What is ElasticSearch and when to use it?
If you are here to learn about ElasticSearch, then you are at the right place. Here we will discuss the basics of Elastic Search, when to use it and its advantages.
As per the official documentation:
- Elasticsearch is a distributed, open-source search and analytics engine for different data, including textual, numerical, geospatial, structured, and unstructured data.
- Based on Apache Lucene, it is an open-source search text engine entirely written in java language.
- ElasticSearch works as a wrapper for Lucene. One has to use the API provided by ElasticSearch; it automatically converts your code into the Lucene syntax while storing and retrieving your documents.
- The release was in 2010.
- The relationship between the ElasticSearch and the Apache SOLR is like a car and its engine. The engine does all the work; you do not need to know how it works if you want to drive the car. The car company has encapsulated the mechanism of the engine with the help of some devices (API) like Gearbox, clutch, and paddle, etc.
ElasticSearch helps you to store vast volumes of data and can give your required data within milliseconds. It is because instead of searching for the text directly, it searches on an index. Your saved data is ready to be searched as soon as it is indexed, or we can say that in near real-time.
Basic Terms to help you around:
Fields are the smallest part of the Elasticsearch. Each field has a datatype and can hold a single piece of data. The field supports the basic data types like the Boolean, string, number, and the advanced data types like the geodata, IP address, and many more. Examples of the fields are name, class, roll no, location, etc.
A document is the basic unit of the information saved on ElasticSearch in JSON format. You can compare a document in ElasticSearch with a row in the relational database. Each document has its unique id and, you can assign the ID while adding the data into the ElasticSearch. If you are not proving the id, ElasticSearch will automatically provide A unique ID to each document.
You can imagine the index in ElasticSearch as the database in the Relational Database system. An Index contains the documents that have similar characteristics or are logically related to each other. If we take an example of an eCommerce website, there will be one index for products, one for customer records and one for orders, and so on.
Type is the logical partitions of the index. In RDBMS, you can refer to the table. For example, you can have an index like a blog inside the blog index and have the user type, article type, comment type, etc. In earlier versions, the _type field would combine with the document’s _id to generate a _uid field, so the documents of different types with the same _id could exist in a single index. In the ElasticSearch 7.0, this is removed.
We will discuss why this has been removed in the upcoming tutorials.
Indices are plural of index. If we have more than one index, then we refer to them as indices.
The data gets saved on the shards to manage it efficiently. A shard of ElasticSearch represents an index in the Apache Lucene. The max limit of documents in an Apache Lucene index is 2,147,483,519. When you index your data in Elasticsearch, it automatically distributes the data into multiple Lucene indexes. Shards can also help in the performance; if we have multiple shards in different nodes, we can query the data parallelly in multiple shards rather than a single shard.
Replica shards are copies of the primary shards and are used to prevent data loss in case of hardware failure. These replicas are on another node where the primary shards are not present. For example, if you have two nodes and node one has three shards and node two has two shards, then the replica of the first shard would be stored on node two and, the replica of node two will be stored on node one. If a primary shard becomes unavailable due to a network problem or hardware failure, the replica is promoted to primary shard and takes over its role.
A node is an instance of the ElasticSearch. It is a distributed system where we can scatter our data into multiple nodes, and each node has shards stored on it.
A cluster is a group of one or more ElasticSearch nodes that are connected. One node is the master node, which manages the adding and removal of other nodes. You can assume the cluster as the highest entity in the elastic search.
Mapping is similar to a schema in RDBMS. It defines the type of value stored in a particular field. We can define mapping by the mapping API. Even if we do not define mapping and directly index the data, mapping happens automatically. This process is known as Dynamic Mapping.
These are the basic terminology used in ElasticSearch.
Now let us look at its advantages and why it is the best choice for large enterprises.
1) Distributed System:
Elasticsearch of distributed in nature. It means we can spread the data into multiple servers, dividing the processing load on each server, helping to optimize speed.
2) Fuzzy Search:
While searching for the data, users can sometimes misspell words such as “iotadal” instead of “Iotasol”. In RDBMS, this would not get any results, but ElasticSearch would still return data. This process is called a fuzzy search.
As the document is stored or indexed, no sooner is it ready to be searched.
4) JSON based:
data storage and communication.
5) Schema Free:
ElasticSearch is schema-free; we can add our data and start using the ES service. It helps if we can define the schema, but it is not a necessity.
6) Build-in Cache:
ES has a built-in cache mechanism to make the queries run faster by caching previous searches.
7) Vast documentation and excellent community support
8) Geo-Loaction Search:
Elasticserach provides the geo-location search out-of-box. This topic is in itself a separate blog which we will address in subsequent posts.
9) Based on Rest API:
We will wrap up the blog at this point. You are now familiar with the core basic building blocks of ElasticSearch.
For any questions or in need of more details, please feel free to reach out to us. Team Iotasol is here to help! Contact us now!
Subscribe to our newsletter to stay updated with our work!
Setting Up Apache Airflow and Jupyter Notebook on AWS EC2 instance
Airflow setup on EC2 instance along with DAG management on the server using Jupyter notebook is the easiest and convenient way of managing
Importance of going Digital post Covid-19
The world and the way businesses run is going to change forever after COVID-19.
How exactly digital transformation works?
Companies are ready to do the digital shift for their business, and most of the time, are not even worried about the cost