Hmmm, if you are here to learn about ElasticSearch then you are at the right place. In this blog, we are going to discuss the basics of ElasticSearch, when to use it and what are the advantages of using it.
As per the official documentation:
Elasticsearch helps you to store vast volumes of data and can give your required data within milliseconds. It is because instead of searching for the text directly, it searches on an index. Your saved data is ready to be searched as soon as it is indexed, or we can say that in near real-time.
Basic Terms to help you around:
1) Fields: Fields are the smallest part of the Elasticsearch. Each field has a datatype and can hold a single piece of data. The field supports the basic data types like the Boolean, string, number, and also the advanced data types like the geodata, IP address, and many more. Examples of the fields are name, class, roll no, location, etc.
2) Documents: A document is the basic unit of the information that can be saved on the Elasticsearch. You can compare a document in Elasticsearch with a row in the relational database. In Elasticsearch, a document is stored in the JSON format. Each document has its unique id and you can assign the ID while adding the data into the ElasticSearch. If you are not proving the id, ElasticSearch will automatically provide A unique ID to each document.
3) Index: You can imagine the index in ElasticSearch as the database in the Relational Database system. An Index contains the documents that have similar characteristics or are logically related to each other. If we take an example of an eCommerce website, there will be one index for products, one for customer records and one for orders, and so on.
4) Type: Type is the logical partitions of the index. In RDBMS, you can refer it to the table. For example, you can have an index like a blog, inside the blog index, you can have types like the user type, article type, comment type, etc. In earlier versions, the _type field was combined with the document’s _id to generate a _uid field, so documents of different types with the same _id could exist in a single index. In the ElasticSearch 7.0, this has been removed.
We will discuss why this has been removed in the upcoming tutorials.
5) Indices: Indices are plural of index. If we have more than one index, then we refer to them as indices.
6) Shards: All the data is saved on the shards. In order to manage data in ElasticSearch, we divide the data into shards. A shard of ElasticSearch represents an index in the Apache Lucene. The max limit of documents in an Apache Lucene index is 2,147,483,519. When you index your data in Elasticsearch, it automatically distributes the data into multiple Lucene indexes. Shards can also help in the performance as if we have multiple shards in different nodes, we can query for the data parallelly in multiple shards rather than a single shard.
7) Replica: Replica shards are copies of the primary shards and are used to prevent data loss in case of hardware failure. These replicas are always created on another node where the primary shards are not present. For example, if you have two nodes and node one has 3 shards and node two has 2 shards, then the replica of the first would be stored on node two and the replica of the node 2 will be stored on the node 1. If a primary shard becomes unavailable due to a network problem or hardware failure, a replica is promoted to as a primary shard and it takes over its role.
8) Node: A node is an instance of the ElasticSearch. ElasticSearch is a distributed system so we can distribute our data into the multiple nodes, and each node has shards stored on it.
9) Cluster: A cluster is a group of one or more ElasticSearch nodes that are connected together. There is one node referred to as the master node, which manages the adding and the removal of other nodes. You can assume the cluster as the highest entity in the elastic search.
10) Mapping: Mapping is similar to a schema in RDBMS. It defines what type of value is stored in which field. We can define mapping by the mapping API. Even if we do not define mapping and directly index the data, mapping happens automatically. This process is known as Dynamic Mapping.
Now that we know the basic terminology used in ES, let us talk about its advantages and why it is the best choice for large enterprises.
1) Distributed System: Elasticsearch is distributed in nature. It means we can spread the data into multiple servers. The processing load will be divided on each server, which in turn helps to optimize speed.
2) Fuzzy Search: While searching for the data, users can sometimes misspell words such as “iotadal” instead of “Iotasol”. In RDBMS, this would not get any results, but ElasticSearch would still return data. This process is called a fuzzy search.
3) Real-Time: As soon as the document is stored or indexed, it is ready to be searched.
4) JSON based data storage and communication.
5) Schema Free: ES is schema-free i.e. we can simply add our data and start using the ES service. It helps if we can define the schema, but it is not a necessity.
6) Built-in cache: ES has a built-in cache mechanism to make the queries run faster by caching previous searches.
7) Vast documentation and excellent community support
8) Geo-location search: Elasticserach provides the geo-location search out-of-box. This topic is in itself a separate blog which we will address in subsequent posts.
9) Based on Rest API.
We will wrap up the blog at this point. You are now familiar with the core basic building blocks of ElasticSearch.