Table of Content
If you are here to learn about ElasticSearch, then you are at the right place. Here we will discuss the basics of Elastic Search, when to use it and its advantages.
ElasticSearch helps you to store vast volumes of data and can give your required data within milliseconds. It is because instead of searching for the text directly, it searches on an index. Your saved data is ready to be searched as soon as it is indexed, or we can say that in near real-time.
Fields are the smallest part of the Elasticsearch. Each field has a datatype and can hold a single piece of data. The field supports the basic data types like the Boolean, string, number, and the advanced data types like the geodata, IP address, and many more. Examples of the fields are name, class, roll no, location, etc.
A document is the basic unit of the information saved on ElasticSearch in JSON format. You can compare a document in ElasticSearch with a row in the relational database. Each document has its unique id and, you can assign the ID while adding the data into the ElasticSearch. If you are not proving the id, ElasticSearch will automatically provide A unique ID to each document.
You can imagine the index in ElasticSearch as the database in the Relational Database system. An Index contains the documents that have similar characteristics or are logically related to each other. If we take an example of an eCommerce website, there will be one index for products, one for customer records and one for orders, and so on.
Type is the logical partitions of the index. In RDBMS, you can refer to the table. For example, you can have an index like a blog inside the blog index and have the user type, article type, comment type, etc. In earlier versions, the _type field would combine with the document’s _id to generate a _uid field, so the documents of different types with the same _id could exist in a single index. In the ElasticSearch 7.0, this is removed.
We will discuss why this has been removed in the upcoming tutorials.
Indices are plural of index. If we have more than one index, then we refer to them as indices.
The data gets saved on the shards to manage it efficiently. A shard of ElasticSearch represents an index in the Apache Lucene. The max limit of documents in an Apache Lucene index is 2,147,483,519. When you index your data in Elasticsearch, it automatically distributes the data into multiple Lucene indexes. Shards can also help in the performance; if we have multiple shards in different nodes, we can query the data parallelly in multiple shards rather than a single shard.
Replica shards are copies of the primary shards and are used to prevent data loss in case of hardware failure. These replicas are on another node where the primary shards are not present. For example, if you have two nodes and node one has three shards and node two has two shards, then the replica of the first shard would be stored on node two and, the replica of node two will be stored on node one. If a primary shard becomes unavailable due to a network problem or hardware failure, the replica is promoted to primary shard and takes over its role.
A node is an instance of the ElasticSearch. It is a distributed system where we can scatter our data into multiple nodes, and each node has shards stored on it.
A cluster is a group of one or more ElasticSearch nodes that are connected. One node is the master node, which manages the adding and removal of other nodes. You can assume the cluster as the highest entity in the elastic search.
Mapping is similar to a schema in RDBMS. It defines the type of value stored in a particular field. We can define mapping by the mapping API. Even if we do not define mapping and directly index the data, mapping happens automatically. This process is known as Dynamic Mapping.
These are the basic terminology used in ElasticSearch.
Elasticsearch of distributed in nature. It means we can spread the data into multiple servers, dividing the processing load on each server, helping to optimize speed.
While searching for the data, users can sometimes misspell words such as “iotadal” instead of “Iotasol”. In RDBMS, this would not get any results, but ElasticSearch would still return data. This process is called a fuzzy search.
As the document is stored or indexed, no sooner is it ready to be searched.
data storage and communication.
ElasticSearch is schema-free; we can add our data and start using the ES service. It helps if we can define the schema, but it is not a necessity.
ES has a built-in cache mechanism to make the queries run faster by caching previous searches.
Elasticserach provides the geo-location search out-of-box. This topic is in itself a separate blog which we will address in subsequent posts.
We will wrap up the blog at this point. You are now familiar with the core basic building blocks of ElasticSearch.
Airflow setup on EC2 instance along with DAG management on the server using Jupyter notebook is the easiest and convenient way of managingREAD MORE
The world and the way businesses run is going to change forever after COVID-19.READ MORE
Companies are ready to do the digital shift for their business, and most of the time, are not even worried about the costREAD MORE