elasticsearch underlying principles CRUD

elasticsearch Box: https: //

First, prior knowledge

Before curd document for in-depth analysis, we have to understand the following little knowledge, a few do not understand what knowledge we will be hard to understand how the document is of CRUD.


1.1 Routing (index) the primary shard immutable

We have not considered this issue, when you index a document that is stored on a single master slice. How Elasticsearch know which documents belong to slice it? When you create a new document, how it is to know should be stored in slices 1 or 2 slices on it? The process is not random, because we want to retrieve documents in the future. In fact, it is based on a simple algorithm decision:

shard = hash(routing) % number_of_primary_shards 

routing value is an arbitrary string, which is the default, but can also be customized _id. The routing string generated by the hash function a number, and then dividing by the number of main sections of a remainder obtained (REMAINDER), the remainder is always the range 0 to number_of_primary_shards – 1, this figure is a fragment of a particular document is located.

This also explains why the number of primary fragmentation can only be defined when creating an index and can not be modified: if the number of primary fragmentation change in the future, all the previous routing value becomes ineffective, the document will never be found.

We show you the process of this route. Suppose that we have three nodes, a student index, corresponding to the three primary shard and a replica shard. At this time, the cluster shown in Figure 1

Inserting a document to the node, and we specify _id (es can be customized in the _id es may be automatically generated) If _id = 1000, we have described above, according to this case will be calculated according to the following algorithm which will be hit which shard. Assuming that hash (1000) = 13;

shard = hash(routing) % number_of_primary_shards
shard = 13 % 3 = 1;(假设hash(1000) = 13

) Note: Because es hash function specifically how to calculate the unknown, not important, we are mainly concerned about the principle.

The calculation of the available interrupt request will hit P1, shard case the document will be inserted into P1. Is not it simple.

Above it is also es routing process, also known as es index (the index is a verb, to understand it) process.


1.2, shard load balancing and peer nodes

Each node in the cluster es, each Shard (including primary shard and replica shard) are equipped to handle any request. This means that between the nodes in the cluster are highly es load balancing, that is not the only entrance is the master node traffic, each node is equipped to handle the request. primary shard and a replica shard is also the height of the load balancing, because not only the primary shard only have the ability to deal with curd, replica shard can process the request retrieval. This is also why the performance of the performance of one of es so well.


Two, document add, delete, change

2.1, additions and deletions process analysis

New, indexing and deletion requests are written (write) operation, they must successfully complete before copied to the relevant replica shard fragments on the primary shard.

About new document indexing process can refer to the “es of the indexing process.”




As shown above, the client sends a request to the client in response es cluster can be divided into six stages or more.

Stage 1:

Node1 client to initiate add, delete, change your request. node1 will carry out related work as a coordinator node (coordinate node).

Phase 2:

node1 calculated according to hit the primary shard document _id is P1, then forwards the request to node2, P1 slice located above the node2.

Stage 3:

node2 process the request on P1. If the request is processed successfully, node2 will continue to forward the request to its copy of the R1. R1 is located node3.

Stage 4:

R1 node3 message processed in the request, if successful, will node3 process is successful is returned to the node2.

Stage 5:

node2 receive a copy of the message P1 process is successful, which means that the request has been processed. Then returns the processing result to the node node1.

Stage 6:

After node1 coordinating node receives a response result, the result returned to the client.

The whole process is complete.

See here we are not also considering a request comes in, I have to wait for all fragments are considered processed this operation is done, so is not it affect the response speed. Based on this thinking, es also provides us with the support custom parameters, for example, we can use replication parameter to specify the primary shard is not to wait until the response to the client after processing is complete replica shard. However, this configuration is not recommended configuration parameters, we know there is such a thing on the line.

replication: The default valuesyncThis value means that primary shard will respond to client needs to wait until after all the copies of fragments have been completed. If we set the value toasync, Will return after completion of primary shard means to the client, but that does not mean it does not forward the request to the copy of the master slice still forwards the requests to the replica shard, but we are no longer sure a copy We not have completed this request, which would not guarantee data consistency.

2.1, write guarantee consistency

It should be noted first that, in fact, are additions and deletions to a write operation, so write here refers to the three additions and deletions to the operation.
Here we are talking about write consistency refers to the consistency of the data on the primary shard and a replica of the shard. es API provides us with a customizable parameter consistency. This parameter allows us to customize the additions and deletions to deal with a request, is not necessary to require all the fragments are active will be performed.
This optional parameter values ​​are three: one, all, quorum (default, default).
1 one:要求我们这个写操作,只要有一个primary shard是active活跃可用的,就可以执行。
2 all:要求我们这个写操作,必须所有的primary shard和replica shard都是活跃的,才可以执行这个写操作
3 quorum:默认的值,要求所有的shard中,必须是大部分的shard都是活跃的,可用的,才可以执行这个写操作

The above three points is actually very good understanding, only a quorum so-called “most” feel not so clear. Formulas below, a write operation is to be performed when the number of fragments in the cluster Active (available) to achieve the following formula results. Otherwise the operation will fail.

int( (primary + number_of_replicas) / 2 ) + 1

Still with our example above, suppose a student we created index, and is provided to three primary shard, replica shard is 1 (this is an index with respect to it, for the primary slice means each of the numbers 1 a primary shard corresponds to the existence of a copy). Which means primary = 3, number_of_replicas = 1 (still relative to the index). The total number of shard of 6.

At this time, the above calculation formula can be seen:

int((3+1)/2) + 1 = 3

That is when the number of shard cluster available in> = 3 write operation is to be performed.
Having said that did not seem to explain the above have to do with write consistency. es to ensure consistency written by quorum is ensured, that the quorum required number of available shard es cluster meet certain requirements to perform. Also indirectly ensure data consistency shard of.
Specific use is also very simple
PUT /index/type/id?consistency=quorum

Of course, if we do not specify that use the default, which is the quorum.


Three, document retrieval

And document retrieval process is slightly different additions and deletions: From the main documents can slice (primary shard) or a copy of any slice (replicashard) are retrieved.


Retrieval process can be divided into four stages

Stage 1:

The client sends a search request to node1. node1 will carry out related work as a coordinator node (coordinate node).

Phase 2:

node1 calculated according to the document _id hit the primary shard for the P1, P1 node1 will find all copies, and then through the round-robin polling algorithm randomly, randomly selected in a primary shard and all of its replica, so that the read request load balancing. If this time is randomly selected P1, node1 forwards the request to the node corresponding to P1.

Stage 3:

P1 processed the request returns the result to the coordinator node node1.

Stage 4:

Coordinating node node1 node2 receives the respective results, and then returns the result to the client.

It is possible that an indexed document already exists on the primary shard without having time to synchronize to the replication shard. The copy shard reports that the document was not found and the primary shard returns the document successfully. Once the index request is successfully returned to the user, the document is available on both the primary and replication shard.
A request for mget and bulk quantities, and there is little difference between a single document retrieval, the difference is the coordinator node needs to calculate slices of each document resides. It is split into multiple document request for the requested document each tile, then forward each of the participating nodes. Upon receiving the response of each node, and organize them into a single combined response response, and finally returned to the client.



“Elasticsearch- Definitive Guide”


If the wrong place also please leave a message correction.

The original is not easy, please indicate the original address: https: //

Leave a Reply