elasticsearch Box: https: //www.cnblogs.com/hello-shf/category/1550315.html
First, prior knowledge
Before curd document for in-depth analysis, we have to understand the following little knowledge, a few do not understand what knowledge we will be hard to understand how the document is of CRUD.
1.1 Routing (index) the primary shard immutable
We have not considered this issue, when you index a document that is stored on a single master slice. How Elasticsearch know which documents belong to slice it? When you create a new document, how it is to know should be stored in slices 1 or 2 slices on it? The process is not random, because we want to retrieve documents in the future. In fact, it is based on a simple algorithm decision:
shard = hash(routing) % number_of_primary_shards
routing value is an arbitrary string, which is the default, but can also be customized _id. The routing string generated by the hash function a number, and then dividing by the number of main sections of a remainder obtained (REMAINDER), the remainder is always the range 0 to number_of_primary_shards – 1, this figure is a fragment of a particular document is located.
This also explains why the number of primary fragmentation can only be defined when creating an index and can not be modified: if the number of primary fragmentation change in the future, all the previous routing value becomes ineffective, the document will never be found.
We show you the process of this route. Suppose that we have three nodes, a student index, corresponding to the three primary shard and a replica shard. At this time, the cluster shown in Figure 1
Inserting a document to the node, and we specify _id (es can be customized in the _id es may be automatically generated) If _id = 1000, we have described above, according to this case will be calculated according to the following algorithm which will be hit which shard. Assuming that hash (1000) = 13;
shard = hash(routing) % number_of_primary_shards 即： shard = 13 % 3 = 1；（假设hash（1000） = 13
) Note: Because es hash function specifically how to calculate the unknown, not important, we are mainly concerned about the principle.
The calculation of the available interrupt request will hit P1, shard case the document will be inserted into P1. Is not it simple.
Above it is also es routing process, also known as es index (the index is a verb, to understand it) process.
1.2, shard load balancing and peer nodes
Each node in the cluster es, each Shard (including primary shard and replica shard) are equipped to handle any request. This means that between the nodes in the cluster are highly es load balancing, that is not the only entrance is the master node traffic, each node is equipped to handle the request. primary shard and a replica shard is also the height of the load balancing, because not only the primary shard only have the ability to deal with curd, replica shard can process the request retrieval. This is also why the performance of the performance of one of es so well.
Two, document add, delete, change
2.1, additions and deletions process analysis
New, indexing and deletion requests are written (write) operation, they must successfully complete before copied to the relevant replica shard fragments on the primary shard.
About new document indexing process can refer to the “es of the indexing process.”
As shown above, the client sends a request to the client in response es cluster can be divided into six stages or more.
Node1 client to initiate add, delete, change your request. node1 will carry out related work as a coordinator node (coordinate node).
node1 calculated according to hit the primary shard document _id is P1, then forwards the request to node2, P1 slice located above the node2.
node2 process the request on P1. If the request is processed successfully, node2 will continue to forward the request to its copy of the R1. R1 is located node3.
R1 node3 message processed in the request, if successful, will node3 process is successful is returned to the node2.
node2 receive a copy of the message P1 process is successful, which means that the request has been processed. Then returns the processing result to the node node1.
After node1 coordinating node receives a response result, the result returned to the client.
The whole process is complete.
See here we are not also considering a request comes in, I have to wait for all fragments are considered processed this operation is done, so is not it affect the response speed. Based on this thinking, es also provides us with the support custom parameters, for example, we can use replication parameter to specify the primary shard is not to wait until the response to the client after processing is complete replica shard. However, this configuration is not recommended configuration parameters, we know there is such a thing on the line.
2.1, write guarantee consistency
1 one：要求我们这个写操作，只要有一个primary shard是active活跃可用的，就可以执行。 2 all：要求我们这个写操作，必须所有的primary shard和replica shard都是活跃的，才可以执行这个写操作 3 quorum：默认的值，要求所有的shard中，必须是大部分的shard都是活跃的，可用的，才可以执行这个写操作
The above three points is actually very good understanding, only a quorum so-called “most” feel not so clear. Formulas below, a write operation is to be performed when the number of fragments in the cluster Active (available) to achieve the following formula results. Otherwise the operation will fail.
int( (primary + number_of_replicas) / 2 ) + 1
Still with our example above, suppose a student we created index, and is provided to three primary shard, replica shard is 1 (this is an index with respect to it, for the primary slice means each of the numbers 1 a primary shard corresponds to the existence of a copy). Which means primary = 3, number_of_replicas = 1 (still relative to the index). The total number of shard of 6.
At this time, the above calculation formula can be seen:
int((3+1)/2) + 1 = 3
Of course, if we do not specify that use the default, which is the quorum.
Three, document retrieval
And document retrieval process is slightly different additions and deletions: From the main documents can slice (primary shard) or a copy of any slice (replicashard) are retrieved.
Retrieval process can be divided into four stages
The client sends a search request to node1. node1 will carry out related work as a coordinator node (coordinate node).
node1 calculated according to the document _id hit the primary shard for the P1, P1 node1 will find all copies, and then through the round-robin polling algorithm randomly, randomly selected in a primary shard and all of its replica, so that the read request load balancing. If this time is randomly selected P1, node1 forwards the request to the node corresponding to P1.
P1 processed the request returns the result to the coordinator node node1.
Coordinating node node1 node2 receives the respective results, and then returns the result to the client.
“Elasticsearch- Definitive Guide”
If the wrong place also please leave a message correction.
The original is not easy, please indicate the original address: https: //www.cnblogs.com/hello-shf/p/11543480.html