elasticsearch Beginners

elasticsearch Box: https: //

A, elasticsearch interesting story behind

Many years ago, unemployment developer named Shay Banon of a newly married, along with his wife went to London, his wife learned cook there. Looking for a lucrative job, in order to give his wife do a recipe search engine, he began using an early version of Lucene. Direct use Lucene is very difficult, so Shay began to make an abstraction layer, Java developers can use it very simple to add search functionality to their programs. He released his first open source project Compass. Later Shay get a job, mainly memory data grid in high-performance, distributed environment. For this high-performance, real-time, distributed search engine demand is particularly prominent, he decided to rewrite Compass, turn it into a stand-alone service and named Elasticsearch. The first public version released in February 2010, since then, Elasticsearch has become one of the most active projects on Github, he has more than 300 contributors (currently 736 contributors). A company has started around Elasticsearch provide business services, and develop new features, however, Elasticsearch will always open and available for everyone. It is said, Shay’s wife is still waiting for her recipe search engine …


Two, elasticsearch Profile

Elasticsearch is an open source search engine, built on a full-text search engine library Apache Lucene ™ basis. Lucene can say today is the most advanced, high-performance, full-featured search engine library – whether open source or proprietary. But just a Lucene library. In order to give full play to its function, you need to use Java Lucene and integrated directly into the application. Even worse, you may need to obtain a degree in information retrieval in order to understand how it works. Lucene is very complex. Elasticsearch is written in Java, its internal use Lucene indexing and search, but its purpose is to enable full-text search easier, by hiding the complexity of Lucene, instead of providing a simple and consistent RESTful API. However, Elasticsearch just Lucene, and also more than just a full-text search engine. It can be accurately described as follows:

1 一个分布式的实时文档存储,每个字段 可以被索引与搜索
2 一个分布式实时分析搜索引擎
3 能胜任上百个服务节点的扩展,并支持 PB 级别的结构化或者非结构化数据


2.1, elasticsearch function

1, distributed search engine

es can be distributed as a search engine, such as the station Baidu, Taobao search goods, general web search system, es are a good selection of technology.

2, data analysis engine

es provides a rich API support for personalized search and data analysis on the basis of the search, such as the electricity supplier site, we can query the last few days of selling commodities.

3, near real-time mass data processing

es is a distributed search engine, and the inner portion through clustering es sheet may be dispersed in the mass data storage and retrieval on multiple servers, which greatly improves the scalability and disaster recovery capabilities.

The so-called near-real-time is a relative concept, the average speed can be achieved if the corresponding second level, we actually called a near real-time. es near real-time, including two aspects: one written in the data can be retrieved after 1s. Second retrieval and analysis of their response times in the second level.


2.2, elasticsearch features

1, distributed

es is a distributed search engine, it can be very good for the migration of data disaster recovery, dynamic expansion, load balancing, distributed nature.

2, mass data

es can handle PB level data because es is a distributed architecture that supports dynamic expansion, so for mass data processing and storage are no longer a problem.


Third, some of the basic concepts elasticsearch

1, the basic concept of data es


Index (index) is similar to a relational database in the “Database” – it is the place where we store the index and associated data.

prompt:     In fact, our data are in slice (Shards), the index to just a fragment of one or more packets together store and index the logical space. However, this is just some of the internal details - Our program has absolutely no fragmentation. For our programs, documents stored in the index (index) in. The remaining details of the Elasticsearch care either.


type concept similar to the concept of tables in MySql.

In the application, we use the object represents some of the “things”, such as a user, a blog, a comment or an email. Each object belongs to a class (class), or the data class defines the attributes associated with the object. Class user objects may include name, sex, age, and Email address. In a relational database, we often objects are stored in a table of the same class, because they have the same structure. Similarly, in Elasticsearch, we use the same type (type) of documents represent the same “thing”, because their data structure is the same. Each type (type) has its own mapping (mapping) or structural definition, like a traditional database table columns the same. All documents in the same type is in the index, but the type of map (mapping) will tell how to store different Elasticsearch documents to be indexed. We will explore how to define and manage mapping “Mapping” section, but now we will rely Elasticsearch to automatically process the data structure.


es is the basic document indexing unit, document similar to a row of MySql. json format of document data.


In MySql we express this concept using the primary key uniqueness, id at es is in a record. The same may be self-generated in the es id, es automatically generated ID have the following characteristics: url is automatically generated safe, base64 encoding, GUID, to ensure that no conflict of the Distribution ID (globally unique ID). Of course, we ourselves can be specified.


2, es several concepts in distributed

Cluster (Cluster):

I believe some familiarity with distributed small partners are not unfamiliar to the Cluster, Cluster represents a cluster es, the so-called cluster is a cluster in the distributed es es combined into a lot of.

node (node):

node is a node es es cluster (Cluster) is called node. In simple terms can be understood as an example is a node es of the cluster.


3, the two concepts on es storage strategy

Shard (fragmentation) and replica:

To add data to Elasticsearch, we need an index (index) – a place to store associated data. Indeed, the index is merely a point or a plurality of slices (Shards) “logical namespace (logical namespace)”. A slice (Shard) is a minimum level “unit of work (worker unit)”, it is only save the part of all the data in the index. Fragment is a path Lucene example, and it is itself a complete search engine. Our documents are stored in the slice, and is indexed in the slice, but our application does not directly communicate with them, instead, to communicate directly with the index. Fragmentation is the key Elasticsearch distribution of data in the cluster. Imagine the slice data into a container. Documents stored in the slice, and slice your assigned to nodes in the cluster. When you cluster expansion or reduction, Elasticsearch will automatically migrate between your node slices to balance the cluster. Fragment with the primary slice (primary shard) or copy fragment (replica shard).

Your index each document belonging to a single master slice, so the number of main index fragmentation determines how much data can be stored up to. Theoretically master slice can be stored in the data size is not limited, the practical limit depends on your usage. The maximum capacity of the fragment depends entirely on your usage: size, size and complexity of the hardware store documents, how to index and query your document, and the response time you would expect.

Make a copy just the main slice slice, it prevents data loss due to hardware failure, and can provide a read request, such as search or retrieve a document from another shard. When the index is created when the number of the master slice is fixed, but the copy number of fragments can be adjusted at any time. By default, a primary index is assigned five fragments, only one main fragment default copy a slice.

    1,primary shard --- 主分片
    2,replica shard ---

Copy slicing (or called a backup copy fragment or slice)


Note that, there is something known as a convention in the industry, say a single word shard generally refers to the primary shard, and said a single word replica refers to the replica shard.

Another thing to note is the replica shard is relative to the index in terms of, if there is a copy of the current index fragmentation, then the master slice it is relative to each main fragment has a copy slice, that is, if there is 5 there are five primary slice copy fragments, and is one correspondence between the main fragment and the fragment replication.

A very important point: primary shard and a replica shard can not be on the same node. The important thing to say three times:

primary shard and replica shard is not on the same node

primary shard and replica shard is not on the same node

primary shard and replica shard is not on the same node

So es smallest highly available configuration for the two servers.


Four, elasticsearch installation and development tools

I installed the elasticsearch-6.6.2 version

Development Tools: kibana-6.6.2 (note kibana version must be consistent and elasticsearch version)

Further another local further configured development tools: elasticsearch-head

Installation, you go to Baidu, there are a lot of very detailed installation steps, not here in the repeat.

Simple posted a map on how to perform the curl in kibana


Fourth, a cluster of health status

Cluster monitoring information Elasticsearch contains many statistics, the most important one is the health cluster, it is shown as green in the status field, yellow or red.

Execution in kibana in: GET / _cat / health v?

1 epoch      timestamp cluster        status shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
2 1568794410 08:13:30  my-application yellow          1         1     47  47    0    0       40             0                  -                 54.0%

Where we can see the current state of health of my local cluster it is yellow, but here the question is, how the health cluster is to judge it?

green (healthy)     All main fragmentation and copy fragments are operating normally. yellow (sub-health)     All main fragments are operating normally, but not all copies of fragments are operating normally. red (unhealthy)     Main fragmentation could not run properly.


I configured the local elasticsearch only a single node, because the primary shard and a replica shard can not be assigned to a node so on, in my local elasticsearch replica shard is not present, so the state of health is yellow.



“Elasticsearch- Definitive Guide”


If the wrong place also please leave a message correction.

The original is not easy, please indicate the original address: https: //


Leave a Reply