Categories
Uncategorized

It is necessary to write a good hand SQL

  • MySQL性能

      The maximum amount of data

      The maximum number of concurrent

      Query takes 0.5 seconds

      Implementation of the principle

  • 数据表设计

      type of data

      Avoid null

      text types

  • 索引优化

      Index Classification

      Optimization principles

  • SQL优化

      Batch processing

      Do not do the column operation

      Avoid Select *

      Operators <> Optimization

      OR optimization

      IN Optimization

      LIKE optimization

      JOIN Optimization

      LIMIT Optimization

  • Other databases

Bloggers responsible for the project mainly Ali cloud database MySQL, SQL recent slow alarms occur frequently, even the longest execution time up to 5 minutes. After the export log analysis, it turned out to be the main reason for not hit and no index paging process. In fact, this is a very low-level error, I can not help but back a cool, technical level of the team members needs to be improved ah. The process of transformation of these SQL, summed up some experience to share to you, if there is an error welcome criticism.

MySQL Performance

The maximum amount of data

Despite the amount of data and the number of concurrent, talk about the performance of bullying. MySQL is no limit on the maximum number of records in a single table, it depends on the operating system limit on file size.

File system

Single file size limit

The largest 4G

Maximum 64GB

The maximum 2TB

Block size is 1024 bytes, the maximum capacity of 16GB file; block size of 4096 bytes, the maximum capacity of the file 2TB

Block size is 4KB, the maximum capacity of the file 4TB

Theory can be greater than 16TB

FAT32
NTFS
NTFS5.0
EXT2
EXT3
EXT4

“Ali Baba Java Development Manual” put forward a single table rows over 5 million lines or single-table capacity of more than 2GB, it is recommended sub-library sub-table. Performance is determined by a combination of factors, put aside the complexity of the business, the impact is followed by the hardware configuration, MySQL configuration, data table design, index optimization. 5000000 This value is only for reference, not an iron law. Bloggers have been operating over a single table over 400 million rows of data, paging, check the latest 20 records takes 0.6 seconds, SQL statements, generally select field_1, field_2 from table where id <# {prePageMinId}="" order="" by="" id="" desc="" limit="" 20,="" prePageMinId="" ID="" is="" the="" smallest="" of="" previous="" data="" record.="" At="" time,="" query="" speed="" okay,="" as="" continues="" to="" grow,="" one="" day="" must="" be="" overwhelmed.="" Sub-library="" sub-table="" a="" long="" period="" and="" large="" high-risk="" job,="" you="" should="" try="" optimize="" on="" current="" structure,="" such="" upgrading="" hardware,="" migrate="" historical="" data,="" etc.,="" it="" Meizhe="" subdivision.="" for="" interested="" students="" can="" read="" basic="" idea="" ​​sub-library="" sub-table.<="" p=""/>

The maximum number of concurrent

The number of concurrent database can refer to the same time the number of requests processed, is determined by the max_connections and max_user_connections. refers to the maximum number of connections max_connections MySQL instance, the upper limit value is 16384, max_user_connections is the maximum number of database connections per user. MySQL will provide a buffer for each connection, which means consuming more memory. If the connections are set too high hardware too much, is too low and can not take full advantage of the hardware. General requirements for both the ratio exceeds 10%, calculated as follows:

max_used_connections / max_connections * 100% = 3/100 *100% ≈ 3%

View the maximum number of connections and response maximum number of connections:

show variables like '%max_connections%';
show variables like '%max_user_connections%';

Modify the maximum number of connections in the configuration file my.cnf

[mysqld]
max_connections = 100
max_used_connections = 20

Query takes 0.5 seconds

Recommended that a single query took control in less than 0.5 seconds, 0.5 seconds is the experience points, three seconds from the principle of the user experience. If the user’s operation does not respond within three seconds, it will even out of boredom. Response time = UI rendering client network requests Processed Processed + + + applications consuming process consuming database query, the processing time is 0.5 seconds left 1/6 database.

Implementation of the principle

Compared NoSQL database, MySQL is a delicate fragile guy. It is like the female students on physical education, and a little dispute on the students arguing (expansion difficult), ran two steps out of breath (low-capacity small concurrent), often ill to leave (SQL constraints too much). Today I will point out a distributed, application expansion is much easier than the database, so less work is the database implementation of the principles, applications, and more work.

    But do not take full advantage of the abuse index, index notes also consume disk and CPU.

    Not recommended to use the database function to format the data to the application process.

    Not recommended to use foreign key constraints to ensure the accuracy of the data with the application.

    Write Once Read Many small scenes, is not recommended to use a unique index, use the application to ensure uniqueness.

    Appropriate redundant field, try to create an intermediate table, intermediate results of calculations with the application, space for time.

    Not allowed to perform extremely time-consuming affairs, with the application split into smaller transactions.

    Estimated important data sheet (such as order table) and load data growth, optimize advance.

Data table design

type of data

Select principle data types: simple or more smaller footprint.

    If the length can be satisfied, to make use of an integer tinyint, smallint, medium_int not int.

    If the string length is determined, using the char type.

    If varchar meet, without using text type.

    The use of high precision decimal type, BIGINT may also be used, such as two decimal accuracy multiplied by 100 to save.

    Try using timestamp instead of datetime.

Types of

Occupy bytes

description

8 bytes

4 bytes

datetime ‘1000-01-01 00:00:00.000000’ to ‘9999-12-31 23:59:59.999999
timestamp ‘1970-01-01 00:00:01.000000’ to ‘2038-01-19 03:14:07.999999’

Compared datetime, timestamp take up less space, the storage zone is automatically converted to UTC time format.

Avoid null

MySQL in the field is still NULL space, will make the index, the index statistics more complex. NULL value is updated to a non-NULL update can not be done in situ from, prone to split affect the performance of the index. As far as possible NULL values ​​instead of meaningful value, but also to avoid SQL statement which contains the judgment is not null.

Type text optimization

Since the text field to store large amounts of data, table capacity will go up early, affecting the performance of other fields of inquiry. We recommend drawn out on the child table, with associated natural key.

Index Tuning

Index Classification

    Ordinary Index: basic index.

    Composite index: indexing the plurality of fields, the composite can be accelerated retrieval query.

    The only index: Similar to ordinary indexes, but the value of the index columns must be unique, allow nulls.

    A combination of a unique index: a combination of column values ​​must be unique.

    Primary key index: special unique index, a record for a unique identification data in the table, allow nulls, usually with a primary key constraint.

    Full-text index: for mass text query, InnoDB and MyISAM after MySQL5.6 support full-text indexing. Because the query precision and scalability poor, more companies choose Elasticsearch.

Index Tuning

    Paging query is very important, if the amount of query data exceeds 30%, MYSQL will not use the index.

    Single table index number not more than 5, a single index field number no more than five.

    String prefix index may be used, the prefix length of the control characters 5-8.

    The only field is too low, increase the index does not make sense, such as: whether to remove the gender.

  1. 合理使用覆盖索引,如下所示:
select login_name, nick_name from member where login_name = ?

login_name, nick_name two fields to establish a composite index, a simple index is faster than login_name

SQL optimization

Batch processing

Bloggers see a child ponds dug a small hole in the drain, the water there are all kinds of floating debris. Duckweed and leaves can always pass the outlet, and will block other objects through the branches, and sometimes get stuck, the need for manual cleaning. MySQL is a fish pond, and the maximum number of concurrent network bandwidth is the outlet, the user SQL is floating. Queries with no paging parameters, or the impact of large amounts of data update and delete operations, all the branches, we want it to break up a batch process, example:
    Business Description: update users all expired coupons unavailable.
    SQL statement: update status = 0 FROM `coupon` WHERE expire_date <= #="" {currentDate}="" and="" status="1;"     If="" a="" large="" number="" of="" coupons="" need="" to="" be="" updated="" unavailable="" state,="" executes="" the="" SQL="" may="" blocked="" other="" SQL,="" batch="" processing="" pseudo-code="" is="" as="" follows:<="" p=""/>

int pageNo = 1;
int PAGE_SIZE = 100;
while(true) {
    List batchIdList = queryList('select id FROM `coupon` WHERE expire_date <= #{currentDate} and status = 1 limit #{(pageNo-1) * PAGE_SIZE},#{PAGE_SIZE}');
    if (CollectionUtils.isEmpty(batchIdList)) {
        return;
    }
    update('update status = 0 FROM `coupon` where status = 1 and id in #{batchIdList}')
    pageNo ++;
}

Operators <> Optimization

Typically <> operator can not use the index, for example as follows, the query is not the amount of $ 100 orders:
    select id from orders where amount = 100!;
    If the amount is under 100 orders for rare, severe uneven distribution of data such circumstances, it is possible to use the index. Given this uncertainty, the search results using the polymerization union, rewritten as follows:

(select id from orders where amount > 100)
 union all
(select id from orders where amount < 100 and amount > 0)

OR optimization

In Innodb engine or can not use the composite index, such as:

select id,product_name from orders where mobile_no = '13421800407' or user_id = 100;

Mobile_no + user_id not hit OR combination of the index, Union employed, as follows:

(select id,product_name from orders where mobile_no = '13421800407')
 union
(select id,product_name from orders where user_id = 100);

At this point id and product_name field has an index, the query is most efficient.

IN Optimization

    IN large main table for small child table, EXIST main table for big kid table. Because the query optimizer escalating, many scenes both performance almost the same thing.

    Try instead join query, for example as follows:

select id from orders where user_id in (select id from user where level = 'VIP');

Using JOIN shown below:

select o.id from orders o left join user u on o.user_id = u.id where u.level = 'VIP';

Do not do the column operation

Typically query the index column computation will lead to failure, as follows:
    Queries day orders

select id from order where date_format(create_time,'%Y-%m-%d') = '2019-07-01';

date_format function causes the query can not use the index, after rewrite:

select id from order where create_time between '2019-07-01 00:00:00' and '2019-07-01 23:59:59';

Avoid Select all

If you do not query all the columns in the table, avoid using SELECT *, it will be a full table scan, can not effectively use the index.

Like Optimization

like a fuzzy query, for example (field indexed):

SELECT column FROM table WHERE field like '%keyword%';

This query misses the index and replaced with the following wording:

SELECT column FROM table WHERE field like 'keyword%';

In addition to the previous query% will hit the index, but the product manager must be fuzzy match before and after it? Full-text indexing fulltext can try, but Elasticsearch is the ultimate weapon.

Join Optimization

Join to achieve is the use of Nested Loop Join algorithm, the result is set by the drive as the basic data table, the data through the node to the next as a filter condition table query data cycle, then combined the results. If multiple join, in front of the result is set as the cyclic data after a re-query the data tables.

    Table-driven table and driven increase query as to meet the ON condition and less Where, with little result set to drive large result sets.

    Is indexed and join field on the drive table, time can not be indexed, provision of adequate Join Buffer Size.

    Prohibit join connect more than three tables, try to increase the redundancy field.

Limit Optimization

When the query for paging limit next turn worse performance, principle solution: Reduce the scan area, as shown below:

select * from orders order by id desc limit 100000,10 
耗时0.4秒
select * from orders order by id desc limit 1000000,10
耗时5.2秒

First screened ID narrow your search, worded as follows:

select * from orders where id > (select id from orders order by id desc  limit 1000000, 1) order by id desc limit 0,10
耗时0.5秒

If the query conditions only the master key ID, worded as follows:

select id from orders where id between 1000000 and 1000010 order by id desc
耗时0.3秒

If the above program is still very slow? I had to use the cursor, and interested friends to read JDBC use the cursor implement paging query

Other databases

As a back-end developer, be sure proficient in MySQL or SQL Server as the storage core, but also an active interest in NoSQL database, they have matured and are widely used enough to solve performance bottlenecks in specific scenarios.

classification

database

characteristic

Key type

For content caching, high load large volumes of data

Key type

For content caching, support more than Memcache data types, and can be persistent data

Columnar storage

Hadoop core database system, massive structured data storage, big data necessary.

Document type

Well-known document database can also be used to cache

Document type

Apache open source projects, focusing on ease of use, support for REST API

Document type

Well-known document database

Graph

Map for social networking to build relationships, recommendation systems

Memcache
Redis
HBase
MongoDb
CouchDB
SequoiaDB
Neo4J

Reference (excerpt of the text portion of the original author belongs Rights):

https://www.jianshu.com/p/6864abb4d885

Chicken soup: Since you've made a choice, why ask why the choice. - Wei Zhuang

Leave a Reply