Categories
Uncategorized

Repeatable read transaction isolation level is able to prevent phantom read it?

Foreword

Every time it comes to the database transaction isolation level, everyone will see this table.

Among them, the repeatable read isolation level, effectively prevents dirty reads and non-repeatable read, but still occur phantom reads, phantom reads can occur, says Repeatable Read isolation level can not defend this phantom read it?

I do not care databases from a textbook or some of the tutorials on the web, often see RR level is repeatable read, phantom read but can not be solved, only serializable (Serializable) to resolve phantom read, this statement correct?

In this article, I will focus in MySQL
    Repeatable read (Repeatable read) can guard phantom read it?
    Discuss this issue, I believe that after reading this article, you will certainly have a new transaction isolation level of understanding.

Users have our database structure and data tables, we will hereinafter this operating table,

Long article warning, after reading this article, you are probably require twenty to thirty minutes.

What is the phantom read?

Before says phantom read, we must first come to understand that dirty reads and non-repeatable read.

Dirty read

When a transaction reads another transaction to modify the data but not submitted, it could happen dirty read.

In our example, the transaction 2 modifies a row, but did not commit, the transaction 1 reads this data is not submitted. Now, if the transaction is rolled back 2 or a modification just made additional modifications, then found 1 transaction data is not correct, so this data is dirty read.

Non-repeatable read

“Non-repeatable read” phenomenon occurs when a read lock is not obtained or SELECT operation performed immediately after the lock is released when the read operation performed SELECT; additional transaction data is updated, read different results.

In this case, the transaction submitted 2 successful, so he modified the row 1 is visible on the id for other transactions. 1 transactions resulted in the previous reading age = 1, read the second age = 2, the two results are inconsistent, which is non-repeatable read.

Magic Reading

“Phantom read” is also called “phantom read”, is ‘non-repeatable read’ a special scenario: when a transaction performing the intermediate operation twice within the search range data ” SELECT … WHERE ”, transaction 2 created (such as [[INSERT]]) a new row in the table, just to meet this new transactional data 1 “WHERE” clause.

Figure 1 transaction executed the same query twice, the second time than the first pass out of the more than one data, this is the phantom reads.

Three in the end what is the difference

Introduction to the three scenarios, but certainly there are still many students do not know that there is any difference in the end, I summarize.

Dirty read: refers to read data from other uncommitted transactions.
    Non-repeatable read: read data (update) other committed transactions.

Non-repeatable read and phantom reads are read data from other committed transactions, but they target different points.
    Non-repeatable reads: update.
    Phantom read: delete, insert.

MySQL transaction isolation level in four

Uncommitted Read

Uncommitted read (READ UNCOMMITTED) is the lowest isolation level, and in this isolation level, if a transaction has started writing data, another transaction is allowed to simultaneously write, but allows other transactions to read the trip data.

The figure used to read the dirty analysis, because data after the transaction id = 2 Update 1, 1 still allows the transaction to read the article data, transaction 1 second to execute the query, read the outcome of the transaction 2 updates, resulting in a dirty read.

Read Committed

Because MySQL’s InnoDB default is to use the RR level, so we need to turn the session into a RC level, and mode settings binlog

SET session transaction isolation level read committed;
SET SESSION binlog_format = 'ROW';(或者是MIXED)

In the Level Read Committed (READ COMMITTED), the read transaction data allows other transactions continue to access the rows of data, but write uncommitted transactions will prevent other transactions from accessing the bank, will remain locked until the write to the transaction submit.

Similarly, we have to analyze dirty read transaction id = 2 update data after 1, before submitting, write lock on the object, so the transaction reads data 1 id = 1, 2 will have been waiting for the end of the transaction, in the blocked state, avoiding the generation of dirty read.

同样,来分析不可重复读,事务1读取id=1的数据后并没有锁住该数据,所以事务2能对这条数据进行更新,事务2对更新并提交后,该数据立即生效,所以事务1再次执行同样的查询,查询到的结果便与第一次查到的不同,所以已提交读防不了不可重复读.

May repeatability

RELEVETATE READS is an isolation level between submitted read and serializable (crap ?), it is InnoDB’s default isolation level, which is the focus of my article, so here I first sell a close, I’ll cover it in more detail later.

Serializable

Serializable (Serializable) is a high level of isolation, which seek to read and write locks on the selected object remains until the end of the transaction to release, they are able to defend the appeal all the problems, but because it is serialized, so less efficient.

After learning some background knowledge of Appeal, following the official start of our discussion.

Repeatable read (Repeatable read) can guard phantom read it?

Repeatable read

Speaking before the repeatable read, let’s do the following experiment under the InnoDB mysql.

Can be seen, neither the read data transaction A transaction update B, C added data transaction did not read, so in this scenario, both the non-repeatable read guard, the guard also phantom read.

So far, I believe we already know the answer, this is how to do it?

Optimistic and pessimistic locking lock

We said earlier lock on the object, is a pessimistic locking mechanism, there are a lot of articles saying Repeatable Read isolation level anti-magic can not read, is considered repeatable read will read the line lock, led him to modify the transaction can not this data until the end of the transaction, but this program can only lock the data row if new data come in, can not stop, it will produce phantom reads.

But MySQL, ORACLE, PostgreSQL, etc. is already a very mature database, how could so simply using this program it affect performance?

Let me introduce pessimistic locking and optimistic locking.

Pessimistic locking

As the name implies, it refers to the data by outsiders (including the current affairs of the other systems, and transaction processing from external systems) to modify a conservative approach, therefore, the entire data processing, the data is locked. To lock when reading data, other transactions can not modify these data. Also locked when modified to delete the data, other transactions can not read the data.

Optimistic locking

For relatively pessimistic locking, optimistic locking mechanism taken a more relaxed locking mechanism. Rely on the database lock mechanism to achieve pessimistic locking in most cases, in order to ensure the greatest degree of exclusivity operation. But the attendant is significant overhead database performance, especially for long transaction, which often can not afford the overhead.

The optimistic locking mechanism to some extent solved the problem. Optimistic locking, most of the data version (Version) recording mechanism implementation. What version of the data? A version identifier data is the increase in the release solution is based on a database table, is generally achieved by adding a “version” database table fields. When reading out the data, read together this version, when after the update, this version number plus one. At this time, the data will be submitted to the version with the current version of the database table data corresponding to the information recorded for comparison, data submitted is greater than the version number of the database table if the current version number, the update them, or that the data is expired.

MySQL, ORACLE, PostgreSQL, etc. are used MVCC (multi-version concurrency control) with the theoretical basis of optimistic locking to avoid non-repeatable read and phantom read, realize MVCC is no fixed standard, each database will have different implementations , discussed here is the InnoDB MVCC.

MVCC (multi-version concurrency control)

In InnoDB, will be added after each line data from two additional hidden value to achieve MVCC, these two values ​​a record of when this row of data is created, another record when this line of data expired (or deleted) . In practice, the storage is not time, but the version number of the transaction, each open a new transaction, the transaction version number is incremented. In the case can be re-read transaction isolation level Repeatable reads:

    SELECT time, reads create a version number <= current="" transaction="" version="" number,="" delete="" the="" number="" is="" empty="" or=""> version number of the current transaction.

    INSERT, the version number to save the current transaction to create a version number of the line

    DELETE, the version number to save the current transaction to delete the version number of the line

    UPDATE, insert a new record, save the current transaction version is the version number of the row created, while preserving the current transaction to the original version number of rows deleted

By MVCC, although each rows have extra storage space to record version, the need for more inspection lines and some extra work, but you can reduce the use of locks, mostly read operations are not locked, read data manipulation simple, good performance.

Careful students should also see, in fact, read out by MVCC data is historical data, instead of the latest data, which in some particularly sensitive to aging business data, it is likely to go wrong, this is the place of the short board of MVCC , there is a solution? Of course.

MCVV read this historical data is called a snapshot reading (snapshot read), and read the way the current version of the data in the database, called the current reading (current read).

Snapshot Reading

We usually only use select the snapshot is read, thus reducing the overhead caused by lock.

select * from table ....

Current Reading

For data modification operation will (update, insert, delete) are based on current read mode. When performing these operations will read the latest recording, even if the data submitted by other transactions can be queried. Suppose you want to update a record, but in another transaction has been delete this data out and commit, and if the update will conflict, so when the need to know the latest update data. Read the latest data, we need to lock. The first statement requires the following shared locks, others need to add an exclusive lock.

select * from table where ? lock in share mode; 
select * from table where ? for update; 
insert; 
update; 
delete;

We re-use the current read to experiment.

Can be seen in isolation level read submission, Transaction 1 modifies all class_id = 1 data, the time when the transaction 2 insert, transaction A baffling than a line class_id data = 1, but not before being update modified , resulting in a phantom read under the read-committed.

And in a repeatable level of isolation, the situation is completely different. After an update transaction, the data is locked, the transaction can not insert new data B, so that A transaction consistent before and after the update data, avoiding phantom read be clear that, update lock is certainly not just have a few queries to the data, because it can not stop the insert, some students will say that the entire table is locked chant.

Again, Mysql is already a mature database, and how will use so inefficient way to do it? In fact, here’s a lock, is achieved by next-key locks.

Next-Key Lock

In Users This table inside, class_id is a non-clustered index, the database will maintain the relationship a non-clustered index and primary key by B + tree, in simple terms, we first = 1 to find the index by class_id all nodes corresponding to these nodes stores the primary key information corresponding to the data, that is id = 1, then we pass the primary key id = 1 to find our data, this process is called back to the table.

Do not understand the underlying principles of database indexes? That’s because you did not mind point b tree
    Go learn: https://www.cnblogs.com/sujing/p/11110292.html

I would like to use this article in our example to draw a B + tree, but painted ugly, pulled down in order to avoid bias in this article B cells. So I would like to quote the article above, there is painting of B + tree to explain Next- key.

User table assume that we use above the need to establish a non-clustered index on the Name, it is how to achieve it? We look:

B + tree is characteristic of all the data stored in the leaf node, a non-clustered index Qin Shousheng an example, the right leaf node stores all of Qin Shousheng Qin Shousheng corresponding Id, i.e., FIG. 34, in our this data is currently done reading, this article will add data row lock for a row lock is well understood, can prevent other transactions its update or delete, but why should we increase GAP lock it?

Again, all data stored in the B + tree leaf node when there is a new incoming data called Qin Shousheng, data must be ahead of or behind this is id = 34, if the front and rear our range lock, and that of course will not get a new Qin Shousheng come.

So, if there is a new range of systems to be inserted? Because the system before and after the van was not locked, it is able to successfully inserted, thus greatly enhancing the ability of concurrent database.

Mashiqianti

Above said Repeatable read can prevent non-repeatable read, but also anti phantom read, it defend all the phantom read it? Of course not, there Mashiqianti time.

For example the following examples:

1.a transaction to select, b Affairs insert does plus a gap lock, but if b Affairs commit, this gap will lock release (after the release of a transaction can freely operate),
    2.a transaction and then select out the results and also the first time select as in the MVCC,
    3. Then a transaction without conditions update, the update will act on all the rows (including the newly added transaction b),
    4.a affairs once again select b affairs of the new line will appear, and update the new line has been modified.

Magic Reading Mysql official explanation given is: as long as a transaction, select the second row even more out of the phantom reads, so this scenario, the count appeared phantom read.

Finally, then leave a question, do you know why there will be examples of Appeal phantom read? Guestbook discussion.

Reference article:
    MySQL 5.6 Reference Manual
    understanding InnoDB transaction isolation levels
    MySQL · source code analysis big difference · InnoDB Repeatable Read isolation level of
    Do not understand the underlying principles of database indexes? That’s because you did not mind point b tree
    The relationship Innodb transaction isolation levels and locks
    MySQL InnoDB row lock in the Next-Key Lock eliminate phantom read

Leave a Reply