MySQL: Thinking caused by USE DB blocking failure

MySQL: Thinking caused by USE DB blocking failure

When encountering a failure, we often think about how to solve the failure, rather than thinking about the cause of the failure from the root of the failure? This result can only make us get fish and lose it. Today, we will share a case of thinking caused by the USE DB blockage failure.

Fault description

Today a friend encountered a serious failure in the database. The failure environment is as follows:

  • MYSQL 5.6.16
  • RR isolation level
  • GITD closed

The performance is as follows:

  • use db cannot enter the database
  • show table status cannot query table information
  • From schema.processlist, there are a lot of Waiting for table metadata lock

In a hurry, he killed a lot of threads and found that he still couldn't recover. Finally, he killed a thing that was not submitted in time before returning to normal. Only a screenshot of the following figure is left:

Fault information extraction

Returning to the figure above, we can summarize the sentence types as follows:

its STATE is sending data

is Waiting for table metadata lock

is Waiting for table metadata lock

Its STATE is Waiting for table metadata lock

Information analysis

It is not easy to analyze this case because it is a comprehensive case of MYSQL layer MDL LOCK and RR mode innodb row lock, and we have to be more sensitive to the STATE of schema.processlist.

It is recommended to read my article as follows to learn MDL LOCK:

The verification of MDL LOCK in this section uses the following two methods:

Method 1 : The author adds log output to the MDL LOCK source code lock function. If you want to analyze various statements and the type of MDL LOCK, you can only use this method, because MDL LOCK locks often happen in a flash, performance_schema.metadata_locks does not Way to observe.

Method 2 : Use performance_schema.metadata_locks of version 5.7 to observe under congestion.

Open the mdl monitoring method in P_S as follows:

1. About CREATE TABLE A AS SELECT B Analysis of B table sending data

The status of sending data can actually represent a lot of meanings. From my current understanding, this is a general term for such statements of SELECT type statements of the MYSQL upper layer when the INNODB layer and the MYSQL layer exchange data, so it appears. May contain:

  • It is true that the amount of data that needs to be accessed is particularly large and may need to be optimized.
  • Because the INNODB layer needs to wait to obtain the row lock, such as our common SELECT FOR UPDATE.

At the same time, we also need to pay attention to the locking method of SELECT B in RR mode and INSERT...SELECT are the same. Reference will not go into details:

From the situation he responded to because he killed a long-term uncommitted thing at the end, he was because it was situation 2. And the entire CREATE TABLE A AS SELECT B statement cannot be acquired because some databases on the B table are locked, resulting in the entire statement being in the sending data state.

2. Analysis on SHOW TABLE STATUS[like'A'] Waiting for table metadata lock

This is the most important part of this case. SHOW TABLE STATUS[like'A'] is actually blocked and its STATE is Waiting for table metadata lock. Note that this is a table because there are many types of MDL LOCK. In the article I introduced in MDL, I mentioned that when desc is a table, MDL_SHARED_HIGH_PRIO(SH) will be displayed. In fact, MDL_SHARED_HIGH_PRIO(SH) will be displayed on this table during SHOW TABLE STATUS.

Both methods can observe the existence of MDL_SHARED_HIGH_PRIO(SH) and the one I simulated is under congestion.

But MDL_SHARED_HIGH_PRIO(SH) is an MDL LOCK type with a very high priority. The performance is as follows:

  • compatibility:

  • Blocking queue priority:

The condition of its being blocked is not possible except being blocked by MDL_EXCLUSIVE(X). So this is a very important breakthrough.

3. about the analysis of CREATE TABLE A AS SELECT B adding MDL LOCK to A table

I didn't know this before, and it was also the place that took the most time in this case. The previous article has analyzed that SHOW TABLE STATUS[like'A'] such as MDL_SHARED_HIGH_PRIO(SH) MDL LOCK statement is blocked in MDL There is only one possibility on LOCK that is MDL_EXCLUSIVE(X) on the A list.

Then I began to suspect that this DDL statement will MDL_EXCLUSIVE(X) on the A table before the end of the statement, and then the actual test is unsurprisingly like this:

It s a pity here that MDL_EXCLUSIVE(X) is not displayed in performance_schema.metadata_locks, but is displayed as MDL_SHARED(S). But we can see in the log I output that the upgrade operation has been done here to upgrade MDL_SHARED(S) to MDL_EXCLUSIVE(X ). And from the previous compatibility list, only MDL_EXCLUSIVE(X) will block MDL_SHARED_HIGH_PRIO(SH). So we should be able to confirm that the upgrade operation is indeed done here, otherwise SHOW TABLE STATUS[like'A'] will not be blocked.

4. Analysis on SELECT * FROM A Waiting for table metadata lock

Maybe everyone thinks that SELECT will not be locked, but that is at the innodb level. MDL_SHARED_READ(SR) at the MYSQL level is as follows:

You can see that there is indeed MDL_SHARED_READ (SR), and it is currently blocked

Its compatibility is as follows:

Obviously MDL_SHARED_READ(SR) and MDL_SHARED_HIGH_PRIO(SH) are incompatible and need to wait.

5. Analysis on DROP TABLE A Waiting for table metadata lock

This is a good analysis because table A has an X lock and DROP TABLE A must have an MDL_EXCLUSIVE(X) lock. It is of course incompatible with MDL_EXCLUSIVE(X). as follows:

Where EXCLUSIVE is what we call MDL_EXCLUSIVE(X), it does exist and is currently blocked

6. Why is the use db blocked?

If you use the mysql client without using the -A option (or no-auto-rehash), at least the following things should be done when USE DB:

1. The MDL (SH) lock on each table under db is as follows (call MDL_context::acquire_lock here to give information on blocking)

You can see that USE DB is indeed blocked because of MDL_SHARED_HIGH_PRIO(SH).

2. Add to the table cache for each table and open the table (call open_table_from_share())

Then this situation is exactly the same as the situation where SHOW TABLE STATUS[like'A'] is blocked, and it is also caused by the incompatibility of MDL locks.

Analysis and combing

With the previous analysis, we can sort out the reasons for this failure as follows:

statement that has not been submitted for a long time on table B will add innodb row lock to some data in table B at the innodb layer.
  • The blockage of CREATE TABLE A AS SELECT B caused by step 1
Because SELECT B in RR mode must lock the satisfied data on table B, wait is triggered because step 1 has been locked, and STATE is sending data.
  • Blocking of other sentences caused by step 2
Because CRATE TABLE A AS SELECT B will have MDL_EXCLUSIVE(X) before the creation of the A table is completed, this lock will block all other statements about the A table, including DESC/SHOW TABLE STATUS/USE DB (non-A). Only use the MDL_SHARED_HIGH_PRIO(SH)MDL LOCK statement. STATE is unified as Waiting for table metadata lock.
Mock test

test environment:

GITD closes the
RR isolation level

Use script:

Proceed as follows:

Finally, the waiting state we see is as follows:

In this way, we can perfectly simulate the online state. If we kill the things in session1, all will be unlocked naturally. Let's take a look at the output in performance_schema.metadata_locks:

We can see the above output, but we need to pay attention to LOCK_TYPE: SHARED, it is impossible to block LOCK_TYPE: SHARED_HIGH_PRIO (you can refer to the appendix or the MDL LOCK analysis article I wrote before). (X).

  • In RC mode, although table B in CREATE TABLE A SELECT B will not have any INNODB ROW LOCK, if table B is very large, table A will also be protected by MDL_EXCLUSIVE(X), so USE DB\SHOW TABLE STATUS will also be triggered. Happening.
  • If you turn on GTID, you cannot use statements like CREATE TABLE A SELECT B.
  • For systems with mixed DML/DDL, you must pay attention to concurrency, as in this example, if you notice high concurrency, you can find ways to avoid it.
  • This case shows once again that things that are not submitted for a long time may cause tragedy, so it is recommended to monitor transactions that are not completed for more than N seconds.


Compatibility matrix

Waiting queue priority matrix

Original release time: 2017-12-22 author: Gao Yunqi article from community partners' data and the cloud "for information may concern" data cloud "micro-channel public number