This article explains the principles and applications of Unique SQL in detail

This article explains the principles and applications of Unique SQL in detail

Abstract: A certain algorithm is combined with each node in the parse tree to calculate an integer value to uniquely identify this type of SQL. This integer value is called Unique SQL ID. SQL statements with the same Unique SQL ID belong to the same "Unique SQL".

1. What is Unique SQL

When the user executes the SQL statement, each SQL statement text will enter the parser to generate a "parse tree". Traverse each node in the parse tree, ignore the constant value, combine each node in the tree with a certain algorithm, and calculate an integer value to uniquely identify this type of SQL. This integer value is called Unique SQL ID , SQL statements with the same Unique SQL ID belong to the same "Unique SQL".

For example, the user enters the following two SQL statements one after another:

select * from t1 where id = 1; select * from t1 where id = 2;

These two SQL statements are the same except for the constant values of the filter conditions. The topological structure of the parse tree generated from this is exactly the same, so the Unique SQL ID is also the same. Therefore, the two statements belong to the same Unique SQL as follows:

select * from t1 where id = ?;

The GaussDB kernel summarizes statistical information for all the SQL statements in the above form and presents them to users through views. In this way, the interference of some irrelevant constant values can be eliminated, statistical data of a certain type of SQL statement can be obtained, and a numerical basis can be provided for performance analysis and problem location.

Note that for the calculation of Unique SQL ID, only constant values will be excluded, but other differences will not be excluded. For example, the SQL statement "select * from t2 where id = 1;" and the above SQL do not belong to the same Unique SQL, and the same SQL statement executed from different CN nodes by different users does not belong to the same Unique SQL.

2. How does Unique SQL count

After receiving the SQL request, the GaussDB kernel first calculates its Unique SQL ID. If the Unique SQL ID already exists, the relevant statistical information is directly updated. If it does not exist, first create a Unique SQL, and then update the statistics, as shown in the following figure:

Unique SQL statistics include execution times, response time, Cache/IO number, row activity and time distribution, etc., which can be queried through the following two views:

  • gs_instr_unique_sql
  • pgxc_instr_unique_sql

The former displays the Unique SQL information on the current CN (Coordinator Node) node (the node that executes the current SQL command), and the latter displays the Unique SQL information on all CN nodes in the system. The format of the two views is the same, and both consist of the fields in the following table:

3. How to use Unique SQL

To use the Unique SQL function, you need to turn on the following variable switches:

  • enable_resource_check (default is on)
  • track_counts (default is on, which affects row activity and Cache/IO related fields)

In addition, instr_unique_sql_count needs to be set to a positive integer. This variable is 0 by default and cannot be modified in the gsql session. It needs to be set by SIGHUP, for example:

gs_guc reload -Z coordinator -D/path/to/coordinator1/-c "instr_unique_sql_count=20">/dev/null

The instr_unique_sql_count parameter determines the number of unique sql collected by the system. When the number of uniques collected reaches this number, new SQL is no longer collected. If the value is increased, the original unique sql information will be retained and new unique sql will be collected at the same time. If the value is reduced, all the unique sql information collected in the current CN node will be cleared, and then new unique sql will be collected.

After setting the above variables, the Unique sql statistical view can be queried like a normal view, for example:

postgres=# select node_name,query,n_calls from pgxc_instr_unique_sql; node_name | query | n_calls --------------+------------------ ------------------------------------------+------- - coordinator2 | select node_name,query,n_calls from pgxc_instr_unique_sql; | 0 (1 row)

The system function reset_instr_unique_sql can clean up unique sql information. This function has 3 parameters, which have the following meanings:

**1. scope: **If it is "GLOBAL", all data on the CN node will be cleared; if it is "LOCAL", only the data on the current CN will be cleared.

**2. type: **If it is "ALL", all data will be cleared; if it is "BY_USERID", only the unique SQL of the specified user will be cleared; if it is "BY_CNID", only the unique SQL of the specified CN will be cleared.

**3. value: **If type="ALL", this parameter is meaningless; if type="BY_USERID", this parameter is the ID of the specified user, if type="BY_CNID", this parameter is the ID of the specified CN.

E.g:

postgres=# select reset_instr_unique_sql('global','all',0); reset_instr_unique_sql ------------------------ t (1 row)

In addition, if the database process is restarted, the previously collected unique SQL information will be cleared.

4. Use Unique SQL to assist in locating problems

The unique sql view provides a wealth of information, and users can choose the information that is helpful to them according to their needs. This section provides examples for the actual situations that customers encounter in the production environment to illustrate several ways to use this view, which can be used as a reference for performance optimization.

4.1 Disk contention caused by abnormal row activity in the query

Abnormal row activity may cause disk contention and slow business operations. By viewing the fluctuations of the number of scanned rows, returned functions, and changed rows, you can find abnormal row activity and help locate the cause.

postgres = # select sum (n_returned_rows) n_returned_rows, sum (n_tuples_fetched) n_tuples_fetched, sum (n_tuples_returned) n_tuples_returned, sum (n_tuples_inserted) n_tuples_inserted, sum (n_tuples_updated) n_tuples_updated, sum (n_tuples_deleted) n_tuples_deleted from pgxc_instr_unique_sql; n_returned_rows | n_tuples_fetched | n_tuples_returned | n_tuples_inserted | n_tuples_updated | n_tuples_deleted -----------------+------------------+---------- ---------+-------------------+------------------+- ----------------- 234 | 0 | 0 | 0 | 0 | 0 (1 row)

4.2 Query Top SQL's resource occupancy

The SQL statements in the unique SQL view can be sorted based on execution time, CPU time, number of scan lines, physical read/logical read, etc., to find out those SQL statements that take up the most resources, and analyze their performance in a targeted manner. The impact and cause, to help find and locate the problem. E.g,

  • Sort by SQL execution time order or reverse order:

SELECT user_name, unique_sql_id, query, total_elapse_time FROM pgxc_instr_unique_sql ORDER BY total_elapse_time ASC or DESC;

  • Sort in order or in reverse order according to the CPU time occupied by SQL execution:

SELECT user_name, unique_sql_id, query, cpu_time FROM pgxc_instr_unique_sql ORDER BY cpu_time ASC or DESC;

  • Sort by the number of rows in SQL order or in reverse order:

SELECT user_name, unique_sql_id, query, n_tuples_returned FROM pgxc_instr_unique_sql ORDER BY n_tuples_returned ASC or DESC;

  • Sort in order or reverse order according to the SQL total scan line:

SELECT user_name, unique_sql_id, query, n_tuples_fetched + n_tuples_returned FROM pgxc_instr_unique_sql ORDER BY n_tuples_fetched + n_tuples_returned ASC or DESC;

  • Sort by SQL execution executor time in order or in reverse order:

SELECT user_name, unique_sql_id, query, execution_time FROM pgxc_instr_unique_sql ORDER BY execution_time ASC or DESC;

  • Sort in order or in reverse order according to the number of physical reads executed by SQL:

SELECT user_name, unique_sql_id, query, n_blocks_fetched FROM pgxc_instr_unique_sql ORDER BY n_blocks_fetched ASC or DESC;

  • Sort in order or in reverse order according to the number of SQL execution logic reads:

SELECT user_name, unique_sql_id, query, n_blocks_hit FROM pgxc_instr_unique_sql ORDER BY n_blocks_hit ASC or DESC;

4.3 Query the number of logical reads/physical reads

Excessive logical read/physical read may cause SQL statements to take up more CPU time. By querying the unique SQL view, you can get the number of sql statement logical/physical read data blocks, and assist in determining the reason for the slow response:

  • Query the number of physical read blocks:

SELECT n_blocks_fetched FROM pgxc_instr_unique_sql;

  • Query the number of logical read blocks:

SELECT n_blocks_hit FROM pgxc_instr_unique_sql;

4.4 Diagnosing low performance due to insufficient memory quota

If the database buffer is set too small, the results of each SQL statement execution will not be cached. If the current SQL execution is completed, if there are other SQL executions, it will crowd out the execution results of the last or several SQL caches in the memory In the next round, if the current SQL is executed again, it needs to perform physical IO to read data from the disk, instead of directly fetching data from the cache, resulting in poor SQL execution performance.

Whether the buffer quota is large enough can be judged by the hit rate. Buffer hit rate=n_blocks_hit/n_blocks_fetched, you can query unique SQL to diagnose whether there is insufficient memory quota:

SELECT (n_blocks_hit/n_blocks_fetched) AS hit_ratio from pgxc_instr_unique_sql;

Click to follow and learn about Huawei Cloud's fresh technology for the first time~