Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries - Mailing list pgsql-hackers

From Remi Colinet
Subject Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries
Date
Msg-id CADdR5nxvTuK34ShnDYcLpnkS06nofrPZVr4sW2RcSdDCKWFg0Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries  (Maksim Milyutin <m.milyutin@postgrespro.ru>)
Responses Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries  (Remi Colinet <remi.colinet@gmail.com>)
Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries  (Maksim Milyutin <m.milyutin@postgrespro.ru>)
List pgsql-hackers
Maksim,


2017-04-18 20:31 GMT+02:00 Maksim Milyutin <m.milyutin@postgrespro.ru>:
On 18.04.2017 17:39, Remi Colinet wrote:
Hello Maksim,

The core implementation I suggested for the new PROGRESS command uses
different functions from the one used by EXPLAIN for its core
implementation.
Some source code is shared with EXPLAIN command. But this shared code is
only related to quals, properties, children, subPlans and few other nodes.

All other code for PROGRESS is new code.

I don't believe explain.c code can be fully shared with the one of the
new PROGRESS command. These 2 commands have different purposes.
The core code used for the new PROGRESS command is very different from
the core code used for EXPLAIN.


Perhaps you will be forced to duplicate significant snippets of functionality from explain.c into your progress.c.
 
Currently, few code is duplicated between EXPLAIN and PROGRESS commands. The duplicated code could be moved to file src/backend/commands/report.c which is used to gather shared code between the 2 commands. I will try to complete this code sharing as much as possible.

The main point is that PROGRESS uses the same design pattern as EXPLAIN by parsing the query tree. The work horse of the PROGRESS command is ProgressNode() which calls recursively sub nodes until we reach leaf nodes such as SeqScan, IndexScan, TupleStore, Sort, Material, ... . EXPLAIN command uses a similar work horse with function ExplainNode() which eventually calls different leaf nodes.

Some of the leaf nodes which are common to the 2 commands have been put in the file src/backend/commands/report.c. May be some further code sharing is also possible for the work horse by using a template function which would call EXPLAIN specific leaf node functions or PROGRESS specific leaf node functions.
 


Regarding the queryDesc state of SQL query upon receiving a request to
report its execution progress, it does not bring any issue. The request
is noted when the signal is received by the monitored backend. Then, the
backend continues its execution code path. When interrupts are checked
in the executor code, the request will be dealt.


Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries.

When the request is being dealt, the monitored backend will stop its
execution and report the progress of the SQL query. Whatever is the
status of the SQL query, progress.c code checks the status and report
either that the SQL query does not have a valid status, or otherwise the
current execution state of the SQL query.

SQL query status checking is about:
- idle transaction
- out of transaction status
- null planned statement
- utility command
- self monitoring

Other tests can be added if needed to exclude some SQL query state. Such
checking is done in void HandleProgressRequest(void).
I do not see why a SQL query progression would not be possible in this
context. Even when the queryDescc is NULL, we can just report a <idle
transaction> output. This is currently the case with the patch suggested.


It's interesting question - how much the active query is in a usable state on the stage of execution. Tom Lane noticed that CHECK_FOR_INTERRUPTS doesn't give us 100% guarantee about full consistency [1].

I wonder what you mean about usable state.

Currently, the code suggested tests the queryDesc pointer and all the sub nodes pointers in order to detect NULL pointers. When the progress report is collected by the backend, this backend does the collect and consequently does not run the query. So the query tree is not being modified. At this moment, whatever is the query state, we can manage to deal with its static state. It is only a tree which could also be just a NULL pointer in the most extreme case. Such case is dealt in the current code.
 

So far, I've found this new command very handy. It allows to evaluate
the time needed to complete a SQL query.


Could you explain how you get the percent of execution for nodes of plan tree and overall for query?

The progress of execution of the query is computed as follows at 2 different places for each leaf node type (Scan, IndexScan, Sort, Material, TupleStore, ...):

- one place in the executor code, or in access methods code, or in sort utilities code, used during the execution of the SQL query in which following values are counted for instance: rows R/W, blocks, R/W, tapes R/W used for sort, tuple store R/W, ... . Some of these values are already computed in the current Postgresql official source code. Some other values had to be added and collected.

- one place in the leaf function of each node type (ProgressScan(), ProgressSort(), ...) in which percents are computed and are then dumped together with raw values collected during execution, in the report. The dump details can be selected with the VERBOSE option of the PROGRESS command (For instance # PROGRESS VERBOSE $pid)

For instance:

1/ read/write rows are collected when running the executor in the file src/backend/executor/execProcnode.c
==============================================================================

This is already implemented in the current official source tree. Nothing mode needs to be implemented to collect values (total rows number and current rows number already fetched are collected).
The report is implemented in leaf function ProgressScanRows().

2/ read/write blocks are collected in the file src/backend/access/heap/heapam.c
==========================================================

This is already implemented in the current official source tree. Nothing more needs to be implemented to collect values during execution.
 The report is implemented in a leaf function ProgressScanBlks().

3/ the sort progressions are collected in the file src/backend/utils/sort/tuplesort.c
==========================================================

[root@rco pg]# git diff --stat master.. src/backend/utils/sort/tuplesort.c
 src/backend/utils/sort/tuplesort.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 127 insertions(+), 15 deletions(-)
[root@rco pg]#

New fields have been added to compute the different I/O (read/write/merge/...) per tapes for instance, during a sort on tape.
The report of Sort node is implemented in leaf function ProgressSort()
 
4/ the tuple store progressions are computed in the file src/backend/utils/sort/tuplestore.c
=================================================================

[root@rco pg]# git diff --stat master.. src/backend/utils/sort/tuplestore.c
 src/backend/utils/sort/tuplestore.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 56 insertions(+), 17 deletions(-)
[root@rco pg]#

New fields have been added to collect the I/O needed for such tuple store.
The report of TupleStore node is implemented in leaf function ProgressTupleStore().


Other node types have been implemented: TidScan, IndexScan, LimitScan, CustomScan, Hash, ModifyTable.
Such node may require some new fields to collect values during the SQL query execution.

Overall, the overhead caused by new values collected during the SQL query execution, is very low.
A few values need to be collected.


A further improvement would be to report the memory, disk and time
resources used by the monitored backend. An overuse of memory, disk and
time resources can prevent the SQL query to complete.


This functionality is somehow implemented in explain.c. You can see my patch to this file [2]. I only manipulate runtime statistics (data in the structure 'Instrumentation') to achieve the printing state of running query.

I will check your patch and try to add such feature to the current patch.
It provides a valuable hint to estimate whether a SQL query has a chance to complete and will not reached the resource limits.
.

1. https://www.postgresql.org/message-id/24182.1472745492%40sss.pgh.pa.us
2. https://github.com/postgrespro/pg_query_state/blob/master/runtime_explain.patch

Thanks for your suggestions and comments

Regards
Remi

--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

pgsql-hackers by date:

Previous
From: Victor Yegorov
Date:
Subject: Re: [HACKERS] Relation cache invalidation on replica
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] OK, so culicidae is *still* broken