[HACKERS] [PATCH] New command to monitor progression of long running queries - Mailing list pgsql-hackers

From Remi Colinet
Subject [HACKERS] [PATCH] New command to monitor progression of long running queries
Date
Msg-id CADdR5nyi8fUXv7mOn-TmuY4JF7yQRhmCE_dZxqzd8YGdwc28yg@mail.gmail.com
Whole thread Raw
Responses Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries  (Maksim Milyutin <m.milyutin@postgrespro.ru>)
Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries  (Vinayak Pokale <vinpokale@gmail.com>)
List pgsql-hackers
Hello,

I've implemented a new command named PROGRESS to monitor progression of long running SQL queries in a backend process.


Use case
=======

A use case is shown in the below example based on a table named t_10m with 10 millions rows.

The table has been created with :

CREATE TABLE T_10M ( id integer, md5 text);
INSERT INTO T_10M SELECT generate_series(1,10000000) AS id, md5(random()::text) AS md5;

1/ Start a first psql session to run long SQL queries:

[pgadm@rco ~]$ psql -A -d test
psql (10devel)
Type "help" for help.
test=#

The option -A is used to allow rows to be output without formatting work.

Redirect output to a file in order to let the query run without terminal interaction:
test=# \o out

Start a long running query:
test=# select * from t_10M order by md5;

2/ In a second psql session, list the backend pid and their SQL query

[pgadm@rco ~]$ psql -d test
psql (10devel)
Type "help" for help.

test=# select pid, query from pg_stat_activity ;
  pid  |                   query                  
-------+-------------------------------------------
 19081 |
 19084 |
 19339 | select pid, query from pg_stat_activity ;
 19341 | select * from t_10m order by md5;
 19727 | select * from t_10m order by md5;
 19726 | select * from t_10m order by md5;
 19079 |
 19078 |
 19080 |
(9 rows)

test=#

Chose the pid of the backend running the long SQL query to be monitored. Above example is a parallel SQL query. Lowest pid is the main backend of the query.

test=# PROGRESS 19341;
                                       PLAN PROGRESS                                      
-------------------------------------------------------------------------------------------
 Gather Merge
   ->  Sort=> dumping tuples to tapes
         rows r/w merge 0/0 rows r/w effective 0/2722972 0%
         Sort Key: md5
         ->  Parallel Seq Scan on t_10m => rows 2751606/3954135 69% blks 125938/161222 78%
(5 rows)

test=#

The query of the monitored backend is:
test=# select * from t_10M order by md5;

Because the table has 10 millions of rows, the sort is done on tapes.


Design of the command
=================

The design of the patch/command is:
- the user issue the "PROGRESS pid" command from a psql session. The pid is the one of the backend which runs the SQL query for which we want to get a progression report. It can be determined from the view pg_stat_activity.
- the monitoring backend, upon receiving the "PROGRESS pid" command from psql utility used in step above, sends a signal to the backend whose process pid is the one provided in the PROGRESS command.
- the monitored backend receives the signal and notes the request as for any interrupt. Then, it continues its execution of its SQL query until interrupts can be serviced.
- when the monitored process can service the interrupts, it deals with the progress request by collecting its execution tree with the execution progress of each long running node. At this time, the SQL query is no more running. The progression of each node is calculated during the execution of the SQL query which is at this moment stopped. The execution tree is dumped in shared memory pages allocated at the start of the server. Then, the monitored backend set a latch on which the monitoring process is waiting for. It then continues executing its SQL query.
- the monitoring backend collects the share memory data dumped by the monitored backed, and sends it to its psql session, as a list of rows.

The command PROGRESS does not incur any slowness on the running query because the execution progress is only computed upon receiving the progress request which is supposed to be seldom used.

The code heavily reuses the one of the explain command. In order to share as much code as possible with the EXPLAIN command, part of the EXPLAIN code which deals with reporting quals for instance, has been moved to a new report.c file in the src/backend/commands folder. This code in report.c is shared between explain.c source code and PROGRESS command source code which is in progress.c file.

The progression reported by PROGRESS command is given in terms of rows, blocks, bytes and percents. The values displayed depend on the node type in the execution plan.

The current patch implements all the possible nodes which could take a lot of time:
- Sequential scan nodes with rows and block progress (node type T_SeqScan, T_SampleScan, T_BitmapHeaepScan, T_SubqueryScan, T_FunctionScan, T_ValuesScan, T_CteScan, T_WorkTableScan)
- Tuple id scan node with rows and blocks progress (T_TidScan)
- Limit node with rows progress (T_Limit)
- Foreign and custom scan with rows and blocks progress (T_ForeignScan, T_CustomScan)
- Index scan, index only scan and bitmap index scan with rows and blocks progress


Patch
====

The diff stat of the patch is:

[root@rco pg]# git diff --stat master..
 contrib/auto_explain/auto_explain.c       |    5 +-
 contrib/postgres_fdw/postgres_fdw.c       |   13 +-
 src/backend/access/heap/heapam.c          |    2 +
 src/backend/commands/Makefile             |    3 +-
 src/backend/commands/explain.c            | 2834 ++++++++++++++-----------------------------------------------------------------------------------------------
 src/backend/commands/prepare.c            |    5 +-
 src/backend/commands/progress.c           | 1314 +++++++++++++++++++++++++++++++++++++++++++++++++++
 src/backend/commands/report.c             | 2120 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/backend/executor/execProcnode.c       |   31 ++
 src/backend/executor/nodeBitmapHeapscan.c |   13 +-
 src/backend/executor/nodeIndexonlyscan.c  |   13 +-
 src/backend/executor/nodeIndexscan.c      |   15 +-
 src/backend/executor/nodeSamplescan.c     |   12 +-
 src/backend/executor/nodeSeqscan.c        |   16 +-
 src/backend/nodes/bitmapset.c             |   19 +
 src/backend/nodes/outfuncs.c              |  245 ++++++++++
 src/backend/parser/gram.y                 |   99 +++-
 src/backend/postmaster/postmaster.c       |    1 +
 src/backend/storage/file/buffile.c        |   47 ++
 src/backend/storage/ipc/ipci.c            |    3 +
 src/backend/storage/ipc/procarray.c       |   57 +++
 src/backend/storage/ipc/procsignal.c      |    4 +
 src/backend/storage/lmgr/lwlock.c         |    7 +-
 src/backend/storage/lmgr/lwlocknames.txt  |    1 +
 src/backend/tcop/postgres.c               |   10 +
 src/backend/tcop/pquery.c                 |   25 +
 src/backend/tcop/utility.c                |   10 +
 src/backend/utils/init/globals.c          |   12 +
 src/backend/utils/sort/tuplesort.c        |  142 +++++-
 src/backend/utils/sort/tuplestore.c       |   73 ++-
 src/include/commands/explain.h            |   67 +--
 src/include/commands/prepare.h            |    2 +-
 src/include/commands/report.h             |  136 ++++++
 src/include/executor/execdesc.h           |    2 +
 src/include/executor/progress.h           |   52 ++
 src/include/foreign/fdwapi.h              |   10 +-
 src/include/nodes/bitmapset.h             |    1 +
 src/include/nodes/execnodes.h             |    3 +
 src/include/nodes/extensible.h            |    6 +-
 src/include/nodes/nodes.h                 |    8 +
 src/include/nodes/parsenodes.h            |   11 +
 src/include/nodes/plannodes.h             |   11 +
 src/include/parser/kwlist.h               |    4 +
 src/include/pgstat.h                      |    3 +-
 src/include/storage/buffile.h             |    8 +
 src/include/storage/procarray.h           |    3 +
 src/include/storage/procsignal.h          |    3 +
 src/include/utils/tuplesort.h             |   71 ++-
 src/include/utils/tuplestore.h            |   33 ++
 49 files changed, 4979 insertions(+), 2606 deletions(-)
[root@rco pg]#

The progress command can be used with the watch command of psql making it more handy to monitor a long running query.
The default format of the PROGRESS command is text. It can be easily expanded to json and xml as for EXPLAIN command.

The patch is based on commit 85a0781334a204c15c9c6ea9d3e6c75334c2beb6 (Date: Fri Apr 14 17:51:25 2017 -0400)


Use cases
========

Some further examples of use are shown below in the test_v1.txt file.


What do you make of this idea/patch?

Does it make sense?

Any suggestion is welcome.

The current patch is still work in progress. It is meanwhile stable. It can be used with regular queries. Utilities commands are not supported for the moment.
Documentation is not yet written.

Regards
Remi

Attachment

pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: [HACKERS] pgbench - allow to store select results intovariables
Next
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] some review comments on logical rep code