Re: [HACKERS] [PATCH v2] Progress command to monitor progression oflong running SQL queries - Mailing list pgsql-hackers

From Remi Colinet
Subject Re: [HACKERS] [PATCH v2] Progress command to monitor progression oflong running SQL queries
Date
Msg-id CADdR5nxv+1boZdt-wv9D9QfH3oh7a-vEVhMkdvjn7mnGOY=nhA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [PATCH v2] Progress command to monitor progression oflong running SQL queries  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
2017-05-13 3:53 GMT+02:00 Robert Haas <robertmhaas@gmail.com>:
On Wed, May 10, 2017 at 10:05 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Regarding the patch, this is way to close to the progress facility
> already in place. So why don't you extend it for the executor?

I don't think that's a good idea.  The existing progress facility
advertises a fixed number of counters with a command-type-specific
interpretation, but for *query* progress reporting, we really need a
richer model.  A query plan can contain an unbounded number of
executor nodes and Remi's idea is, quite reasonably, to report
something about each one.
 
Oracle's way to report progress of long queries is to use a v$session_longops view which collects progress reports for any backend which has been running a SQL query for more than 6 seconds. But it's probably even better to use a SQL function which allows for instance to provide a specific pid parameter in order to limit the report to a given backend. Users are expected to be focused on one SQL query which is long to complete.

From a design point of view, I think a patch like this has to clear a
number of hurdles.  It needs to:

1. Decide what data to expose.  The sample output looks reasonable in
this case, although the amount of code churn looks really high.

I will probably remove:
- the verbose option (PROGRESS VERBOSE PID) which exposes very low details for tapes used by Sort nodes and by HasJoinTable nodes.
- the JSON, XML output formats because the command will be replaced by a SQL function returning a row set. 
This should make the code much smaller and avoid any change to the EXPLAIN code (code sharing so far).


2. Find a way to advertise the data that it exposes.  This patch looks
like it allocates a half-MB of shared memory per backend and gives up
if that's not enough.
 
May be, it needs a dynamic allocation of shared memory instead of static allocation at start of server. This would spare shared memory and avoid failure when the report is larger than the shared memory buffer pre allocated so far.
 

3. Find a way to synchronize the access to the data that it exposes.
This patch ignores that problem completely, so multiple progress
report commands running at the same time will behave unpredictably.

So far, I've been using a single lock in shared memory to synchronize the requests for SQL progress report.

LWLockAcquire(ProgressLock, LW_EXCLUSIVE);
...
SendProcSignal(stmt->pid, PROCSIG_PROGRESS, bid);
...
LWLockRelease(ProgressLock);

It's expected that a few dba would use such feature at the same time. The lock would need to be unlocked if the current backend fails.

I've also realized that access control for progress report needs to be added.

Thx & regards
Remi

 
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advanceof pgindent run.
Next
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advanceof pgindent run.