Re: Expose lock group leader pid in pg_stat_activity - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Expose lock group leader pid in pg_stat_activity
Date
Msg-id 20200130130301.GC130922@paquier.xyz
Whole thread Raw
In response to Re: Expose lock group leader pid in pg_stat_activity  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Expose lock group leader pid in pg_stat_activity
List pgsql-hackers
On Tue, Jan 28, 2020 at 02:52:08PM +0100, Tomas Vondra wrote:
> On Tue, Jan 28, 2020 at 02:26:34PM +0100, Julien Rouhaud wrote:
>> There were already some dependencies between the rows since parallel
>> queries were added, as you could see eg. a parallel worker while no
>> query is currently active.  This patch will make those corner cases
>> more obvious.

I was reviewing the code and one thing that I was wondering is if it
would be better to make the code more defensive and return NULL when
the PID of the referenced leader is 0 or InvalidPid.  However that
would mean that we have a dummy 2PC entry from PGPROC or something not
yet started, which makes no sense.  So your simpler version is
actually fine.  What you have here is that in the worst case you could
finish with an incorrect reference to shared memory if a PGPROC is
recycled between the moment you look for the leader field and the
moment the PID value is fetched.  That's very unlikely to happen, and
I agree that it does not really justify the cost of taking extra locks
every time we scan pg_stat_activity.

> Yeah, sure. I mean explicit dependencies, e.g. a column referencing
> values from another row, like leader_id does.

+      The leader_pid is NULL for processes not involved in parallel query.
This is missing two markups, one for "NULL" and a second for
"leader_pid".  The documentation does not match the surroundings
either, so I would suggest a reformulation for the beginning:
PID of the leader process if this process is involved in parallel query.

And actually this paragraph is not completely true, because leader_pid
remains set even after one parallel query run has been finished for a
session when leader_pid = pid as lockGroupLeader is set to NULL only
once the process is stopped in ProcKill().

>> Should I document the possible inconsistencies?
>
> I think it's worth mentioning that as a comment in the code, say before
> the pg_stat_get_activity function. IMO we don't need to document all
> possible inconsistencies, a generic explanation is enough.

Agreed that adding some information in the area when we look at the
PGPROC entries for wait events and such would be nice.

> Not sure about the user docs. Does it currently say anything about this
> topic - consistency with stat catalogs?

Not sure that it is the job of this patch to do that.  Do you have
something specific in mind?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Next
From: Asif Rehman
Date:
Subject: Re: WIP/PoC for parallel backup