Re: Expose lock group leader pid in pg_stat_activity - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Expose lock group leader pid in pg_stat_activity
Date
Msg-id 20200204142725.GA30965@nol
Whole thread Raw
In response to Re: Expose lock group leader pid in pg_stat_activity  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Expose lock group leader pid in pg_stat_activity
List pgsql-hackers
On Thu, Jan 30, 2020 at 10:03:01PM +0900, Michael Paquier wrote:
> On Tue, Jan 28, 2020 at 02:52:08PM +0100, Tomas Vondra wrote:
> > On Tue, Jan 28, 2020 at 02:26:34PM +0100, Julien Rouhaud wrote:
> >> There were already some dependencies between the rows since parallel
> >> queries were added, as you could see eg. a parallel worker while no
> >> query is currently active.  This patch will make those corner cases
> >> more obvious.
>
> I was reviewing the code and one thing that I was wondering is if it
> would be better to make the code more defensive and return NULL when
> the PID of the referenced leader is 0 or InvalidPid.  However that
> would mean that we have a dummy 2PC entry from PGPROC or something not
> yet started, which makes no sense.  So your simpler version is
> actually fine.  What you have here is that in the worst case you could
> finish with an incorrect reference to shared memory if a PGPROC is
> recycled between the moment you look for the leader field and the
> moment the PID value is fetched.  That's very unlikely to happen, and
> I agree that it does not really justify the cost of taking extra locks
> every time we scan pg_stat_activity.

Ok.

>
> > Yeah, sure. I mean explicit dependencies, e.g. a column referencing
> > values from another row, like leader_id does.
>
> +      The leader_pid is NULL for processes not involved in parallel query.
> This is missing two markups, one for "NULL" and a second for
> "leader_pid".

The extra "leader_pid" disappeared when I reworked the doc.  I'm not sure what
you meant here for NULL as I don't see any extra markup used in closeby
documentation, so I hope this version is ok.

> The documentation does not match the surroundings
> either, so I would suggest a reformulation for the beginning:
> PID of the leader process if this process is involved in parallel query.

> And actually this paragraph is not completely true, because leader_pid
> remains set even after one parallel query run has been finished for a
> session when leader_pid = pid as lockGroupLeader is set to NULL only
> once the process is stopped in ProcKill().

Oh good point, that's unfortunately not a super friendly behavior.  I tried to
adapt the documentation to address of all that.  It's maybe slightly too
verbose, but I guess that extra clarity is welcome here.

> >> Should I document the possible inconsistencies?
> >
> > I think it's worth mentioning that as a comment in the code, say before
> > the pg_stat_get_activity function. IMO we don't need to document all
> > possible inconsistencies, a generic explanation is enough.
>
> Agreed that adding some information in the area when we look at the
> PGPROC entries for wait events and such would be nice.

I added some code comments to remind that we don't guarantee any consistency
here.

Attachment

pgsql-hackers by date:

Previous
From: Rémi Lapeyre
Date:
Subject: [PATCH v1] Allow COPY "text" format to output a header
Next
From: "曾文旌(义从)"
Date:
Subject: Re: [Proposal] Global temporary tables