On Thu, Jan 30, 2020 at 10:03:01PM +0900, Michael Paquier wrote:
> On Tue, Jan 28, 2020 at 02:52:08PM +0100, Tomas Vondra wrote:
> > On Tue, Jan 28, 2020 at 02:26:34PM +0100, Julien Rouhaud wrote:
> >> There were already some dependencies between the rows since parallel
> >> queries were added, as you could see eg. a parallel worker while no
> >> query is currently active. This patch will make those corner cases
> >> more obvious.
>
> I was reviewing the code and one thing that I was wondering is if it
> would be better to make the code more defensive and return NULL when
> the PID of the referenced leader is 0 or InvalidPid. However that
> would mean that we have a dummy 2PC entry from PGPROC or something not
> yet started, which makes no sense. So your simpler version is
> actually fine. What you have here is that in the worst case you could
> finish with an incorrect reference to shared memory if a PGPROC is
> recycled between the moment you look for the leader field and the
> moment the PID value is fetched. That's very unlikely to happen, and
> I agree that it does not really justify the cost of taking extra locks
> every time we scan pg_stat_activity.
Ok.
>
> > Yeah, sure. I mean explicit dependencies, e.g. a column referencing
> > values from another row, like leader_id does.
>
> + The leader_pid is NULL for processes not involved in parallel query.
> This is missing two markups, one for "NULL" and a second for
> "leader_pid".
The extra "leader_pid" disappeared when I reworked the doc. I'm not sure what
you meant here for NULL as I don't see any extra markup used in closeby
documentation, so I hope this version is ok.
> The documentation does not match the surroundings
> either, so I would suggest a reformulation for the beginning:
> PID of the leader process if this process is involved in parallel query.
> And actually this paragraph is not completely true, because leader_pid
> remains set even after one parallel query run has been finished for a
> session when leader_pid = pid as lockGroupLeader is set to NULL only
> once the process is stopped in ProcKill().
Oh good point, that's unfortunately not a super friendly behavior. I tried to
adapt the documentation to address of all that. It's maybe slightly too
verbose, but I guess that extra clarity is welcome here.
> >> Should I document the possible inconsistencies?
> >
> > I think it's worth mentioning that as a comment in the code, say before
> > the pg_stat_get_activity function. IMO we don't need to document all
> > possible inconsistencies, a generic explanation is enough.
>
> Agreed that adding some information in the area when we look at the
> PGPROC entries for wait events and such would be nice.
I added some code comments to remind that we don't guarantee any consistency
here.