Thread: Expose lock group leader pid in pg_stat_activity

Expose lock group leader pid in pg_stat_activity

From
Julien Rouhaud
Date:
Hello,

Guillaume (in Cc) recently pointed out [1] that it's currently not
possible to retrieve the list of parallel workers for a given backend
at the SQL level.  His use case was to develop a function in plpgsql
to sample a given query wait event, but it's not hard to imagine other
useful use cases for this information, for instance doing some
analysis on the average number of workers per parallel query, or ratio
of parallel queries.  IIUC parallel queries is for now the only user
of lock group, so this should work just fine.

I'm attaching a trivial patch to expose the group leader pid if any
in pg_stat_activity.  Quick example of usage:

=# SELECT query, leader_pid,
  array_agg(pid) filter(WHERE leader_pid != pid) AS members
FROM pg_stat_activity
WHERE leader_pid IS NOT NULL
GROUP BY query, leader_pid;
       query       | leader_pid |    members
-------------------+------------+---------------
 select * from t1; |      28701 | {28728,28732}
(1 row)


[1] https://twitter.com/g_lelarge/status/1209486212190343168

Attachment

Re: Expose lock group leader pid in pg_stat_activity

From
Julien Rouhaud
Date:
On Wed, Dec 25, 2019 at 7:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> Guillaume (in Cc) recently pointed out [1] that it's currently not
> possible to retrieve the list of parallel workers for a given backend
> at the SQL level.  His use case was to develop a function in plpgsql
> to sample a given query wait event, but it's not hard to imagine other
> useful use cases for this information, for instance doing some
> analysis on the average number of workers per parallel query, or ratio
> of parallel queries.  IIUC parallel queries is for now the only user
> of lock group, so this should work just fine.
>
> I'm attaching a trivial patch to expose the group leader pid if any
> in pg_stat_activity.  Quick example of usage:
>
> =# SELECT query, leader_pid,
>   array_agg(pid) filter(WHERE leader_pid != pid) AS members
> FROM pg_stat_activity
> WHERE leader_pid IS NOT NULL
> GROUP BY query, leader_pid;
>        query       | leader_pid |    members
> -------------------+------------+---------------
>  select * from t1; |      28701 | {28728,28732}
> (1 row)
>
>
> [1] https://twitter.com/g_lelarge/status/1209486212190343168

And I just realized that I forgot to update rule.out, sorry about
that.  v2 attached.

Attachment

Re: Expose lock group leader pid in pg_stat_activity

From
Guillaume Lelarge
Date:
Le mer. 25 déc. 2019 à 19:30, Julien Rouhaud <rjuju123@gmail.com> a écrit :
On Wed, Dec 25, 2019 at 7:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> Guillaume (in Cc) recently pointed out [1] that it's currently not
> possible to retrieve the list of parallel workers for a given backend
> at the SQL level.  His use case was to develop a function in plpgsql
> to sample a given query wait event, but it's not hard to imagine other
> useful use cases for this information, for instance doing some
> analysis on the average number of workers per parallel query, or ratio
> of parallel queries.  IIUC parallel queries is for now the only user
> of lock group, so this should work just fine.
>
> I'm attaching a trivial patch to expose the group leader pid if any
> in pg_stat_activity.  Quick example of usage:
>
> =# SELECT query, leader_pid,
>   array_agg(pid) filter(WHERE leader_pid != pid) AS members
> FROM pg_stat_activity
> WHERE leader_pid IS NOT NULL
> GROUP BY query, leader_pid;
>        query       | leader_pid |    members
> -------------------+------------+---------------
>  select * from t1; |      28701 | {28728,28732}
> (1 row)
>
>
> [1] https://twitter.com/g_lelarge/status/1209486212190343168

And I just realized that I forgot to update rule.out, sorry about
that.  v2 attached.

So I tried your patch this morning, and it works really well.

On a SELECT count(*), I got this:

SELECT leader_pid, pid, wait_event_type, wait_event, state, backend_type FROM pg_stat_activity WHERE pid=111439 or leader_pid=111439;

┌────────────┬────────┬─────────────────┬──────────────┬────────┬─────────────────┐
│ leader_pid │  pid   │ wait_event_type │  wait_event  │ state  │  backend_type   │
├────────────┼────────┼─────────────────┼──────────────┼────────┼─────────────────┤
│     111439 │ 111439 │ LWLock          │ WALWriteLock │ active │ client backend  │
│     111439 │ 116887 │ LWLock          │ WALWriteLock │ active │ parallel worker │
│     111439 │ 116888 │ IO              │ WALSync      │ active │ parallel worker │
└────────────┴────────┴─────────────────┴──────────────┴────────┴─────────────────┘
(3 rows)

and this from a CREATE INDEX:

┌────────────┬────────┬─────────────────┬────────────┬────────┬─────────────────┐
│ leader_pid │  pid   │ wait_event_type │ wait_event │ state  │  backend_type   │
├────────────┼────────┼─────────────────┼────────────┼────────┼─────────────────┤
│     111439 │ 111439 │                 │            │ active │ client backend  │
│     111439 │ 118775 │                 │            │ active │ parallel worker │
└────────────┴────────┴─────────────────┴────────────┴────────┴─────────────────┘
(2 rows)

Anyway, it applies cleanly, it compiles, and it works. Documentation is available. So it looks to me it's good to go :)


--
Guillaume.

Re: Expose lock group leader pid in pg_stat_activity

From
Julien Rouhaud
Date:
On Thu, Dec 26, 2019 at 9:08 AM Guillaume Lelarge
<guillaume@lelarge.info> wrote:
>
> Le mer. 25 déc. 2019 à 19:30, Julien Rouhaud <rjuju123@gmail.com> a écrit :
>>
>> On Wed, Dec 25, 2019 at 7:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>> >
>> > Guillaume (in Cc) recently pointed out [1] that it's currently not
>> > possible to retrieve the list of parallel workers for a given backend
>> > at the SQL level.  His use case was to develop a function in plpgsql
>> > to sample a given query wait event, but it's not hard to imagine other
>> > useful use cases for this information, for instance doing some
>> > analysis on the average number of workers per parallel query, or ratio
>> > of parallel queries.  IIUC parallel queries is for now the only user
>> > of lock group, so this should work just fine.
>> >
>> > I'm attaching a trivial patch to expose the group leader pid if any
>> > in pg_stat_activity.  Quick example of usage:
>> >
>> > =# SELECT query, leader_pid,
>> >   array_agg(pid) filter(WHERE leader_pid != pid) AS members
>> > FROM pg_stat_activity
>> > WHERE leader_pid IS NOT NULL
>> > GROUP BY query, leader_pid;
>> >        query       | leader_pid |    members
>> > -------------------+------------+---------------
>> >  select * from t1; |      28701 | {28728,28732}
>> > (1 row)
>> >
>> >
>> > [1] https://twitter.com/g_lelarge/status/1209486212190343168
>>
>> And I just realized that I forgot to update rule.out, sorry about
>> that.  v2 attached.
>
>
> So I tried your patch this morning, and it works really well.
>
> On a SELECT count(*), I got this:
>
> SELECT leader_pid, pid, wait_event_type, wait_event, state, backend_type FROM pg_stat_activity WHERE pid=111439 or
leader_pid=111439;
>
> ┌────────────┬────────┬─────────────────┬──────────────┬────────┬─────────────────┐
> │ leader_pid │  pid   │ wait_event_type │  wait_event  │ state  │  backend_type   │
> ├────────────┼────────┼─────────────────┼──────────────┼────────┼─────────────────┤
> │     111439 │ 111439 │ LWLock          │ WALWriteLock │ active │ client backend  │
> │     111439 │ 116887 │ LWLock          │ WALWriteLock │ active │ parallel worker │
> │     111439 │ 116888 │ IO              │ WALSync      │ active │ parallel worker │
> └────────────┴────────┴─────────────────┴──────────────┴────────┴─────────────────┘
> (3 rows)
>
> and this from a CREATE INDEX:
>
> ┌────────────┬────────┬─────────────────┬────────────┬────────┬─────────────────┐
> │ leader_pid │  pid   │ wait_event_type │ wait_event │ state  │  backend_type   │
> ├────────────┼────────┼─────────────────┼────────────┼────────┼─────────────────┤
> │     111439 │ 111439 │                 │            │ active │ client backend  │
> │     111439 │ 118775 │                 │            │ active │ parallel worker │
> └────────────┴────────┴─────────────────┴────────────┴────────┴─────────────────┘
> (2 rows)
>
> Anyway, it applies cleanly, it compiles, and it works. Documentation is available. So it looks to me it's good to go
:)

Thanks for the review Guillaume.  Double checking the doc, I see that
I made a copy/pasto mistake in the new field name.  Attached v3 should
be all good.

Attachment

Re: Expose lock group leader pid in pg_stat_activity

From
Guillaume Lelarge
Date:
Le jeu. 26 déc. 2019 à 09:49, Julien Rouhaud <rjuju123@gmail.com> a écrit :
On Thu, Dec 26, 2019 at 9:08 AM Guillaume Lelarge
<guillaume@lelarge.info> wrote:
>
> Le mer. 25 déc. 2019 à 19:30, Julien Rouhaud <rjuju123@gmail.com> a écrit :
>>
>> On Wed, Dec 25, 2019 at 7:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>> >
>> > Guillaume (in Cc) recently pointed out [1] that it's currently not
>> > possible to retrieve the list of parallel workers for a given backend
>> > at the SQL level.  His use case was to develop a function in plpgsql
>> > to sample a given query wait event, but it's not hard to imagine other
>> > useful use cases for this information, for instance doing some
>> > analysis on the average number of workers per parallel query, or ratio
>> > of parallel queries.  IIUC parallel queries is for now the only user
>> > of lock group, so this should work just fine.
>> >
>> > I'm attaching a trivial patch to expose the group leader pid if any
>> > in pg_stat_activity.  Quick example of usage:
>> >
>> > =# SELECT query, leader_pid,
>> >   array_agg(pid) filter(WHERE leader_pid != pid) AS members
>> > FROM pg_stat_activity
>> > WHERE leader_pid IS NOT NULL
>> > GROUP BY query, leader_pid;
>> >        query       | leader_pid |    members
>> > -------------------+------------+---------------
>> >  select * from t1; |      28701 | {28728,28732}
>> > (1 row)
>> >
>> >
>> > [1] https://twitter.com/g_lelarge/status/1209486212190343168
>>
>> And I just realized that I forgot to update rule.out, sorry about
>> that.  v2 attached.
>
>
> So I tried your patch this morning, and it works really well.
>
> On a SELECT count(*), I got this:
>
> SELECT leader_pid, pid, wait_event_type, wait_event, state, backend_type FROM pg_stat_activity WHERE pid=111439 or leader_pid=111439;
>
> ┌────────────┬────────┬─────────────────┬──────────────┬────────┬─────────────────┐
> │ leader_pid │  pid   │ wait_event_type │  wait_event  │ state  │  backend_type   │
> ├────────────┼────────┼─────────────────┼──────────────┼────────┼─────────────────┤
> │     111439 │ 111439 │ LWLock          │ WALWriteLock │ active │ client backend  │
> │     111439 │ 116887 │ LWLock          │ WALWriteLock │ active │ parallel worker │
> │     111439 │ 116888 │ IO              │ WALSync      │ active │ parallel worker │
> └────────────┴────────┴─────────────────┴──────────────┴────────┴─────────────────┘
> (3 rows)
>
> and this from a CREATE INDEX:
>
> ┌────────────┬────────┬─────────────────┬────────────┬────────┬─────────────────┐
> │ leader_pid │  pid   │ wait_event_type │ wait_event │ state  │  backend_type   │
> ├────────────┼────────┼─────────────────┼────────────┼────────┼─────────────────┤
> │     111439 │ 111439 │                 │            │ active │ client backend  │
> │     111439 │ 118775 │                 │            │ active │ parallel worker │
> └────────────┴────────┴─────────────────┴────────────┴────────┴─────────────────┘
> (2 rows)
>
> Anyway, it applies cleanly, it compiles, and it works. Documentation is available. So it looks to me it's good to go :)

Thanks for the review Guillaume.  Double checking the doc, I see that
I made a copy/pasto mistake in the new field name.  Attached v3 should
be all good.

Feeling bad I missed this. But, yeah, it's much better with the right column's name.

For me, it's looking good to be ready for commiter. Should I set it this way in the Commit Fest app?


--
Guillaume.

Re: Expose lock group leader pid in pg_stat_activity

From
Julien Rouhaud
Date:
On Thu, Dec 26, 2019 at 10:20 AM Guillaume Lelarge
<guillaume@lelarge.info> wrote:
>
> Le jeu. 26 déc. 2019 à 09:49, Julien Rouhaud <rjuju123@gmail.com> a écrit :
>>
>> On Thu, Dec 26, 2019 at 9:08 AM Guillaume Lelarge
>> <guillaume@lelarge.info> wrote:
>> >
>> > Le mer. 25 déc. 2019 à 19:30, Julien Rouhaud <rjuju123@gmail.com> a écrit :
>> >>
>> >> On Wed, Dec 25, 2019 at 7:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>> >> >
>> >> > Guillaume (in Cc) recently pointed out [1] that it's currently not
>> >> > possible to retrieve the list of parallel workers for a given backend
>> >> > at the SQL level.  His use case was to develop a function in plpgsql
>> >> > to sample a given query wait event, but it's not hard to imagine other
>> >> > useful use cases for this information, for instance doing some
>> >> > analysis on the average number of workers per parallel query, or ratio
>> >> > of parallel queries.  IIUC parallel queries is for now the only user
>> >> > of lock group, so this should work just fine.
>> >> >
>> >> > I'm attaching a trivial patch to expose the group leader pid if any
>> >> > in pg_stat_activity.  Quick example of usage:
>> >> >
>> >> > =# SELECT query, leader_pid,
>> >> >   array_agg(pid) filter(WHERE leader_pid != pid) AS members
>> >> > FROM pg_stat_activity
>> >> > WHERE leader_pid IS NOT NULL
>> >> > GROUP BY query, leader_pid;
>> >> >        query       | leader_pid |    members
>> >> > -------------------+------------+---------------
>> >> >  select * from t1; |      28701 | {28728,28732}
>> >> > (1 row)
>> >> >
>> >> >
>> >> > [1] https://twitter.com/g_lelarge/status/1209486212190343168
>> >>
>> >> And I just realized that I forgot to update rule.out, sorry about
>> >> that.  v2 attached.
>> >
>> >
>> > So I tried your patch this morning, and it works really well.
>> >
>> > On a SELECT count(*), I got this:
>> >
>> > SELECT leader_pid, pid, wait_event_type, wait_event, state, backend_type FROM pg_stat_activity WHERE pid=111439 or
leader_pid=111439;
>> >
>> > ┌────────────┬────────┬─────────────────┬──────────────┬────────┬─────────────────┐
>> > │ leader_pid │  pid   │ wait_event_type │  wait_event  │ state  │  backend_type   │
>> > ├────────────┼────────┼─────────────────┼──────────────┼────────┼─────────────────┤
>> > │     111439 │ 111439 │ LWLock          │ WALWriteLock │ active │ client backend  │
>> > │     111439 │ 116887 │ LWLock          │ WALWriteLock │ active │ parallel worker │
>> > │     111439 │ 116888 │ IO              │ WALSync      │ active │ parallel worker │
>> > └────────────┴────────┴─────────────────┴──────────────┴────────┴─────────────────┘
>> > (3 rows)
>> >
>> > and this from a CREATE INDEX:
>> >
>> > ┌────────────┬────────┬─────────────────┬────────────┬────────┬─────────────────┐
>> > │ leader_pid │  pid   │ wait_event_type │ wait_event │ state  │  backend_type   │
>> > ├────────────┼────────┼─────────────────┼────────────┼────────┼─────────────────┤
>> > │     111439 │ 111439 │                 │            │ active │ client backend  │
>> > │     111439 │ 118775 │                 │            │ active │ parallel worker │
>> > └────────────┴────────┴─────────────────┴────────────┴────────┴─────────────────┘
>> > (2 rows)
>> >
>> > Anyway, it applies cleanly, it compiles, and it works. Documentation is available. So it looks to me it's good to
go:)
 
>>
>> Thanks for the review Guillaume.  Double checking the doc, I see that
>> I made a copy/pasto mistake in the new field name.  Attached v3 should
>> be all good.
>
>
> Feeling bad I missed this. But, yeah, it's much better with the right column's name.
>
> For me, it's looking good to be ready for commiter. Should I set it this way in the Commit Fest app?

If you don't see any other issue with the patch, I'd say yes.  A
committer can still put it back to waiting on author if needed.

Re: Expose lock group leader pid in pg_stat_activity

From
Guillaume Lelarge
Date:
Le jeu. 26 déc. 2019 à 10:26, Julien Rouhaud <rjuju123@gmail.com> a écrit :
On Thu, Dec 26, 2019 at 10:20 AM Guillaume Lelarge
<guillaume@lelarge.info> wrote:
>
> Le jeu. 26 déc. 2019 à 09:49, Julien Rouhaud <rjuju123@gmail.com> a écrit :
>>
>> On Thu, Dec 26, 2019 at 9:08 AM Guillaume Lelarge
>> <guillaume@lelarge.info> wrote:
>> >
>> > Le mer. 25 déc. 2019 à 19:30, Julien Rouhaud <rjuju123@gmail.com> a écrit :
>> >>
>> >> On Wed, Dec 25, 2019 at 7:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>> >> >
>> >> > Guillaume (in Cc) recently pointed out [1] that it's currently not
>> >> > possible to retrieve the list of parallel workers for a given backend
>> >> > at the SQL level.  His use case was to develop a function in plpgsql
>> >> > to sample a given query wait event, but it's not hard to imagine other
>> >> > useful use cases for this information, for instance doing some
>> >> > analysis on the average number of workers per parallel query, or ratio
>> >> > of parallel queries.  IIUC parallel queries is for now the only user
>> >> > of lock group, so this should work just fine.
>> >> >
>> >> > I'm attaching a trivial patch to expose the group leader pid if any
>> >> > in pg_stat_activity.  Quick example of usage:
>> >> >
>> >> > =# SELECT query, leader_pid,
>> >> >   array_agg(pid) filter(WHERE leader_pid != pid) AS members
>> >> > FROM pg_stat_activity
>> >> > WHERE leader_pid IS NOT NULL
>> >> > GROUP BY query, leader_pid;
>> >> >        query       | leader_pid |    members
>> >> > -------------------+------------+---------------
>> >> >  select * from t1; |      28701 | {28728,28732}
>> >> > (1 row)
>> >> >
>> >> >
>> >> > [1] https://twitter.com/g_lelarge/status/1209486212190343168
>> >>
>> >> And I just realized that I forgot to update rule.out, sorry about
>> >> that.  v2 attached.
>> >
>> >
>> > So I tried your patch this morning, and it works really well.
>> >
>> > On a SELECT count(*), I got this:
>> >
>> > SELECT leader_pid, pid, wait_event_type, wait_event, state, backend_type FROM pg_stat_activity WHERE pid=111439 or leader_pid=111439;
>> >
>> > ┌────────────┬────────┬─────────────────┬──────────────┬────────┬─────────────────┐
>> > │ leader_pid │  pid   │ wait_event_type │  wait_event  │ state  │  backend_type   │
>> > ├────────────┼────────┼─────────────────┼──────────────┼────────┼─────────────────┤
>> > │     111439 │ 111439 │ LWLock          │ WALWriteLock │ active │ client backend  │
>> > │     111439 │ 116887 │ LWLock          │ WALWriteLock │ active │ parallel worker │
>> > │     111439 │ 116888 │ IO              │ WALSync      │ active │ parallel worker │
>> > └────────────┴────────┴─────────────────┴──────────────┴────────┴─────────────────┘
>> > (3 rows)
>> >
>> > and this from a CREATE INDEX:
>> >
>> > ┌────────────┬────────┬─────────────────┬────────────┬────────┬─────────────────┐
>> > │ leader_pid │  pid   │ wait_event_type │ wait_event │ state  │  backend_type   │
>> > ├────────────┼────────┼─────────────────┼────────────┼────────┼─────────────────┤
>> > │     111439 │ 111439 │                 │            │ active │ client backend  │
>> > │     111439 │ 118775 │                 │            │ active │ parallel worker │
>> > └────────────┴────────┴─────────────────┴────────────┴────────┴─────────────────┘
>> > (2 rows)
>> >
>> > Anyway, it applies cleanly, it compiles, and it works. Documentation is available. So it looks to me it's good to go :)
>>
>> Thanks for the review Guillaume.  Double checking the doc, I see that
>> I made a copy/pasto mistake in the new field name.  Attached v3 should
>> be all good.
>
>
> Feeling bad I missed this. But, yeah, it's much better with the right column's name.
>
> For me, it's looking good to be ready for commiter. Should I set it this way in the Commit Fest app?

If you don't see any other issue with the patch, I'd say yes.  A
committer can still put it back to waiting on author if needed.

That's also what I thought, but as I was the only one commenting on this... Anyway, done.


--
Guillaume.

Re: Expose lock group leader pid in pg_stat_activity

From
Sergei Kornilov
Date:
Hello

I doubt that "Process ID of the lock group leader" is enough for user documentation. I think we need note:
- this field is related to parallel query execution
- leader_pid = pid if process is parallel leader
- leader_pid would point to pid of the leader if process is parallel worker
- leader_pid will be NULL for non-parallel queries or idle sessions

Also patch has no tests. Possible this is normal, not sure how to write a reliable test for this feature.
Patch applies, compiles, pass tests

regards, Sergei



Re: Expose lock group leader pid in pg_stat_activity

From
Julien Rouhaud
Date:
Hello,

On Thu, Dec 26, 2019 at 12:18 PM Sergei Kornilov <sk@zsrv.org> wrote:
>
> I doubt that "Process ID of the lock group leader" is enough for user documentation. I think we need note:
> - this field is related to parallel query execution
> - leader_pid = pid if process is parallel leader
> - leader_pid would point to pid of the leader if process is parallel worker
> - leader_pid will be NULL for non-parallel queries or idle sessions

As I understand it, lock group is some infrastructure that is used by
parallel queries, but could be used for something else too.  So if
more documentation is needed, we should say something like "For now,
only parallel queries can have a lock group" or something like that.

The fact that leader_pid == pid for the leader and different for the
other member should be obvious, I'm not sure that it's worth
documenting that.

> Also patch has no tests. Possible this is normal, not sure how to write a reliable test for this feature.

Yes, I was unsure if some extra testing was required.  We could set
force_parallel_mode to on and query "select leader_pid is not null
from pg_stat_activity where pid = pg_backend_pid(), and then the
opposite test, which should do the trick.



Re: Expose lock group leader pid in pg_stat_activity

From
Sergei Kornilov
Date:
Hello

> As I understand it, lock group is some infrastructure that is used by
> parallel queries, but could be used for something else too. So if
> more documentation is needed, we should say something like "For now,
> only parallel queries can have a lock group" or something like that.

If lockGroupLeader will be used in some way for non-parallel query, then the name leader_pid could be confusing. No?
I treat pg_stat_activity as view for user. We have document somewhere what is "lock group leader" (excepts README in
sourcetree)? I meant user going to read documentation, "ok, this field is process ID of the lock group leader, but what
isit?". Expose a leader pid for parallel worker will be clear improvement for user. And seems lockGroupLeader->pid is
exactlythis stuff. Therefore, I would like to see such description and meaning of the field.
 

> The fact that leader_pid == pid for the leader and different for the
> other member should be obvious, I'm not sure that it's worth
> documenting that.

It may be not obvious that leader_pid is not null in this case. But ok, no objections.

regards, Sergei



Re: Expose lock group leader pid in pg_stat_activity

From
Julien Rouhaud
Date:
On Fri, Dec 27, 2019 at 10:01 AM Sergei Kornilov <sk@zsrv.org> wrote:
>
> Hello
>
> > As I understand it, lock group is some infrastructure that is used by
> > parallel queries, but could be used for something else too. So if
> > more documentation is needed, we should say something like "For now,
> > only parallel queries can have a lock group" or something like that.
>
> If lockGroupLeader will be used in some way for non-parallel query, then the name leader_pid could be confusing. No?
> I treat pg_stat_activity as view for user. We have document somewhere what is "lock group leader" (excepts README in
sourcetree)? I meant user going to read documentation, "ok, this field is process ID of the lock group leader, but what
isit?". Expose a leader pid for parallel worker will be clear improvement for user. And seems lockGroupLeader->pid is
exactlythis stuff. Therefore, I would like to see such description and meaning of the field. 

I think that not using "parallel" to name this field will help to
avoid confusion if the lock group infrastructure is eventually used
for something else, but that's only true if indeed we explain what a
lock group is.

> > The fact that leader_pid == pid for the leader and different for the
> > other member should be obvious, I'm not sure that it's worth
> > documenting that.
>
> It may be not obvious that leader_pid is not null in this case. But ok, no objections.

If we adapt lmgr/README to document the group locking, it also
addresses this.  What do you thing of:

The leader_pid is NULL for processes not involved in parallel query.
When a process wants to cooperate with parallel workers, it becomes a
lock group leader, which means that this field will be valued to its
own pid. When a parallel worker starts up, this field will be valued
with the leader pid.



Re: Expose lock group leader pid in pg_stat_activity

From
Michael Paquier
Date:
On Fri, Dec 27, 2019 at 10:15:33AM +0100, Julien Rouhaud wrote:
> I think that not using "parallel" to name this field will help to
> avoid confusion if the lock group infrastructure is eventually used
> for something else, but that's only true if indeed we explain what a
> lock group is.

As you already pointed out, src/backend/storage/lmgr/README includes a
full description of this stuff under the section "Group Locking".  So
I agree that the patch ought to document your new field in a much
better way, without mentioning the term "group locking" that's even
better to not confuse the reader because this term not present in the
docs at all.

> The leader_pid is NULL for processes not involved in parallel query.
> When a process wants to cooperate with parallel workers, it becomes a
> lock group leader, which means that this field will be valued to its
> own pid. When a parallel worker starts up, this field will be valued
> with the leader pid.

The first sentence is good to have.  Now instead of "lock group
leader", I think that we had better use "parallel group leader" as in
other parts of the docs (see wait events for example).  Then we just
need to say that if leader_pid has the same value as
pg_stat_activity.pid, then we have a group leader.  If not, then it is
a parallel worker process initially spawned by the leader whose PID is
leader_pid (when executing Gather, Gather Merge, soon-to-be parallel
vacuum or for a parallel btree build, but this does not need a mention
in the docs).  There could be an argument as well to have leader_pid
set to NULL for a leader, but that would be inconsistent with what
the PGPROC entry reports for the backend.

While looking at the code, I think that we could refactor things a bit
for raw_wait_event, wait_event_type and wait_event which has some
duplicated code for backend and auxiliary processes.  What about
filling in the wait event data after fetching the PGPROC entry, and
also fill in leader_pid for auxiliary processes.  This does not matter
now, perhaps it will never matter (or not), but that would make the
code much more consistent.
--
Michael

Attachment

Re: Expose lock group leader pid in pg_stat_activity

From
Michael Paquier
Date:
On Thu, Jan 16, 2020 at 04:27:27PM +0900, Michael Paquier wrote:
> While looking at the code, I think that we could refactor things a bit
> for raw_wait_event, wait_event_type and wait_event which has some
> duplicated code for backend and auxiliary processes.  What about
> filling in the wait event data after fetching the PGPROC entry, and
> also fill in leader_pid for auxiliary processes.  This does not matter
> now, perhaps it will never matter (or not), but that would make the
> code much more consistent.

And actually, the way you are looking at the leader's PID is visibly
incorrect and inconsistent because the patch takes no shared LWLock on
the leader using LockHashPartitionLockByProc() followed by
LWLockAcquire(), no?  That's incorrect because it could be perfectly
possible to crash with this code between the moment you check if
lockGroupLeader is NULL and the moment you look at
lockGroupLeader->pid if a process is being stopped in-between and
removes itself from a lock group in ProcKill().  That's also
inconsistent because it could be perfectly possible to finish with an
incorrect view of the data while scanning for all the backend entries,
like a leader set to NULL with workers pointing to the leader for
example, or even workers marked incorrectly as NULL.  The second one
may not be a problem, but the first one could be confusing.
--
Michael

Attachment

Re: Expose lock group leader pid in pg_stat_activity

From
Julien Rouhaud
Date:
On Thu, Jan 16, 2020 at 8:28 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Dec 27, 2019 at 10:15:33AM +0100, Julien Rouhaud wrote:
> > I think that not using "parallel" to name this field will help to
> > avoid confusion if the lock group infrastructure is eventually used
> > for something else, but that's only true if indeed we explain what a
> > lock group is.
>
> As you already pointed out, src/backend/storage/lmgr/README includes a
> full description of this stuff under the section "Group Locking".  So
> I agree that the patch ought to document your new field in a much
> better way, without mentioning the term "group locking" that's even
> better to not confuse the reader because this term not present in the
> docs at all.
>
> > The leader_pid is NULL for processes not involved in parallel query.
> > When a process wants to cooperate with parallel workers, it becomes a
> > lock group leader, which means that this field will be valued to its
> > own pid. When a parallel worker starts up, this field will be valued
> > with the leader pid.
>
> The first sentence is good to have.  Now instead of "lock group
> leader", I think that we had better use "parallel group leader" as in
> other parts of the docs (see wait events for example).

Ok, I'll change this way.

>  Then we just
> need to say that if leader_pid has the same value as
> pg_stat_activity.pid, then we have a group leader.  If not, then it is
> a parallel worker process initially spawned by the leader whose PID is
> leader_pid (when executing Gather, Gather Merge, soon-to-be parallel
> vacuum or for a parallel btree build, but this does not need a mention
> in the docs).  There could be an argument as well to have leader_pid
> set to NULL for a leader, but that would be inconsistent with what
> the PGPROC entry reports for the backend.

It would also slightly complicate things to get the full set of
backends involved in a parallel query, while excluding the leader is
entirely trivial.

> While looking at the code, I think that we could refactor things a bit
> for raw_wait_event, wait_event_type and wait_event which has some
> duplicated code for backend and auxiliary processes.  What about
> filling in the wait event data after fetching the PGPROC entry, and
> also fill in leader_pid for auxiliary processes.  This does not matter
> now, perhaps it will never matter (or not), but that would make the
> code much more consistent.

Yeah, I didn't think that auxiliary would be involved any time soon
but I can include this refactoring.



Re: Expose lock group leader pid in pg_stat_activity

From
Julien Rouhaud
Date:
On Thu, Jan 16, 2020 at 8:49 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Jan 16, 2020 at 04:27:27PM +0900, Michael Paquier wrote:
> > While looking at the code, I think that we could refactor things a bit
> > for raw_wait_event, wait_event_type and wait_event which has some
> > duplicated code for backend and auxiliary processes.  What about
> > filling in the wait event data after fetching the PGPROC entry, and
> > also fill in leader_pid for auxiliary processes.  This does not matter
> > now, perhaps it will never matter (or not), but that would make the
> > code much more consistent.
>
> And actually, the way you are looking at the leader's PID is visibly
> incorrect and inconsistent because the patch takes no shared LWLock on
> the leader using LockHashPartitionLockByProc() followed by
> LWLockAcquire(), no?  That's incorrect because it could be perfectly
> possible to crash with this code between the moment you check if
> lockGroupLeader is NULL and the moment you look at
> lockGroupLeader->pid if a process is being stopped in-between and
> removes itself from a lock group in ProcKill().  That's also
> inconsistent because it could be perfectly possible to finish with an
> incorrect view of the data while scanning for all the backend entries,
> like a leader set to NULL with workers pointing to the leader for
> example, or even workers marked incorrectly as NULL.  The second one
> may not be a problem, but the first one could be confusing.

Oh indeed.  But unless we hold some LWLock during the whole function
execution, we cannot guarantee a consistent view right?  And isn't it
already possible to e.g. see a parallel worker in pg_stat_activity
while all other queries are shown are idle, if you're unlucky enough?

Also, LockHashPartitionLockByProc requires the leader PGPROC, and
there's no guarantee that we'll see the leader before any of the
workers, so I'm unsure how to implement what you said.  Wouldn't it be
better to simply fetch the leader PGPROC after acquiring a shared
ProcArrayLock, and using that copy to display the pid, after checking
that we retrieved a non-null PGPROC?



Re: Expose lock group leader pid in pg_stat_activity

From
Michael Paquier
Date:
On Fri, Jan 17, 2020 at 05:07:55PM +0100, Julien Rouhaud wrote:
> Oh indeed.  But unless we hold some LWLock during the whole function
> execution, we cannot guarantee a consistent view right?

Yep.  That's possible.

> And isn't it already possible to e.g. see a parallel worker in
> pg_stat_activity while all other queries are shown are idle, if
> you're unlucky enough?

Yep.  That's possible.

> Also, LockHashPartitionLockByProc requires the leader PGPROC, and
> there's no guarantee that we'll see the leader before any of the
> workers, so I'm unsure how to implement what you said.  Wouldn't it be
> better to simply fetch the leader PGPROC after acquiring a shared
> ProcArrayLock, and using that copy to display the pid, after checking
> that we retrieved a non-null PGPROC?

Another idea would be to check if the current PGPROC entry is a leader
and print in an int[] the list of PIDs of all the workers while
holding a shared LWLock to avoid anything to be unregistered.  Less
handy, but a bit more consistent.  One problem with doing that is
that you may have in your list of PIDs some worker processes that are
already gone when going through all the other backend entries.  An
advantage is that an empty array could mean "I am the leader, but
nothing has been registered yet to my group lock." (note that the
leader adds itself to lockGroupMembers).
--
Michael

Attachment

Re: Expose lock group leader pid in pg_stat_activity

From
Julien Rouhaud
Date:
On Sat, Jan 18, 2020 at 3:51 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Jan 17, 2020 at 05:07:55PM +0100, Julien Rouhaud wrote:
> > Oh indeed.  But unless we hold some LWLock during the whole function
> > execution, we cannot guarantee a consistent view right?
>
> Yep.  That's possible.
>
> > And isn't it already possible to e.g. see a parallel worker in
> > pg_stat_activity while all other queries are shown are idle, if
> > you're unlucky enough?
>
> Yep.  That's possible.
>
> > Also, LockHashPartitionLockByProc requires the leader PGPROC, and
> > there's no guarantee that we'll see the leader before any of the
> > workers, so I'm unsure how to implement what you said.  Wouldn't it be
> > better to simply fetch the leader PGPROC after acquiring a shared
> > ProcArrayLock, and using that copy to display the pid, after checking
> > that we retrieved a non-null PGPROC?
>
> Another idea would be to check if the current PGPROC entry is a leader
> and print in an int[] the list of PIDs of all the workers while
> holding a shared LWLock to avoid anything to be unregistered.  Less
> handy, but a bit more consistent.  One problem with doing that is
> that you may have in your list of PIDs some worker processes that are
> already gone when going through all the other backend entries.  An
> advantage is that an empty array could mean "I am the leader, but
> nothing has been registered yet to my group lock." (note that the
> leader adds itself to lockGroupMembers).

So, AFAICT the LockHashPartitionLockByProc is required when
iterating/modifying lockGroupMembers or lockGroupLink, but just
getting the leader pid should be safe.  Since we'll never be able to
get a totally consistent view of data here, I'm in favor of avoiding
taking extra locks here.  I agree that outputting an array of the pid
would be more consistent for the leader, but will have its own set of
corner cases.  It seems to me that a new leader_pid column is easier
to handle at SQL level, so I kept that approach in attached v4.  If
you have strong objections to it. I can still change it.

Attachment

Re: Expose lock group leader pid in pg_stat_activity

From
Tomas Vondra
Date:
On Tue, Jan 28, 2020 at 12:36:41PM +0100, Julien Rouhaud wrote:
>On Sat, Jan 18, 2020 at 3:51 AM Michael Paquier <michael@paquier.xyz> wrote:
>>
>> On Fri, Jan 17, 2020 at 05:07:55PM +0100, Julien Rouhaud wrote:
>> > Oh indeed.  But unless we hold some LWLock during the whole function
>> > execution, we cannot guarantee a consistent view right?
>>
>> Yep.  That's possible.
>>
>> > And isn't it already possible to e.g. see a parallel worker in
>> > pg_stat_activity while all other queries are shown are idle, if
>> > you're unlucky enough?
>>
>> Yep.  That's possible.
>>
>> > Also, LockHashPartitionLockByProc requires the leader PGPROC, and
>> > there's no guarantee that we'll see the leader before any of the
>> > workers, so I'm unsure how to implement what you said.  Wouldn't it be
>> > better to simply fetch the leader PGPROC after acquiring a shared
>> > ProcArrayLock, and using that copy to display the pid, after checking
>> > that we retrieved a non-null PGPROC?
>>
>> Another idea would be to check if the current PGPROC entry is a leader
>> and print in an int[] the list of PIDs of all the workers while
>> holding a shared LWLock to avoid anything to be unregistered.  Less
>> handy, but a bit more consistent.  One problem with doing that is
>> that you may have in your list of PIDs some worker processes that are
>> already gone when going through all the other backend entries.  An
>> advantage is that an empty array could mean "I am the leader, but
>> nothing has been registered yet to my group lock." (note that the
>> leader adds itself to lockGroupMembers).
>
>So, AFAICT the LockHashPartitionLockByProc is required when
>iterating/modifying lockGroupMembers or lockGroupLink, but just
>getting the leader pid should be safe.  Since we'll never be able to
>get a totally consistent view of data here, I'm in favor of avoiding
>taking extra locks here.  I agree that outputting an array of the pid
>would be more consistent for the leader, but will have its own set of
>corner cases.  It seems to me that a new leader_pid column is easier
>to handle at SQL level, so I kept that approach in attached v4.  If
>you have strong objections to it. I can still change it.

I agree a separate "leader_id" column is easier to work with, as it does
not require unnesting and so on.

As for the consistency, I agree we probably can't make this perfect, as
we're fetching and processing the PGPROC records one by one. Fixing that
would require acquiring a much stronger lock on PGPROC, and perhaps some
other locks. That's pre-existing behavior, of course, it's just not very
obvious as we don't have any dependencies between the rows, I think.
Adding the leader_id will change, that, of course. But I think it's
still mostly OK, even with the possible inconsistency.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Expose lock group leader pid in pg_stat_activity

From
Julien Rouhaud
Date:
On Tue, Jan 28, 2020 at 2:09 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> I agree a separate "leader_id" column is easier to work with, as it does
> not require unnesting and so on.
>
> As for the consistency, I agree we probably can't make this perfect, as
> we're fetching and processing the PGPROC records one by one. Fixing that
> would require acquiring a much stronger lock on PGPROC, and perhaps some
> other locks. That's pre-existing behavior, of course, it's just not very
> obvious as we don't have any dependencies between the rows, I think.
> Adding the leader_id will change, that, of course. But I think it's
> still mostly OK, even with the possible inconsistency.

There were already some dependencies between the rows since parallel
queries were added, as you could see eg. a parallel worker while no
query is currently active.  This patch will make those corner cases
more obvious.  Should I document the possible inconsistencies?



Re: Expose lock group leader pid in pg_stat_activity

From
Tomas Vondra
Date:
On Tue, Jan 28, 2020 at 02:26:34PM +0100, Julien Rouhaud wrote:
>On Tue, Jan 28, 2020 at 2:09 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> I agree a separate "leader_id" column is easier to work with, as it does
>> not require unnesting and so on.
>>
>> As for the consistency, I agree we probably can't make this perfect, as
>> we're fetching and processing the PGPROC records one by one. Fixing that
>> would require acquiring a much stronger lock on PGPROC, and perhaps some
>> other locks. That's pre-existing behavior, of course, it's just not very
>> obvious as we don't have any dependencies between the rows, I think.
>> Adding the leader_id will change, that, of course. But I think it's
>> still mostly OK, even with the possible inconsistency.
>
>There were already some dependencies between the rows since parallel
>queries were added, as you could see eg. a parallel worker while no
>query is currently active.  This patch will make those corner cases
>more obvious.

Yeah, sure. I mean explicit dependencies, e.g. a column referencing
values from another row, like leader_id does.

>Should I document the possible inconsistencies?

I think it's worth mentioning that as a comment in the code, say before
the pg_stat_get_activity function. IMO we don't need to document all
possible inconsistencies, a generic explanation is enough.

Not sure about the user docs. Does it currently say anything about this
topic - consistency with stat catalogs?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Expose lock group leader pid in pg_stat_activity

From
Michael Paquier
Date:
On Tue, Jan 28, 2020 at 02:52:08PM +0100, Tomas Vondra wrote:
> On Tue, Jan 28, 2020 at 02:26:34PM +0100, Julien Rouhaud wrote:
>> There were already some dependencies between the rows since parallel
>> queries were added, as you could see eg. a parallel worker while no
>> query is currently active.  This patch will make those corner cases
>> more obvious.

I was reviewing the code and one thing that I was wondering is if it
would be better to make the code more defensive and return NULL when
the PID of the referenced leader is 0 or InvalidPid.  However that
would mean that we have a dummy 2PC entry from PGPROC or something not
yet started, which makes no sense.  So your simpler version is
actually fine.  What you have here is that in the worst case you could
finish with an incorrect reference to shared memory if a PGPROC is
recycled between the moment you look for the leader field and the
moment the PID value is fetched.  That's very unlikely to happen, and
I agree that it does not really justify the cost of taking extra locks
every time we scan pg_stat_activity.

> Yeah, sure. I mean explicit dependencies, e.g. a column referencing
> values from another row, like leader_id does.

+      The leader_pid is NULL for processes not involved in parallel query.
This is missing two markups, one for "NULL" and a second for
"leader_pid".  The documentation does not match the surroundings
either, so I would suggest a reformulation for the beginning:
PID of the leader process if this process is involved in parallel query.

And actually this paragraph is not completely true, because leader_pid
remains set even after one parallel query run has been finished for a
session when leader_pid = pid as lockGroupLeader is set to NULL only
once the process is stopped in ProcKill().

>> Should I document the possible inconsistencies?
>
> I think it's worth mentioning that as a comment in the code, say before
> the pg_stat_get_activity function. IMO we don't need to document all
> possible inconsistencies, a generic explanation is enough.

Agreed that adding some information in the area when we look at the
PGPROC entries for wait events and such would be nice.

> Not sure about the user docs. Does it currently say anything about this
> topic - consistency with stat catalogs?

Not sure that it is the job of this patch to do that.  Do you have
something specific in mind?
--
Michael

Attachment

Re: Expose lock group leader pid in pg_stat_activity

From
Julien Rouhaud
Date:
On Thu, Jan 30, 2020 at 10:03:01PM +0900, Michael Paquier wrote:
> On Tue, Jan 28, 2020 at 02:52:08PM +0100, Tomas Vondra wrote:
> > On Tue, Jan 28, 2020 at 02:26:34PM +0100, Julien Rouhaud wrote:
> >> There were already some dependencies between the rows since parallel
> >> queries were added, as you could see eg. a parallel worker while no
> >> query is currently active.  This patch will make those corner cases
> >> more obvious.
>
> I was reviewing the code and one thing that I was wondering is if it
> would be better to make the code more defensive and return NULL when
> the PID of the referenced leader is 0 or InvalidPid.  However that
> would mean that we have a dummy 2PC entry from PGPROC or something not
> yet started, which makes no sense.  So your simpler version is
> actually fine.  What you have here is that in the worst case you could
> finish with an incorrect reference to shared memory if a PGPROC is
> recycled between the moment you look for the leader field and the
> moment the PID value is fetched.  That's very unlikely to happen, and
> I agree that it does not really justify the cost of taking extra locks
> every time we scan pg_stat_activity.

Ok.

>
> > Yeah, sure. I mean explicit dependencies, e.g. a column referencing
> > values from another row, like leader_id does.
>
> +      The leader_pid is NULL for processes not involved in parallel query.
> This is missing two markups, one for "NULL" and a second for
> "leader_pid".

The extra "leader_pid" disappeared when I reworked the doc.  I'm not sure what
you meant here for NULL as I don't see any extra markup used in closeby
documentation, so I hope this version is ok.

> The documentation does not match the surroundings
> either, so I would suggest a reformulation for the beginning:
> PID of the leader process if this process is involved in parallel query.

> And actually this paragraph is not completely true, because leader_pid
> remains set even after one parallel query run has been finished for a
> session when leader_pid = pid as lockGroupLeader is set to NULL only
> once the process is stopped in ProcKill().

Oh good point, that's unfortunately not a super friendly behavior.  I tried to
adapt the documentation to address of all that.  It's maybe slightly too
verbose, but I guess that extra clarity is welcome here.

> >> Should I document the possible inconsistencies?
> >
> > I think it's worth mentioning that as a comment in the code, say before
> > the pg_stat_get_activity function. IMO we don't need to document all
> > possible inconsistencies, a generic explanation is enough.
>
> Agreed that adding some information in the area when we look at the
> PGPROC entries for wait events and such would be nice.

I added some code comments to remind that we don't guarantee any consistency
here.

Attachment

Re: Expose lock group leader pid in pg_stat_activity

From
Michael Paquier
Date:
On Tue, Feb 04, 2020 at 03:27:25PM +0100, Julien Rouhaud wrote:
> I added some code comments to remind that we don't guarantee any consistency
> here.

That's mostly fine.  I have moved the comment related to
AuxiliaryPidGetProc() within the inner part of its the "if" (or the
comment should be changed to be conditional).  An extra thing is that
nulls[29] was not set to true for a user without the proper permission
rights.

Does that look fine to you?
--
Michael

Attachment

Re: Expose lock group leader pid in pg_stat_activity

From
Julien Rouhaud
Date:
On Wed, Feb 05, 2020 at 10:48:31AM +0900, Michael Paquier wrote:
> On Tue, Feb 04, 2020 at 03:27:25PM +0100, Julien Rouhaud wrote:
> > I added some code comments to remind that we don't guarantee any consistency
> > here.
> 
> That's mostly fine.  I have moved the comment related to
> AuxiliaryPidGetProc() within the inner part of its the "if" (or the
> comment should be changed to be conditional).  An extra thing is that
> nulls[29] was not set to true for a user without the proper permission
> rights.

Oh, oops indeed.

> Does that look fine to you?

This looks good, thanks a lot!



Re: Expose lock group leader pid in pg_stat_activity

From
Michael Paquier
Date:
On Wed, Feb 05, 2020 at 07:57:20AM +0100, Julien Rouhaud wrote:
> This looks good, thanks a lot!

Thanks for double-checking.  And done.
--
Michael

Attachment

Re: Expose lock group leader pid in pg_stat_activity

From
Julien Rouhaud
Date:
On Thu, Feb 06, 2020 at 09:24:16AM +0900, Michael Paquier wrote:
> On Wed, Feb 05, 2020 at 07:57:20AM +0100, Julien Rouhaud wrote:
> > This looks good, thanks a lot!
> 
> Thanks for double-checking.  And done.

Thanks!

While on the topic, is there any reason why the backend stays a group leader
for the rest of its lifetime, and should we change that?

Also, while reading ProcKill, I noticed a typo in a comment:

    /*
     * Detach from any lock group of which we are a member.  If the leader
-    * exist before all other group members, it's PGPROC will remain allocated
+    * exist before all other group members, its PGPROC will remain allocated
     * until the last group process exits; that process must return the
     * leader's PGPROC to the appropriate list.
     */

Attachment

Re: Expose lock group leader pid in pg_stat_activity

From
Michael Paquier
Date:
On Thu, Feb 06, 2020 at 09:23:33AM +0100, Julien Rouhaud wrote:
> While on the topic, is there any reason why the backend stays a group leader
> for the rest of its lifetime, and should we change that?

Nothing happens without a reason.  a1c1af2 is the original commit, and
the thread is here:
https://www.postgresql.org/message-id/CA+TgmoapgKdy_Z0W9mHqZcGSo2t_t-4_V36DXaKim+X_fYp0oQ@mail.gmail.com

By looking at the surroundings, there are a couple of assumptions
behind the timing of the shutdown for the workers and the leader.
I have not studied much the details on that, but my guess is that it
makes the handling of the leader shutting down before its workers
easier.  Robert or Amit likely know all the details here.

> Also, while reading ProcKill, I noticed a typo in a comment:
>
>     /*
>      * Detach from any lock group of which we are a member.  If the leader
> -    * exist before all other group members, it's PGPROC will remain allocated
> +    * exist before all other group members, its PGPROC will remain allocated
>      * until the last group process exits; that process must return the
>      * leader's PGPROC to the appropriate list.
>      */

Thanks, fixed.
--
Michael

Attachment

Re: Expose lock group leader pid in pg_stat_activity

From
Justin Pryzby
Date:
On Tue, Jan 28, 2020 at 12:36:41PM +0100, Julien Rouhaud wrote:
> So, AFAICT the LockHashPartitionLockByProc is required when
> iterating/modifying lockGroupMembers or lockGroupLink, but just
> getting the leader pid should be safe.

This still seems unsafe:

git show -U11 -w --patience b025f32e0b src/backend/utils/adt/pgstatfuncs.c
+                       /*
+                        * If a PGPROC entry was retrieved, display wait events and lock
+                        * group leader information if any.  To avoid extra overhead, no
+                        * extra lock is being held, so there is no guarantee of
+                        * consistency across multiple rows.
+                        */
...
+                               PGPROC     *leader;
...
+                               leader = proc->lockGroupLeader;
+                               if (leader)
+                               {
# does something guarantee that leader doesn't change ?
+                                       values[29] = Int32GetDatum(leader->pid);
+                                       nulls[29] = false;
                                }

Michael seems to have raised the issue:

On Thu, Jan 16, 2020 at 04:49:12PM +0900, Michael Paquier wrote:
> And actually, the way you are looking at the leader's PID is visibly
> incorrect and inconsistent because the patch takes no shared LWLock on
> the leader using LockHashPartitionLockByProc() followed by
> LWLockAcquire(), no?  That's incorrect because it could be perfectly
> possible to crash with this code between the moment you check if 
> lockGroupLeader is NULL and the moment you look at
> lockGroupLeader->pid if a process is being stopped in-between and
> removes itself from a lock group in ProcKill().  

But I don't see how it was addressed ?

I read this:

src/backend/storage/lmgr/lock.c:         * completely valid.  We cannot safely dereference another backend's
src/backend/storage/lmgr/lock.c-         * lockGroupLeader field without holding all lock partition locks, and
src/backend/storage/lmgr/lock.c-         * it's not worth that.)

I think if you do:
|LWLockAcquire(&proc->backendLock, LW_SHARED);
..then it's at least *safe* to access leader->pid, but it may be inconsistent
unless you also call LockHashPartitionLockByProc.

I wasn't able to produce a crash, so maybe I missed something.

-- 
Justin



Re: Expose lock group leader pid in pg_stat_activity

From
Justin Pryzby
Date:
On Sun, Mar 15, 2020 at 11:27:52PM -0500, Justin Pryzby wrote:
> On Tue, Jan 28, 2020 at 12:36:41PM +0100, Julien Rouhaud wrote:
> > So, AFAICT the LockHashPartitionLockByProc is required when
> > iterating/modifying lockGroupMembers or lockGroupLink, but just
> > getting the leader pid should be safe.
> 
> This still seems unsafe:
> 
> git show -U11 -w --patience b025f32e0b src/backend/utils/adt/pgstatfuncs.c
> +                       /*
> +                        * If a PGPROC entry was retrieved, display wait events and lock
> +                        * group leader information if any.  To avoid extra overhead, no
> +                        * extra lock is being held, so there is no guarantee of
> +                        * consistency across multiple rows.
> +                        */
> ...
> +                               PGPROC     *leader;
> ...
> +                               leader = proc->lockGroupLeader;
> +                               if (leader)
> +                               {
> # does something guarantee that leader doesn't change ?
> +                                       values[29] = Int32GetDatum(leader->pid);
> +                                       nulls[29] = false;
>                                 }
> 
> Michael seems to have raised the issue:
> 
> On Thu, Jan 16, 2020 at 04:49:12PM +0900, Michael Paquier wrote:
> > And actually, the way you are looking at the leader's PID is visibly
> > incorrect and inconsistent because the patch takes no shared LWLock on
> > the leader using LockHashPartitionLockByProc() followed by
> > LWLockAcquire(), no?  That's incorrect because it could be perfectly
> > possible to crash with this code between the moment you check if 
> > lockGroupLeader is NULL and the moment you look at
> > lockGroupLeader->pid if a process is being stopped in-between and
> > removes itself from a lock group in ProcKill().  
> 
> But I don't see how it was addressed ?
> 
> I read this:
> 
> src/backend/storage/lmgr/lock.c:         * completely valid.  We cannot safely dereference another backend's
> src/backend/storage/lmgr/lock.c-         * lockGroupLeader field without holding all lock partition locks, and
> src/backend/storage/lmgr/lock.c-         * it's not worth that.)
> 
> I think if you do:
> |LWLockAcquire(&proc->backendLock, LW_SHARED);
> ..then it's at least *safe* to access leader->pid, but it may be inconsistent
> unless you also call LockHashPartitionLockByProc.
> 
> I wasn't able to produce a crash, so maybe I missed something.

I think I see.  Julien's v3 patch did this:
https://www.postgresql.org/message-id/attachment/106429/pgsa_leader_pid-v3.diff
+                if (proc->lockGroupLeader)
+                    values[29] = Int32GetDatum(proc->lockGroupLeader->pid);

..which is racy because a proc with a leader might die and be replaced by
another proc without a leader between 1 and 2.

But the code since v4 does:

+                leader = proc->lockGroupLeader;
+                if (leader)
+                                       values[29] = Int32GetDatum(leader->pid);

..which is safe because PROCs are allocated in shared memory, so leader is for
sure a non-NULL pointer to a PROC.  leader->pid may be read inconsistently,
which is what the comment says: "no extra lock is being held".

-- 
Justin



Re: Expose lock group leader pid in pg_stat_activity

From
Michael Paquier
Date:
On Mon, Mar 16, 2020 at 12:43:41AM -0500, Justin Pryzby wrote:
> I think I see.  Julien's v3 patch did this:
> https://www.postgresql.org/message-id/attachment/106429/pgsa_leader_pid-v3.diff
> +                if (proc->lockGroupLeader)
> +                    values[29] = Int32GetDatum(proc->lockGroupLeader->pid);
>
> ..which is racy because a proc with a leader might die and be replaced by
> another proc without a leader between 1 and 2.
>
> But the code since v4 does:
>
> +                leader = proc->lockGroupLeader;
> +                if (leader)
> +                                       values[29] = Int32GetDatum(leader->pid);
>
> ..which is safe because PROCs are allocated in shared memory, so leader is for
> sure a non-NULL pointer to a PROC.  leader->pid may be read inconsistently,
> which is what the comment says: "no extra lock is being held".

Yes, you have the correct answer here.  As shaped, the code relies on
the state of a PGPROC entry in shared memory.
--
Michael

Attachment