Thread: RFC: Logging plan of the running query

RFC: Logging plan of the running query

From

torikoshia

Date:

12 May 2021, 11:24:04

Hi,

During the discussion about memory contexts dumping[1], there
was a comment that exposing not only memory contexts but also
query plans and untruncated query string would be useful.

I also feel that it would be nice when thinking about situations
such as troubleshooting a long-running query on production
environments where we cannot use debuggers.

At that point of the above comment, I was considering exposing
such information on the shared memory.
However, since memory contexts are now exposed on the log by
pg_log_backend_memory_contexts(PID), I'm thinking about
defining a function that logs the plan of a running query and
untruncated query string on the specified PID in the same way
as below.

   postgres=# SELECT * FROM pg_log_current_plan(2155192);
    pg_log_current_plan
   ---------------------
    t
   (1 row)

   $ tail -f data/log/postgresql-2021-05-12.log

   2021-05-12 17:37:19.481 JST [2155192] LOG:  logging the plan of 
running query on PID 2155192
           Query Text: SELECT a.filler FROM pgbench_accounts a JOIN 
pgbench_accounts b ON a.aid = b.aid;
           Merge Join  (cost=0.85..83357.85 rows=1000000 width=85)
             Merge Cond: (a.aid = b.aid)
             ->  Index Scan using pgbench_accounts_pkey on 
pgbench_accounts a  (cost=0.42..42377.43 rows=1000000 width=89)
             ->  Index Only Scan using pgbench_accounts_pkey on 
pgbench_accounts b  (cost=0.42..25980.42 rows=1000000 width=4)


Attached a PoC patch.

Any thoughts?

[1] 
https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v1-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

Pavel Stehule

Date:

12 May 2021, 12:07:44

st 12. 5. 2021 v 13:24 odesílatel torikoshia <torikoshia@oss.nttdata.com> napsal:

Hi,

During the discussion about memory contexts dumping[1], there
was a comment that exposing not only memory contexts but also
query plans and untruncated query string would be useful.

I also feel that it would be nice when thinking about situations
such as troubleshooting a long-running query on production
environments where we cannot use debuggers.

At that point of the above comment, I was considering exposing
such information on the shared memory.
However, since memory contexts are now exposed on the log by
pg_log_backend_memory_contexts(PID), I'm thinking about
defining a function that logs the plan of a running query and
untruncated query string on the specified PID in the same way
as below.

postgres=# SELECT * FROM pg_log_current_plan(2155192);
pg_log_current_plan
---------------------
t
(1 row)

$ tail -f data/log/postgresql-2021-05-12.log

2021-05-12 17:37:19.481 JST [2155192] LOG: logging the plan of
running query on PID 2155192
Query Text: SELECT a.filler FROM pgbench_accounts a JOIN
pgbench_accounts b ON a.aid = b.aid;
Merge Join (cost=0.85..83357.85 rows=1000000 width=85)
Merge Cond: (a.aid = b.aid)
-> Index Scan using pgbench_accounts_pkey on
pgbench_accounts a (cost=0.42..42377.43 rows=1000000 width=89)
-> Index Only Scan using pgbench_accounts_pkey on
pgbench_accounts b (cost=0.42..25980.42 rows=1000000 width=4)

Attached a PoC patch.

Any thoughts?

Pavel

[1]
https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

Bharath Rupireddy

Date:

12 May 2021, 12:33:29

On Wed, May 12, 2021 at 4:54 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> Hi,
>
> During the discussion about memory contexts dumping[1], there
> was a comment that exposing not only memory contexts but also
> query plans and untruncated query string would be useful.
>
> I also feel that it would be nice when thinking about situations
> such as troubleshooting a long-running query on production
> environments where we cannot use debuggers.
>
> At that point of the above comment, I was considering exposing
> such information on the shared memory.
> However, since memory contexts are now exposed on the log by
> pg_log_backend_memory_contexts(PID), I'm thinking about
> defining a function that logs the plan of a running query and
> untruncated query string on the specified PID in the same way
> as below.
>
>    postgres=# SELECT * FROM pg_log_current_plan(2155192);
>     pg_log_current_plan
>    ---------------------
>     t
>    (1 row)
>
>    $ tail -f data/log/postgresql-2021-05-12.log
>
>    2021-05-12 17:37:19.481 JST [2155192] LOG:  logging the plan of
> running query on PID 2155192
>            Query Text: SELECT a.filler FROM pgbench_accounts a JOIN
> pgbench_accounts b ON a.aid = b.aid;
>            Merge Join  (cost=0.85..83357.85 rows=1000000 width=85)
>              Merge Cond: (a.aid = b.aid)
>              ->  Index Scan using pgbench_accounts_pkey on
> pgbench_accounts a  (cost=0.42..42377.43 rows=1000000 width=89)
>              ->  Index Only Scan using pgbench_accounts_pkey on
> pgbench_accounts b  (cost=0.42..25980.42 rows=1000000 width=4)
>
>
> Attached a PoC patch.
>
> Any thoughts?
>
> [1]
> https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com

+1 for the idea. It looks like pg_log_current_plan is allowed to run
by superusers. Since it also shows up the full query text and the plan
in the server log as plain text, there are chances that the sensitive
information might be logged into the server log which is a risky thing
from security standpoint. There's another thread (see [1] below) which
discusses this issue by having a separate role for all debugging
purposes. Note that final consensus is not reached yet. We may want to
use the same role for this patch as well.

[1] - https://www.postgresql.org/message-id/CA%2BTgmoZz%3DK1bQRp0Ug%3D6uMGFWg-6kaxdHe6VSWaxq0U-YkppYQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

Matthias van de Meent

Date:

12 May 2021, 12:55:14

On Wed, 12 May 2021 at 13:24, torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> Hi,
>
> During the discussion about memory contexts dumping[1], there
> was a comment that exposing not only memory contexts but also
> query plans and untruncated query string would be useful.
>
> I also feel that it would be nice when thinking about situations
> such as troubleshooting a long-running query on production
> environments where we cannot use debuggers.
>
> At that point of the above comment, I was considering exposing
> such information on the shared memory.
> However, since memory contexts are now exposed on the log by
> pg_log_backend_memory_contexts(PID), I'm thinking about
> defining a function that logs the plan of a running query and
> untruncated query string on the specified PID in the same way
> as below.
>
>    postgres=# SELECT * FROM pg_log_current_plan(2155192);
>     pg_log_current_plan
>    ---------------------
>     t
>    (1 row)
>
>    $ tail -f data/log/postgresql-2021-05-12.log
>
>    2021-05-12 17:37:19.481 JST [2155192] LOG:  logging the plan of
> running query on PID 2155192
>            Query Text: SELECT a.filler FROM pgbench_accounts a JOIN
> pgbench_accounts b ON a.aid = b.aid;
>            Merge Join  (cost=0.85..83357.85 rows=1000000 width=85)
>              Merge Cond: (a.aid = b.aid)
>              ->  Index Scan using pgbench_accounts_pkey on
> pgbench_accounts a  (cost=0.42..42377.43 rows=1000000 width=89)
>              ->  Index Only Scan using pgbench_accounts_pkey on
> pgbench_accounts b  (cost=0.42..25980.42 rows=1000000 width=4)
>
>
> Attached a PoC patch.
>
> Any thoughts?

Great idea. One feature I'd suggest would be adding a 'format' option
as well, if such feature would be feasable.


With regards,

Matthias van de Meent

Re: RFC: Logging plan of the running query

From

Julien Rouhaud

Date:

12 May 2021, 14:40:46

On Wed, May 12, 2021 at 08:24:04PM +0900, torikoshia wrote:
> Hi,
> 
> During the discussion about memory contexts dumping[1], there
> was a comment that exposing not only memory contexts but also
> query plans and untruncated query string would be useful.
> 
> I also feel that it would be nice when thinking about situations
> such as troubleshooting a long-running query on production
> environments where we cannot use debuggers.
> 
> At that point of the above comment, I was considering exposing
> such information on the shared memory.
> However, since memory contexts are now exposed on the log by
> pg_log_backend_memory_contexts(PID), I'm thinking about
> defining a function that logs the plan of a running query and
> untruncated query string on the specified PID in the same way
> as below.
> 
>   postgres=# SELECT * FROM pg_log_current_plan(2155192);
>    pg_log_current_plan
>   ---------------------
>    t
>   (1 row)
> 
>   $ tail -f data/log/postgresql-2021-05-12.log
> 
>   2021-05-12 17:37:19.481 JST [2155192] LOG:  logging the plan of running
> query on PID 2155192
>           Query Text: SELECT a.filler FROM pgbench_accounts a JOIN
> pgbench_accounts b ON a.aid = b.aid;
>           Merge Join  (cost=0.85..83357.85 rows=1000000 width=85)
>             Merge Cond: (a.aid = b.aid)
>             ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts a
> (cost=0.42..42377.43 rows=1000000 width=89)
>             ->  Index Only Scan using pgbench_accounts_pkey on
> pgbench_accounts b  (cost=0.42..25980.42 rows=1000000 width=4)

I didn't read the POC patch yet, but +1 for having that feature.

Re: RFC: Logging plan of the running query

From

Laurenz Albe

Date:

12 May 2021, 16:08:07

On Wed, 2021-05-12 at 18:03 +0530, Bharath Rupireddy wrote:
> On Wed, May 12, 2021 at 4:54 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
> > During the discussion about memory contexts dumping[1], there
> > was a comment that exposing not only memory contexts but also
> > query plans and untruncated query string would be useful.
> > 
> >    postgres=# SELECT * FROM pg_log_current_plan(2155192);
> >     pg_log_current_plan
> >    ---------------------
> >     t
> >    (1 row)
> > 
> >    $ tail -f data/log/postgresql-2021-05-12.log
> > 
> >    2021-05-12 17:37:19.481 JST [2155192] LOG:  logging the plan of
> > running query on PID 2155192
> >            Query Text: SELECT a.filler FROM pgbench_accounts a JOIN
> > pgbench_accounts b ON a.aid = b.aid;
> >            Merge Join  (cost=0.85..83357.85 rows=1000000 width=85)
> >              Merge Cond: (a.aid = b.aid)
> >              ->  Index Scan using pgbench_accounts_pkey on
> > pgbench_accounts a  (cost=0.42..42377.43 rows=1000000 width=89)
> >              ->  Index Only Scan using pgbench_accounts_pkey on
> > pgbench_accounts b  (cost=0.42..25980.42 rows=1000000 width=4)

I love the idea, but I didn't look at the patch.

> Since it also shows up the full query text and the plan
> in the server log as plain text, there are chances that the sensitive
> information might be logged into the server log which is a risky thing
> from security standpoint.

I think that is irrelevant.

A superuser can already set "log_statement = 'all'" to get this.
There is no protection from superusers, and it is pointless to require that.

Yours,
Laurenz Albe

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

13 May 2021, 08:23:33

Thank you all for your positive comments.

On 2021-05-12 21:55, Matthias van de Meent wrote:

> Great idea. One feature I'd suggest would be adding a 'format' option
> as well, if such feature would be feasable.

Thanks for the comment!

During the development of pg_log_backend_memory_contexts(), I tried to
make the number of contexts to record configurable by making it GUC
variable or putting it on the shared memory, but the former seemed an
overkill and the latter introduced some ugly behaviors, so we decided
to make it a static number[1].
I think we face the same difficulty here.

Allowing to select the format would be better as auto_explain does by
auto_explain.log_format, but I'm a bit doubtful that it is worth the
costs.

[1] 

https://www.postgresql.org/message-id/flat/6738f309-a41b-cbe6-bb57-a1c58ce9f05a%40oss.nttdata.com#e687d583080c96563b2fa2e32dbf3fb6

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

13 May 2021, 08:26:20

On 2021-05-13 01:08, Laurenz Albe wrote:
> On Wed, 2021-05-12 at 18:03 +0530, Bharath Rupireddy wrote:
>> Since it also shows up the full query text and the plan
>> in the server log as plain text, there are chances that the sensitive
>> information might be logged into the server log which is a risky thing
>> from security standpoint.

Thanks for the notification!

> I think that is irrelevant.
> 
> A superuser can already set "log_statement = 'all'" to get this.
> There is no protection from superusers, and it is pointless to require 
> that.

AFAIU, since that discussion is whether or not users other than 
superusers
should be given the privilege to execute the backtrace printing 
function,
I think it might be applicable to pg_log_current_plan().

Since restricting privilege to superusers is stricter, I'm going to 
proceed
as it is for now, but depending on the above discussion, it may be 
better to
change it.

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

Dilip Kumar

Date:

13 May 2021, 09:13:56

On Wed, May 12, 2021 at 4:54 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> Hi,
>
> During the discussion about memory contexts dumping[1], there
> was a comment that exposing not only memory contexts but also
> query plans and untruncated query string would be useful.
>
> I also feel that it would be nice when thinking about situations
> such as troubleshooting a long-running query on production
> environments where we cannot use debuggers.
>
> At that point of the above comment, I was considering exposing
> such information on the shared memory.
> However, since memory contexts are now exposed on the log by
> pg_log_backend_memory_contexts(PID), I'm thinking about
> defining a function that logs the plan of a running query and
> untruncated query string on the specified PID in the same way
> as below.
>
>    postgres=# SELECT * FROM pg_log_current_plan(2155192);
>     pg_log_current_plan
>    ---------------------
>     t
>    (1 row)

+1 for the idea.  I did not read the complete patch but while reading
through the patch, I noticed that you using elevel as LOG for printing
the stack trace.  But I think the backend whose pid you have passed,
the connected client to that backend might not have superuser
privileges and if you use elevel LOG then that message will be sent to
that connected client as well and I don't think that is secure.  So
can we use LOG_SERVER_ONLY so that we can prevent
it from sending to the client.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

Bharath Rupireddy

Date:

13 May 2021, 09:27:56

On Thu, May 13, 2021 at 2:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> +1 for the idea.  I did not read the complete patch but while reading
> through the patch, I noticed that you using elevel as LOG for printing
> the stack trace.  But I think the backend whose pid you have passed,
> the connected client to that backend might not have superuser
> privileges and if you use elevel LOG then that message will be sent to
> that connected client as well and I don't think that is secure.  So
> can we use LOG_SERVER_ONLY so that we can prevent
> it from sending to the client.

True, we should use LOG_SERVER_ONLY and not send any logs to the client.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

Bharath Rupireddy

Date:

13 May 2021, 09:36:41

On Thu, May 13, 2021 at 2:57 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> On Thu, May 13, 2021 at 2:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > +1 for the idea.  I did not read the complete patch but while reading
> > through the patch, I noticed that you using elevel as LOG for printing
> > the stack trace.  But I think the backend whose pid you have passed,
> > the connected client to that backend might not have superuser
> > privileges and if you use elevel LOG then that message will be sent to
> > that connected client as well and I don't think that is secure.  So
> > can we use LOG_SERVER_ONLY so that we can prevent
> > it from sending to the client.
>
> True, we should use LOG_SERVER_ONLY and not send any logs to the client.

I further tend to think that, is it correct to log queries with LOG
level when log_statement GUC is set? Or should it also be
LOG_SERVER_ONLY?

    /* Log immediately if dictated by log_statement */
    if (check_log_statement(parsetree_list))
    {
        ereport(LOG,
                (errmsg("statement: %s", query_string),
                 errhidestmt(true),
                 errdetail_execute(parsetree_list)));

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

Bharath Rupireddy

Date:

13 May 2021, 09:38:50

On Thu, May 13, 2021 at 1:56 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2021-05-13 01:08, Laurenz Albe wrote:
> > On Wed, 2021-05-12 at 18:03 +0530, Bharath Rupireddy wrote:
> >> Since it also shows up the full query text and the plan
> >> in the server log as plain text, there are chances that the sensitive
> >> information might be logged into the server log which is a risky thing
> >> from security standpoint.
>
> Thanks for the notification!
>
> > I think that is irrelevant.
> >
> > A superuser can already set "log_statement = 'all'" to get this.
> > There is no protection from superusers, and it is pointless to require
> > that.
>
> AFAIU, since that discussion is whether or not users other than
> superusers
> should be given the privilege to execute the backtrace printing
> function,
> I think it might be applicable to pg_log_current_plan().
>
> Since restricting privilege to superusers is stricter, I'm going to
> proceed
> as it is for now, but depending on the above discussion, it may be
> better to
> change it.

Yeah, we can keep it as superuser-only for now.

Might be unrelated, but just for info - there's another thread
"Granting control of SUSET gucs to non-superusers" at [1] discussing
the new roles.

[1] - https://www.postgresql.org/message-id/F9408A5A-B20B-42D2-9E7F-49CD3D1547BC%40enterprisedb.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

Dilip Kumar

Date:

13 May 2021, 09:49:48

On Thu, May 13, 2021 at 3:06 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, May 13, 2021 at 2:57 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > On Thu, May 13, 2021 at 2:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > +1 for the idea.  I did not read the complete patch but while reading
> > > through the patch, I noticed that you using elevel as LOG for printing
> > > the stack trace.  But I think the backend whose pid you have passed,
> > > the connected client to that backend might not have superuser
> > > privileges and if you use elevel LOG then that message will be sent to
> > > that connected client as well and I don't think that is secure.  So
> > > can we use LOG_SERVER_ONLY so that we can prevent
> > > it from sending to the client.
> >
> > True, we should use LOG_SERVER_ONLY and not send any logs to the client.
>
> I further tend to think that, is it correct to log queries with LOG
> level when log_statement GUC is set? Or should it also be
> LOG_SERVER_ONLY?
>
>     /* Log immediately if dictated by log_statement */
>     if (check_log_statement(parsetree_list))
>     {
>         ereport(LOG,
>                 (errmsg("statement: %s", query_string),
>                  errhidestmt(true),
>                  errdetail_execute(parsetree_list)));
>

What is your argument behind logging it with LOG? I mean we are
sending the signal to all the backend and some backend might have the
client who is not connected as a superuser so sending the plan to
those clients is not a good idea from a security perspective.
Anyways, LOG_SERVER_ONLY is not an exposed logging level it is used
for an internal purpose.  So IMHO it should be logged with
LOG_SERVER_ONLY level.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

13 May 2021, 10:12:39

On 2021-05-13 18:36, Bharath Rupireddy wrote:
> On Thu, May 13, 2021 at 2:57 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> On Thu, May 13, 2021 at 2:44 PM Dilip Kumar <dilipbalaut@gmail.com> 
>> wrote:
>> > +1 for the idea.  I did not read the complete patch but while reading
>> > through the patch, I noticed that you using elevel as LOG for printing
>> > the stack trace.  But I think the backend whose pid you have passed,
>> > the connected client to that backend might not have superuser
>> > privileges and if you use elevel LOG then that message will be sent to
>> > that connected client as well and I don't think that is secure.  So
>> > can we use LOG_SERVER_ONLY so that we can prevent
>> > it from sending to the client.
>> 
>> True, we should use LOG_SERVER_ONLY and not send any logs to the 
>> client.

Thanks, agree with changing it to LOG_SERVER_ONLY.

> I further tend to think that, is it correct to log queries with LOG
> level when log_statement GUC is set? Or should it also be
> LOG_SERVER_ONLY?

I feel it's OK to log with LOG_SERVER_ONLY since the log from
log_statement GUC would be printed already and independently.
ISTM people don't expect to log_statement GUC works even on
pg_log_current_plan(), do they?


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

Bharath Rupireddy

Date:

13 May 2021, 10:46:39

On Thu, May 13, 2021 at 3:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, May 13, 2021 at 3:06 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Thu, May 13, 2021 at 2:57 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > On Thu, May 13, 2021 at 2:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > +1 for the idea.  I did not read the complete patch but while reading
> > > > through the patch, I noticed that you using elevel as LOG for printing
> > > > the stack trace.  But I think the backend whose pid you have passed,
> > > > the connected client to that backend might not have superuser
> > > > privileges and if you use elevel LOG then that message will be sent to
> > > > that connected client as well and I don't think that is secure.  So
> > > > can we use LOG_SERVER_ONLY so that we can prevent
> > > > it from sending to the client.
> > >
> > > True, we should use LOG_SERVER_ONLY and not send any logs to the client.
> >
> > I further tend to think that, is it correct to log queries with LOG
> > level when log_statement GUC is set? Or should it also be
> > LOG_SERVER_ONLY?
> >
> >     /* Log immediately if dictated by log_statement */
> >     if (check_log_statement(parsetree_list))
> >     {
> >         ereport(LOG,
> >                 (errmsg("statement: %s", query_string),
> >                  errhidestmt(true),
> >                  errdetail_execute(parsetree_list)));
>
> What is your argument behind logging it with LOG? I mean we are
> sending the signal to all the backend and some backend might have the
> client who is not connected as a superuser so sending the plan to
> those clients is not a good idea from a security perspective.
> Anyways, LOG_SERVER_ONLY is not an exposed logging level it is used
> for an internal purpose.  So IMHO it should be logged with
> LOG_SERVER_ONLY level.

I'm saying that -  currently, queries are logged with LOG level when
the log_statement GUC is set. The queries might be sent to the
non-superuser clients. So, your point of "sending the plan to those
clients is not a good idea from a security perspective" gets violated
right? Should the log level be changed(in the below code) from "LOG"
to "LOG_SERVER_ONLY"? I think we can discuss this separately so as not
to sidetrack the main feature.

    /* Log immediately if dictated by log_statement */
    if (check_log_statement(parsetree_list))
    {
        ereport(LOG,
                (errmsg("statement: %s", query_string),
                 errhidestmt(true),
                 errdetail_execute(parsetree_list)));

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

Dilip Kumar

Date:

13 May 2021, 11:43:47

On Thu, May 13, 2021 at 4:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> I'm saying that -  currently, queries are logged with LOG level when
> the log_statement GUC is set. The queries might be sent to the
> non-superuser clients. So, your point of "sending the plan to those
> clients is not a good idea from a security perspective" gets violated
> right? Should the log level be changed(in the below code) from "LOG"
> to "LOG_SERVER_ONLY"? I think we can discuss this separately so as not
> to sidetrack the main feature.
>
>     /* Log immediately if dictated by log_statement */
>     if (check_log_statement(parsetree_list))
>     {
>         ereport(LOG,
>                 (errmsg("statement: %s", query_string),
>                  errhidestmt(true),
>                  errdetail_execute(parsetree_list)));
>

Yes, that was my exact point, that in this particular code log with
LOG_SERVER_ONLY.

Like this.
     /* Log immediately if dictated by log_statement */
     if (check_log_statement(parsetree_list))
     {
         ereport(LOG_SERVER_ONLY,
.....


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

Bharath Rupireddy

Date:

13 May 2021, 11:45:12

On Thu, May 13, 2021 at 5:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, May 13, 2021 at 4:16 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > I'm saying that -  currently, queries are logged with LOG level when
> > the log_statement GUC is set. The queries might be sent to the
> > non-superuser clients. So, your point of "sending the plan to those
> > clients is not a good idea from a security perspective" gets violated
> > right? Should the log level be changed(in the below code) from "LOG"
> > to "LOG_SERVER_ONLY"? I think we can discuss this separately so as not
> > to sidetrack the main feature.
> >
> >     /* Log immediately if dictated by log_statement */
> >     if (check_log_statement(parsetree_list))
> >     {
> >         ereport(LOG,
> >                 (errmsg("statement: %s", query_string),
> >                  errhidestmt(true),
> >                  errdetail_execute(parsetree_list)));
> >
>
> Yes, that was my exact point, that in this particular code log with
> LOG_SERVER_ONLY.
>
> Like this.
>      /* Log immediately if dictated by log_statement */
>      if (check_log_statement(parsetree_list))
>      {
>          ereport(LOG_SERVER_ONLY,

Agree, but let's discuss that in a separate thread.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

Dilip Kumar

Date:

13 May 2021, 11:48:52

On Thu, May 13, 2021 at 5:15 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, May 13, 2021 at 5:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, May 13, 2021 at 4:16 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > I'm saying that -  currently, queries are logged with LOG level when
> > > the log_statement GUC is set. The queries might be sent to the
> > > non-superuser clients. So, your point of "sending the plan to those
> > > clients is not a good idea from a security perspective" gets violated
> > > right? Should the log level be changed(in the below code) from "LOG"
> > > to "LOG_SERVER_ONLY"? I think we can discuss this separately so as not
> > > to sidetrack the main feature.
> > >
> > >     /* Log immediately if dictated by log_statement */
> > >     if (check_log_statement(parsetree_list))
> > >     {
> > >         ereport(LOG,
> > >                 (errmsg("statement: %s", query_string),
> > >                  errhidestmt(true),
> > >                  errdetail_execute(parsetree_list)));
> > >
> >
> > Yes, that was my exact point, that in this particular code log with
> > LOG_SERVER_ONLY.
> >
> > Like this.
> >      /* Log immediately if dictated by log_statement */
> >      if (check_log_statement(parsetree_list))
> >      {
> >          ereport(LOG_SERVER_ONLY,
>
> Agree, but let's discuss that in a separate thread.

Did not understand why separate thread? this is part of this thread
no? but anyways now everyone agreed that we will log with
LOG_SERVER_ONLY.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

Dilip Kumar

Date:

13 May 2021, 12:57:32

On Thu, May 13, 2021 at 5:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, May 13, 2021 at 5:15 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Thu, May 13, 2021 at 5:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, May 13, 2021 at 4:16 PM Bharath Rupireddy
> > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > >
> > > > I'm saying that -  currently, queries are logged with LOG level when
> > > > the log_statement GUC is set. The queries might be sent to the
> > > > non-superuser clients. So, your point of "sending the plan to those
> > > > clients is not a good idea from a security perspective" gets violated
> > > > right? Should the log level be changed(in the below code) from "LOG"
> > > > to "LOG_SERVER_ONLY"? I think we can discuss this separately so as not
> > > > to sidetrack the main feature.
> > > >
> > > >     /* Log immediately if dictated by log_statement */
> > > >     if (check_log_statement(parsetree_list))
> > > >     {
> > > >         ereport(LOG,
> > > >                 (errmsg("statement: %s", query_string),
> > > >                  errhidestmt(true),
> > > >                  errdetail_execute(parsetree_list)));
> > > >
> > >
> > > Yes, that was my exact point, that in this particular code log with
> > > LOG_SERVER_ONLY.
> > >
> > > Like this.
> > >      /* Log immediately if dictated by log_statement */
> > >      if (check_log_statement(parsetree_list))
> > >      {
> > >          ereport(LOG_SERVER_ONLY,
> >
> > Agree, but let's discuss that in a separate thread.
>
> Did not understand why separate thread? this is part of this thread
> no? but anyways now everyone agreed that we will log with
> LOG_SERVER_ONLY.

Bharat offlist pointed to me that here he was talking about another
log that is logging the query and not specific to this patch, so let's
not discuss this point here.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

28 May 2021, 06:51:37

On 2021-05-13 21:57, Dilip Kumar wrote:
> On Thu, May 13, 2021 at 5:18 PM Dilip Kumar <dilipbalaut@gmail.com> 
> wrote:
>> 
>> On Thu, May 13, 2021 at 5:15 PM Bharath Rupireddy
>> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> >
>> > On Thu, May 13, 2021 at 5:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> > >
>> > > On Thu, May 13, 2021 at 4:16 PM Bharath Rupireddy
>> > > <bharath.rupireddyforpostgres@gmail.com> wrote:
>> > > >
>> > > > I'm saying that -  currently, queries are logged with LOG level when
>> > > > the log_statement GUC is set. The queries might be sent to the
>> > > > non-superuser clients. So, your point of "sending the plan to those
>> > > > clients is not a good idea from a security perspective" gets violated
>> > > > right? Should the log level be changed(in the below code) from "LOG"
>> > > > to "LOG_SERVER_ONLY"? I think we can discuss this separately so as not
>> > > > to sidetrack the main feature.
>> > > >
>> > > >     /* Log immediately if dictated by log_statement */
>> > > >     if (check_log_statement(parsetree_list))
>> > > >     {
>> > > >         ereport(LOG,
>> > > >                 (errmsg("statement: %s", query_string),
>> > > >                  errhidestmt(true),
>> > > >                  errdetail_execute(parsetree_list)));
>> > > >
>> > >
>> > > Yes, that was my exact point, that in this particular code log with
>> > > LOG_SERVER_ONLY.
>> > >
>> > > Like this.
>> > >      /* Log immediately if dictated by log_statement */
>> > >      if (check_log_statement(parsetree_list))
>> > >      {
>> > >          ereport(LOG_SERVER_ONLY,
>> >
>> > Agree, but let's discuss that in a separate thread.
>> 
>> Did not understand why separate thread? this is part of this thread
>> no? but anyways now everyone agreed that we will log with
>> LOG_SERVER_ONLY.

Modified elevel from LOG to LOG_SERVER_ONLY.

I also modified the patch to log JIT Summary and GUC settings 
information.
If there is other useful information to log, I would appreciate it if 
you could point it out.

> Bharat offlist pointed to me that here he was talking about another
> log that is logging the query and not specific to this patch, so let's
> not discuss this point here.

Thanks for sharing the situation!

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v2-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

09 June 2021, 07:44:55

On 2021-05-28 15:51, torikoshia wrote:
> On 2021-05-13 21:57, Dilip Kumar wrote:
>> On Thu, May 13, 2021 at 5:18 PM Dilip Kumar <dilipbalaut@gmail.com> 
>> wrote:
>>> 
>>> On Thu, May 13, 2021 at 5:15 PM Bharath Rupireddy
>>> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>> >
>>> > On Thu, May 13, 2021 at 5:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>> > >
>>> > > On Thu, May 13, 2021 at 4:16 PM Bharath Rupireddy
>>> > > <bharath.rupireddyforpostgres@gmail.com> wrote:
>>> > > >
>>> > > > I'm saying that -  currently, queries are logged with LOG level when
>>> > > > the log_statement GUC is set. The queries might be sent to the
>>> > > > non-superuser clients. So, your point of "sending the plan to those
>>> > > > clients is not a good idea from a security perspective" gets violated
>>> > > > right? Should the log level be changed(in the below code) from "LOG"
>>> > > > to "LOG_SERVER_ONLY"? I think we can discuss this separately so as not
>>> > > > to sidetrack the main feature.
>>> > > >
>>> > > >     /* Log immediately if dictated by log_statement */
>>> > > >     if (check_log_statement(parsetree_list))
>>> > > >     {
>>> > > >         ereport(LOG,
>>> > > >                 (errmsg("statement: %s", query_string),
>>> > > >                  errhidestmt(true),
>>> > > >                  errdetail_execute(parsetree_list)));
>>> > > >
>>> > >
>>> > > Yes, that was my exact point, that in this particular code log with
>>> > > LOG_SERVER_ONLY.
>>> > >
>>> > > Like this.
>>> > >      /* Log immediately if dictated by log_statement */
>>> > >      if (check_log_statement(parsetree_list))
>>> > >      {
>>> > >          ereport(LOG_SERVER_ONLY,
>>> >
>>> > Agree, but let's discuss that in a separate thread.
>>> 
>>> Did not understand why separate thread? this is part of this thread
>>> no? but anyways now everyone agreed that we will log with
>>> LOG_SERVER_ONLY.
> 
> Modified elevel from LOG to LOG_SERVER_ONLY.
> 
> I also modified the patch to log JIT Summary and GUC settings 
> information.
> If there is other useful information to log, I would appreciate it if
> you could point it out.

Updated the patch.

- reordered superuser check which was pointed out in another thread[1]
- added a regression test

[1] https://postgr.es/m/YLxw1uVGIAP5uMPl@paquier.xyz

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v3-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

Fujii Masao

Date:

09 June 2021, 14:04:22


On 2021/06/09 16:44, torikoshia wrote:
> Updated the patch.

Thanks for updating the patch!

auto_explain can log the plan of even nested statement
if auto_explain.log_nested_statements is enabled.
But ISTM that pg_log_current_plan() cannot log that plan.
Is this intentional?
I think that it's better to make pg_log_current_plan() log
the plan of even nested statement.


+    es->format = EXPLAIN_FORMAT_TEXT;
+    es->settings = true;

Since pg_log_current_plan() is usually used to investigate and
trouble-shoot the long running queries, IMO it's better to
enable es->verbose by default and get additional information
about the queries. Thought?


+ * pg_log_current_plan
+ *        Signal a backend process to log plan the of running query.

"plan the of" is typo?


+ * Only superusers are allowed to signal to log plan because any users to
+ * issue this request at an unbounded rate would cause lots of log messages
+ * and which can lead to denial of service.

"because any users" should be "because allowing any users"
like the comment for pg_log_backend_memory_contexts()?


+ * All the actual work is deferred to ProcessLogExplainInterrupt(),

"ProcessLogExplainInterrupt()" should be "ProcessLogCurrentPlanInterrupt()"?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

10 June 2021, 02:09:22

On 2021-06-09 23:04, Fujii Masao wrote:

Thanks for your review!

> auto_explain can log the plan of even nested statement
> if auto_explain.log_nested_statements is enabled.
> But ISTM that pg_log_current_plan() cannot log that plan.
> Is this intentional?

> I think that it's better to make pg_log_current_plan() log
> the plan of even nested statement.

+1. It would be better.
But currently plan information is got from ActivePortal and ISTM there 
are no easy way to retrieve plan information of nested statements from 
ActivePortal.
Anyway I'll do some more research.


I think you are right about the following comments.
I'll fix them.

> +    es->format = EXPLAIN_FORMAT_TEXT;
> +    es->settings = true;
> 
> Since pg_log_current_plan() is usually used to investigate and
> trouble-shoot the long running queries, IMO it's better to
> enable es->verbose by default and get additional information
> about the queries. Thought?
> + * pg_log_current_plan
> + *        Signal a backend process to log plan the of running query.
> 
> "plan the of" is typo?
> 
> 
> + * Only superusers are allowed to signal to log plan because any users 
> to
> + * issue this request at an unbounded rate would cause lots of log 
> messages
> + * and which can lead to denial of service.
> 
> "because any users" should be "because allowing any users"
> like the comment for pg_log_backend_memory_contexts()?
> 
> 
> + * All the actual work is deferred to ProcessLogExplainInterrupt(),
> 
> "ProcessLogExplainInterrupt()" should be 
> "ProcessLogCurrentPlanInterrupt()"?
> 
> Regards,

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

Bharath Rupireddy

Date:

10 June 2021, 16:20:48

On Wed, Jun 9, 2021 at 1:14 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
> Updated the patch.

Thanks for the patch. Here are some comments on v3 patch:

1) We could just say "Requests to log query plan of the presently
running query of a given backend along with an untruncated query
string in the server logs."
Instead of
+        They will be logged at <literal>LOG</literal> message level and
+        will appear in the server log based on the log
+        configuration set (See <xref linkend="runtime-config-logging"/>

2) It's better to do below, for reference see how pg_backend_pid,
pg_terminate_backend, pg_relpages and so on are used in the tests.
+SELECT pg_log_current_plan(pg_backend_pid());
rather than using the function in the FROM clause.
+SELECT * FROM pg_log_current_plan(pg_backend_pid());
If okay, also change it for pg_log_backend_memory_contexts.

3) Since most of the code looks same in pg_log_backend_memory_contexts
and pg_log_current_plan, I think we can have a common function
something like below:
bool
SendProcSignalForLogInfo(ProcSignalReason reason)
{
Assert(reason == PROCSIG_LOG_MEMORY_CONTEXT || reason ==
PROCSIG_LOG_CURRENT_PLAN);

if (!superuser())
{
if (reason == PROCSIG_LOG_MEMORY_CONTEXT)
errmsg("must be a superuser to log memory contexts")
else if (reason == PROCSIG_LOG_CURRENT_PLAN)
errmsg("must be a superuser to log plan of the running query")
}

if (SendProcSignal(pid, reason, proc->backendId) < 0)
{
}
}
Then we could just do:
Datum
pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
{
PG_RETURN_BOOL(SendProcSignalForLogInfo(PROCSIG_LOG_MEMORY_CONTEXT));
}
Datum
pg_log_current_plan(PG_FUNCTION_ARGS)
{
PG_RETURN_BOOL(SendProcSignalForLogInfo(PROCSIG_LOG_CURRENT_PLAN));
}
We can have SendProcSignalForLogInfo function defined in procsignal.c
and procsignal.h

4) I think we can have a sample function usage and how it returns true
value, how the plan looks for a simple query(select 1 or some other
simple/complex generic query or simply select
pg_log_current_plan(pg_backend_pid());) in the documentation, much
like pg_log_backend_memory_contexts.

5) Instead of just showing the true return value of the function
pg_log_current_plan in the sql test, which just shows that the signal
is sent, but it doesn't mean that the backend has processed that
signal and logged the plan. I think we can add the test using TAP
framework, one

6) Do we unnecessarily need to signal the processes such as autovacuum
launcher/workers, logical replication launcher/workers, wal senders,
wal receivers and so on. only to emit a "PID %d is not executing
queries now" message? Moreover, those processes will be waiting in
loops for timeouts to occur, then as soon as they wake up do they need
to process this extra uninformative signal?
We could choose to not signal those processes at all which might or
might not be possible.
Otherwise, we could just emit messages like "XXXX process cannot run a
query" in ProcessInterrupts.

7)Instead of
(errmsg("logging plan of the running query on PID %d\n%s",
how about below?
(errmsg("plan of the query running on backend with PID %d is:\n%s",

8) Instead of
errmsg("PID %d is not executing queries now")
how about below?
errmsg("Backend with PID %d is not running a query")

9) We could just do:
void
ProcessLogCurrentPlanInterrupt(void)
{
ExplainState *es;
LogCurrentPlanPending = false;
if (!ActivePortal || !ActivePortal->queryDesc)
errmsg("PID %d is not executing queries now");
es = NewExplainState();
ExplainQueryText();
ExplainPrintPlan();

10) How about renaming the function pg_log_current_plan to
pg_log_query_plan or pg_log_current_query_plan?

11) What happens if pg_log_current_plan is called for a parallel worker?

With Regards,
Bharath Rupireddy.

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

14 June 2021, 12:18:51

On 2021-06-11 01:20, Bharath Rupireddy wrote:

Thanks for your review!

> On Wed, Jun 9, 2021 at 1:14 PM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> Updated the patch.
> 
> Thanks for the patch. Here are some comments on v3 patch:
> 
> 1) We could just say "Requests to log query plan of the presently
> running query of a given backend along with an untruncated query
> string in the server logs."
> Instead of
> +        They will be logged at <literal>LOG</literal> message level 
> and
> +        will appear in the server log based on the log
> +        configuration set (See <xref 
> linkend="runtime-config-logging"/>

Actually this explanation is almost the same as that of
pg_log_backend_memory_contexts().
Do you think we should change both of them?
I think it may be too detailed but accurate.

> 2) It's better to do below, for reference see how pg_backend_pid,
> pg_terminate_backend, pg_relpages and so on are used in the tests.
> +SELECT pg_log_current_plan(pg_backend_pid());
> rather than using the function in the FROM clause.
> +SELECT * FROM pg_log_current_plan(pg_backend_pid());
> If okay, also change it for pg_log_backend_memory_contexts.

Agreed.

> 3) Since most of the code looks same in pg_log_backend_memory_contexts
> and pg_log_current_plan, I think we can have a common function
> something like below:

Agreed. I'll do some refactoring.

> bool
> SendProcSignalForLogInfo(ProcSignalReason reason)
> {
> Assert(reason == PROCSIG_LOG_MEMORY_CONTEXT || reason ==
> PROCSIG_LOG_CURRENT_PLAN);
> 
> if (!superuser())
> {
> if (reason == PROCSIG_LOG_MEMORY_CONTEXT)
> errmsg("must be a superuser to log memory contexts")
> else if (reason == PROCSIG_LOG_CURRENT_PLAN)
> errmsg("must be a superuser to log plan of the running query")
> }
> 
> if (SendProcSignal(pid, reason, proc->backendId) < 0)
> {
> }
> }
> Then we could just do:
> Datum
> pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
> {
> PG_RETURN_BOOL(SendProcSignalForLogInfo(PROCSIG_LOG_MEMORY_CONTEXT));
> }
> Datum
> pg_log_current_plan(PG_FUNCTION_ARGS)
> {
> PG_RETURN_BOOL(SendProcSignalForLogInfo(PROCSIG_LOG_CURRENT_PLAN));
> }
> We can have SendProcSignalForLogInfo function defined in procsignal.c
> and procsignal.h
> 
> 4) I think we can have a sample function usage and how it returns true
> value, how the plan looks for a simple query(select 1 or some other
> simple/complex generic query or simply select
> pg_log_current_plan(pg_backend_pid());) in the documentation, much
> like pg_log_backend_memory_contexts.

+1.

> 5) Instead of just showing the true return value of the function
> pg_log_current_plan in the sql test, which just shows that the signal
> is sent, but it doesn't mean that the backend has processed that
> signal and logged the plan. I think we can add the test using TAP
> framework, one

I once made a tap test for pg_log_backend_memory_contexts(), but it
seemed an overkill and we agreed that it was appropriate just ensuring
the function working as below discussion.

   
https://www.postgresql.org/message-id/bbecd722d4f8e261b350186ac4bf68a7%40oss.nttdata.com

> 6) Do we unnecessarily need to signal the processes such as autovacuum
> launcher/workers, logical replication launcher/workers, wal senders,
> wal receivers and so on. only to emit a "PID %d is not executing
> queries now" message? Moreover, those processes will be waiting in
> loops for timeouts to occur, then as soon as they wake up do they need
> to process this extra uninformative signal?
> We could choose to not signal those processes at all which might or
> might not be possible.
> Otherwise, we could just emit messages like "XXXX process cannot run a
> query" in ProcessInterrupts.

Agreed.

However it needs to distinguish backends which can execute queries and
other processes such as autovacuum launcher, I don't come up with
easy ways to do so.
I'm going to think about it.

> 7)Instead of
> (errmsg("logging plan of the running query on PID %d\n%s",
> how about below?
> (errmsg("plan of the query running on backend with PID %d is:\n%s",

+1.

> 8) Instead of
> errmsg("PID %d is not executing queries now")
> how about below?
> errmsg("Backend with PID %d is not running a query")

+1.

> 
> 9) We could just do:
> void
> ProcessLogCurrentPlanInterrupt(void)
> {
> ExplainState *es;
> LogCurrentPlanPending = false;
> if (!ActivePortal || !ActivePortal->queryDesc)
> errmsg("PID %d is not executing queries now");
> es = NewExplainState();
> ExplainQueryText();
> ExplainPrintPlan();
> 
> 10) How about renaming the function pg_log_current_plan to
> pg_log_query_plan or pg_log_current_query_plan?

+1.

> 11) What happens if pg_log_current_plan is called for a parallel 
> worker?

AFAIU Parallel worker doesn't have ActivePortal, so it would always
emit the message 'PID %d is not executing queries now'.
As 6), it would be better to distinguish the parallel worker and normal
backend.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

Bharath Rupireddy

Date:

15 June 2021, 04:27:02

On Mon, Jun 14, 2021 at 5:48 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
> > 1) We could just say "Requests to log query plan of the presently
> > running query of a given backend along with an untruncated query
> > string in the server logs."
> > Instead of
> > +        They will be logged at <literal>LOG</literal> message level
> > and
> > +        will appear in the server log based on the log
> > +        configuration set (See <xref
> > linkend="runtime-config-logging"/>
>
> Actually this explanation is almost the same as that of
> pg_log_backend_memory_contexts().
> Do you think we should change both of them?
> I think it may be too detailed but accurate.

I withdraw my comment. We can keep the explanation similar to
pg_log_backend_memory_contexts as it was agreed upon and committed
text. If the wordings are similar, then it will be easier for users to
understand the documentation.

> > 5) Instead of just showing the true return value of the function
> > pg_log_current_plan in the sql test, which just shows that the signal
> > is sent, but it doesn't mean that the backend has processed that
> > signal and logged the plan. I think we can add the test using TAP
> > framework, one
>
> I once made a tap test for pg_log_backend_memory_contexts(), but it
> seemed an overkill and we agreed that it was appropriate just ensuring
> the function working as below discussion.
>
> https://www.postgresql.org/message-id/bbecd722d4f8e261b350186ac4bf68a7%40oss.nttdata.com

Okay. I withdraw my comment.

> > 6) Do we unnecessarily need to signal the processes such as autovacuum
> > launcher/workers, logical replication launcher/workers, wal senders,
> > wal receivers and so on. only to emit a "PID %d is not executing
> > queries now" message? Moreover, those processes will be waiting in
> > loops for timeouts to occur, then as soon as they wake up do they need
> > to process this extra uninformative signal?
> > We could choose to not signal those processes at all which might or
> > might not be possible.
> > Otherwise, we could just emit messages like "XXXX process cannot run a
> > query" in ProcessInterrupts.
>
> Agreed.
>
> However it needs to distinguish backends which can execute queries and
> other processes such as autovacuum launcher, I don't come up with
> easy ways to do so.
> I'm going to think about it.

I'm not sure if there is any information in the shared memory
accessible to all the backends/sessions that can say a PID is
autovacuum launcher/worker, logical replication launcher/worker or any
other background or parallel worker. If we were to invent a new
mechanism just for addressing the above comment, I would rather choose
to not do that as it seems like an overkill. We can leave it up to the
user whether or not to unnecessarily signal those processes which are
bound to print "PID XXX is not executing queries now" message.

> > 11) What happens if pg_log_current_plan is called for a parallel
> > worker?
>
> AFAIU Parallel worker doesn't have ActivePortal, so it would always
> emit the message 'PID %d is not executing queries now'.
> As 6), it would be better to distinguish the parallel worker and normal
> backend.

As I said, above, I think it will be a bit tough to do. If done, it
seems like an overkill.

With Regards,
Bharath Rupireddy.

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

16 June 2021, 11:36:08

On 2021-06-15 13:27, Bharath Rupireddy wrote:
> On Mon, Jun 14, 2021 at 5:48 PM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> > 1) We could just say "Requests to log query plan of the presently
>> > running query of a given backend along with an untruncated query
>> > string in the server logs."
>> > Instead of
>> > +        They will be logged at <literal>LOG</literal> message level
>> > and
>> > +        will appear in the server log based on the log
>> > +        configuration set (See <xref
>> > linkend="runtime-config-logging"/>
>> 
>> Actually this explanation is almost the same as that of
>> pg_log_backend_memory_contexts().
>> Do you think we should change both of them?
>> I think it may be too detailed but accurate.
> 
> I withdraw my comment. We can keep the explanation similar to
> pg_log_backend_memory_contexts as it was agreed upon and committed
> text. If the wordings are similar, then it will be easier for users to
> understand the documentation.
> 
>> > 5) Instead of just showing the true return value of the function
>> > pg_log_current_plan in the sql test, which just shows that the signal
>> > is sent, but it doesn't mean that the backend has processed that
>> > signal and logged the plan. I think we can add the test using TAP
>> > framework, one
>> 
>> I once made a tap test for pg_log_backend_memory_contexts(), but it
>> seemed an overkill and we agreed that it was appropriate just ensuring
>> the function working as below discussion.
>> 
>> https://www.postgresql.org/message-id/bbecd722d4f8e261b350186ac4bf68a7%40oss.nttdata.com
> 
> Okay. I withdraw my comment.
> 
>> > 6) Do we unnecessarily need to signal the processes such as autovacuum
>> > launcher/workers, logical replication launcher/workers, wal senders,
>> > wal receivers and so on. only to emit a "PID %d is not executing
>> > queries now" message? Moreover, those processes will be waiting in
>> > loops for timeouts to occur, then as soon as they wake up do they need
>> > to process this extra uninformative signal?
>> > We could choose to not signal those processes at all which might or
>> > might not be possible.
>> > Otherwise, we could just emit messages like "XXXX process cannot run a
>> > query" in ProcessInterrupts.
>> 
>> Agreed.
>> 
>> However it needs to distinguish backends which can execute queries and
>> other processes such as autovacuum launcher, I don't come up with
>> easy ways to do so.
>> I'm going to think about it.
> 
> I'm not sure if there is any information in the shared memory
> accessible to all the backends/sessions that can say a PID is
> autovacuum launcher/worker, logical replication launcher/worker or any
> other background or parallel worker.

As far as I looked around, there seems no easy ways to do so.

> If we were to invent a new
> mechanism just for addressing the above comment, I would rather choose
> to not do that as it seems like an overkill. We can leave it up to the
> user whether or not to unnecessarily signal those processes which are
> bound to print "PID XXX is not executing queries now" message.

+1. I'm going to proceed in this direction.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

22 June 2021, 02:30:31

On 2021-06-16 20:36, torikoshia wrote:
>> other background or parallel worker.
> 
> As far as I looked around, there seems no easy ways to do so.
> 
>> If we were to invent a new
>> mechanism just for addressing the above comment, I would rather choose
>> to not do that as it seems like an overkill. We can leave it up to the
>> user whether or not to unnecessarily signal those processes which are
>> bound to print "PID XXX is not executing queries now" message.
> 
> +1. I'm going to proceed in this direction.

Updated the patch.


On Thu, Jun 10, 2021 at 11:09 AM torikoshia <torikoshia@oss.nttdata.com> 
wrote:
> On 2021-06-09 23:04, Fujii Masao wrote:

> > auto_explain can log the plan of even nested statement
> > if auto_explain.log_nested_statements is enabled.
> > But ISTM that pg_log_current_plan() cannot log that plan.
> > Is this intentional?
> 
> > I think that it's better to make pg_log_current_plan() log
> > the plan of even nested statement.
> 
> +1. It would be better.
> But currently plan information is got from ActivePortal and ISTM there
> are no easy way to retrieve plan information of nested statements from
> ActivePortal.
> Anyway I'll do some more research.

I haven't found a proper way yet but it seems necessary to use something 
other than ActivePortal and I'm now thinking this could be a separate 
patch in the future.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v4-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

Fujii Masao

Date:

01 July 2021, 06:34:26


On 2021/06/22 11:30, torikoshia wrote:
> On Thu, Jun 10, 2021 at 11:09 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>> On 2021-06-09 23:04, Fujii Masao wrote:
> 
>> > auto_explain can log the plan of even nested statement
>> > if auto_explain.log_nested_statements is enabled.
>> > But ISTM that pg_log_current_plan() cannot log that plan.
>> > Is this intentional?
>>
>> > I think that it's better to make pg_log_current_plan() log
>> > the plan of even nested statement.
>>
>> +1. It would be better.
>> But currently plan information is got from ActivePortal and ISTM there
>> are no easy way to retrieve plan information of nested statements from
>> ActivePortal.
>> Anyway I'll do some more research.
> 
> I haven't found a proper way yet but it seems necessary to use something other than ActivePortal and I'm now thinking
thiscould be a separate patch in the future. 
 

     DO $$
     BEGIN
     PERFORM pg_sleep(100);
     END$$;

When I called pg_log_current_query_plan() to send the signal to
the backend executing the above query, I got the following log message.
I think that this is not expected message. I guess this issue happened
because the information about query text and plan is retrieved
from ActivePortal. If this understanding is right, ISTM that we should
implement new mechanism so that we can retrieve those information
even while nested query is being executed.

     LOG:  backend with PID 42449 is not running a query

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

Bharath Rupireddy

Date:

02 July 2021, 14:21:38

On Tue, Jun 22, 2021 at 8:00 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
> Updated the patch.

Thanks for the patch. Here are some comments on the v4 patch:

1) Can we do + ExplainState *es = NewExplainState(); and es
assignments after if (!ActivePortal || !ActivePortal->queryDesc), just
to avoid unnecessary call in case of error hit? Also note that, we can
easily hit the error case.

2) It looks like there's an improper indentation. MyProcPid and
es->str->data, should start from the ".
+ ereport(LOG_SERVER_ONLY,
+ (errmsg("backend with PID %d is not running a query",
+ MyProcPid)));

+ ereport(LOG_SERVER_ONLY,
+ (errmsg("plan of the query running on backend with PID %d is:\n%s",
+ MyProcPid,
+ es->str->data),
For reference see errmsg("unrecognized value for EXPLAIN option \"%s\": \"%s\"",

3)I prefer to do this so that any new piece of code can be introduced
in between easily and it will be more readable as well.
+Datum
+pg_log_current_query_plan(PG_FUNCTION_ARGS)
+{
+ pid_t pid;
+ bool result;
+
+ pid = PG_GETARG_INT32(0);
+ result = SendProcSignalForLogInfo(pid, PROCSIG_LOG_CURRENT_PLAN);
+
+ PG_RETURN_BOOL(result);
+}
If okay, please also change for the pg_log_backend_memory_contexts.

4) Extra whitespace before the second line i.e. 2nd line reason should
be aligned with the 1st line reason.
+ Assert(reason == PROCSIG_LOG_MEMORY_CONTEXT ||
+ reason == PROCSIG_LOG_CURRENT_PLAN);

5) How about "Requests to log the plan of the query currently running
on the backend with specified process ID along with the untruncated
query string"?
+        Requests to log the untruncated query string and its plan for
+        the query currently running on the backend with the specified
+        process ID.

6) A typo: it is "nested statements (..) are not"
+    Note that nested statements (statements executed inside a function) is not

7) I'm not sure what you mean by "Some functions output what you want
to the log."
--- Memory contexts are logged and they are not returned to the function.
+-- Some functions output what you want to the log.
Instead, can we say "These functions return true if the specified
backend is successfully signaled, otherwise false. Upon receiving the
signal, the backend will log the information to the server log."

Regards,
Bharath Rupireddy.

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

09 July 2021, 05:05:16

On 2021-07-02 23:21, Bharath Rupireddy wrote:
> On Tue, Jun 22, 2021 at 8:00 AM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> Updated the patch.
> 
> Thanks for the patch. Here are some comments on the v4 patch:

Thanks for your comments and suggestions!
I agree with you and updated the patch.

On Thu, Jul 1, 2021 at 3:34 PM Fujii Masao <masao.fujii@oss.nttdata.com> 
wrote:

>      DO $$
>      BEGIN
>      PERFORM pg_sleep(100);
>      END$$;
> 
> When I called pg_log_current_query_plan() to send the signal to
> the backend executing the above query, I got the following log message.
> I think that this is not expected message. I guess this issue happened
> because the information about query text and plan is retrieved
> from ActivePortal. If this understanding is right, ISTM that we should
> implement new mechanism so that we can retrieve those information
> even while nested query is being executed.

I'm now working on this comment.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v5-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

Masahiro Ikeda

Date:

13 July 2021, 14:11:37

On Tue, Jun 22, 2021 at 8:00 AM torikoshia <torikoshia@oss.nttdata.com> 
wrote:
> Updated the patch.

Hi, torikoshi-san

Thanks for your great work! I'd like to use this feature in v15.
I confirmed that it works with queries I tried and make check-world has 
no error.

When I tried this feature, I realized two things. So, I share them.

(1) About output contents

> The format of the query plan is the same as when <literal>FORMAT 
> TEXT</literal>
> and <literal>VEBOSE</literal> are used in the 
> <command>EXPLAIN</command> command.
> For example:

I think the above needs to add COSTS and SETTINGS options too, and it's 
better to use an
example which the SETTINGS option works like the following.

```
2021-07-13 21:59:56 JST 69757 [client backend] LOG:  plan of the query 
running on backend with PID 69757 is:
         Query Text: PREPARE query2 AS SELECT COUNT(*) FROM 
pgbench_accounts t1, pgbench_accounts t2;
         Aggregate  (cost=3750027242.84..3750027242.86 rows=1 width=8)
           Output: count(*)
           ->  Nested Loop  (cost=0.84..3125027242.84 rows=250000000000 
width=0)
                 ->  Index Only Scan using pgbench_accounts_pkey on 
public.pgbench_accounts t1  (cost=0.42..12996.42 rows=500000 width=0)
                       Output: t1.aid
                 ->  Materialize  (cost=0.42..15496.42 rows=500000 
width=0)
                       ->  Index Only Scan using pgbench_accounts_pkey on 
public.pgbench_accounts t2  (cost=0.42..12996.42 rows=500000 width=0)
         Settings: effective_cache_size = '8GB', work_mem = '16MB'
```

(2) About EXPLAIN "BUFFER" option

When I checked EXPLAIN option, I found there is another option "BUFFER" 
which can be
used without the "ANALYZE" option.

I'm not sure it's useful because your target use-case is analyzing a 
long-running query,
not its planning phase. If so, the planning buffer usage is not so much 
useful. But, since
the overhead to output buffer usages is not high and it's used for 
debugging use cases,
I wonder it's not a bad idea to output buffer usages too. Thought?

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

19 July 2021, 02:28:40

On Tue, Jul 13, 2021 at 11:11 PM Masahiro Ikeda 
<ikedamsh@oss.nttdata.com> wrote:

> When I tried this feature, I realized two things. So, I share them.

Thanks for your review!

> (1) About output contents
> 
> > The format of the query plan is the same as when <literal>FORMAT
> > TEXT</literal>
> > and <literal>VEBOSE</literal> are used in the
> > <command>EXPLAIN</command> command.
> > For example:

> I think the above needs to add COSTS and SETTINGS options too, and it's
> better to use an
> example which the SETTINGS option works like the following.

Agreed. Updated the patch.

> (2) About EXPLAIN "BUFFER" option
> 
> When I checked EXPLAIN option, I found there is another option "BUFFER"
> which can be
> used without the "ANALYZE" option.
> 
> I'm not sure it's useful because your target use-case is analyzing a
> long-running query,
> not its planning phase. If so, the planning buffer usage is not so much
> useful. But, since
> the overhead to output buffer usages is not high and it's used for
> debugging use cases,
> I wonder it's not a bad idea to output buffer usages too. Thought?

As you pointed out, I also think it would be useful when queries are 
taking a long time in the planning phase.
However, as far as I read ExplainOneQuery(), the buffer usages in the 
planner phase are not retrieved by default. They are retrieved only when 
BUFFERS is specified in the EXPLAIN.

If we change it to always get the buffer usages and expose them as a 
global variable, we can get them through pg_log_current_plan(), but I 
think it doesn't pay.

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v6-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

Fujii Masao

Date:

19 July 2021, 06:07:48


On 2021/07/19 11:28, torikoshia wrote:
> Agreed. Updated the patch.

Thanks for updating the patch!

+bool
+SendProcSignalForLogInfo(pid_t pid, ProcSignalReason reason)

I don't think that procsignal.c is proper place to check the permission and
check whether the specified PID indicates a PostgreSQL server process, etc
because procsignal.c just provides fundamental routines for interprocess
signaling. Isn't it better to move the function to signalfuncs.c or elsewhere?


+    ExplainQueryText(es, ActivePortal->queryDesc);
+    ExplainPrintPlan(es, ActivePortal->queryDesc);
+    ExplainPrintJITSummary(es, ActivePortal->queryDesc);

When text format is used, ExplainBeginOutput() and ExplainEndOutput()
do nothing. So (I guess) you thought that they don't need to be called and
implemented the code in that way. But IMO it's better to comment
why they don't need to be called, or to just call both of them
even if they do nothing in text format.


+    ExplainPrintJITSummary(es, ActivePortal->queryDesc);

It's better to check es->costs before calling this function,
like explain_ExecutorEnd() and ExplainOnePlan() do?


+    result = SendProcSignalForLogInfo(pid, PROCSIG_LOG_CURRENT_PLAN);
+
+    PG_RETURN_BOOL(result);

Currently SendProcSignalForLogInfo() calls PG_RETURN_BOOL() in some cases,
but instead it should just return true/false because pg_log_current_query_plan()
expects that?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

Fujii Masao

Date:

27 July 2021, 18:34:44


On 2021/07/09 14:05, torikoshia wrote:
> On 2021-07-02 23:21, Bharath Rupireddy wrote:
>> On Tue, Jun 22, 2021 at 8:00 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>>> Updated the patch.
>>
>> Thanks for the patch. Here are some comments on the v4 patch:
> 
> Thanks for your comments and suggestions!
> I agree with you and updated the patch.
> 
> On Thu, Jul 1, 2021 at 3:34 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> 
>>      DO $$
>>      BEGIN
>>      PERFORM pg_sleep(100);
>>      END$$;
>>
>> When I called pg_log_current_query_plan() to send the signal to
>> the backend executing the above query, I got the following log message.
>> I think that this is not expected message. I guess this issue happened
>> because the information about query text and plan is retrieved
>> from ActivePortal. If this understanding is right, ISTM that we should
>> implement new mechanism so that we can retrieve those information
>> even while nested query is being executed.
> 
> I'm now working on this comment.

One idea is to define new global pointer, e.g., "QueryDesc *ActiveQueryDesc;".
This global pointer is set to queryDesc in ExecutorRun()
(also maybe ExecutorStart()). And this is reset to NULL in ExecutorEnd() and
when an error is thrown. Then ProcessLogCurrentPlanInterrupt() can
get the plan of the currently running query from that global pointer
instead of ActivePortal, and log it. Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

Pavel Stehule

Date:

27 July 2021, 18:45:34

út 27. 7. 2021 v 20:34 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com> napsal:

On 2021/07/09 14:05, torikoshia wrote:
> On 2021-07-02 23:21, Bharath Rupireddy wrote:
>> On Tue, Jun 22, 2021 at 8:00 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>>> Updated the patch.
>>
>> Thanks for the patch. Here are some comments on the v4 patch:
>
> Thanks for your comments and suggestions!
> I agree with you and updated the patch.
>
> On Thu, Jul 1, 2021 at 3:34 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>>      DO $$
>>      BEGIN
>>      PERFORM pg_sleep(100);
>>      END$$;
>>
>> When I called pg_log_current_query_plan() to send the signal to
>> the backend executing the above query, I got the following log message.
>> I think that this is not expected message. I guess this issue happened
>> because the information about query text and plan is retrieved
>> from ActivePortal. If this understanding is right, ISTM that we should
>> implement new mechanism so that we can retrieve those information
>> even while nested query is being executed.
>
> I'm now working on this comment.

One idea is to define new global pointer, e.g., "QueryDesc *ActiveQueryDesc;".
This global pointer is set to queryDesc in ExecutorRun()
(also maybe ExecutorStart()). And this is reset to NULL in ExecutorEnd() and
when an error is thrown. Then ProcessLogCurrentPlanInterrupt() can
get the plan of the currently running query from that global pointer
instead of ActivePortal, and log it. Thought?

It cannot work - there can be a lot of nested queries, and at the end you cannot reset to null, but you should return back pointer to outer query.

Regards

Pavel

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

28 July 2021, 11:44:00

On 2021-07-28 03:45, Pavel Stehule wrote:
> út 27. 7. 2021 v 20:34 odesílatel Fujii Masao
> <masao.fujii@oss.nttdata.com> napsal:
> 
>> On 2021/07/09 14:05, torikoshia wrote:
>>> On 2021-07-02 23:21, Bharath Rupireddy wrote:
>>>> On Tue, Jun 22, 2021 at 8:00 AM torikoshia
>> <torikoshia@oss.nttdata.com> wrote:
>>>>> Updated the patch.
>>>> 
>>>> Thanks for the patch. Here are some comments on the v4 patch:
>>> 
>>> Thanks for your comments and suggestions!
>>> I agree with you and updated the patch.
>>> 
>>> On Thu, Jul 1, 2021 at 3:34 PM Fujii Masao
>> <masao.fujii@oss.nttdata.com> wrote:
>>> 
>>>> DO $$
>>>> BEGIN
>>>> PERFORM pg_sleep(100);
>>>> END$$;
>>>> 
>>>> When I called pg_log_current_query_plan() to send the signal to
>>>> the backend executing the above query, I got the following log
>> message.
>>>> I think that this is not expected message. I guess this issue
>> happened
>>>> because the information about query text and plan is retrieved
>>>> from ActivePortal. If this understanding is right, ISTM that we
>> should
>>>> implement new mechanism so that we can retrieve those information
>>>> even while nested query is being executed.
>>> 
>>> I'm now working on this comment.
>> 
>> One idea is to define new global pointer, e.g., "QueryDesc
>> *ActiveQueryDesc;".
>> This global pointer is set to queryDesc in ExecutorRun()
>> (also maybe ExecutorStart()). And this is reset to NULL in
>> ExecutorEnd() and
>> when an error is thrown. Then ProcessLogCurrentPlanInterrupt() can
>> get the plan of the currently running query from that global pointer
>> instead of ActivePortal, and log it. Thought?
> 
> It cannot work - there can be a lot of nested queries, and at the end
> you cannot reset to null, but you should return back pointer to outer
> query.

Thanks for your comment!

I'm wondering if we can avoid this problem by saving one outer level 
QueryDesc in addition to the current one.
I'm going to try it.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

10 August 2021, 12:22:49

On 2021-07-28 20:44, torikoshia wrote:
> On 2021-07-28 03:45, Pavel Stehule wrote:
>> út 27. 7. 2021 v 20:34 odesílatel Fujii Masao
>> <masao.fujii@oss.nttdata.com> napsal:
>> 
>>> On 2021/07/09 14:05, torikoshia wrote:
>>>> On 2021-07-02 23:21, Bharath Rupireddy wrote:
>>>>> On Tue, Jun 22, 2021 at 8:00 AM torikoshia
>>> <torikoshia@oss.nttdata.com> wrote:
>>>>>> Updated the patch.
>>>>> 
>>>>> Thanks for the patch. Here are some comments on the v4 patch:
>>>> 
>>>> Thanks for your comments and suggestions!
>>>> I agree with you and updated the patch.
>>>> 
>>>> On Thu, Jul 1, 2021 at 3:34 PM Fujii Masao
>>> <masao.fujii@oss.nttdata.com> wrote:
>>>> 
>>>>> DO $$
>>>>> BEGIN
>>>>> PERFORM pg_sleep(100);
>>>>> END$$;
>>>>> 
>>>>> When I called pg_log_current_query_plan() to send the signal to
>>>>> the backend executing the above query, I got the following log
>>> message.
>>>>> I think that this is not expected message. I guess this issue
>>> happened
>>>>> because the information about query text and plan is retrieved
>>>>> from ActivePortal. If this understanding is right, ISTM that we
>>> should
>>>>> implement new mechanism so that we can retrieve those information
>>>>> even while nested query is being executed.
>>>> 
>>>> I'm now working on this comment.
>>> 
>>> One idea is to define new global pointer, e.g., "QueryDesc
>>> *ActiveQueryDesc;".
>>> This global pointer is set to queryDesc in ExecutorRun()
>>> (also maybe ExecutorStart()). And this is reset to NULL in
>>> ExecutorEnd() and
>>> when an error is thrown. Then ProcessLogCurrentPlanInterrupt() can
>>> get the plan of the currently running query from that global pointer
>>> instead of ActivePortal, and log it. Thought?
>> 
>> It cannot work - there can be a lot of nested queries, and at the end
>> you cannot reset to null, but you should return back pointer to outer
>> query.
> 
> Thanks for your comment!
> 
> I'm wondering if we can avoid this problem by saving one outer level
> QueryDesc in addition to the current one.
> I'm going to try it.

I have updated the patch in this way.

In this patch, getting the plan to the DO statement is as follows.

---------------------------------
   (pid:76608)=# DO $$
    BEGIN
    PERFORM pg_sleep(15);
    END$$;

   (pid:74482)=# SELECT pg_log_current_query_plan(76608);

   LOG: 00000: plan of the query running on backend with PID 76608 is:
         Query Text: SELECT pg_sleep(15)
         Result  (cost=0.00..0.01 rows=1 width=4)
           Output: pg_sleep('15'::double precision)

    -- pid:76608 finished DO statement:
   (pid:74482)=# SELECT pg_log_current_query_plan(76608);

   LOG:  00000: backend with PID 76608 is not running a query
---------------------------------

Any thoughts?

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v7-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

Fujii Masao

Date:

10 August 2021, 15:21:11


On 2021/08/10 21:22, torikoshia wrote:
> I have updated the patch in this way.

Thanks for updating the patch!


> In this patch, getting the plan to the DO statement is as follows.

Looks good to me.


> Any thoughts?

+    ereport(LOG_SERVER_ONLY,
+            (errmsg("plan of the query running on backend with PID %d is:\n%s",
+                    MyProcPid, es->str->data),
+             errhidestmt(true)));

Shouldn't we hide context information by calling errhidecontext(true)?



While "make installcheck" regression test was running, I repeated
executing pg_log_current_query_plan() and got the failure of join_hash test
with the following diff. This means that pg_log_current_query_plan() could
cause the query that should be completed successfully to fail with the error.
Isn't this a bug?

I *guess* that the cause of this issue is that ExplainNode() can call
InstrEndLoop() more than once unexpectedly.

  ------------------------------------------------------------------------------
  $$
    select count(*) from simple r join simple s using (id);
  $$);
- initially_multibatch | increased_batches
-----------------------+-------------------
- f                    | f
-(1 row)
-
+ERROR:  InstrEndLoop called on running node
+CONTEXT:  PL/pgSQL function hash_join_batches(text) line 6 at FOR over EXECUTE statement
  rollback to settings;
  -- parallel with parallel-oblivious hash join
  savepoint settings;
@@ -687,11 +684,9 @@
      left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss
      on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
  $$);
- multibatch
-------------
- t
-(1 row)
-
+ERROR:  InstrEndLoop called on running node
+CONTEXT:  parallel worker
+PL/pgSQL function hash_join_batches(text) line 6 at FOR over EXECUTE statement
  rollback to settings;
  -- single-batch with rescan, parallel-aware
  savepoint settings;
  ------------------------------------------------------------------------------

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

11 August 2021, 12:14:05

On 2021-08-11 00:21, Fujii Masao wrote:

> On 2021/08/10 21:22, torikoshia wrote:
>> I have updated the patch in this way.
> 
> Thanks for updating the patch!
> 
> 
>> In this patch, getting the plan to the DO statement is as follows.
> 
> Looks good to me.
> 
> 
>> Any thoughts?
> 
> +    ereport(LOG_SERVER_ONLY,
> +            (errmsg("plan of the query running on backend with PID %d is:\n%s",
> +                    MyProcPid, es->str->data),
> +             errhidestmt(true)));
> 
> Shouldn't we hide context information by calling errhidecontext(true)?

Agreed.

> While "make installcheck" regression test was running, I repeated
> executing pg_log_current_query_plan() and got the failure of join_hash 
> test
> with the following diff. This means that pg_log_current_query_plan() 
> could
> cause the query that should be completed successfully to fail with the 
> error.
> Isn't this a bug?

Thanks for finding the bug.
I also reproduced it.

> I *guess* that the cause of this issue is that ExplainNode() can call
> InstrEndLoop() more than once unexpectedly.

As far as I looked into, pg_log_current_plan() can call InstrEndLoop() 
through ExplainNode().
I added a flag to ExplainState to avoid calling InstrEndLoop() when 
ExplainNode() is called from pg_log_current_plan().

> 
>  
> ------------------------------------------------------------------------------
>  $$
>    select count(*) from simple r join simple s using (id);
>  $$);
> - initially_multibatch | increased_batches
> -----------------------+-------------------
> - f                    | f
> -(1 row)
> -
> +ERROR:  InstrEndLoop called on running node
> +CONTEXT:  PL/pgSQL function hash_join_batches(text) line 6 at FOR
> over EXECUTE statement
>  rollback to settings;
>  -- parallel with parallel-oblivious hash join
>  savepoint settings;
> @@ -687,11 +684,9 @@
>      left join (select b1.id, b1.t from join_bar b1 join join_bar b2
> using (id)) ss
>      on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
>  $$);
> - multibatch
> -------------
> - t
> -(1 row)
> -
> +ERROR:  InstrEndLoop called on running node
> +CONTEXT:  parallel worker
> +PL/pgSQL function hash_join_batches(text) line 6 at FOR over EXECUTE 
> statement
>  rollback to settings;
>  -- single-batch with rescan, parallel-aware
>  savepoint settings;
>  
> ------------------------------------------------------------------------------
> 
> Regards,

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v8-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

Fujii Masao

Date:

19 August 2021, 16:12:11


On 2021/08/11 21:14, torikoshia wrote:
> As far as I looked into, pg_log_current_plan() can call InstrEndLoop() through ExplainNode().
> I added a flag to ExplainState to avoid calling InstrEndLoop() when ExplainNode() is called from
pg_log_current_plan().

Thanks for updating the patch!
I tried to test the patch again and encountered two issues.

(1)
The following WITH RECURSIVE query failed with the error
"ERROR:  failed to find plan for CTE sg" when I ran
pg_log_current_query_plan() against the backend executing that query.
Is this a bug?

     create table graph0( f int, t int, label text );
     insert into graph0 values (1, 2, 'arc 1 -> 2'),(1, 3, 'arc 1 -> 3'),(2, 3, 'arc 2 -> 3'),(1, 4, 'arc 1 -> 4'),(4,
5,'arc 4 -> 5');
 

     with recursive search_graph(f, t, label, i) as (
         select *, 1||pg_sleep(1)::text from graph0 g
         union distinct
         select g.*,1||pg_sleep(1)::text
         from graph0 g, search_graph sg
        where g.f = sg.t
     ) search breadth first by f, t set seq
     select * from search_graph order by seq;


(2)
When I ran pg_log_current_query_plan() while "make installcheck" test
was running, I got the following assertion failure.

TRAP: FailedAssertion("!IsPageLockHeld || (locktag->locktag_type == LOCKTAG_RELATION_EXTEND)", File: "lock.c", Line:
894,PID: 61512)
 

0   postgres                            0x000000010ec23557 ExceptionalCondition + 231
1   postgres                            0x000000010e9eff15 LockAcquireExtended + 1461
2   postgres                            0x000000010e9ed14d LockRelationOid + 61
3   postgres                            0x000000010e41251b relation_open + 91
4   postgres                            0x000000010e509679 table_open + 25
5   postgres                            0x000000010ebf9462 SearchCatCacheMiss + 274
6   postgres                            0x000000010ebf5979 SearchCatCacheInternal + 761
7   postgres                            0x000000010ebf566c SearchCatCache + 60
8   postgres                            0x000000010ec1a9e0 SearchSysCache + 144
9   postgres                            0x000000010ec1ae03 SearchSysCacheExists + 51
10  postgres                            0x000000010e58ce35 TypeIsVisible + 437
11  postgres                            0x000000010ea98e4c format_type_extended + 1964
12  postgres                            0x000000010ea9900e format_type_with_typemod + 30
13  postgres                            0x000000010eb78d76 get_const_expr + 742
14  postgres                            0x000000010eb79bc8 get_rule_expr + 232
15  postgres                            0x000000010eb8140f get_func_expr + 1247
16  postgres                            0x000000010eb79dcd get_rule_expr + 749
17  postgres                            0x000000010eb81688 get_rule_expr_paren + 136
18  postgres                            0x000000010eb7bf38 get_rule_expr + 9304
19  postgres                            0x000000010eb72ad5 deparse_expression_pretty + 149
20  postgres                            0x000000010eb73463 deparse_expression + 83
21  postgres                            0x000000010e68eaf1 show_plan_tlist + 353
22  postgres                            0x000000010e68adaf ExplainNode + 4991
23  postgres                            0x000000010e688b4b ExplainPrintPlan + 283
24  postgres                            0x000000010e68e1aa ProcessLogCurrentPlanInterrupt + 266
25  postgres                            0x000000010ea133bb ProcessInterrupts + 3435
26  postgres                            0x000000010e738c97 vacuum_delay_point + 55
27  postgres                            0x000000010e42bb4b ginInsertCleanup + 1531
28  postgres                            0x000000010e42d418 gin_clean_pending_list + 776
29  postgres                            0x000000010e74955a ExecInterpExpr + 2522
30  postgres                            0x000000010e7487e2 ExecInterpExprStillValid + 82
31  postgres                            0x000000010e7ae83b ExecEvalExprSwitchContext + 59
32  postgres                            0x000000010e7ae7be ExecProject + 78
33  postgres                            0x000000010e7ae4e9 ExecResult + 345
34  postgres                            0x000000010e764e42 ExecProcNodeFirst + 82
35  postgres                            0x000000010e75ccb2 ExecProcNode + 50
36  postgres                            0x000000010e758301 ExecutePlan + 193
37  postgres                            0x000000010e7581d1 standard_ExecutorRun + 609
38  auto_explain.so                     0x000000010f1df3a7 explain_ExecutorRun + 247
39  postgres                            0x000000010e757f3b ExecutorRun + 91
40  postgres                            0x000000010ea1cb49 PortalRunSelect + 313
41  postgres                            0x000000010ea1c4dd PortalRun + 861
42  postgres                            0x000000010ea17474 exec_simple_query + 1540
43  postgres                            0x000000010ea164d4 PostgresMain + 2580
44  postgres                            0x000000010e91d159 BackendRun + 89
45  postgres                            0x000000010e91c6a5 BackendStartup + 565
46  postgres                            0x000000010e91b3fe ServerLoop + 638
47  postgres                            0x000000010e918b9d PostmasterMain + 6717
48  postgres                            0x000000010e7efd43 main + 819
49  libdyld.dylib                       0x00007fff6a46e3d5 start + 1
50  ???                                 0x0000000000000003 0x0 + 3

LOG:  server process (PID 61512) was terminated by signal 6: Abort trap: 6
DETAIL:  Failed process was running: select gin_clean_pending_list('t_gin_test_tbl_i_j_idx') is not null;

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

07 September 2021, 03:39:46

On 2021-08-20 01:12, Fujii Masao wrote:
> On 2021/08/11 21:14, torikoshia wrote:
>> As far as I looked into, pg_log_current_plan() can call InstrEndLoop() 
>> through ExplainNode().
>> I added a flag to ExplainState to avoid calling InstrEndLoop() when 
>> ExplainNode() is called from pg_log_current_plan().
> 
> Thanks for updating the patch!
> I tried to test the patch again and encountered two issues.

Thanks for finding these issues!

> 
> (1)
> The following WITH RECURSIVE query failed with the error
> "ERROR:  failed to find plan for CTE sg" when I ran
> pg_log_current_query_plan() against the backend executing that query.
> Is this a bug?
> 
>     create table graph0( f int, t int, label text );
>     insert into graph0 values (1, 2, 'arc 1 -> 2'),(1, 3, 'arc 1 ->
> 3'),(2, 3, 'arc 2 -> 3'),(1, 4, 'arc 1 -> 4'),(4, 5, 'arc 4 -> 5');
> 
>     with recursive search_graph(f, t, label, i) as (
>         select *, 1||pg_sleep(1)::text from graph0 g
>         union distinct
>         select g.*,1||pg_sleep(1)::text
>         from graph0 g, search_graph sg
>        where g.f = sg.t
>     ) search breadth first by f, t set seq
>     select * from search_graph order by seq;

This ERROR occurred without applying the patch and just calling 
EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST.

I'm going to make another thread to discuss it.

> (2)
> When I ran pg_log_current_query_plan() while "make installcheck" test
> was running, I got the following assertion failure.
> 
> TRAP: FailedAssertion("!IsPageLockHeld || (locktag->locktag_type ==
> LOCKTAG_RELATION_EXTEND)", File: "lock.c", Line: 894, PID: 61512)
> 
> 0   postgres                            0x000000010ec23557
> ExceptionalCondition + 231
> 1   postgres                            0x000000010e9eff15
> LockAcquireExtended + 1461
> 2   postgres                            0x000000010e9ed14d 
> LockRelationOid + 61
> 3   postgres                            0x000000010e41251b 
> relation_open + 91
> 4   postgres                            0x000000010e509679 table_open + 
> 25
> 5   postgres                            0x000000010ebf9462
> SearchCatCacheMiss + 274
> 6   postgres                            0x000000010ebf5979
> SearchCatCacheInternal + 761
> 7   postgres                            0x000000010ebf566c 
> SearchCatCache + 60
> 8   postgres                            0x000000010ec1a9e0 
> SearchSysCache + 144
> 9   postgres                            0x000000010ec1ae03
> SearchSysCacheExists + 51
> 10  postgres                            0x000000010e58ce35 
> TypeIsVisible + 437
> 11  postgres                            0x000000010ea98e4c
> format_type_extended + 1964
> 12  postgres                            0x000000010ea9900e
> format_type_with_typemod + 30
> 13  postgres                            0x000000010eb78d76 
> get_const_expr + 742
> 14  postgres                            0x000000010eb79bc8 
> get_rule_expr + 232
> 15  postgres                            0x000000010eb8140f 
> get_func_expr + 1247
> 16  postgres                            0x000000010eb79dcd 
> get_rule_expr + 749
> 17  postgres                            0x000000010eb81688
> get_rule_expr_paren + 136
> 18  postgres                            0x000000010eb7bf38 
> get_rule_expr + 9304
> 19  postgres                            0x000000010eb72ad5
> deparse_expression_pretty + 149
> 20  postgres                            0x000000010eb73463
> deparse_expression + 83
> 21  postgres                            0x000000010e68eaf1 
> show_plan_tlist + 353
> 22  postgres                            0x000000010e68adaf ExplainNode 
> + 4991
> 23  postgres                            0x000000010e688b4b
> ExplainPrintPlan + 283
> 24  postgres                            0x000000010e68e1aa
> ProcessLogCurrentPlanInterrupt + 266
> 25  postgres                            0x000000010ea133bb
> ProcessInterrupts + 3435
> 26  postgres                            0x000000010e738c97
> vacuum_delay_point + 55
> 27  postgres                            0x000000010e42bb4b
> ginInsertCleanup + 1531
> 28  postgres                            0x000000010e42d418
> gin_clean_pending_list + 776
> 29  postgres                            0x000000010e74955a 
> ExecInterpExpr + 2522
> 30  postgres                            0x000000010e7487e2
> ExecInterpExprStillValid + 82
> 31  postgres                            0x000000010e7ae83b
> ExecEvalExprSwitchContext + 59
> 32  postgres                            0x000000010e7ae7be ExecProject 
> + 78
> 33  postgres                            0x000000010e7ae4e9 ExecResult + 
> 345
> 34  postgres                            0x000000010e764e42
> ExecProcNodeFirst + 82
> 35  postgres                            0x000000010e75ccb2 ExecProcNode 
> + 50
> 36  postgres                            0x000000010e758301 ExecutePlan 
> + 193
> 37  postgres                            0x000000010e7581d1
> standard_ExecutorRun + 609
> 38  auto_explain.so                     0x000000010f1df3a7
> explain_ExecutorRun + 247
> 39  postgres                            0x000000010e757f3b ExecutorRun 
> + 91
> 40  postgres                            0x000000010ea1cb49 
> PortalRunSelect + 313
> 41  postgres                            0x000000010ea1c4dd PortalRun + 
> 861
> 42  postgres                            0x000000010ea17474
> exec_simple_query + 1540
> 43  postgres                            0x000000010ea164d4 PostgresMain 
> + 2580
> 44  postgres                            0x000000010e91d159 BackendRun + 
> 89
> 45  postgres                            0x000000010e91c6a5 
> BackendStartup + 565
> 46  postgres                            0x000000010e91b3fe ServerLoop + 
> 638
> 47  postgres                            0x000000010e918b9d 
> PostmasterMain + 6717
> 48  postgres                            0x000000010e7efd43 main + 819
> 49  libdyld.dylib                       0x00007fff6a46e3d5 start + 1
> 50  ???                                 0x0000000000000003 0x0 + 3
> 
> LOG:  server process (PID 61512) was terminated by signal 6: Abort 
> trap: 6
> DETAIL:  Failed process was running: select
> gin_clean_pending_list('t_gin_test_tbl_i_j_idx') is not null;

As far as I understand, since explaining plans can acquire heavyweight 
lock for example to get column names, when page lock is held at the time 
of the interrupt, this assertion error occurs.

The attached patch tries to avoid this by checking each LocalLock entry 
and when finding even one, giving up logging the plan.

Thoughts?


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v9-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

08 September 2021, 12:06:26

On 2021-09-07 12:39, torikoshia wrote:
> On 2021-08-20 01:12, Fujii Masao wrote:
>> On 2021/08/11 21:14, torikoshia wrote:
>>> As far as I looked into, pg_log_current_plan() can call 
>>> InstrEndLoop() through ExplainNode().
>>> I added a flag to ExplainState to avoid calling InstrEndLoop() when 
>>> ExplainNode() is called from pg_log_current_plan().
>> 
>> Thanks for updating the patch!
>> I tried to test the patch again and encountered two issues.
> 
> Thanks for finding these issues!
> 
>> 
>> (1)
>> The following WITH RECURSIVE query failed with the error
>> "ERROR:  failed to find plan for CTE sg" when I ran
>> pg_log_current_query_plan() against the backend executing that query.
>> Is this a bug?
>> 
>>     create table graph0( f int, t int, label text );
>>     insert into graph0 values (1, 2, 'arc 1 -> 2'),(1, 3, 'arc 1 ->
>> 3'),(2, 3, 'arc 2 -> 3'),(1, 4, 'arc 1 -> 4'),(4, 5, 'arc 4 -> 5');
>> 
>>     with recursive search_graph(f, t, label, i) as (
>>         select *, 1||pg_sleep(1)::text from graph0 g
>>         union distinct
>>         select g.*,1||pg_sleep(1)::text
>>         from graph0 g, search_graph sg
>>        where g.f = sg.t
>>     ) search breadth first by f, t set seq
>>     select * from search_graph order by seq;
> 
> This ERROR occurred without applying the patch and just calling
> EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST.
> 
> I'm going to make another thread to discuss it.
> 
>> (2)
>> When I ran pg_log_current_query_plan() while "make installcheck" test
>> was running, I got the following assertion failure.
>> 
>> TRAP: FailedAssertion("!IsPageLockHeld || (locktag->locktag_type ==
>> LOCKTAG_RELATION_EXTEND)", File: "lock.c", Line: 894, PID: 61512)
>> 
>> 0   postgres                            0x000000010ec23557
>> ExceptionalCondition + 231
>> 1   postgres                            0x000000010e9eff15
>> LockAcquireExtended + 1461
>> 2   postgres                            0x000000010e9ed14d 
>> LockRelationOid + 61
>> 3   postgres                            0x000000010e41251b 
>> relation_open + 91
>> 4   postgres                            0x000000010e509679 table_open 
>> + 25
>> 5   postgres                            0x000000010ebf9462
>> SearchCatCacheMiss + 274
>> 6   postgres                            0x000000010ebf5979
>> SearchCatCacheInternal + 761
>> 7   postgres                            0x000000010ebf566c 
>> SearchCatCache + 60
>> 8   postgres                            0x000000010ec1a9e0 
>> SearchSysCache + 144
>> 9   postgres                            0x000000010ec1ae03
>> SearchSysCacheExists + 51
>> 10  postgres                            0x000000010e58ce35 
>> TypeIsVisible + 437
>> 11  postgres                            0x000000010ea98e4c
>> format_type_extended + 1964
>> 12  postgres                            0x000000010ea9900e
>> format_type_with_typemod + 30
>> 13  postgres                            0x000000010eb78d76 
>> get_const_expr + 742
>> 14  postgres                            0x000000010eb79bc8 
>> get_rule_expr + 232
>> 15  postgres                            0x000000010eb8140f 
>> get_func_expr + 1247
>> 16  postgres                            0x000000010eb79dcd 
>> get_rule_expr + 749
>> 17  postgres                            0x000000010eb81688
>> get_rule_expr_paren + 136
>> 18  postgres                            0x000000010eb7bf38 
>> get_rule_expr + 9304
>> 19  postgres                            0x000000010eb72ad5
>> deparse_expression_pretty + 149
>> 20  postgres                            0x000000010eb73463
>> deparse_expression + 83
>> 21  postgres                            0x000000010e68eaf1 
>> show_plan_tlist + 353
>> 22  postgres                            0x000000010e68adaf ExplainNode 
>> + 4991
>> 23  postgres                            0x000000010e688b4b
>> ExplainPrintPlan + 283
>> 24  postgres                            0x000000010e68e1aa
>> ProcessLogCurrentPlanInterrupt + 266
>> 25  postgres                            0x000000010ea133bb
>> ProcessInterrupts + 3435
>> 26  postgres                            0x000000010e738c97
>> vacuum_delay_point + 55
>> 27  postgres                            0x000000010e42bb4b
>> ginInsertCleanup + 1531
>> 28  postgres                            0x000000010e42d418
>> gin_clean_pending_list + 776
>> 29  postgres                            0x000000010e74955a 
>> ExecInterpExpr + 2522
>> 30  postgres                            0x000000010e7487e2
>> ExecInterpExprStillValid + 82
>> 31  postgres                            0x000000010e7ae83b
>> ExecEvalExprSwitchContext + 59
>> 32  postgres                            0x000000010e7ae7be ExecProject 
>> + 78
>> 33  postgres                            0x000000010e7ae4e9 ExecResult 
>> + 345
>> 34  postgres                            0x000000010e764e42
>> ExecProcNodeFirst + 82
>> 35  postgres                            0x000000010e75ccb2 
>> ExecProcNode + 50
>> 36  postgres                            0x000000010e758301 ExecutePlan 
>> + 193
>> 37  postgres                            0x000000010e7581d1
>> standard_ExecutorRun + 609
>> 38  auto_explain.so                     0x000000010f1df3a7
>> explain_ExecutorRun + 247
>> 39  postgres                            0x000000010e757f3b ExecutorRun 
>> + 91
>> 40  postgres                            0x000000010ea1cb49 
>> PortalRunSelect + 313
>> 41  postgres                            0x000000010ea1c4dd PortalRun + 
>> 861
>> 42  postgres                            0x000000010ea17474
>> exec_simple_query + 1540
>> 43  postgres                            0x000000010ea164d4 
>> PostgresMain + 2580
>> 44  postgres                            0x000000010e91d159 BackendRun 
>> + 89
>> 45  postgres                            0x000000010e91c6a5 
>> BackendStartup + 565
>> 46  postgres                            0x000000010e91b3fe ServerLoop 
>> + 638
>> 47  postgres                            0x000000010e918b9d 
>> PostmasterMain + 6717
>> 48  postgres                            0x000000010e7efd43 main + 819
>> 49  libdyld.dylib                       0x00007fff6a46e3d5 start + 1
>> 50  ???                                 0x0000000000000003 0x0 + 3
>> 
>> LOG:  server process (PID 61512) was terminated by signal 6: Abort 
>> trap: 6
>> DETAIL:  Failed process was running: select
>> gin_clean_pending_list('t_gin_test_tbl_i_j_idx') is not null;
> 
> As far as I understand, since explaining plans can acquire heavyweight
> lock for example to get column names, when page lock is held at the
> time of the interrupt, this assertion error occurs.
> 
> The attached patch tries to avoid this by checking each LocalLock
> entry and when finding even one, giving up logging the plan.
> 
> Thoughts?

Regression tests failed on windows.
Updated patch attached.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v10-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

Ekaterina Sokolova

Date:

13 October 2021, 14:28:30

Hi, hackers!

• The last version of patch is correct applied. It changes 8 files from 
/src/backend, and 9 other files.

• I have 1 error and 1 warning during compilation on Mac.

explain.c:4985:25: error: implicit declaration of function 
'GetLockMethodLocalHash' is invalid in C99 
[-Werror,-Wimplicit-function-declaration]
         hash_seq_init(&status, GetLockMethodLocalHash());
explain.c:4985:25: warning: incompatible integer to pointer conversion 
passing 'int' to parameter of type 'HTAB *' (aka 'struct HTAB *') 
[-Wint-conversion]
         hash_seq_init(&status, GetLockMethodLocalHash());

This error doesn't appear at my second machine with Ubuntu.

I found the reason. You delete #ifdef USE_ASSERT_CHECKING from 
implementation of function GetLockMethodLocalHash(void), but this ifdef 
exists around function declaration. There may be a situation, when 
implementation exists without declaration, so files with using of 
function produce errors. I create new version of patch with fix of this 
problem.

I'm agree that seeing the details of a query is a useful feature, but I 
have several doubts:

1) There are lots of changes of core's code. But not all users need this 
functionality. So adding this functionality like extension seemed more 
reasonable.

2) There are many tools available to monitor the status of a query. How 
much do we need another one? For example:
     • pg_stat_progress_* is set of views with current status of ANALYZE, 
CREATE INDEX, VACUUM, CLUSTER, COPY, Base Backup. You can find it in 
PostgreSQL documentation [1].
     • pg_query_state is contrib with 2 patches for core (I hope someday 
Community will support adding this patches to PostgreSQL). It contains 
function with printing table with pid, full query text, plan and current 
progress of every node like momentary EXPLAIN ANALYSE for SELECT, 
UPDATE, INSERT, DELETE. So it supports every flags and formats of 
EXPLAIN. You can find current version of pg_query_state on github [2]. 
Also I found old discussion about first version of it in Community [3].

3) Have you measured the overload of your feature? It would be really 
interesting to know the changes in speed and performance.

Thank you for working on this issue. I would be glad to continue to 
follow the development of this issue.

Links above:
[1] https://www.postgresql.org/docs/current/progress-reporting.html
[2] https://github.com/postgrespro/pg_query_state
[3] 
https://www.postgresql.org/message-id/dbfb1a42-ee58-88fd-8d77-550498f52bc5@postgrespro.ru

-- 
Ekaterina Sokolova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

v11-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

15 October 2021, 06:17:22

On 2021-10-13 23:28, Ekaterina Sokolova wrote:
> Hi, hackers!
> 
> • The last version of patch is correct applied. It changes 8 files
> from /src/backend, and 9 other files.
> 
> • I have 1 error and 1 warning during compilation on Mac.
> 
> explain.c:4985:25: error: implicit declaration of function
> 'GetLockMethodLocalHash' is invalid in C99
> [-Werror,-Wimplicit-function-declaration]
>         hash_seq_init(&status, GetLockMethodLocalHash());
> explain.c:4985:25: warning: incompatible integer to pointer conversion
> passing 'int' to parameter of type 'HTAB *' (aka 'struct HTAB *')
> [-Wint-conversion]
>         hash_seq_init(&status, GetLockMethodLocalHash());
> 
> This error doesn't appear at my second machine with Ubuntu.
> 
> I found the reason. You delete #ifdef USE_ASSERT_CHECKING from
> implementation of function GetLockMethodLocalHash(void), but this
> ifdef exists around function declaration. There may be a situation,
> when implementation exists without declaration, so files with using of
> function produce errors. I create new version of patch with fix of
> this problem.

Thanks for fixing that!

> I'm agree that seeing the details of a query is a useful feature, but
> I have several doubts:
> 
> 1) There are lots of changes of core's code. But not all users need
> this functionality. So adding this functionality like extension seemed
> more reasonable.

It would be good if we can implement this feature in an extension, but 
as pg_query_state extension needs applying patches to PostgreSQL, I 
think this kind of feature needs PostgreSQL core modification.
IMHO extensions which need core modification are not easy to use in 
production environments..

> 2) There are many tools available to monitor the status of a query.
> How much do we need another one? For example:
>     • pg_stat_progress_* is set of views with current status of
> ANALYZE, CREATE INDEX, VACUUM, CLUSTER, COPY, Base Backup. You can
> find it in PostgreSQL documentation [1].
>     • pg_query_state is contrib with 2 patches for core (I hope
> someday Community will support adding this patches to PostgreSQL). It
> contains function with printing table with pid, full query text, plan
> and current progress of every node like momentary EXPLAIN ANALYSE for
> SELECT, UPDATE, INSERT, DELETE. So it supports every flags and formats
> of EXPLAIN. You can find current version of pg_query_state on github
> [2]. Also I found old discussion about first version of it in
> Community [3].

Thanks for introducing the extension!

I only took a quick look at pg_query_state, I have some questions.

pg_query_state seems using shm_mq to expose the plan information, but 
there was a discussion that this kind of architecture would be tricky to 
do properly [1].
Does pg_query_state handle difficulties listed on the discussion?

It seems the caller of the pg_query_state() has to wait until the target 
process pushes the plan information into shared memory, can it lead to 
deadlock situations?
I came up with this question because when trying to make a view for 
memory contexts of other backends, we encountered deadlock situations. 
After all, we gave up view design and adopted sending signal and 
logging.

Some of the comments of [3] seem useful for my patch, I'm going to 
consider them. Thanks!

> 3) Have you measured the overload of your feature? It would be really
> interesting to know the changes in speed and performance.

I haven't measured it yet, but I believe that the overhead for backends 
which are not called pg_log_current_plan() would be slight since the 
patch just adds the logic for saving QueryDesc on ExecutorRun().
The overhead for backends which is called pg_log_current_plan() might 
not slight, but since the target process are assumed dealing with 
long-running query and the user want to know its plan, its overhead 
would be worth the cost.

> Thank you for working on this issue. I would be glad to continue to
> follow the development of this issue.

Thanks for your help!

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

15 October 2021, 10:12:26

On 2021-10-15 15:17, torikoshia wrote:
> I only took a quick look at pg_query_state, I have some questions.
> 
> pg_query_state seems using shm_mq to expose the plan information, but
> there was a discussion that this kind of architecture would be tricky
> to do properly [1].
> Does pg_query_state handle difficulties listed on the discussion?

Sorry, I forgot to add the URL.
[1] 
https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com

> It seems the caller of the pg_query_state() has to wait until the
> target process pushes the plan information into shared memory, can it
> lead to deadlock situations?
> I came up with this question because when trying to make a view for
> memory contexts of other backends, we encountered deadlock situations.
> After all, we gave up view design and adopted sending signal and
> logging.

Discussion at the following URL.
https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

Ekaterina Sokolova

Date:

02 November 2021, 11:32:59

Hi!

I'm here to answer your questions about contrib/pg_query_state.
>> I only took a quick look at pg_query_state, I have some questions.

>> pg_query_state seems using shm_mq to expose the plan information, but
>> there was a discussion that this kind of architecture would be tricky
>> to do properly [1].
>> Does pg_query_state handle difficulties listed on the discussion?
> [1] 
> https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com

I doubt that it was the right link.
But on the topic I will say that extension really use shared memory, 
interaction is implemented by sending / receiving messages. This 
architecture provides the required reliability and convenience.

>> It seems the caller of the pg_query_state() has to wait until the
>> target process pushes the plan information into shared memory, can it
>> lead to deadlock situations?
>> I came up with this question because when trying to make a view for
>> memory contexts of other backends, we encountered deadlock situations.
>> After all, we gave up view design and adopted sending signal and
>> logging.
> 
> Discussion at the following URL.
> https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com

Before extracting information about side process we check its state. 
Information will only be retrieved for a process willing to provide it. 
Otherwise, we will receive an error message about impossibility of 
getting query execution statistics + process status. Also checking fact 
of extracting your own status exists. This is even verified in tests.

Thanks for your attention.
Just in case, I am ready to discuss this topic in more detail.

About overhead:
> I haven't measured it yet, but I believe that the overhead for backends
> which are not called pg_log_current_plan() would be slight since the
> patch just adds the logic for saving QueryDesc on ExecutorRun().
> The overhead for backends which is called pg_log_current_plan() might
> not slight, but since the target process are assumed dealing with
> long-running query and the user want to know its plan, its overhead
> would be worth the cost.
I think it would be useful for us to have couple of examples with a 
different number of rows compared to using without this functionality.

Hope this helps.

-- 
Ekaterina Sokolova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: RFC: Logging plan of the running query

From

Daniel Gustafsson

Date:

03 November 2021, 12:48:51

This patch no longer applies on top of HEAD, please submit a rebased version.

--
Daniel Gustafsson        https://vmware.com/

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

04 November 2021, 12:49:21

On 2021-11-02 20:32, Ekaterina Sokolova wrote:
Thanks for your response!

> Hi!
> 
> I'm here to answer your questions about contrib/pg_query_state.
>>> I only took a quick look at pg_query_state, I have some questions.
> 
>>> pg_query_state seems using shm_mq to expose the plan information, but
>>> there was a discussion that this kind of architecture would be tricky
>>> to do properly [1].
>>> Does pg_query_state handle difficulties listed on the discussion?
>> [1] 
>> https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com
> 
> I doubt that it was the right link.

Sorry for make you confused, here is the link.

   
https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com

> But on the topic I will say that extension really use shared memory,
> interaction is implemented by sending / receiving messages. This
> architecture provides the required reliability and convenience.

As described in the link, using shared memory for this kind of work 
would need DSM and It'd be also necessary to exchange information 
between requestor and responder.

For example, when I looked at a little bit of pg_query_state code, it 
looks like the size of the queue is fixed at QUEUE_SIZE, and I wonder 
how plans that exceed QUEUE_SIZE are handled.

>>> It seems the caller of the pg_query_state() has to wait until the
>>> target process pushes the plan information into shared memory, can it
>>> lead to deadlock situations?
>>> I came up with this question because when trying to make a view for
>>> memory contexts of other backends, we encountered deadlock 
>>> situations.
>>> After all, we gave up view design and adopted sending signal and
>>> logging.
>> 
>> Discussion at the following URL.
>> https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com
> 
> Before extracting information about side process we check its state.
> Information will only be retrieved for a process willing to provide
> it. Otherwise, we will receive an error message about impossibility of
> getting query execution statistics + process status. Also checking
> fact of extracting your own status exists. This is even verified in
> tests.
> 
> Thanks for your attention.
> Just in case, I am ready to discuss this topic in more detail.

I imagined the following procedure.
Does it cause dead lock in pg_query_state?

- session1
BEGIN; TRUNCATE t;

- session2
BEGIN; TRUNCATE t; -- wait

- session1
SELECT * FROM pg_query_state(<pid of session>); -- wait and dead locked?

> About overhead:
>> I haven't measured it yet, but I believe that the overhead for 
>> backends
>> which are not called pg_log_current_plan() would be slight since the
>> patch just adds the logic for saving QueryDesc on ExecutorRun().
>> The overhead for backends which is called pg_log_current_plan() might
>> not slight, but since the target process are assumed dealing with
>> long-running query and the user want to know its plan, its overhead
>> would be worth the cost.
> I think it would be useful for us to have couple of examples with a
> different number of rows compared to using without this functionality.

Do you have any expectaion that the number of rows would affect the 
performance of this functionality?
This patch adds some codes to ExecutorRun(), but I thought the number of 
rows would not give impact on the performance.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

Justin Pryzby

Date:

12 November 2021, 18:37:09

On Wed, Oct 13, 2021 at 05:28:30PM +0300, Ekaterina Sokolova wrote:
> Hi, hackers!
> 
>     • pg_query_state is contrib with 2 patches for core (I hope someday
> Community will support adding this patches to PostgreSQL). It contains

I reviewed this version of the patch - I have some language fixes.

I didn't know about pg_query_state, thanks.

> To improve this situation, this patch adds
> pg_log_current_query_plan() function that requests to log the
> plan of the specified backend process.

To me, "current plan" seems to mean "plan of *this* backend" (which makes no
sense to log).  I think the user-facing function could be called
pg_log_query_plan().  It's true that the implementation is a request to another
backend to log its *own* query plan - but users shouldn't need to know about
the implementation.

> +        Only superusers can request to log plan of the running query.

.. log the plan of a running query.

> +    Note that nested statements (statements executed inside a function) are not
> +    considered for logging. Only the deepest nesting query's plan is logged.

Only the plan of the most deeply nested query is logged.

> +                (errmsg("backend with PID %d is not running a query",
> +                    MyProcPid)));

The extra parens around errmsg() are not needed since e3a87b499.

> +                (errmsg("backend with PID %d is now holding a page lock. Try again",

remove "now"

> +            (errmsg("plan of the query running on backend with PID %d is:\n%s",
> +                    MyProcPid, es->str->data),

Maybe this should say "query plan running on backend with PID 17793 is:"

> + * would cause lots of log messages and which can lead to denial of

remove "and"

> +                 errmsg("must be a superuser to log information about specified process")));

I think it should say not say "specified", since that sounds like the user
might have access to log information about some other processes:
| must be a superuser to log information about processes

> +
> +    proc = BackendPidGetProc(pid);
> +
> +    /*
> +     * BackendPidGetProc returns NULL if the pid isn't valid; but by the time
> +     * we reach kill(), a process for which we get a valid proc here might
> +     * have terminated on its own.  There's no way to acquire a lock on an
> +     * arbitrary process to prevent that. But since this mechanism is usually
> +     * used to below purposes, it might end its own first and the information

used for below purposes,

-- 
Justin

Re: RFC: Logging plan of the running query

From

Bharath Rupireddy

Date:

13 November 2021, 13:29:59

On Wed, Oct 13, 2021 at 7:58 PM Ekaterina Sokolova
<e.sokolova@postgrespro.ru> wrote:
> Thank you for working on this issue. I would be glad to continue to
> follow the development of this issue.

Thanks for the patch. I'm not sure if v11 is the latest patch, if yes,
I have the following comments:

1) Firstly, v11 patch isn't getting applied on the master -
http://cfbot.cputube.org/patch_35_3142.log.

2) I think we are moving away from if (!superuser()) checks, see the
commit [1]. The goal is to let the GRANT-REVOKE system deal with who
is supposed to run these system functions. Since
pg_log_current_query_plan also writes the info to server logs, I think
it should do the same thing as commit [1] did for
pg_log_backend_memory_contexts.

With v11, you are re-introducing the superuser() check in the
pg_log_backend_memory_contexts which is wrong.

3) I think SendProcSignalForLogInfo can be more generic, meaning, it
can also send signal to auxiliary processes if asked to do this will
simplify the things for pg_log_backend_memory_contexts and other
patches like pg_print_backtrace. I would imagine it to be "bool
SendProcSignalForLogInfo(pid_t pid, ProcSignalReason reason, bool
signal_aux_proc);".

[1] commit f0b051e322d530a340e62f2ae16d99acdbcb3d05
Author: Jeff Davis <jdavis@postgresql.org>
Date:   Tue Oct 26 13:13:52 2021 -0700

    Allow GRANT on pg_log_backend_memory_contexts().

    Remove superuser check, allowing any user granted permissions on
    pg_log_backend_memory_contexts() to log the memory contexts of any
    backend.

    Note that this could allow a privileged non-superuser to log the
    memory contexts of a superuser backend, but as discussed, that does
    not seem to be a problem.

    Reviewed-by: Nathan Bossart, Bharath Rupireddy, Michael Paquier,
Kyotaro Horiguchi, Andres Freund
    Discussion:
https://postgr.es/m/e5cf6684d17c8d1ef4904ae248605ccd6da03e72.camel@j-davis.com

Regards,
Bharath Rupireddy.

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

15 November 2021, 12:59:15

On 2021-11-13 22:29, Bharath Rupireddy wrote:
Thanks for your review!

> On Wed, Oct 13, 2021 at 7:58 PM Ekaterina Sokolova
> <e.sokolova@postgrespro.ru> wrote:
>> Thank you for working on this issue. I would be glad to continue to
>> follow the development of this issue.
> 
> Thanks for the patch. I'm not sure if v11 is the latest patch, if yes,
> I have the following comments:
> 
> 1) Firstly, v11 patch isn't getting applied on the master -
> http://cfbot.cputube.org/patch_35_3142.log.
Updated the patch.

> 2) I think we are moving away from if (!superuser()) checks, see the
> commit [1]. The goal is to let the GRANT-REVOKE system deal with who
> is supposed to run these system functions. Since
> pg_log_current_query_plan also writes the info to server logs, I think
> it should do the same thing as commit [1] did for
> pg_log_backend_memory_contexts.
> 
> With v11, you are re-introducing the superuser() check in the
> pg_log_backend_memory_contexts which is wrong.

Yeah, I removed superuser() check and make it possible to be executed by 
non-superusers when users are granted to do so.
> 
> 3) I think SendProcSignalForLogInfo can be more generic, meaning, it
> can also send signal to auxiliary processes if asked to do this will
> simplify the things for pg_log_backend_memory_contexts and other
> patches like pg_print_backtrace. I would imagine it to be "bool
> SendProcSignalForLogInfo(pid_t pid, ProcSignalReason reason, bool
> signal_aux_proc);".

I agree with your idea.
Since sending signals to auxiliary processes to dump memory contexts and 
pg_print_backtrace is still under discussion, IMHO it would be better to 
refactor SendProcSignalForLogInfo after these patches are commited.

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v12-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

15 November 2021, 14:00:12

On 2021-11-13 03:37, Justin Pryzby wrote:

> I reviewed this version of the patch - I have some language fixes.

Thanks for your review!
Attached patch that reflects your comments.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v13-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

Bharath Rupireddy

Date:

15 November 2021, 14:15:49

On Mon, Nov 15, 2021 at 6:29 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
> > 3) I think SendProcSignalForLogInfo can be more generic, meaning, it
> > can also send signal to auxiliary processes if asked to do this will
> > simplify the things for pg_log_backend_memory_contexts and other
> > patches like pg_print_backtrace. I would imagine it to be "bool
> > SendProcSignalForLogInfo(pid_t pid, ProcSignalReason reason, bool
> > signal_aux_proc);".
>
> I agree with your idea.
> Since sending signals to auxiliary processes to dump memory contexts and
> pg_print_backtrace is still under discussion, IMHO it would be better to
> refactor SendProcSignalForLogInfo after these patches are commited.

+1.

I have another comment: isn't it a good idea that an overloaded
version of the new function pg_log_query_plan can take the available
explain command options as a text argument? I'm not sure if it is
possible to get the stats like buffers, costs etc of a running query,
if yes, something like pg_log_query_plan(pid, 'buffers',
'costs'....);? It looks to be an overkill at first sight, but these
can be useful to know more detailed plan of the query.

Thoughts?

Regards,
Bharath Rupireddy.

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

16 November 2021, 11:48:27

On 2021-11-15 23:15, Bharath Rupireddy wrote:

> I have another comment: isn't it a good idea that an overloaded
> version of the new function pg_log_query_plan can take the available
> explain command options as a text argument? I'm not sure if it is
> possible to get the stats like buffers, costs etc of a running query,
> if yes, something like pg_log_query_plan(pid, 'buffers',
> 'costs'....);? It looks to be an overkill at first sight, but these
> can be useful to know more detailed plan of the query.

I also think the overloaded version would be useful.
However as discussed in [1], it seems to introduce other difficulties.
I think it would be enough that the first version of pg_log_query_plan 
doesn't take any parameters.

[1] 
https://www.postgresql.org/message-id/ce86e4f72f09d5497e8ad3a162861d33%40oss.nttdata.com

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

Ekaterina Sokolova

Date:

17 November 2021, 13:44:51

Hi!

You forgot my last fix to build correctly on Mac. I have added it.

About our discussion of pg_query_state:

torikoshia писал 2021-11-04 15:49:
>> I doubt that it was the right link.
> Sorry for make you confused, here is the link.
> https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com

Thank you. I'll see it soon.

> I imagined the following procedure.
> Does it cause dead lock in pg_query_state?
> 
> - session1
> BEGIN; TRUNCATE t;
> 
> - session2
> BEGIN; TRUNCATE t; -- wait
> 
> - session1
> SELECT * FROM pg_query_state(<pid of session>); -- wait and dead 
> locked?

As I know, pg_query_state use non-blocking read and write. I have wrote 
few tests trying to deadlock it (on 14 version), but all finished 
correctly.

Have a nice day. Please feel free to contact me if you need any further 
information.

-- 
Ekaterina Sokolova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

v13-0002-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

26 November 2021, 03:39:48

On 2021-11-17 22:44, Ekaterina Sokolova wrote:
> Hi!
> 
> You forgot my last fix to build correctly on Mac. I have added it.

Thanks for the notification!
Since the patch could not be applied to the HEAD anymore, I also updated 
it.

> 
> About our discussion of pg_query_state:
> 
> torikoshia писал 2021-11-04 15:49:
>>> I doubt that it was the right link.
>> Sorry for make you confused, here is the link.
>> https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com
> 
> Thank you. I'll see it soon.
> 
>> I imagined the following procedure.
>> Does it cause dead lock in pg_query_state?
>> 
>> - session1
>> BEGIN; TRUNCATE t;
>> 
>> - session2
>> BEGIN; TRUNCATE t; -- wait
>> 
>> - session1
>> SELECT * FROM pg_query_state(<pid of session>); -- wait and dead 
>> locked?
> 
> As I know, pg_query_state use non-blocking read and write. I have
> wrote few tests trying to deadlock it (on 14 version), but all
> finished correctly.
> 
> Have a nice day. Please feel free to contact me if you need any
> further information.

Thanks for your information and help!

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v14-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

07 January 2022, 05:30:37

On 2021-11-26 12:39, torikoshia wrote:
> Since the patch could not be applied to the HEAD anymore, I also 
> updated it.

Updated the patch for fixing compiler warning about the format on 
windows.


-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v15-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

07 January 2022, 11:58:19

On 2022-01-07 14:30, torikoshia wrote:

> Updated the patch for fixing compiler warning about the format on 
> windows.

I got another compiler warning, updated the patch again.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v16-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

Fujii Masao

Date:

07 January 2022, 12:54:31


On 2022/01/07 20:58, torikoshia wrote:
> On 2022-01-07 14:30, torikoshia wrote:
> 
>> Updated the patch for fixing compiler warning about the format on windows.
> 
> I got another compiler warning, updated the patch again.

Thanks for updating the patch!

I ran the following query every 0.1s by using \watch psql command from three different sessions while make installcheck
testwas running.
 

     SELECT pg_log_query_plan(pid) FROM pg_stat_activity;


And then I got the segmentation fault as follows.

2022-01-07 21:40:32 JST [postmaster] LOG:  server process (PID 51017) was terminated by signal 11: Segmentation fault:
11
2022-01-07 21:40:32 JST [postmaster] DETAIL:  Failed process was running: select description, (test_conv(inbytes,
'utf8','utf8')).* from utf8_verification_inputs;
 
2022-01-07 21:40:32 JST [postmaster] LOG:  terminating any other active server processes
2022-01-07 21:40:32 JST [postmaster] LOG:  all server processes terminated; reinitializing


The backtrace I got from the core file was:

(lldb) target create --core "/cores/core.51017"
Core file '/cores/core.51017' (x86_64) was loaded.
(lldb) bt
* thread #1, stop reason = signal SIGSTOP
   * frame #0: 0x00007fff20484552 libsystem_platform.dylib`_platform_strlen + 18
     frame #1: 0x000000011076e36c postgres`dopr(target=0x00007ffedfd1b1d8, format="\n", args=0x00007ffedfd1b450) at
snprintf.c:444:20
     frame #2: 0x000000011076e133 postgres`pg_vsnprintf(str="Query Text:
\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f"...,
count=1024,fmt="%s: %s\n", args=0x00007ffedfd1b450) at snprintf.c:195:2
 
     frame #3: 0x000000011075e51d postgres`pvsnprintf(buf="Query Text:
\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f"...,
len=1024,fmt="%s: %s\n", args=0x00007ffedfd1b450) at psprintf.c:110:13
 
     frame #4: 0x000000011076052d postgres`appendStringInfoVA(str=0x00007fe56804bec0, fmt="%s: %s\n",
args=0x00007ffedfd1b450)at stringinfo.c:149:13
 
     frame #5: 0x0000000110760449 postgres`appendStringInfo(str=0x00007fe56804bec0, fmt="%s: %s\n") at
stringinfo.c:103:12
     frame #6: 0x000000011016f6cf postgres`ExplainProperty(qlabel="Query Text", unit=0x0000000000000000, value="",
numeric=false,es=0x00007fe56804be28) at explain.c:4390:5
 
     frame #7: 0x000000011016eace postgres`ExplainPropertyText(qlabel="Query Text", value="", es=0x00007fe56804be28) at
explain.c:4435:2
     frame #8: 0x000000011016ea97 postgres`ExplainQueryText(es=0x00007fe56804be28, queryDesc=0x00007fe568022320) at
explain.c:969:3
     frame #9: 0x000000011016faeb postgres`ProcessLogCurrentPlanInterrupt at explain.c:5009:2
     frame #10: 0x00000001104dda84 postgres`ProcessInterrupts at postgres.c:3373:3
     frame #11: 0x0000000110c81c57 plpgsql.so`exec_stmts(estate=0x00007ffedfd1bab8, stmts=0x00007fe568050e00) at
pl_exec.c:1994:3
     frame #12: 0x0000000110c80a78 plpgsql.so`exec_stmt_block(estate=0x00007ffedfd1bab8, block=0x00007fe56801c570) at
pl_exec.c:1777:9
     frame #13: 0x0000000110c81c91 plpgsql.so`exec_stmts(estate=0x00007ffedfd1bab8, stmts=0x00007fe56801c5c8) at
pl_exec.c:1999:10
     frame #14: 0x0000000110c80f21 plpgsql.so`exec_stmt_block(estate=0x00007ffedfd1bab8, block=0x00007fe56801c678) at
pl_exec.c:1926:8
     frame #15: 0x0000000110c7ed92 plpgsql.so`exec_toplevel_block(estate=0x00007ffedfd1bab8, block=0x00007fe56801c678)
atpl_exec.c:1617:7
 
     frame #16: 0x0000000110c7d3e4 plpgsql.so`plpgsql_exec_function(func=0x00007fe56a00aa20, fcinfo=0x00007fe56803c888,
simple_eval_estate=0x0000000000000000,simple_eval_resowner=0x0000000000000000, procedure_resowner=0x0000000000000000,
atomic=true)at pl_exec.c:611:7
 
     frame #17: 0x0000000110c9bf12 plpgsql.so`plpgsql_call_handler(fcinfo=0x00007fe56803c888) at pl_handler.c:277:13
     frame #18: 0x0000000110228b06 postgres`ExecInterpExpr(state=0x00007fe56803bf88, econtext=0x00007fe56803ba70,
isnull=0x00007ffedfd1bf7f)at execExprInterp.c:725:8
 
     frame #19: 0x000000011024858b postgres`ExecEvalExprSwitchContext(state=0x00007fe56803bf88,
econtext=0x00007fe56803ba70,isNull=0x00007ffedfd1bf7f) at executor.h:339:13
 
     frame #20: 0x00000001102482aa postgres`ExecProject(projInfo=0x00007fe56803bf80) at executor.h:373:9
     frame #21: 0x0000000110247e2f postgres`ExecScan(node=0x00007fe56803b958, accessMtd=(postgres`SeqNext at
nodeSeqscan.c:51),recheckMtd=(postgres`SeqRecheck at nodeSeqscan.c:90)) at execScan.c:238:12
 
     frame #22: 0x000000011028c575 postgres`ExecSeqScan(pstate=0x00007fe56803b958) at nodeSeqscan.c:112:9
     frame #23: 0x000000011023bbe2 postgres`ExecProcNode(node=0x00007fe56803b958) at executor.h:257:9
     frame #24: 0x0000000110237471 postgres`ExecutePlan(estate=0x00007fe56803b720, planstate=0x00007fe56803b958,
use_parallel_mode=false,operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection,
dest=0x00007fe568046ac8,execute_once=true) at execMain.c:1561:10
 
     frame #25: 0x0000000110237320 postgres`standard_ExecutorRun(queryDesc=0x00007fe568037320,
direction=ForwardScanDirection,count=0, execute_once=true) at execMain.c:371:3
 
     frame #26: 0x00000001102370b8 postgres`ExecutorRun(queryDesc=0x00007fe568037320, direction=ForwardScanDirection,
count=0,execute_once=true) at execMain.c:313:3
 
     frame #27: 0x00000001104e6bdb postgres`PortalRunSelect(portal=0x00007fe567825920, forward=true, count=0,
dest=0x00007fe568046ac8)at pquery.c:921:4
 
     frame #28: 0x00000001104e65ba postgres`PortalRun(portal=0x00007fe567825920, count=9223372036854775807,
isTopLevel=true,run_once=true, dest=0x00007fe568046ac8, altdest=0x00007fe568046ac8, qc=0x00007ffedfd1c350) at
pquery.c:765:18
     frame #29: 0x00000001104e1861 postgres`exec_simple_query(query_string="select description, (test_conv(inbytes,
'utf8','utf8')).* from utf8_verification_inputs;") at postgres.c:1216:10
 
     frame #30: 0x00000001104e099c postgres`PostgresMain(dbname="regression", username="postgres") at
postgres.c:4505:7
     frame #31: 0x00000001103ee2a2 postgres`BackendRun(port=0x00007fe567704080) at postmaster.c:4594:2
     frame #32: 0x00000001103ed868 postgres`BackendStartup(port=0x00007fe567704080) at postmaster.c:4322:3
     frame #33: 0x00000001103ec64c postgres`ServerLoop at postmaster.c:1802:7
     frame #34: 0x00000001103e9e29 postgres`PostmasterMain(argc=3, argv=0x00007fe567405e70) at postmaster.c:1474:11
     frame #35: 0x00000001102c8b69 postgres`main(argc=3, argv=0x00007fe567405e70) at main.c:198:3
     frame #36: 0x00007fff2045cf3d libdyld.dylib`start + 1

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

24 January 2022, 05:33:05

On 2022-01-14 15:38, Julien Rouhaud wrote:
> Hi,
> 
> On Fri, Jan 07, 2022 at 09:54:31PM +0900, Fujii Masao wrote:

>> I ran the following query every 0.1s by using \watch psql command from 
>> three different sessions while make installcheck test was running.
>> 
>>     SELECT pg_log_query_plan(pid) FROM pg_stat_activity;
>> 
>> 
>> And then I got the segmentation fault as follows.
>> 
>> 2022-01-07 21:40:32 JST [postmaster] LOG:  server process (PID 51017) 
>> was terminated by signal 11: Segmentation fault: 11
>> 2022-01-07 21:40:32 JST [postmaster] DETAIL:  Failed process was 
>> running: select description, (test_conv(inbytes, 'utf8', 'utf8')).* 
>> from utf8_verification_inputs;
>> 2022-01-07 21:40:32 JST [postmaster] LOG:  terminating any other 
>> active server processes
>> 2022-01-07 21:40:32 JST [postmaster] LOG:  all server processes 
>> terminated; reinitializing
>> 
>> 
>> The backtrace I got from the core file was:
>> 
>> (lldb) target create --core "/cores/core.51017"
>> Core file '/cores/core.51017' (x86_64) was loaded.
>> (lldb) bt
>> * thread #1, stop reason = signal SIGSTOP
>>   * frame #0: 0x00007fff20484552 
>> libsystem_platform.dylib`_platform_strlen + 18
>>     frame #1: 0x000000011076e36c 
>> postgres`dopr(target=0x00007ffedfd1b1d8, format="\n", 
>> args=0x00007ffedfd1b450) at snprintf.c:444:20
>>     frame #2: 0x000000011076e133 postgres`pg_vsnprintf(str="Query 
>> Text: 
>>

\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x

7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f

\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x

7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f
\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f"..., 
>> count=1024, fmt="%s: %s\n", args=0x00007ffedfd1b450) at 
>> snprintf.c:195:2

Thanks for testing!
I could reproduce the situation and confirmed almost the same backtrace.

The cause seems that the previous patch didn't do cleanup when the 
transactions were aborted. It leaded segmentation faults when 
referencing ActiveQueryDesc-> after the transaction were aborted.

The attached patch added cleanups when transactions and subtransactions 
are aborted.

BTW, the latest patch for pg_log_backtrace() added a new function for 
deduplicating codes for pg_log_*()[1].
It seems better to use it for pg_log_query_plan() after the patch is 
merged.


[1] 
https://www.postgresql.org/message-id/flat/20211115194251.GP17618%40telsasoft.com#4e681d04c78867b2cce44b3cb1791bf5
  duplicate code to a new function
CheckPostgresProcessId which can be used by pg_log_backtrace,

> On top of the visibly use-after-pfree memory issue, the patch doesn't 
> apply

Thanks for letting me know!
As mentioned above, I updated the patch.

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v17-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

Fujii Masao

Date:

27 January 2022, 11:18:37


On 2022/01/24 14:33, torikoshia wrote:
> As mentioned above, I updated the patch.

Thanks for updating the patch!

Here are another review comments:

+LOG:  plan of the query running on backend with PID 17793 is:

This seems not the same as what actually logged.

+        ereport(WARNING,
+                (errmsg("PID %d is not a PostgreSQL server process", pid)));

Like commit 7fa945b857 changed, this warning message should be "PID %d is not a PostgreSQL backend process"?

+    if (SendProcSignal(pid, PROCSIG_LOG_CURRENT_PLAN, InvalidBackendId) < 0)

proc->backendId should be specified instead of InvalidBackendId, to speed up the processing in SendProcSignal()?

+    PROCSIG_LOG_CURRENT_PLAN, /* ask backend to log plan of the current query */
+volatile sig_atomic_t LogCurrentPlanPending = false;
+extern void HandleLogCurrentPlanInterrupt(void);
+extern void ProcessLogCurrentPlanInterrupt(void);

Isn't it better to use the names that are more consistent with the function name, i.e., pg_log_query_plan? For example,
PROCSIG_LOG_QUERY_PLANinstead of PROCSIG_LOG_CURRENT_PLAN?
 

+        ereport(LOG_SERVER_ONLY,
+                errmsg("backend with PID %d is not running a query",
+                    MyProcPid));

errhidestmt(true) and errhidecontext(true) need to be added, don't they? Otherwise, for example, if pg_log_query_plan()
isexecuted after debug_query_string is set but before ActiveQueryDesc is set, STATEMENT message would be output even
thoughthe message saying "not running a query" is output. Which seems confusing.
 

+    hash_seq_init(&status, GetLockMethodLocalHash());
+    while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL)
+    {
+        if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE)

Why did you use the search for local lock hash instead of IsPageLockHeld flag variable, to check whether a page lock is
heldor not? Because there is the corner case where the interrupt is processed after the local lock is registered into
thehash but before IsPageLockHeld is enabled?
 

There is the case where the request to log a query plan is skipped even while the target backend is running a query. If
thishappens, users can just retry pg_log_query_plan(). These things should be documented?
 

+            ereport(LOG_SERVER_ONLY,
+                errmsg("backend with PID %d is holding a page lock. Try again",
+                    MyProcPid));

It seems more proper to output this message in DETAIL or HINT, instead. So how about something like the following
messages?

LOG: could not log the query plan
DETAIL: query plan cannot be logged while page level lock is being held
HINT: Try pg_log_query_plan() after a few ....

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

28 January 2022, 08:45:49

On 2022-01-27 20:18, Fujii Masao wrote:
> Here are another review comments:

Thanks for reviewing!

> +LOG:  plan of the query running on backend with PID 17793 is:
> 
> This seems not the same as what actually logged.

Modified.

> +        ereport(WARNING,
> +                (errmsg("PID %d is not a PostgreSQL server process", pid)));
> 
> Like commit 7fa945b857 changed, this warning message should be "PID %d
> is not a PostgreSQL backend process"?

Modified.

> +    if (SendProcSignal(pid, PROCSIG_LOG_CURRENT_PLAN, InvalidBackendId) < 
> 0)
> 
> proc->backendId should be specified instead of InvalidBackendId, to
> speed up the processing in SendProcSignal()?

Agreed. Modified.

> +    PROCSIG_LOG_CURRENT_PLAN, /* ask backend to log plan of the current 
> query */
> +volatile sig_atomic_t LogCurrentPlanPending = false;
> +extern void HandleLogCurrentPlanInterrupt(void);
> +extern void ProcessLogCurrentPlanInterrupt(void);
> 
> Isn't it better to use the names that are more consistent with the
> function name, i.e., pg_log_query_plan? For example,
> PROCSIG_LOG_QUERY_PLAN instead of PROCSIG_LOG_CURRENT_PLAN?

Agreed.
I removed 'current' from the variable and function names and used 
'query' instead.

> +        ereport(LOG_SERVER_ONLY,
> +                errmsg("backend with PID %d is not running a query",
> +                    MyProcPid));
> 
> errhidestmt(true) and errhidecontext(true) need to be added, don't
> they? Otherwise, for example, if pg_log_query_plan() is executed after
> debug_query_string is set but before ActiveQueryDesc is set, STATEMENT
> message would be output even though the message saying "not running a
> query" is output. Which seems confusing.

Agreed. Added errhidestmt(true) and errhidecontext(true).

> +    hash_seq_init(&status, GetLockMethodLocalHash());
> +    while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL)
> +    {
> +        if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE)
> 
> Why did you use the search for local lock hash instead of
> IsPageLockHeld flag variable, to check whether a page lock is held or
> not? Because there is the corner case where the interrupt is processed
> after the local lock is registered into the hash but before
> IsPageLockHeld is enabled?

As far as I read CheckAndSetLockHeld(), IsPageLockHeld can be used only 
when USE_ASSERT_CHECKING is enabled.
Since removing USE_ASSERT_CHECKING from CheckAndSetLockHeld() would give 
performance impact on every granting/removing local lock, I used the 
search for local local hash.

> There is the case where the request to log a query plan is skipped
> even while the target backend is running a query. If this happens,
> users can just retry pg_log_query_plan(). These things should be
> documented?

Agreed.
Added following:

   +        Note that there is the case where the request to log a query
   +        plan is skipped even while the target backend is running a
   +        query due to lock conflict avoidance.
   +        If this happens, users can just retry pg_log_query_plan().

|

> +            ereport(LOG_SERVER_ONLY,
> +                errmsg("backend with PID %d is holding a page lock. Try again",
> +                    MyProcPid));
> 
> It seems more proper to output this message in DETAIL or HINT,
> instead. So how about something like the following messages?
> 
> LOG: could not log the query plan
> DETAIL: query plan cannot be logged while page level lock is being held
> HINT: Try pg_log_query_plan() after a few ....

Agreed.
I felt the HINT message 'after a few ...' is difficult to describe, and 
wrote as below.

| HINT: Retrying pg_log_query_plan() might succeed since the lock 
duration of page level locks are usually short

How do you think?

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v18-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

Fujii Masao

Date:

31 January 2022, 16:51:11

On 2022/01/28 17:45, torikoshia wrote:
>> There is the case where the request to log a query plan is skipped
>> even while the target backend is running a query. If this happens,
>> users can just retry pg_log_query_plan(). These things should be
>> documented?
> 
> Agreed.
> Added following:
> 
>    +        Note that there is the case where the request to log a query
>    +        plan is skipped even while the target backend is running a
>    +        query due to lock conflict avoidance.
>    +        If this happens, users can just retry pg_log_query_plan().

This may cause users to misunderstand that pg_log_query_plan() fails while the target backend is holding *any* locks?
Isn'tit better to mention "page-level locks", instead? So how about the following?

--------------------------
Note that the request to log the query plan will be ignored if it's received during a short period while the target
backendis holding a page-level lock.

--------------------------

>> +            ereport(LOG_SERVER_ONLY,
>> +                errmsg("backend with PID %d is holding a page lock. Try again",
>> +                    MyProcPid));
>>
>> It seems more proper to output this message in DETAIL or HINT,
>> instead. So how about something like the following messages?
>>
>> LOG: could not log the query plan
>> DETAIL: query plan cannot be logged while page level lock is being held
>> HINT: Try pg_log_query_plan() after a few ....
> 
> Agreed.
> I felt the HINT message 'after a few ...' is difficult to describe, and wrote as below.
> 
> | HINT: Retrying pg_log_query_plan() might succeed since the lock duration of page level locks are usually short
> 
> How do you think?

Or we don't need HINT message?

+                errmsg("could not log the query plan"),
+                errdetail("query plan cannot be logged while page level lock is being held"),

In detail message, the first word of sentences should be capitalized. How about "Cannot log the query plan while
holdingpage-level lock.", instead?

Thanks for updating the patch! Here are some review comments.

+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_log_query_plan</primary>
+        </indexterm>

This entry is placed before one for pg_log_backend_memory_contexts(). But it should be *after* that since those entries
seemto be placed in alphabetical order in the table?

+        Requests to log the plan of the query currently running on the
+        backend with specified process ID along with the untruncated
+        query string.

Other descriptions about logging of query string seem not to mention something like "untruncated query string". For
example,auto_explain, log_statement, etc. Why do we need to mention "along with the untruncated query string" specially
forpg_log_query_plan()?

+    Note that nested statements (statements executed inside a function) are not
+    considered for logging. Only the plan of the most deeply nested query is logged.

Now the plan of even nested statement can be logged. So this description needs to be updated?

@@ -440,6 +450,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc)

      MemoryContextSwitchTo(oldcontext);

+    ActiveQueryDesc = NULL;

ActiveQueryDesc seems unnecessary. Why does ActiveQueryDesc need to be reset to NULL in standard_ExecutorFinish()?

Currently even during ProcessLogQueryPlanInterrupt(), pg_log_query_plan() can be call and another
ProcessLogQueryPlanInterrupt()can be executed. So repeatable re-entrances to ProcessLogQueryPlanInterrupt() could cause
"stackdepth limit exceeded" error. To avoid this, shouldn't we make ProcessLogQueryPlanInterrupt() do nothing and
returnimmediately, if it's called during another ProcessLogQueryPlanInterrupt()?

pg_log_backend_memory_contexts() also might have the same issue.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

Kyotaro Horiguchi

Date:

01 February 2022, 08:27:57

At Tue, 1 Feb 2022 01:51:11 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>
>
> On 2022/01/28 17:45, torikoshia wrote:
> >> There is the case where the request to log a query plan is skipped
> >> even while the target backend is running a query. If this happens,
> >> users can just retry pg_log_query_plan(). These things should be
> >> documented?
> > Agreed.
> > Added following:
> >    +        Note that there is the case where the request to log a
> >  query
> >    +        plan is skipped even while the target backend is running a
> >    +        query due to lock conflict avoidance.
> >    +        If this happens, users can just retry pg_log_query_plan().
>
> This may cause users to misunderstand that pg_log_query_plan() fails
> while the target backend is holding *any* locks? Isn't it better to
> mention "page-level locks", instead? So how about the following?
>
> --------------------------
> Note that the request to log the query plan will be ignored if it's
> received during a short period while the target backend is holding a
> page-level lock.
> --------------------------
>
>
> >> +            ereport(LOG_SERVER_ONLY,
> >> +                errmsg("backend with PID %d is holding a page
> >> lock. Try again",
> >> +                    MyProcPid));
> >>
> >> It seems more proper to output this message in DETAIL or HINT,
> >> instead. So how about something like the following messages?
> >>
> >> LOG: could not log the query plan
> >> DETAIL: query plan cannot be logged while page level lock is being
> >> held
> >> HINT: Try pg_log_query_plan() after a few ....
> > Agreed.
> > I felt the HINT message 'after a few ...' is difficult to describe,
> > and wrote as below.
> > | HINT: Retrying pg_log_query_plan() might succeed since the lock
> > | duration of page level locks are usually short
> > How do you think?
>
> Or we don't need HINT message?
>
>
> +                errmsg("could not log the query plan"),
> + errdetail("query plan cannot be logged while page level lock is
> being held"),
>
> In detail message, the first word of sentences should be
> capitalized. How about "Cannot log the query plan while holding
> page-level lock.", instead?
>
>
> Thanks for updating the patch! Here are some review comments.
>
> +      <row>
> +       <entry role="func_table_entry"><para role="func_signature">
> +        <indexterm>
> +         <primary>pg_log_query_plan</primary>
> +        </indexterm>
>
> This entry is placed before one for
> pg_log_backend_memory_contexts(). But it should be *after* that since
> those entries seem to be placed in alphabetical order in the table?
>
>
> + Requests to log the plan of the query currently running on the
> +        backend with specified process ID along with the untruncated
> +        query string.
>
> Other descriptions about logging of query string seem not to mention
> something like "untruncated query string". For example, auto_explain,
> log_statement, etc. Why do we need to mention "along with the
> untruncated query string" specially for pg_log_query_plan()?
>
>
> + Note that nested statements (statements executed inside a function)
> are not
> + considered for logging. Only the plan of the most deeply nested
> query is logged.
>
> Now the plan of even nested statement can be logged. So this
> description needs to be updated?
>
>
> @@ -440,6 +450,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
>        MemoryContextSwitchTo(oldcontext);
>  +    ActiveQueryDesc = NULL;
>
> ActiveQueryDesc seems unnecessary. Why does ActiveQueryDesc need to be
> reset to NULL in standard_ExecutorFinish()?
>
>
> Currently even during ProcessLogQueryPlanInterrupt(),
> pg_log_query_plan() can be call and another
> ProcessLogQueryPlanInterrupt() can be executed. So repeatable
> re-entrances to ProcessLogQueryPlanInterrupt() could cause "stack
> depth limit exceeded" error. To avoid this, shouldn't we make
> ProcessLogQueryPlanInterrupt() do nothing and return immediately, if
> it's called during another ProcessLogQueryPlanInterrupt()?
>
> pg_log_backend_memory_contexts() also might have the same issue.

Good catch.

By the way, I'm anxious about the following part and I'd like to
remove it.

+     * Ensure no page lock is held on this process.

It seems to me what is wrong is ginInsertCleanup(), not this feature.
As I read the comment for the assertion, I don't believe we want to
allow CFI while holding a page lock.  And AFAICS the function is the
only point where doing that.  (It is the alone user of LockPage()...)

This is the assertion.

lock.c: 902
>     /*
>      * We don't acquire any other heavyweight lock while holding the page lock
>      * except for relation extension.
>      */
>     Assert(!IsPageLockHeld ||
>            (locktag->locktag_type == LOCKTAG_RELATION_EXTEND));

It is added by a recent commit 72e78d831ab5550c39f2dcc7cc5d44c406ec3dc2.

> * Similar to relation extension, page locks are also held for a short
> * duration, so imposing such a restriction won't hurt.

I don't believe a path involving vacuum_delay_point() calls is
short-duration'ed.

https://www.postgresql.org/message-id/CAH2-WznzCPUKnOV%2Bre30_rHLCkqQWm2JTSVjTCAei9LySTc2pw%40mail.gmail.com

> One thing that really bothers me about commit e2c79e14 is that
> LockPage() is called, not LockBuffer(). GIN had no LockPage() calls
> before that commit, and is now the only code in the entire system that
> calls LockPage()/ConditionalLockPage() (the hash am no longer uses
> page heavyweight locks following recent work there).

I agree to the discussion.  Can't we use other mechanism here to get
rid of the Lockpage()?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Re: RFC: Logging plan of the running query

From

Fujii Masao

Date:

01 February 2022, 14:11:03


On 2022/02/01 17:27, Kyotaro Horiguchi wrote:
>> * Similar to relation extension, page locks are also held for a short
>> * duration, so imposing such a restriction won't hurt.
> 
> I don't believe a path involving vacuum_delay_point() calls is
> short-duration'ed.

Yes.


>> One thing that really bothers me about commit e2c79e14 is that
>> LockPage() is called, not LockBuffer(). GIN had no LockPage() calls
>> before that commit, and is now the only code in the entire system that
>> calls LockPage()/ConditionalLockPage() (the hash am no longer uses
>> page heavyweight locks following recent work there).
> 
> I agree to the discussion.  Can't we use other mechanism here to get
> rid of the Lockpage()?

I have no good idea for that yet, but I agree it's better to get rid of page level lock.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

Kyotaro Horiguchi

Date:

02 February 2022, 07:49:57

At Tue, 1 Feb 2022 23:11:03 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> >> One thing that really bothers me about commit e2c79e14 is that
> >> LockPage() is called, not LockBuffer(). GIN had no LockPage() calls
> >> before that commit, and is now the only code in the entire system that
> >> calls LockPage()/ConditionalLockPage() (the hash am no longer uses
> >> page heavyweight locks following recent work there).
> > I agree to the discussion.  Can't we use other mechanism here to get
> > rid of the Lockpage()?
> 
> I have no good idea for that yet, but I agree it's better to get rid
> of page level lock.

It's my turn?

The page lock is used to hold-off simultaneous cleanups on the same
index.  ShareUpdateExclusive lock on the index relation works that
way. In that path it seems like we are always holding a RowExclusive
lock, so it seems to me we can use ShareUpdateExclusive for our
purpose.

There might be a false blocking case when another backend is holding a
conflicting lock on the index.  They are, Share, ShareRowExclusive,
Exclusive and AccessExclusive.  The last three cases don't seem worth
discussion.  I'm not sure about Share and Share Row cases. AFAICS
Share lock is taken on an index in ATExecReplicaIdentity, and there no
use of ShareRowExclusive lock on an index. It's no use discussing about
explicit locking.

So aren't we able to use ShareUpdateExclusive lock for that?

In the attached patch, ginInsertCleanup has an extra check for such
stronger locks not being held.  At least "make check" doesn't cause
the extra assertion to fire.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 7409fdc165..1af9a69abb 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -791,20 +791,29 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
     bool        fsm_vac = false;
     Size        workMemory;
 
-    /*
+    /*r
      * We would like to prevent concurrent cleanup process. For that we will
      * lock metapage in exclusive mode using LockPage() call. Nobody other
      * will use that lock for metapage, so we keep possibility of concurrent
      * insertion into pending list
      */
 
+    /*
+     * we use ShareUpdateExclusive lock on this relation to hold-off concurrent
+     * cleanup
+     */
+    Assert(!CheckRelationLockedByMe(index, ShareUpdateExclusiveLock, false));
+
+    /* tentative debug-purpose assertion for stronger locks */
+    Assert(!CheckRelationLockedByMe(index, ShareLock, true));
+    
     if (forceCleanup)
     {
         /*
          * We are called from [auto]vacuum/analyze or gin_clean_pending_list()
          * and we would like to wait concurrent cleanup to finish.
          */
-        LockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock);
+        LockRelation(index, ShareUpdateExclusiveLock);
         workMemory =
             (IsAutoVacuumWorkerProcess() && autovacuum_work_mem != -1) ?
             autovacuum_work_mem : maintenance_work_mem;
@@ -816,7 +825,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
          * just exit in hope that concurrent process will clean up pending
          * list.
          */
-        if (!ConditionalLockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock))
+        if (!ConditionalLockRelation(index, ShareUpdateExclusiveLock))
             return;
         workMemory = work_mem;
     }
@@ -830,7 +839,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
     {
         /* Nothing to do */
         UnlockReleaseBuffer(metabuffer);
-        UnlockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock);
+        UnlockRelation(index, ShareUpdateExclusiveLock);
         return;
     }
 
@@ -1002,7 +1011,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
         page = BufferGetPage(buffer);
     }
 
-    UnlockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock);
+    UnlockRelation(index, ShareUpdateExclusiveLock);
     ReleaseBuffer(metabuffer);
 
     /*

Re: RFC: Logging plan of the running query

From

Kyotaro Horiguchi

Date:

02 February 2022, 08:04:21

At Wed, 02 Feb 2022 16:49:57 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> So aren't we able to use ShareUpdateExclusive lock for that?
> 
> In the attached patch, ginInsertCleanup has an extra check for such
> stronger locks not being held.  At least "make check" doesn't cause
> the extra assertion to fire.

Actually, the discussion is a bit dubious. What we need really to
check is wheter such locks are not held on an index *elsewhere*.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

02 February 2022, 12:59:12

2022-02-01 01:51, Fujii Masao wrote:
Thanks for reviewing and suggestions!

>>    +        Note that there is the case where the request to log a 
>> query
>>    +        plan is skipped even while the target backend is running a
>>    +        query due to lock conflict avoidance.
>>    +        If this happens, users can just retry pg_log_query_plan().
> 
> This may cause users to misunderstand that pg_log_query_plan() fails
> while the target backend is holding *any* locks? Isn't it better to
> mention "page-level locks", instead? So how about the following?
> 
> --------------------------
> Note that the request to log the query plan will be ignored if it's
> received during a short period while the target backend is holding a
> page-level lock.
> --------------------------

Agreed.

>>> +            ereport(LOG_SERVER_ONLY,
>>> +                errmsg("backend with PID %d is holding a page lock. 
>>> Try again",
>>> +                    MyProcPid));
>>> 
>>> It seems more proper to output this message in DETAIL or HINT,
>>> instead. So how about something like the following messages?
>>> 
>>> LOG: could not log the query plan
>>> DETAIL: query plan cannot be logged while page level lock is being 
>>> held
>>> HINT: Try pg_log_query_plan() after a few ....
>> 
>> Agreed.
>> I felt the HINT message 'after a few ...' is difficult to describe, 
>> and wrote as below.
>> 
>> | HINT: Retrying pg_log_query_plan() might succeed since the lock 
>> duration of page level locks are usually short
>> 
>> How do you think?
> 
> Or we don't need HINT message?

Removed the HINT message.

> +                             errmsg("could not log the query plan"),
> +                             errdetail("query plan cannot be logged 
> while page level lock is
> being held"),
> 
> In detail message, the first word of sentences should be capitalized.
> How about "Cannot log the query plan while holding page-level lock.",
> instead?

Agreed.

> Thanks for updating the patch! Here are some review comments.
> 
> +      <row>
> +       <entry role="func_table_entry"><para role="func_signature">
> +        <indexterm>
> +         <primary>pg_log_query_plan</primary>
> +        </indexterm>
> 
> This entry is placed before one for pg_log_backend_memory_contexts().
> But it should be *after* that since those entries seem to be placed in
> alphabetical order in the table?

Modified it.

> +        Requests to log the plan of the query currently running on the
> +        backend with specified process ID along with the untruncated
> +        query string.
> 
> Other descriptions about logging of query string seem not to mention
> something like "untruncated query string". For example, auto_explain,
> log_statement, etc. Why do we need to mention "along with the
> untruncated query string" specially for pg_log_query_plan()?

Modified it as below:

              Requests to log the plan of the query currently running on 
the
     -        backend with specified process ID along with the 
untruncated
     -        query string.
     -        They will be logged at <literal>LOG</literal> message level 
and
     +        backend with specified process ID.
     +        It will be logged at <literal>LOG</literal> message level 
and

> +    Note that nested statements (statements executed inside a 
> function) are not
> +    considered for logging. Only the plan of the most deeply nested
> query is logged.
> 
> Now the plan of even nested statement can be logged. So this
> description needs to be updated?

Modified it as below:

     -    Note that nested statements (statements executed inside a 
function) are not
     -    considered for logging. Only the plan of the most deeply nested 
query is logged.
     +    Note that when the statements are executed inside a function, 
only the
     +    plan of the most deeply nested query is logged.

> @@ -440,6 +450,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
>       MemoryContextSwitchTo(oldcontext);
>  +    ActiveQueryDesc = NULL;
> 
> ActiveQueryDesc seems unnecessary. Why does ActiveQueryDesc need to be
> reset to NULL in standard_ExecutorFinish()?

ActiveQueryDesc should not be reset in standard_ExecutorFinish().
Removed it.

> Currently even during ProcessLogQueryPlanInterrupt(),
> pg_log_query_plan() can be call and another
> ProcessLogQueryPlanInterrupt() can be executed. So repeatable
> re-entrances to ProcessLogQueryPlanInterrupt() could cause "stack
> depth limit exceeded" error. To avoid this, shouldn't we make
> ProcessLogQueryPlanInterrupt() do nothing and return immediately, if
> it's called during another ProcessLogQueryPlanInterrupt()?
> 
> pg_log_backend_memory_contexts() also might have the same issue.

As you pointed out offlist, the issue could occur even when both 
pg_log_backend_memory_contexts and pg_log_query_plan are called and it 
may be better to make another patch.


You also pointed out offlist that it would be necessary to call 
LockErrorCleanup() if ProcessLogQueryPlanInterrupt() can incur ERROR.
AFAICS it can call ereport(ERROR), i.e., palloc0() in NewExplainState(), 
so I added PG_TRY/CATCH and make it call LockErrorCleanup() when ERROR 
occurs.



On 2022-02-01 17:27, Kyotaro Horiguchi wrote:

Thanks for reviewing Horiguchi-san!

> By the way, I'm anxious about the following part and I'd like to
> remove it.

I also think it would be nice if it's possible.
> 
> +        * Ensure no page lock is held on this process.
> 
> It seems to me what is wrong is ginInsertCleanup(), not this feature.

> Actually, the discussion is a bit dubious. What we need really to
check is wheter such locks are not held on an index *elsewhere*.

Since I'm not sure how long it will take to discuss this point, the 
attached patch is based on the current HEAD at this time.
I also think it may be better to discuss it on another thread.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v19-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

Robert Treat

Date:

03 February 2022, 08:09:07

On Wed, Feb 2, 2022 at 7:59 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> 2022-02-01 01:51, Fujii Masao wrote:
<snip>
> > +    Note that nested statements (statements executed inside a
> > function) are not
> > +    considered for logging. Only the plan of the most deeply nested
> > query is logged.
> >
> > Now the plan of even nested statement can be logged. So this
> > description needs to be updated?
>
> Modified it as below:
>
>      -    Note that nested statements (statements executed inside a
> function) are not
>      -    considered for logging. Only the plan of the most deeply nested
> query is logged.
>      +    Note that when the statements are executed inside a function,
> only the
>      +    plan of the most deeply nested query is logged.
>

Minor nit, but I think the "the" is superfluous.. ie.

"Note that when statements are executed inside a function,
only the plan of the most deeply nested query is logged"

<snip>
> On 2022-02-01 17:27, Kyotaro Horiguchi wrote:
>
> Thanks for reviewing Horiguchi-san!
>
> > By the way, I'm anxious about the following part and I'd like to
> > remove it.
>
> I also think it would be nice if it's possible.
> >
> > +        * Ensure no page lock is held on this process.
> >
> > It seems to me what is wrong is ginInsertCleanup(), not this feature.
>
> > Actually, the discussion is a bit dubious. What we need really to
> check is wheter such locks are not held on an index *elsewhere*.
>
> Since I'm not sure how long it will take to discuss this point, the
> attached patch is based on the current HEAD at this time.
> I also think it may be better to discuss it on another thread.
>

While I agree on the above points, IMHO I don't believe it should be a
show-stopper for adding this functionality to v15, but we have a few
more commitments before we get to that point.


Robert Treat
https://xzilla.net

Re: RFC: Logging plan of the running query

From

Fujii Masao

Date:

07 February 2022, 16:13:44

On 2022/02/02 21:59, torikoshia wrote:
>> This may cause users to misunderstand that pg_log_query_plan() fails
>> while the target backend is holding *any* locks? Isn't it better to
>> mention "page-level locks", instead? So how about the following?
>>
>> --------------------------
>> Note that the request to log the query plan will be ignored if it's
>> received during a short period while the target backend is holding a
>> page-level lock.
>> --------------------------
> 
> Agreed.

On second thought, this note is confusing rather than helpful? Because the users don't know when and what operation
needspage-level lock. So now I'm thinking it's better to remove this note.

> As you pointed out offlist, the issue could occur even when both pg_log_backend_memory_contexts and pg_log_query_plan
arecalled and it may be better to make another patch.

OK.

> You also pointed out offlist that it would be necessary to call LockErrorCleanup() if ProcessLogQueryPlanInterrupt()
canincur ERROR.

> AFAICS it can call ereport(ERROR), i.e., palloc0() in NewExplainState(), so I added PG_TRY/CATCH and make it call
LockErrorCleanup()when ERROR occurs.

As we discussed off-list, this treatment might not be necessary. Because when ERROR or FATAL error happens,
AbortTransaction()is called and LockErrorCleanup() is also called inside there.

> Since I'm not sure how long it will take to discuss this point, the attached patch is based on the current HEAD at
thistime.

Thanks for updating the patch!

@@ -5048,6 +5055,12 @@ AbortSubTransaction(void)
       */
      PG_SETMASK(&UnBlockSig);

+    /*
+     * When ActiveQueryDesc is referenced after abort, some of its elements
+     * are freed. To avoid accessing them, reset ActiveQueryDesc here.
+     */
+    ActiveQueryDesc = NULL;

AbortSubTransaction() should reset ActiveQueryDesc to save_ActiveQueryDesc that ExecutorRun() set, instead of NULL?
OtherwiseActiveQueryDesc of top-level statement will be unavailable after subtransaction is aborted in the nested
statements.

For example, no plan is logged while the following "select pg_sleep(test())" is running, because the exception inside
test()function aborts the subtransaction and resets ActiveQueryDesc to NULL.

create or replace function test () returns int as $$
begin
     perform 1/0;
exception when others then
     return 30;
end;
$$ language plpgsql;

select pg_sleep(test());

-CREATE ROLE regress_log_memory;
+CREATE ROLE regress_log;

Isn't this name too generic? How about regress_log_function, for example?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

Kyotaro Horiguchi

Date:

08 February 2022, 08:18:52

At Tue, 8 Feb 2022 01:13:44 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2022/02/02 21:59, torikoshia wrote:
> >> This may cause users to misunderstand that pg_log_query_plan() fails
> >> while the target backend is holding *any* locks? Isn't it better to
> >> mention "page-level locks", instead? So how about the following?
> >>
> >> --------------------------
> >> Note that the request to log the query plan will be ignored if it's
> >> received during a short period while the target backend is holding a
> >> page-level lock.
> >> --------------------------
> > Agreed.
> 
> On second thought, this note is confusing rather than helpful? Because
> the users don't know when and what operation needs page-level lock. So
> now I'm thinking it's better to remove this note.

*I* agree to removing the note. And the following error message looks
as mysterious as the note is, and the DETAIL doesn't help..

            ereport(LOG_SERVER_ONLY,
+                errmsg("could not log the query plan"),
+                errdetail("Cannot log the query plan while holding page-level lock."));
+            hash_seq_term(&status);

We should tell the command can be retried soon, like this?

"LOG:  ignored request for logging query plan due to lock confilcts"
"HINT:  You can try again in a moment."

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

08 February 2022, 15:12:50

On 2022-02-03 17:09, Robert Treat wrote:
> On Wed, Feb 2, 2022 at 7:59 AM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> 2022-02-01 01:51, Fujii Masao wrote:
> <snip>
>> > +    Note that nested statements (statements executed inside a
>> > function) are not
>> > +    considered for logging. Only the plan of the most deeply nested
>> > query is logged.
>> >
>> > Now the plan of even nested statement can be logged. So this
>> > description needs to be updated?
>> 
>> Modified it as below:
>> 
>>      -    Note that nested statements (statements executed inside a
>> function) are not
>>      -    considered for logging. Only the plan of the most deeply 
>> nested
>> query is logged.
>>      +    Note that when the statements are executed inside a 
>> function,
>> only the
>>      +    plan of the most deeply nested query is logged.
>> 
> 
> Minor nit, but I think the "the" is superfluous.. ie.
> 
> "Note that when statements are executed inside a function,
> only the plan of the most deeply nested query is logged"

Thanks! Modified it.


On 2022-02-08 01:13, Fujii Masao wrote:
Thanks for the comments!

> On 2022/02/02 21:59, torikoshia wrote:
>>> This may cause users to misunderstand that pg_log_query_plan() fails
>>> while the target backend is holding *any* locks? Isn't it better to
>>> mention "page-level locks", instead? So how about the following?
>>> 
>>> --------------------------
>>> Note that the request to log the query plan will be ignored if it's
>>> received during a short period while the target backend is holding a
>>> page-level lock.
>>> --------------------------
>> 
>> Agreed.
> 
> On second thought, this note is confusing rather than helpful? Because
> the users don't know when and what operation needs page-level lock. So
> now I'm thinking it's better to remove this note.

Removed it.
> 
> 
>> As you pointed out offlist, the issue could occur even when both 
>> pg_log_backend_memory_contexts and pg_log_query_plan are called and it 
>> may be better to make another patch.
> 
> OK.
> 
> 
>> You also pointed out offlist that it would be necessary to call 
>> LockErrorCleanup() if ProcessLogQueryPlanInterrupt() can incur ERROR.
>> AFAICS it can call ereport(ERROR), i.e., palloc0() in 
>> NewExplainState(), so I added PG_TRY/CATCH and make it call 
>> LockErrorCleanup() when ERROR occurs.
> 
> As we discussed off-list, this treatment might not be necessary.
> Because when ERROR or FATAL error happens, AbortTransaction() is
> called and LockErrorCleanup() is also called inside there.

Agreed.

>> Since I'm not sure how long it will take to discuss this point, the 
>> attached patch is based on the current HEAD at this time.
> 
> Thanks for updating the patch!
> 
> @@ -5048,6 +5055,12 @@ AbortSubTransaction(void)
>       */
>      PG_SETMASK(&UnBlockSig);
>  +    /*
> +     * When ActiveQueryDesc is referenced after abort, some of its 
> elements
> +     * are freed. To avoid accessing them, reset ActiveQueryDesc here.
> +     */
> +    ActiveQueryDesc = NULL;
> 
> AbortSubTransaction() should reset ActiveQueryDesc to
> save_ActiveQueryDesc that ExecutorRun() set, instead of NULL?
> Otherwise ActiveQueryDesc of top-level statement will be unavailable
> after subtransaction is aborted in the nested statements.
> 
> For example, no plan is logged while the following "select
> pg_sleep(test())" is running, because the exception inside test()
> function aborts the subtransaction and resets ActiveQueryDesc to NULL.
> 
> create or replace function test () returns int as $$
> begin
>      perform 1/0;
> exception when others then
>      return 30;
> end;
> $$ language plpgsql;
> 
> select pg_sleep(test());

Agreed.

BTW, since the above example results in calling ExecutorRun() only once, 
the output didn't differ even after ActiveQueryDesc is reset to 
save_ActiveQueryDesc.

The below definition of test() worked as expected.

  create or replace function test () returns int as $$
  begin
      perform 1;
      perform 1/0;
  exception when others then
      return 30;
  end;
  $$ language plpgsql;

> 
> 
> -CREATE ROLE regress_log_memory;
> +CREATE ROLE regress_log;
> 
> Isn't this name too generic? How about regress_log_function, for 
> example?

Agreed.


On 2022-02-08 17:18, Kyotaro Horiguchi wrote:
> At Tue, 8 Feb 2022 01:13:44 +0900, Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote in
>> 
>> 
>> On 2022/02/02 21:59, torikoshia wrote:
>> >> This may cause users to misunderstand that pg_log_query_plan() fails
>> >> while the target backend is holding *any* locks? Isn't it better to
>> >> mention "page-level locks", instead? So how about the following?
>> >>
>> >> --------------------------
>> >> Note that the request to log the query plan will be ignored if it's
>> >> received during a short period while the target backend is holding a
>> >> page-level lock.
>> >> --------------------------
>> > Agreed.
>> 
>> On second thought, this note is confusing rather than helpful? Because
>> the users don't know when and what operation needs page-level lock. So
>> now I'm thinking it's better to remove this note.
> 
> *I* agree to removing the note. And the following error message looks
> as mysterious as the note is, and the DETAIL doesn't help..
> 
>             ereport(LOG_SERVER_ONLY,
> +                errmsg("could not log the query plan"),
> +                errdetail("Cannot log the query plan while holding page-level 
> lock."));
> +            hash_seq_term(&status);
> 
> We should tell the command can be retried soon, like this?
> 
> "LOG:  ignored request for logging query plan due to lock confilcts"
> "HINT:  You can try again in a moment."

Thanks, it seems better.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v20-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

Fujii Masao

Date:

21 February 2022, 17:39:16


On 2022/02/09 0:12, torikoshia wrote:
> BTW, since the above example results in calling ExecutorRun() only once, the output didn't differ even after
ActiveQueryDescis reset to save_ActiveQueryDesc.
 
> 
> The below definition of test() worked as expected.
> 
>   create or replace function test () returns int as $$
>   begin
>       perform 1;
>       perform 1/0;
>   exception when others then
>       return 30;
>   end;
>   $$ language plpgsql;
So in this case ActiveQueryDesc set by ExecutorRun() called only once is still valid even when AbortSubTransaction() is
called.That is, that ActiveQueryDesc should NOT be reset to save_ActiveQueryDesc in this case, should it?
 

OTOH, in your example, ActiveQueryDesc set by the second call to ExecutorRun() should be reset in
AbortSubTransaction().Then ActiveQueryDesc set by the first call should be valid.
 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

09 March 2022, 10:04:49

On 2022-02-08 01:13, Fujii Masao wrote:
> AbortSubTransaction() should reset ActiveQueryDesc to
> save_ActiveQueryDesc that ExecutorRun() set, instead of NULL?
> Otherwise ActiveQueryDesc of top-level statement will be unavailable
> after subtransaction is aborted in the nested statements.

I once agreed above suggestion and made v20 patch making 
save_ActiveQueryDesc a global variable, but it caused segfault when 
calling pg_log_query_plan() after FreeQueryDesc().

OTOH, doing some kind of reset of ActiveQueryDesc seems necessary since 
it also caused segfault when running pg_log_query_plan() during 
installcheck.

There may be a better way, but resetting ActiveQueryDesc to NULL seems 
safe and simple.
Of course it makes pg_log_query_plan() useless after a subtransaction is 
aborted.
However, if it does not often happen that people want to know the 
running query's plan whose subtransaction is aborted, resetting 
ActiveQueryDesc to NULL would be acceptable.

Attached is a patch that sets ActiveQueryDesc to NULL when a 
subtransaction is aborted.

How do you think?

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v21-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

16 May 2022, 08:02:40

On 2022-03-09 19:04, torikoshia wrote:
> On 2022-02-08 01:13, Fujii Masao wrote:
>> AbortSubTransaction() should reset ActiveQueryDesc to
>> save_ActiveQueryDesc that ExecutorRun() set, instead of NULL?
>> Otherwise ActiveQueryDesc of top-level statement will be unavailable
>> after subtransaction is aborted in the nested statements.
> 
> I once agreed above suggestion and made v20 patch making
> save_ActiveQueryDesc a global variable, but it caused segfault when
> calling pg_log_query_plan() after FreeQueryDesc().
> 
> OTOH, doing some kind of reset of ActiveQueryDesc seems necessary
> since it also caused segfault when running pg_log_query_plan() during
> installcheck.
> 
> There may be a better way, but resetting ActiveQueryDesc to NULL seems
> safe and simple.
> Of course it makes pg_log_query_plan() useless after a subtransaction
> is aborted.
> However, if it does not often happen that people want to know the
> running query's plan whose subtransaction is aborted, resetting
> ActiveQueryDesc to NULL would be acceptable.
> 
> Attached is a patch that sets ActiveQueryDesc to NULL when a
> subtransaction is aborted.
> 
> How do you think?

Attached new patch to fix patch apply failures.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v22-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

08 September 2022, 10:19:09

On 2022-05-16 17:02, torikoshia wrote:
> On 2022-03-09 19:04, torikoshia wrote:
>> On 2022-02-08 01:13, Fujii Masao wrote:
>>> AbortSubTransaction() should reset ActiveQueryDesc to
>>> save_ActiveQueryDesc that ExecutorRun() set, instead of NULL?
>>> Otherwise ActiveQueryDesc of top-level statement will be unavailable
>>> after subtransaction is aborted in the nested statements.
>> 
>> I once agreed above suggestion and made v20 patch making
>> save_ActiveQueryDesc a global variable, but it caused segfault when
>> calling pg_log_query_plan() after FreeQueryDesc().
>> 
>> OTOH, doing some kind of reset of ActiveQueryDesc seems necessary
>> since it also caused segfault when running pg_log_query_plan() during
>> installcheck.
>> 
>> There may be a better way, but resetting ActiveQueryDesc to NULL seems
>> safe and simple.
>> Of course it makes pg_log_query_plan() useless after a subtransaction
>> is aborted.
>> However, if it does not often happen that people want to know the
>> running query's plan whose subtransaction is aborted, resetting
>> ActiveQueryDesc to NULL would be acceptable.
>> 
>> Attached is a patch that sets ActiveQueryDesc to NULL when a
>> subtransaction is aborted.
>> 
>> How do you think?
Attached new patch to fix patch apply failures again.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v23-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

20 September 2022, 14:41:57

On 2022-09-19 17:47, Алена Рыбакина wrote:
Thanks for your review and comments!

> Hi,
> 
> I'm sorry,if this message is duplicated previous this one, but I'm not
> sure that the previous message is sent correctly. I sent it from email
> address a.rybakina@postgrespro.ru and I couldn't send this one email
> from those address.

I've successfully received your mail from both a.rybakina@postgrespro.ru 
and lena.ribackina@yandex.ru.

> I like idea to create patch for logging query plan. After reviewing
> this code and notice some moments and I'd rather ask you some
> questions.
> 
> Firstly, I suggest some editing in the comment of commit. I think, it 
> is
> turned out the more laconic and the same clear. I wrote it below since 
> I
> can't think of any other way to add it.

> ```
> Currently, we have to wait for finishing of the query execution to 
> check
> its plan.
> This is not so convenient in investigation long-running queries on
> production
> environments where we cannot use debuggers.

> To improve this situation there is proposed the patch containing the
> pg_log_query_plan()
> function which request to log plan of the specified backend process.

> By default, only superusers are allowed to request log of the plan
> otherwise
> allowing any users to issue this request could create cause lots of log
> messages
> and it can lead to denial of service.
> 
> At the next requesting CHECK_FOR_INTERRUPTS(), the target backend logs
> its plan at
> LOG_SERVER_ONLY level and therefore this plan will appear in the server
> log only,
> not to be sent to the client.
Thanks, I have incorporated your comments.
Since the latter part of the original message comes from the commit 
message of pg_log_backend_memory_contexts(43620e328617c), so I left it 
as it was for consistency.


> Secondly, I have question about deleting USE_ASSERT_CHECKING in lock.h?
> It supposed to have been checked in another placed of the code by
> matching values. I am worry about skipping errors due to untesting with
> assert option in the places where it (GetLockMethodLocalHash)
> participates and we won't able to get core file in segfault cases. I
> might not understand something, then can you please explain to me?
Since GetLockMethodLocalHash() is only used for assertions, this is only 
defined when USE_ASSERT_CHECKING is enabled.
This patch uses GetLockMethodLocalHash() not only for the assertion 
purpose, so I removed "ifdef USE_ASSERT_CHECKING" for this function.
I belive it does not lead to skip errors.

> Thirdly, I have incomprehension of the point why save_ActiveQueryDesc 
> is
> declared in the pquery.h? I am seemed to save_ActiveQueryDesc to be 
> used
> in an once time in the ExecutorRun function and  its declaration
> superfluous. I added it in the attached patch.
Exactly.

> Fourthly, it seems to me there are not enough explanatory comments in
> the code. I also added them in the attached patch.
Thanks!

| +   /*
| +    * Save value of ActiveQueryDesc before having called
| +    * ExecutorRun_hook function due to having reset by
| +    * AbortSubTransaction.
| +    */
| +
|     save_ActiveQueryDesc = ActiveQueryDesc;
|     ActiveQueryDesc = queryDesc;
|
| @@ -314,6 +320,7 @@ ExecutorRun(QueryDesc *queryDesc,
|     else
|         standard_ExecutorRun(queryDesc, direction, count, 
execute_once);
|
| +   /* We set the actual value of ActiveQueryDesc */
|     ActiveQueryDesc = save_ActiveQueryDesc;

Since these processes are needed for nested queries, not only for
AbortSubTransaction[1], added comments on it.

| +/* Function to handle the signal to output the query plan. */
|  extern void HandleLogQueryPlanInterrupt(void);

I feel this comment is unnecessary since the explanation of 
HandleLogQueryPlanInterrupt() is written in explain.c and no functions 
in explain.h have comments in it.

> Lastly, I have incomprehension about handling signals since have been
> unused it before. Could another signal disabled calling this signal to
> log query plan? I noticed this signal to be checked the latest in
> procsignal_sigusr1_handler function.
Are you concerned that one signal will not be processed when multiple 
signals are sent in succession?
AFAIU both of them are processed since SendProcSignal flags 
ps_signalFlags for each signal.

```
  SendProcSignal(pid_t pid, ProcSignalReason reason, BackendId backendId)
  {
      volatile ProcSignalSlot *slot;
...(snip)...
278         if (slot->pss_pid == pid)
279         {
280             /* Atomically set the proper flag */
281             slot->pss_signalFlags[reason] = true;
282             /* Send signal */
283             return kill(pid, SIGUSR1);
```

Comments of ProcSignalReason also say 'We can cope with concurrent 
signals for different reasons'.

```C
  /*
   * Reasons for signaling a Postgres child process (a backend or an 
auxiliary
   * process, like checkpointer).  We can cope with concurrent signals 
for different
   * reasons.  However, if the same reason is signaled multiple times in 
quick
   * succession, the process is likely to observe only one notification 
of it.
   * This is okay for the present uses.
   ...
  typedef enum
  {
      PROCSIG_CATCHUP_INTERRUPT,  /* sinval catchup interrupt */
      PROCSIG_NOTIFY_INTERRUPT,   /* listen/notify interrupt */
      PROCSIG_PARALLEL_MESSAGE,   /* message from cooperating parallel 
backend */
      PROCSIG_WALSND_INIT_STOPPING,   /* ask walsenders to prepare for 
shutdown  */
      PROCSIG_BARRIER,            /* global barrier interrupt  */
      PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory 
contexts */
      PROCSIG_LOG_QUERY_PLAN, /* ask backend to log plan of the current 
query */
...
  } ProcSignalReason;
```


[1] 
https://www.postgresql.org/message-id/8b53b32f-26cc-0531-4ac0-27310e0bef4b%40oss.nttdata.com

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v24-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

Alena Rybakina

Date:

21 September 2022, 08:30:25

Ok, I get it.

Since GetLockMethodLocalHash() is only used for assertions, this is only defined when USE_ASSERT_CHECKING is enabled. This patch uses GetLockMethodLocalHash() not only for the assertion purpose, so I removed "ifdef USE_ASSERT_CHECKING" for this function. I belive it does not lead to skip errors.

Agree.

Since these processes are needed for nested queries, not only for AbortSubTransaction[1], added comments on it.

I also noticed it. However I also discovered that above function declarations to be aplied for explain command and used to be printed details of the executed query.

We have a similar task to print the plan of an interrupted process making a request for a specific pid.

In short, I think, this task is different and for separating these parts I added this comment.

I feel this comment is unnecessary since the explanation of HandleLogQueryPlanInterrupt() is written in explain.c and no functions in explain.h have comments in it.

Yes, I was worried about it. I understood it, thank for explaining.

AFAIU both of them are processed since SendProcSignal flags ps_signalFlags for each signal.

```
SendProcSignal(pid_t pid, ProcSignalReason reason, BackendId backendId)
{
volatile ProcSignalSlot *slot;
...(snip)...
278 if (slot->pss_pid == pid)
279 {
280 /* Atomically set the proper flag */
281 slot->pss_signalFlags[reason] = true;
282 /* Send signal */
283 return kill(pid, SIGUSR1);
```

Comments of ProcSignalReason also say 'We can cope with concurrent signals for different reasons'.

```C
/*
* Reasons for signaling a Postgres child process (a backend or an auxiliary
* process, like checkpointer). We can cope with concurrent signals for different
* reasons. However, if the same reason is signaled multiple times in quick
* succession, the process is likely to observe only one notification of it.
* This is okay for the present uses.
...
typedef enum
{
PROCSIG_CATCHUP_INTERRUPT, /* sinval catchup interrupt */
PROCSIG_NOTIFY_INTERRUPT, /* listen/notify interrupt */
PROCSIG_PARALLEL_MESSAGE, /* message from cooperating parallel backend */
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_LOG_QUERY_PLAN, /* ask backend to log plan of the current query */
...
} ProcSignalReason;
```

[1] https://www.postgresql.org/message-id/8b53b32f-26cc-0531-4ac0-27310e0bef4b%40oss.nttdata.com

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

22 September 2022, 14:12:16

On 2022-09-21 17:30, Alena Rybakina wrote:

Thanks for your reply!

> I also noticed it. However I also discovered that above function
> declarations to be aplied for explain command and used to be printed
> details of the executed query.
> 
> We have a similar task to print the plan of an interrupted process
> making a request for a specific pid.
> 
> In short, I think, this task is different and for separating these
> parts I added this comment.

I'm not sure I understand your comment correctly, do you mean 
HandleLogQueryPlanInterrupt() should not be placed in explain.c?

It may be so.
However, given that ProcesLogMemoryContextInterrupt(), which similarly 
handles interrupts for pg_log_backend_memory_contexts(), is located in 
mcxt.c, I also think current location might be acceptable.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

Alena Rybakina

Date:

23 September 2022, 09:43:19

Sorry, I wrote confusingly at that time.

No, I suggested adding comment about the explanation of 
HandleLogQueryPlanInterrupt() only in the explain.h and not removing 
from the explain.c.

I seemed to be necessary separating declaration function for 'explaining 
feature' of executed query from our logging plan of the running query 
interrupts function. But now, I doubt it.

> I'm not sure I understand your comment correctly, do you mean 
> HandleLogQueryPlanInterrupt() should not be placed in explain.c? 


Thank you for having reminded about this function and I looked at 
ProcessLogMemoryContextInterrupt() declaration. I'm noticed comments in 
the memutils.h are missed tooю

Description of this function is written only in mcxt.c.
> However, given that ProcesLogMemoryContextInterrupt(), which similarly 
> handles interrupts for pg_log_backend_memory_contexts(), is located in 
> mcxt.c, I also think current location might be acceptable. 


So I think you are right and the comment about the explanation of 
HandleLogQueryPlanInterrupt() written in explain.h is redundant.

> I feel this comment is unnecessary since the explanation of 
> HandleLogQueryPlanInterrupt() is written in explain.c and no functions 
> in explain.h have comments in it. 

Regards,

-- 
Alena Rybakina
Postgres Professional

Re: RFC: Logging plan of the running query

From

Andres Freund

Date:

06 December 2022, 18:41:01

Hi,

This patch does not currently build, due to a conflicting oid:

https://cirrus-ci.com/task/4638460594618368?logs=build#L108
[17:26:44.602] /usr/bin/perl ../src/include/catalog/../../backend/catalog/genbki.pl --include-path=../src/include
--set-version=16--output=src/include/catalog ../src/include/catalog/pg_proc.h ../src/include/catalog/pg_type.h
../src/include/catalog/pg_attribute.h../src/include/catalog/pg_class.h ../src/include/catalog/pg_attrdef.h
../src/include/catalog/pg_constraint.h../src/include/catalog/pg_inherits.h ../src/include/catalog/pg_index.h
../src/include/catalog/pg_operator.h../src/include/catalog/pg_opfamily.h ../src/include/catalog/pg_opclass.h
../src/include/catalog/pg_am.h../src/include/catalog/pg_amop.h ../src/include/catalog/pg_amproc.h
../src/include/catalog/pg_language.h../src/include/catalog/pg_largeobject_metadata.h
../src/include/catalog/pg_largeobject.h../src/include/catalog/pg_aggregate.h ../src/include/catalog/pg_statistic.h
../src/include/catalog/pg_statistic_ext.h../src/include/catalog/pg_statistic_ext_data.h
../src/include/catalog/pg_rewrite.h../src/include/catalog/pg_trigger.h ../src/include/catalog/pg_event_trigger.h
../src/include/catalog/pg_description.h../src/include/catalog/pg_cast.h ../src/include/catalog/pg_enum.h
../src/include/catalog/pg_namespace.h../src/include/catalog/pg_conversion.h ../src/include/catalog/pg_depend.h
../src/include/catalog/pg_database.h../src/include/catalog/pg_db_role_setting.h ../src/include/catalog/pg_tablespace.h
../src/include/catalog/pg_authid.h../src/include/catalog/pg_auth_members.h ../src/include/catalog/pg_shdepend.h
../src/include/catalog/pg_shdescription.h../src/include/catalog/pg_ts_config.h
../src/include/catalog/pg_ts_config_map.h../src/include/catalog/pg_ts_dict.h ../src/include/catalog/pg_ts_parser.h
../src/include/catalog/pg_ts_template.h../src/include/catalog/pg_extension.h
../src/include/catalog/pg_foreign_data_wrapper.h../src/include/catalog/pg_foreign_server.h
../src/include/catalog/pg_user_mapping.h../src/include/catalog/pg_foreign_table.h ../src/include/catalog/pg_policy.h
../src/include/catalog/pg_replication_origin.h../src/include/catalog/pg_default_acl.h
../src/include/catalog/pg_init_privs.h../src/include/catalog/pg_seclabel.h ../src/include/catalog/pg_shseclabel.h
../src/include/catalog/pg_collation.h../src/include/catalog/pg_parameter_acl.h
../src/include/catalog/pg_partitioned_table.h../src/include/catalog/pg_range.h ../src/include/catalog/pg_transform.h
../src/include/catalog/pg_sequence.h../src/include/catalog/pg_publication.h
../src/include/catalog/pg_publication_namespace.h../src/include/catalog/pg_publication_rel.h
../src/include/catalog/pg_subscription.h../src/include/catalog/pg_subscription_rel.h 
[17:26:44.602] Duplicate OIDs detected:
[17:26:44.602] 4550
[17:26:44.602] found 1 duplicate OID(s) in catalog data

I suggest you choose a random oid out of the "development purposes" range:

 *        OIDs 1-9999 are reserved for manual assignment (see .dat files in
 *        src/include/catalog/).  Of these, 8000-9999 are reserved for
 *        development purposes (such as in-progress patches and forks);
 *        they should not appear in released versions.

Greetings,

Andres Freund

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

08 December 2022, 05:10:32

On 2022-12-07 03:41, Andres Freund wrote:
> Hi,
> 
> This patch does not currently build, due to a conflicting oid:
> 
> I suggest you choose a random oid out of the "development purposes" 
> range:

Thanks for your advice!
Attached updated patch.

BTW, since this patch depends on ProcessInterrupts() and EXPLAIN codes 
which is used in auto_explain, I'm feeling that the following discussion 
also applies to this patch.

> -- 
> https://www.postgresql.org/message-id/CA%2BTgmoYW_rSOW4JMQ9_0Df9PKQ%3DsQDOKUGA4Gc9D8w4wui8fSA%40mail.gmail.com
> 
> explaining a query is a pretty
> complicated operation that involves catalog lookups and lots of
> complicated stuff, and I don't think that it would be safe to do all
> of that at any arbitrary point in the code where ProcessInterrupts()
> happened to fire.

If I can't come up with some workaround during the next Commitfest, I'm 
going to withdraw this proposal.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v25-0001-log-running-query-plan.patch

Re: RFC: Logging plan of the running query

From

James Coleman

Date:

02 June 2023, 17:51:22

Hello,

Thanks for working on this patch!

On Thu, Dec 8, 2022 at 12:10 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> BTW, since this patch depends on ProcessInterrupts() and EXPLAIN codes
> which is used in auto_explain, I'm feeling that the following discussion
> also applies to this patch.
>
> > --
> > https://www.postgresql.org/message-id/CA%2BTgmoYW_rSOW4JMQ9_0Df9PKQ%3DsQDOKUGA4Gc9D8w4wui8fSA%40mail.gmail.com
> >
> > explaining a query is a pretty
> > complicated operation that involves catalog lookups and lots of
> > complicated stuff, and I don't think that it would be safe to do all
> > of that at any arbitrary point in the code where ProcessInterrupts()
> > happened to fire.
>
> If I can't come up with some workaround during the next Commitfest, I'm
> going to withdraw this proposal.

While at PGCon this week I'd brought up this idea with a few people
without realizing you'd already worked on it previously, and then
after I discovered this thread several of us (Greg, Ronan, David,
Heikki, and myself -- all cc'd) discussed the safety concerns over
dinner last night.

Our conclusion was that all of the concerns we could come up with (for
example, walking though the code for ExplainTargetRel() and discussing
the relevant catalog and syscache accesses) all applied specifically
to Robert's concerns about running explain inside an aborted
transaction. After all, I'd originally started that thread to ask
about running auto-explain after a query timeout.

To put it positively: we believe that, for example, catalog accesses
inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside
an existing valid transaction/query state, as it would be for this
patch -- are safe. If there were problems, then those problems are
likely bugs we already have in other CFI cases.

Another concern Robert raised may apply here: what if a person tries
to cancel a query when we're explaining? I believe that's a reasonable
question to ask, but I believe it's a trade-off that's worth making
for the (significant) introspection benefits this patch would provide.

On to the patch itself: overall I think it looks like it's in pretty
good shape. I also noticed we don't have any tests (I assume it'd have
to be TAP tests) of the actual output happening, and I think it would
be worth adding that.

Are you interested in re-opening this patch? I'd be happy to provide
further review and help to try to push this along.

I've rebased the patch and attached as v26.

Thanks,
James Coleman

Attachment

v26-0001-Add-function-to-log-the-plan-of-the-query.patch

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

05 June 2023, 08:30:18

On 2023-06-03 02:51, James Coleman wrote:
> Hello,
> 
> Thanks for working on this patch!
> 
> On Thu, Dec 8, 2022 at 12:10 AM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> BTW, since this patch depends on ProcessInterrupts() and EXPLAIN codes
>> which is used in auto_explain, I'm feeling that the following 
>> discussion
>> also applies to this patch.
>> 
>> > --
>> > https://www.postgresql.org/message-id/CA%2BTgmoYW_rSOW4JMQ9_0Df9PKQ%3DsQDOKUGA4Gc9D8w4wui8fSA%40mail.gmail.com
>> >
>> > explaining a query is a pretty
>> > complicated operation that involves catalog lookups and lots of
>> > complicated stuff, and I don't think that it would be safe to do all
>> > of that at any arbitrary point in the code where ProcessInterrupts()
>> > happened to fire.
>> 
>> If I can't come up with some workaround during the next Commitfest, 
>> I'm
>> going to withdraw this proposal.
> 
> While at PGCon this week I'd brought up this idea with a few people
> without realizing you'd already worked on it previously, and then
> after I discovered this thread several of us (Greg, Ronan, David,
> Heikki, and myself -- all cc'd) discussed the safety concerns over
> dinner last night.
> 
> Our conclusion was that all of the concerns we could come up with (for
> example, walking though the code for ExplainTargetRel() and discussing
> the relevant catalog and syscache accesses) all applied specifically
> to Robert's concerns about running explain inside an aborted
> transaction. After all, I'd originally started that thread to ask
> about running auto-explain after a query timeout.
> 
> To put it positively: we believe that, for example, catalog accesses
> inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside
> an existing valid transaction/query state, as it would be for this
> patch -- are safe. If there were problems, then those problems are
> likely bugs we already have in other CFI cases.

Thanks a lot for the discussion and sharing it!
I really appreciate it.

BTW I'm not sure whether all the CFI are called in valid transaction,
do you think we should check each of them?

> Another concern Robert raised may apply here: what if a person tries
> to cancel a query when we're explaining? I believe that's a reasonable
> question to ask, but I believe it's a trade-off that's worth making
> for the (significant) introspection benefits this patch would provide.

Is the concern here limited to the case where explain code goes crazy
as Robert pointed out?
If so, this may be a trade-off worth doing.
I am a little concerned about whether there will be cases where the
explain code is not problematic but just takes long time.

> On to the patch itself: overall I think it looks like it's in pretty
> good shape. I also noticed we don't have any tests (I assume it'd have
> to be TAP tests) of the actual output happening, and I think it would
> be worth adding that.

Checking the log output may be better, but I didn't add it since there
is only a little content that can be checked, and similar function
pg_log_backend_memory_contexts() does not do such type of tests.

> Are you interested in re-opening this patch? I'd be happy to provide
> further review and help to try to push this along.
Sure, I'm going to re-open this.

> I've rebased the patch and attached as v26.
Thanks again for your work!

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

James Coleman

Date:

05 June 2023, 18:26:25

On Mon, Jun 5, 2023 at 4:30 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2023-06-03 02:51, James Coleman wrote:
> > Hello,
> >
> > Thanks for working on this patch!

Sure thing! I'm *very interested* in seeing this available, and I
think it paves the way for some additional features later on...

> > On Thu, Dec 8, 2022 at 12:10 AM torikoshia <torikoshia@oss.nttdata.com>
> ...
> > To put it positively: we believe that, for example, catalog accesses
> > inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside
> > an existing valid transaction/query state, as it would be for this
> > patch -- are safe. If there were problems, then those problems are
> > likely bugs we already have in other CFI cases.
>
> Thanks a lot for the discussion and sharing it!
> I really appreciate it.
>
> BTW I'm not sure whether all the CFI are called in valid transaction,
> do you think we should check each of them?

I kicked off the regressions tests with a call to
ProcessLogQueryPlanInterrupt() in every single CHECK_FOR_INTERRUPTS()
call. Several hours and 52 GB of logs later I have confirmed that
(with the attached revision) at the very least the regression test
suite can't trigger any kind of failures regardless of when we trigger
this. The existing code in the patch for only running the explain when
there's an active query handling that.

> > Another concern Robert raised may apply here: what if a person tries
> > to cancel a query when we're explaining? I believe that's a reasonable
> > question to ask, but I believe it's a trade-off that's worth making
> > for the (significant) introspection benefits this patch would provide.
>
> Is the concern here limited to the case where explain code goes crazy
> as Robert pointed out?
> If so, this may be a trade-off worth doing.
> I am a little concerned about whether there will be cases where the
> explain code is not problematic but just takes long time.

The explain code could take a long time in some rare cases (e.g., we
discovered a bug a few years back with the planning code that actually
descends an index to find the max value), but I think the trade-off is
worth it.

> > On to the patch itself: overall I think it looks like it's in pretty
> > good shape. I also noticed we don't have any tests (I assume it'd have
> > to be TAP tests) of the actual output happening, and I think it would
> > be worth adding that.
>
> Checking the log output may be better, but I didn't add it since there
> is only a little content that can be checked, and similar function
> pg_log_backend_memory_contexts() does not do such type of tests.

Fair enough. I still think that would be an improvement here, but
others could also weigh in.

> > Are you interested in re-opening this patch? I'd be happy to provide
> > further review and help to try to push this along.
> Sure, I'm going to re-open this.

I've attached v27. The important change here in 0001 is that it
guarantees the interrupt handler is re-entrant, since that was a bug
exposed by my testing. I've also included 0002 which is only meant for
testing (it attempts to log in the plan in every
CHECK_FOR_INTERRUPTS() call).

Regards,
James

Attachment

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

12 June 2023, 04:07:34

On 2023-06-06 03:26, James Coleman wrote:
> On Mon, Jun 5, 2023 at 4:30 AM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> On 2023-06-03 02:51, James Coleman wrote:
>> > Hello,
>> >
>> > Thanks for working on this patch!
> 
> Sure thing! I'm *very interested* in seeing this available, and I
> think it paves the way for some additional features later on...
> 
>> > On Thu, Dec 8, 2022 at 12:10 AM torikoshia <torikoshia@oss.nttdata.com>
>> ...
>> > To put it positively: we believe that, for example, catalog accesses
>> > inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside
>> > an existing valid transaction/query state, as it would be for this
>> > patch -- are safe. If there were problems, then those problems are
>> > likely bugs we already have in other CFI cases.
>> 
>> Thanks a lot for the discussion and sharing it!
>> I really appreciate it.
>> 
>> BTW I'm not sure whether all the CFI are called in valid transaction,
>> do you think we should check each of them?
> 
> I kicked off the regressions tests with a call to
> ProcessLogQueryPlanInterrupt() in every single CHECK_FOR_INTERRUPTS()
> call. Several hours and 52 GB of logs later I have confirmed that
> (with the attached revision) at the very least the regression test
> suite can't trigger any kind of failures regardless of when we trigger
> this. The existing code in the patch for only running the explain when
> there's an active query handling that.

Thanks for the testing!

> I've attached v27. The important change here in 0001 is that it
> guarantees the interrupt handler is re-entrant, since that was a bug
> exposed by my testing. I've also included 0002 which is only meant for
> testing (it attempts to log in the plan in every
> CHECK_FOR_INTERRUPTS() call).

When SIGINT is sent during ProcessLogQueryPlanInterrupt(),
ProcessLogQueryPlanInterruptActive can remain true.
After that, pg_log_query_plan() does nothing but just returns.

AFAIU, v26 logs plan for each pg_log_query_plan() even when another
pg_log_query_plan() is running, but it doesn't cause errors or critical
problem.

Considering the problem solved and introduced by v27, I think v26 might
be fine.
How do you think?

> 
> Regards,
> James

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

James Coleman

Date:

12 June 2023, 15:52:01

On Sun, Jun 11, 2023 at 11:07 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2023-06-06 03:26, James Coleman wrote:
> > On Mon, Jun 5, 2023 at 4:30 AM torikoshia <torikoshia@oss.nttdata.com>
> > wrote:
> >>
> >> On 2023-06-03 02:51, James Coleman wrote:
> >> > Hello,
> >> >
> >> > Thanks for working on this patch!
> >
> > Sure thing! I'm *very interested* in seeing this available, and I
> > think it paves the way for some additional features later on...
> >
> >> > On Thu, Dec 8, 2022 at 12:10 AM torikoshia <torikoshia@oss.nttdata.com>
> >> ...
> >> > To put it positively: we believe that, for example, catalog accesses
> >> > inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside
> >> > an existing valid transaction/query state, as it would be for this
> >> > patch -- are safe. If there were problems, then those problems are
> >> > likely bugs we already have in other CFI cases.
> >>
> >> Thanks a lot for the discussion and sharing it!
> >> I really appreciate it.
> >>
> >> BTW I'm not sure whether all the CFI are called in valid transaction,
> >> do you think we should check each of them?
> >
> > I kicked off the regressions tests with a call to
> > ProcessLogQueryPlanInterrupt() in every single CHECK_FOR_INTERRUPTS()
> > call. Several hours and 52 GB of logs later I have confirmed that
> > (with the attached revision) at the very least the regression test
> > suite can't trigger any kind of failures regardless of when we trigger
> > this. The existing code in the patch for only running the explain when
> > there's an active query handling that.
>
> Thanks for the testing!
>
> > I've attached v27. The important change here in 0001 is that it
> > guarantees the interrupt handler is re-entrant, since that was a bug
> > exposed by my testing. I've also included 0002 which is only meant for
> > testing (it attempts to log in the plan in every
> > CHECK_FOR_INTERRUPTS() call).
>
> When SIGINT is sent during ProcessLogQueryPlanInterrupt(),
> ProcessLogQueryPlanInterruptActive can remain true.
> After that, pg_log_query_plan() does nothing but just returns.
>
> AFAIU, v26 logs plan for each pg_log_query_plan() even when another
> pg_log_query_plan() is running, but it doesn't cause errors or critical
> problem.
>
> Considering the problem solved and introduced by v27, I think v26 might
> be fine.
> How do you think?

The testing I did with calling this during every CFI is what uncovered
the re-entrancy problem. IIRC (without running that test again) the
problem was a stack overflow. Now, to be sure this is a particularly
degenerate case because in real-world usage it'd be impossible in
practice, I think, to trigger that many calls to this function (and by
extension the interrupt handler).

If SIGINT is the only concern we could reset
ProcessLogQueryPlanInterruptActive in error handling code. I admit
that part of my thought process here is thinking ahead to an
additional patch I'd like to see on top of this, which is logging a
query plan before cleaning up when statement timeout occurs. The
re-entrancy issue becomes more interesting then, I think, since we
would then have automated calling of the logging code. BTW: I'd
thought that would make a nice follow-up patch for this, but if you'd
prefer I could add it as another patch in the series here.

What do you think about resetting the flag versus just not having it?

Regards,
James Coleman

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

13 June 2023, 15:22:14

On 2023-06-13 00:52, James Coleman wrote:
>> 
>> > I've attached v27. The important change here in 0001 is that it
>> > guarantees the interrupt handler is re-entrant, since that was a bug
>> > exposed by my testing. I've also included 0002 which is only meant for
>> > testing (it attempts to log in the plan in every
>> > CHECK_FOR_INTERRUPTS() call).
>> 
>> When SIGINT is sent during ProcessLogQueryPlanInterrupt(),
>> ProcessLogQueryPlanInterruptActive can remain true.
>> After that, pg_log_query_plan() does nothing but just returns.
>> 
>> AFAIU, v26 logs plan for each pg_log_query_plan() even when another
>> pg_log_query_plan() is running, but it doesn't cause errors or 
>> critical
>> problem.
>> 
>> Considering the problem solved and introduced by v27, I think v26 
>> might
>> be fine.
>> How do you think?
> 
> The testing I did with calling this during every CFI is what uncovered
> the re-entrancy problem. IIRC (without running that test again) the
> problem was a stack overflow. Now, to be sure this is a particularly
> degenerate case because in real-world usage it'd be impossible in
> practice, I think, to trigger that many calls to this function (and by
> extension the interrupt handler).

Yeah.In addition, currently only superusers are allowed to execute
pg_log_query_plan(), I think we don't need to think about cases
where users are malicious.

> If SIGINT is the only concern we could reset
> ProcessLogQueryPlanInterruptActive in error handling code. I admit
> that part of my thought process here is thinking ahead to an
> additional patch I'd like to see on top of this, which is logging a
> query plan before cleaning up when statement timeout occurs.

I remember this is what you wanted do.[1]

> The
> re-entrancy issue becomes more interesting then, I think, since we
> would then have automated calling of the logging code. BTW: I'd
> thought that would make a nice follow-up patch for this, but if you'd
> prefer I could add it as another patch in the series here.
> 
> What do you think about resetting the flag versus just not having it?

If I understand you correctly, adding the flag is not necessary for this
proposal.
To keep the patch simple, I prefer not having it.


[1] 

https://www.postgresql.org/message-id/flat/CA%2BTgmoYW_rSOW4JMQ9_0Df9PKQ%3DsQDOKUGA4Gc9D8w4wui8fSA%40mail.gmail.com#b57432077f8045be8588049269f7a8dd

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

James Coleman

Date:

13 June 2023, 15:53:39

On Tue, Jun 13, 2023 at 11:22 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2023-06-13 00:52, James Coleman wrote:
> >>
> >> > I've attached v27. The important change here in 0001 is that it
> >> > guarantees the interrupt handler is re-entrant, since that was a bug
> >> > exposed by my testing. I've also included 0002 which is only meant for
> >> > testing (it attempts to log in the plan in every
> >> > CHECK_FOR_INTERRUPTS() call).
> >>
> >> When SIGINT is sent during ProcessLogQueryPlanInterrupt(),
> >> ProcessLogQueryPlanInterruptActive can remain true.
> >> After that, pg_log_query_plan() does nothing but just returns.
> >>
> >> AFAIU, v26 logs plan for each pg_log_query_plan() even when another
> >> pg_log_query_plan() is running, but it doesn't cause errors or
> >> critical
> >> problem.
> >>
> >> Considering the problem solved and introduced by v27, I think v26
> >> might
> >> be fine.
> >> How do you think?
> >
> > The testing I did with calling this during every CFI is what uncovered
> > the re-entrancy problem. IIRC (without running that test again) the
> > problem was a stack overflow. Now, to be sure this is a particularly
> > degenerate case because in real-world usage it'd be impossible in
> > practice, I think, to trigger that many calls to this function (and by
> > extension the interrupt handler).
>
> Yeah.In addition, currently only superusers are allowed to execute
> pg_log_query_plan(), I think we don't need to think about cases
> where users are malicious.
>
> > If SIGINT is the only concern we could reset
> > ProcessLogQueryPlanInterruptActive in error handling code. I admit
> > that part of my thought process here is thinking ahead to an
> > additional patch I'd like to see on top of this, which is logging a
> > query plan before cleaning up when statement timeout occurs.
>
> I remember this is what you wanted do.[1]
>
> > The
> > re-entrancy issue becomes more interesting then, I think, since we
> > would then have automated calling of the logging code. BTW: I'd
> > thought that would make a nice follow-up patch for this, but if you'd
> > prefer I could add it as another patch in the series here.
> >
> > What do you think about resetting the flag versus just not having it?
>
> If I understand you correctly, adding the flag is not necessary for this
> proposal.
> To keep the patch simple, I prefer not having it.
>

I'm going to re-run tests with my patch version + resetting the flag
on SIGINT (and any other error condition) to be certain that the issue
you uncovered (where backends get stuck after a SIGINT not responding
to the requested plan logging) wasn't masking any other issues.

As long as that run is clean also then I believe the patch is safe
as-is even without the re-entrancy guard.

I'll report back with the results of that testing.

Regards,
James Coleman

Re: RFC: Logging plan of the running query

From

James Coleman

Date:

14 June 2023, 16:48:52

On Tue, Jun 13, 2023 at 11:53 AM James Coleman <jtc331@gmail.com> wrote:
>
> ...
> I'm going to re-run tests with my patch version + resetting the flag
> on SIGINT (and any other error condition) to be certain that the issue
> you uncovered (where backends get stuck after a SIGINT not responding
> to the requested plan logging) wasn't masking any other issues.
>
> As long as that run is clean also then I believe the patch is safe
> as-is even without the re-entrancy guard.
>
> I'll report back with the results of that testing.

The tests have been running since last night, but have been apparently
hung now for many hours. I haven't been able to fully look into it,
but so far I know the hung (100% CPU) backend last logged this:

2023-06-14 02:00:30.045 UTC client backend[84461]
pg_regress/updatable_views LOG:  query plan running on backend with
PID 84461 is:
        Query Text: SELECT table_name, column_name, is_updatable
          FROM information_schema.columns
         WHERE table_name LIKE E'r_\\_view%'
         ORDER BY table_name, ordinal_position;

The last output from the regression test harness was:

# parallel group (5 tests):  index_including create_view
index_including_gist create_index create_index_spgist
ok 66        + create_index                            36508 ms
ok 67        + create_index_spgist                     38588 ms
ok 68        + create_view                              1394 ms
ok 69        + index_including                           654 ms
ok 70        + index_including_gist                     1701 ms
# parallel group (16 tests):  errors create_cast drop_if_exists
create_aggregate roleattributes constraints hash_func typed_table
infinite_recurse

Attaching gdb to the hung backend shows this:

#0  0x00005601ab1f9529 in ProcLockWakeup
(lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
<default_lockmethod>, lock=lock@entry=0x7f5325c913f0) at proc.c:1655
#1  0x00005601ab1e99dc in CleanUpLock (lock=lock@entry=0x7f5325c913f0,
proclock=proclock@entry=0x7f5325d40d60,
lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
<default_lockmethod>,
    hashcode=hashcode@entry=573498161, wakeupNeeded=<optimized out>)
at lock.c:1673
#2  0x00005601ab1e9e21 in LockRefindAndRelease
(lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
<default_lockmethod>, proc=<optimized out>,
locktag=locktag@entry=0x5601ac3d7998, lockmode=lockmode@entry=1,
    decrement_strong_lock_count=decrement_strong_lock_count@entry=false)
at lock.c:3150
#3  0x00005601ab1edb27 in LockReleaseAll
(lockmethodid=lockmethodid@entry=1, allLocks=false) at lock.c:2295
#4  0x00005601ab1f8599 in ProcReleaseLocks
(isCommit=isCommit@entry=true) at proc.c:781
#5  0x00005601ab37f1f4 in ResourceOwnerReleaseInternal
(owner=<optimized out>, phase=phase@entry=RESOURCE_RELEASE_LOCKS,
isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at
resowner.c:618
#6  0x00005601ab37f7b7 in ResourceOwnerRelease (owner=<optimized out>,
phase=phase@entry=RESOURCE_RELEASE_LOCKS,
isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at
resowner.c:494
#7  0x00005601aaec1d84 in CommitTransaction () at xact.c:2334
#8  0x00005601aaec2b22 in CommitTransactionCommand () at xact.c:3067
#9  0x00005601ab200a66 in finish_xact_command () at postgres.c:2783
#10 0x00005601ab20338f in exec_simple_query (
    query_string=query_string@entry=0x5601ac3b0858 "SELECT table_name,
column_name, is_updatable\n  FROM information_schema.columns\n WHERE
table_name LIKE E'r_\\\\_view%'\n ORDER BY table_name,
ordinal_position;") at postgres.c:1300

I am unable to connect to the regression test Postgres instance --
psql just hangs, so the lock seems to have affected the postmaster
also.

I'm wondering if this might represent a bug in the current patch.

Regards,
James Coleman

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

15 June 2023, 13:00:46

On 2023-06-15 01:48, James Coleman wrote:
> On Tue, Jun 13, 2023 at 11:53 AM James Coleman <jtc331@gmail.com> 
> wrote:
>> 
>> ...
>> I'm going to re-run tests with my patch version + resetting the flag
>> on SIGINT (and any other error condition) to be certain that the issue
>> you uncovered (where backends get stuck after a SIGINT not responding
>> to the requested plan logging) wasn't masking any other issues.
>> 
>> As long as that run is clean also then I believe the patch is safe
>> as-is even without the re-entrancy guard.
>> 
>> I'll report back with the results of that testing.
> 
> The tests have been running since last night, but have been apparently
> hung now for many hours. I haven't been able to fully look into it,
> but so far I know the hung (100% CPU) backend last logged this:
> 
> 2023-06-14 02:00:30.045 UTC client backend[84461]
> pg_regress/updatable_views LOG:  query plan running on backend with
> PID 84461 is:
>         Query Text: SELECT table_name, column_name, is_updatable
>           FROM information_schema.columns
>          WHERE table_name LIKE E'r_\\_view%'
>          ORDER BY table_name, ordinal_position;
> 
> The last output from the regression test harness was:
> 
> # parallel group (5 tests):  index_including create_view
> index_including_gist create_index create_index_spgist
> ok 66        + create_index                            36508 ms
> ok 67        + create_index_spgist                     38588 ms
> ok 68        + create_view                              1394 ms
> ok 69        + index_including                           654 ms
> ok 70        + index_including_gist                     1701 ms
> # parallel group (16 tests):  errors create_cast drop_if_exists
> create_aggregate roleattributes constraints hash_func typed_table
> infinite_recurse
> 
> Attaching gdb to the hung backend shows this:
> 
> #0  0x00005601ab1f9529 in ProcLockWakeup
> (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
> <default_lockmethod>, lock=lock@entry=0x7f5325c913f0) at proc.c:1655
> #1  0x00005601ab1e99dc in CleanUpLock (lock=lock@entry=0x7f5325c913f0,
> proclock=proclock@entry=0x7f5325d40d60,
> lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
> <default_lockmethod>,
>     hashcode=hashcode@entry=573498161, wakeupNeeded=<optimized out>)
> at lock.c:1673
> #2  0x00005601ab1e9e21 in LockRefindAndRelease
> (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
> <default_lockmethod>, proc=<optimized out>,
> locktag=locktag@entry=0x5601ac3d7998, lockmode=lockmode@entry=1,
>     
> decrement_strong_lock_count=decrement_strong_lock_count@entry=false)
> at lock.c:3150
> #3  0x00005601ab1edb27 in LockReleaseAll
> (lockmethodid=lockmethodid@entry=1, allLocks=false) at lock.c:2295
> #4  0x00005601ab1f8599 in ProcReleaseLocks
> (isCommit=isCommit@entry=true) at proc.c:781
> #5  0x00005601ab37f1f4 in ResourceOwnerReleaseInternal
> (owner=<optimized out>, phase=phase@entry=RESOURCE_RELEASE_LOCKS,
> isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at
> resowner.c:618
> #6  0x00005601ab37f7b7 in ResourceOwnerRelease (owner=<optimized out>,
> phase=phase@entry=RESOURCE_RELEASE_LOCKS,
> isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at
> resowner.c:494
> #7  0x00005601aaec1d84 in CommitTransaction () at xact.c:2334
> #8  0x00005601aaec2b22 in CommitTransactionCommand () at xact.c:3067
> #9  0x00005601ab200a66 in finish_xact_command () at postgres.c:2783
> #10 0x00005601ab20338f in exec_simple_query (
>     query_string=query_string@entry=0x5601ac3b0858 "SELECT table_name,
> column_name, is_updatable\n  FROM information_schema.columns\n WHERE
> table_name LIKE E'r_\\\\_view%'\n ORDER BY table_name,
> ordinal_position;") at postgres.c:1300
> 
> I am unable to connect to the regression test Postgres instance --
> psql just hangs, so the lock seems to have affected the postmaster
> also.
> 
> I'm wondering if this might represent a bug in the current patch.

Thanks for running and analyzing the test!

Could you share me how you are running the test?

I imagined something like below, but currently couldn't reproduce it.
- apply both v26-0001 and v27-0002 and build
- run PostgreSQL with default GUCssaaa
- make installcheck-world
- run 'SELECT pg_log_query_plan(pid) FROM pg_stat_activity \watch 0.1' 
during make installcheck-world

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

James Coleman

Date:

15 June 2023, 16:34:23

On Thu, Jun 15, 2023 at 9:00 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2023-06-15 01:48, James Coleman wrote:
> > On Tue, Jun 13, 2023 at 11:53 AM James Coleman <jtc331@gmail.com>
> > wrote:
> >>
> >> ...
> >> I'm going to re-run tests with my patch version + resetting the flag
> >> on SIGINT (and any other error condition) to be certain that the issue
> >> you uncovered (where backends get stuck after a SIGINT not responding
> >> to the requested plan logging) wasn't masking any other issues.
> >>
> >> As long as that run is clean also then I believe the patch is safe
> >> as-is even without the re-entrancy guard.
> >>
> >> I'll report back with the results of that testing.
> >
> > The tests have been running since last night, but have been apparently
> > hung now for many hours. I haven't been able to fully look into it,
> > but so far I know the hung (100% CPU) backend last logged this:
> >
> > 2023-06-14 02:00:30.045 UTC client backend[84461]
> > pg_regress/updatable_views LOG:  query plan running on backend with
> > PID 84461 is:
> >         Query Text: SELECT table_name, column_name, is_updatable
> >           FROM information_schema.columns
> >          WHERE table_name LIKE E'r_\\_view%'
> >          ORDER BY table_name, ordinal_position;
> >
> > The last output from the regression test harness was:
> >
> > # parallel group (5 tests):  index_including create_view
> > index_including_gist create_index create_index_spgist
> > ok 66        + create_index                            36508 ms
> > ok 67        + create_index_spgist                     38588 ms
> > ok 68        + create_view                              1394 ms
> > ok 69        + index_including                           654 ms
> > ok 70        + index_including_gist                     1701 ms
> > # parallel group (16 tests):  errors create_cast drop_if_exists
> > create_aggregate roleattributes constraints hash_func typed_table
> > infinite_recurse
> >
> > Attaching gdb to the hung backend shows this:
> >
> > #0  0x00005601ab1f9529 in ProcLockWakeup
> > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
> > <default_lockmethod>, lock=lock@entry=0x7f5325c913f0) at proc.c:1655
> > #1  0x00005601ab1e99dc in CleanUpLock (lock=lock@entry=0x7f5325c913f0,
> > proclock=proclock@entry=0x7f5325d40d60,
> > lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
> > <default_lockmethod>,
> >     hashcode=hashcode@entry=573498161, wakeupNeeded=<optimized out>)
> > at lock.c:1673
> > #2  0x00005601ab1e9e21 in LockRefindAndRelease
> > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
> > <default_lockmethod>, proc=<optimized out>,
> > locktag=locktag@entry=0x5601ac3d7998, lockmode=lockmode@entry=1,
> >
> > decrement_strong_lock_count=decrement_strong_lock_count@entry=false)
> > at lock.c:3150
> > #3  0x00005601ab1edb27 in LockReleaseAll
> > (lockmethodid=lockmethodid@entry=1, allLocks=false) at lock.c:2295
> > #4  0x00005601ab1f8599 in ProcReleaseLocks
> > (isCommit=isCommit@entry=true) at proc.c:781
> > #5  0x00005601ab37f1f4 in ResourceOwnerReleaseInternal
> > (owner=<optimized out>, phase=phase@entry=RESOURCE_RELEASE_LOCKS,
> > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at
> > resowner.c:618
> > #6  0x00005601ab37f7b7 in ResourceOwnerRelease (owner=<optimized out>,
> > phase=phase@entry=RESOURCE_RELEASE_LOCKS,
> > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at
> > resowner.c:494
> > #7  0x00005601aaec1d84 in CommitTransaction () at xact.c:2334
> > #8  0x00005601aaec2b22 in CommitTransactionCommand () at xact.c:3067
> > #9  0x00005601ab200a66 in finish_xact_command () at postgres.c:2783
> > #10 0x00005601ab20338f in exec_simple_query (
> >     query_string=query_string@entry=0x5601ac3b0858 "SELECT table_name,
> > column_name, is_updatable\n  FROM information_schema.columns\n WHERE
> > table_name LIKE E'r_\\\\_view%'\n ORDER BY table_name,
> > ordinal_position;") at postgres.c:1300
> >
> > I am unable to connect to the regression test Postgres instance --
> > psql just hangs, so the lock seems to have affected the postmaster
> > also.
> >
> > I'm wondering if this might represent a bug in the current patch.
>
> Thanks for running and analyzing the test!

Sure thing!

> Could you share me how you are running the test?
>
> I imagined something like below, but currently couldn't reproduce it.
> - apply both v26-0001 and v27-0002 and build
> - run PostgreSQL with default GUCssaaa
> - make installcheck-world
> - run 'SELECT pg_log_query_plan(pid) FROM pg_stat_activity \watch 0.1'
> during make installcheck-world

Apologies, I should have attached my updated patch (with the fix for
the bug you'd reporting with the re-entrancy guard). Attached is v28
which sets ProcessLogQueryPlanInterruptActive to false in errfinish
when necessary. Once built with those two patches I'm simply running
`make check`.

Regards,
James Coleman

Attachment

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

27 June 2023, 10:03:21

On 2023-06-16 01:34, James Coleman wrote:
> On Thu, Jun 15, 2023 at 9:00 AM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> On 2023-06-15 01:48, James Coleman wrote:
>> > On Tue, Jun 13, 2023 at 11:53 AM James Coleman <jtc331@gmail.com>
>> > wrote:
>> >>
>> >> ...
>> >> I'm going to re-run tests with my patch version + resetting the flag
>> >> on SIGINT (and any other error condition) to be certain that the issue
>> >> you uncovered (where backends get stuck after a SIGINT not responding
>> >> to the requested plan logging) wasn't masking any other issues.
>> >>
>> >> As long as that run is clean also then I believe the patch is safe
>> >> as-is even without the re-entrancy guard.
>> >>
>> >> I'll report back with the results of that testing.
>> >
>> > The tests have been running since last night, but have been apparently
>> > hung now for many hours. I haven't been able to fully look into it,
>> > but so far I know the hung (100% CPU) backend last logged this:
>> >
>> > 2023-06-14 02:00:30.045 UTC client backend[84461]
>> > pg_regress/updatable_views LOG:  query plan running on backend with
>> > PID 84461 is:
>> >         Query Text: SELECT table_name, column_name, is_updatable
>> >           FROM information_schema.columns
>> >          WHERE table_name LIKE E'r_\\_view%'
>> >          ORDER BY table_name, ordinal_position;
>> >
>> > The last output from the regression test harness was:
>> >
>> > # parallel group (5 tests):  index_including create_view
>> > index_including_gist create_index create_index_spgist
>> > ok 66        + create_index                            36508 ms
>> > ok 67        + create_index_spgist                     38588 ms
>> > ok 68        + create_view                              1394 ms
>> > ok 69        + index_including                           654 ms
>> > ok 70        + index_including_gist                     1701 ms
>> > # parallel group (16 tests):  errors create_cast drop_if_exists
>> > create_aggregate roleattributes constraints hash_func typed_table
>> > infinite_recurse
>> >
>> > Attaching gdb to the hung backend shows this:
>> >
>> > #0  0x00005601ab1f9529 in ProcLockWakeup
>> > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
>> > <default_lockmethod>, lock=lock@entry=0x7f5325c913f0) at proc.c:1655
>> > #1  0x00005601ab1e99dc in CleanUpLock (lock=lock@entry=0x7f5325c913f0,
>> > proclock=proclock@entry=0x7f5325d40d60,
>> > lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
>> > <default_lockmethod>,
>> >     hashcode=hashcode@entry=573498161, wakeupNeeded=<optimized out>)
>> > at lock.c:1673
>> > #2  0x00005601ab1e9e21 in LockRefindAndRelease
>> > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
>> > <default_lockmethod>, proc=<optimized out>,
>> > locktag=locktag@entry=0x5601ac3d7998, lockmode=lockmode@entry=1,
>> >
>> > decrement_strong_lock_count=decrement_strong_lock_count@entry=false)
>> > at lock.c:3150
>> > #3  0x00005601ab1edb27 in LockReleaseAll
>> > (lockmethodid=lockmethodid@entry=1, allLocks=false) at lock.c:2295
>> > #4  0x00005601ab1f8599 in ProcReleaseLocks
>> > (isCommit=isCommit@entry=true) at proc.c:781
>> > #5  0x00005601ab37f1f4 in ResourceOwnerReleaseInternal
>> > (owner=<optimized out>, phase=phase@entry=RESOURCE_RELEASE_LOCKS,
>> > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at
>> > resowner.c:618
>> > #6  0x00005601ab37f7b7 in ResourceOwnerRelease (owner=<optimized out>,
>> > phase=phase@entry=RESOURCE_RELEASE_LOCKS,
>> > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at
>> > resowner.c:494
>> > #7  0x00005601aaec1d84 in CommitTransaction () at xact.c:2334
>> > #8  0x00005601aaec2b22 in CommitTransactionCommand () at xact.c:3067
>> > #9  0x00005601ab200a66 in finish_xact_command () at postgres.c:2783
>> > #10 0x00005601ab20338f in exec_simple_query (
>> >     query_string=query_string@entry=0x5601ac3b0858 "SELECT table_name,
>> > column_name, is_updatable\n  FROM information_schema.columns\n WHERE
>> > table_name LIKE E'r_\\\\_view%'\n ORDER BY table_name,
>> > ordinal_position;") at postgres.c:1300
>> >
>> > I am unable to connect to the regression test Postgres instance --
>> > psql just hangs, so the lock seems to have affected the postmaster
>> > also.
>> >
>> > I'm wondering if this might represent a bug in the current patch.
>> 
>> Thanks for running and analyzing the test!
> 
> Sure thing!
> 
>> Could you share me how you are running the test?
>> 
>> I imagined something like below, but currently couldn't reproduce it.
>> - apply both v26-0001 and v27-0002 and build
>> - run PostgreSQL with default GUCssaaa
>> - make installcheck-world
>> - run 'SELECT pg_log_query_plan(pid) FROM pg_stat_activity \watch 0.1'
>> during make installcheck-world
> 
> Apologies, I should have attached my updated patch (with the fix for
> the bug you'd reporting with the re-entrancy guard). Attached is v28
> which sets ProcessLogQueryPlanInterruptActive to false in errfinish
> when necessary. Once built with those two patches I'm simply running
> `make check`.

Thanks!

However, I haven't succeeded in reproducing the problem as below:

> The tests have been running since last night, but have been apparently
> hung now for many hours. I haven't been able to fully look into it,
> but so far I know the hung (100% CPU) backend last logged this:

Did you do something like this?

   $ ./configure --prefix=/home/ubuntu/pgsql/master --enable-cassert
   $ git apply 
../patch/v28-0001-Add-function-to-log-the-plan-of-the-query.patch 
../patch/v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch
   $ make
   $ make check

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

17 August 2023, 14:02:22

On 2023-06-16 01:34, James Coleman wrote:
> Attached is v28
> which sets ProcessLogQueryPlanInterruptActive to false in errfinish
> when necessary. Once built with those two patches I'm simply running
> `make check`.

With v28-0001 and v28-0002 patch, I confirmed backend processes consume 
huge
amount of memory and under some environments they were terminated by OOM
killer.

This was because memory was allocated from existing memory contexts and 
they
were not freed after ProcessLogQueryPlanInterrupt().
Updated the patch to use dedicated memory context for
ProcessLogQueryPlanInterrupt().

Applying attached patch and v28-0002 patch, `make check` successfully
completed after 20min and 50GB of logs on my environment.

>>> On 2023-06-15 01:48, James Coleman wrote:
>>> > The tests have been running since last night, but have been apparently
>>> > hung now for many hours.

I don't know if this has anything to do with the hung you faced, but I 
thought
it might be possible that the large amount of memory usage resulted in
swapping, which caused a significant delay in processing.

If possible, I would be very grateful if you could try to reproduce this 
with
the v29 patch.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachment

v29-0001-Add-function-to-log-the-plan-of-the-query.patch

Re: RFC: Logging plan of the running query

From

James Coleman

Date:

25 August 2023, 11:43:26

On Thu, Aug 17, 2023 at 10:02 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2023-06-16 01:34, James Coleman wrote:
> > Attached is v28
> > which sets ProcessLogQueryPlanInterruptActive to false in errfinish
> > when necessary. Once built with those two patches I'm simply running
> > `make check`.
>
> With v28-0001 and v28-0002 patch, I confirmed backend processes consume
> huge
> amount of memory and under some environments they were terminated by OOM
> killer.
>
> This was because memory was allocated from existing memory contexts and
> they
> were not freed after ProcessLogQueryPlanInterrupt().
> Updated the patch to use dedicated memory context for
> ProcessLogQueryPlanInterrupt().
>
> Applying attached patch and v28-0002 patch, `make check` successfully
> completed after 20min and 50GB of logs on my environment.
>
> >>> On 2023-06-15 01:48, James Coleman wrote:
> >>> > The tests have been running since last night, but have been apparently
> >>> > hung now for many hours.
>
> I don't know if this has anything to do with the hung you faced, but I
> thought
> it might be possible that the large amount of memory usage resulted in
> swapping, which caused a significant delay in processing.

Ah, yes, I think that could be a possible explanation. I was delaying
on this thread because I wasn't comfortable with having caused an
issue once (even if I couldn't easily reproduce) without at least some
theory as to the cause (and a fix).

> If possible, I would be very grateful if you could try to reproduce this
> with
> the v29 patch.

I'll kick off some testing.

Thanks,
James Coleman

Re: RFC: Logging plan of the running query

From

James Coleman

Date:

26 August 2023, 12:03:26

On Fri, Aug 25, 2023 at 7:43 AM James Coleman <jtc331@gmail.com> wrote:
>
> On Thu, Aug 17, 2023 at 10:02 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
> >
> > On 2023-06-16 01:34, James Coleman wrote:
> > > Attached is v28
> > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish
> > > when necessary. Once built with those two patches I'm simply running
> > > `make check`.
> >
> > With v28-0001 and v28-0002 patch, I confirmed backend processes consume
> > huge
> > amount of memory and under some environments they were terminated by OOM
> > killer.
> >
> > This was because memory was allocated from existing memory contexts and
> > they
> > were not freed after ProcessLogQueryPlanInterrupt().
> > Updated the patch to use dedicated memory context for
> > ProcessLogQueryPlanInterrupt().
> >
> > Applying attached patch and v28-0002 patch, `make check` successfully
> > completed after 20min and 50GB of logs on my environment.
> >
> > >>> On 2023-06-15 01:48, James Coleman wrote:
> > >>> > The tests have been running since last night, but have been apparently
> > >>> > hung now for many hours.
> >
> > I don't know if this has anything to do with the hung you faced, but I
> > thought
> > it might be possible that the large amount of memory usage resulted in
> > swapping, which caused a significant delay in processing.
>
> Ah, yes, I think that could be a possible explanation. I was delaying
> on this thread because I wasn't comfortable with having caused an
> issue once (even if I couldn't easily reproduce) without at least some
> theory as to the cause (and a fix).
>
> > If possible, I would be very grateful if you could try to reproduce this
> > with
> > the v29 patch.
>
> I'll kick off some testing.
>

I don't have time to investigate what's happening here, but 24 hours
later the first "make check" is still running, and at first glance it
seems to have the same behavior I'd seen that first time. The test
output is to this point:

# parallel group (5 tests):  index_including create_view
index_including_gist create_index create_index_spgist
ok 66        + create_index                            26365 ms
ok 67        + create_index_spgist                     27675 ms
ok 68        + create_view                              1235 ms
ok 69        + index_including                          1102 ms
ok 70        + index_including_gist                     1633 ms
# parallel group (16 tests):  create_aggregate create_cast errors
roleattributes drop_if_exists hash_func typed_table create_am
infinite_recurse

and it hasn't progressed past that point since at least ~16 hours ago
(the first several hours of the run I wasn't monitoring it).

I haven't connected up gdb yet, and won't be able to until maybe
tomorrow, but here's the ps output for postgres processes that are
running:

admin    3213249  0.0  0.0 196824 20552 ?        Ss   Aug25   0:00
/home/admin/postgresql-test/bin/postgres -D
/home/admin/postgresql-test-data
admin    3213250  0.0  0.0 196964  7284 ?        Ss   Aug25   0:00
postgres: checkpointer
admin    3213251  0.0  0.0 196956  4276 ?        Ss   Aug25   0:00
postgres: background writer
admin    3213253  0.0  0.0 196956  8600 ?        Ss   Aug25   0:00
postgres: walwriter
admin    3213254  0.0  0.0 198424  7316 ?        Ss   Aug25   0:00
postgres: autovacuum launcher
admin    3213255  0.0  0.0 198412  5488 ?        Ss   Aug25   0:00
postgres: logical replication launcher
admin    3237967  0.0  0.0   2484   516 pts/4    S+   Aug25   0:00
/bin/sh -c echo "# +++ regress check in src/test/regress +++" &&
PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH"
LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib"
INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template
../../../src/test/regress/pg_regress --temp-instance=./tmp_check
--inputdir=. --bindir=     --dlpath=. --max-concurrent-tests=20
--schedule=./parallel_schedule
admin    3237973  0.0  0.0 197880 20688 pts/4    S+   Aug25   0:00
postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c
listen_addresses= -k /tmp/pg_regress-7mmGUa
admin    3237976  0.0  0.1 198332 44608 ?        Ss   Aug25   0:00
postgres: checkpointer
admin    3237977  0.0  0.0 198032  4640 ?        Ss   Aug25   0:00
postgres: background writer
admin    3237979  0.0  0.0 197880  8580 ?        Ss   Aug25   0:00
postgres: walwriter
admin    3237980  0.0  0.0 199484  7608 ?        Ss   Aug25   0:00
postgres: autovacuum launcher
admin    3237981  0.0  0.0 199460  5488 ?        Ss   Aug25   0:00
postgres: logical replication launcher
admin    3243644  0.0  0.2 252400 74656 ?        Ss   Aug25   0:01
postgres: admin regression [local] SELECT waiting
admin    3243645  0.0  0.1 205480 33992 ?        Ss   Aug25   0:00
postgres: admin regression [local] SELECT waiting
admin    3243654 99.9  0.0 203156 31504 ?        Rs   Aug25 1437:49
postgres: admin regression [local] VACUUM
admin    3243655  0.0  0.1 212036 38504 ?        Ss   Aug25   0:00
postgres: admin regression [local] SELECT waiting
admin    3243656  0.0  0.0 206024 30892 ?        Ss   Aug25   0:00
postgres: admin regression [local] DELETE waiting
admin    3243657  0.0  0.1 205568 32232 ?        Ss   Aug25   0:00
postgres: admin regression [local] ALTER TABLE waiting
admin    3243658  0.0  0.0 203740 21532 ?        Ss   Aug25   0:00
postgres: admin regression [local] ANALYZE waiting
admin    3243798  0.0  0.0 199884  8464 ?        Ss   Aug25   0:00
postgres: autovacuum worker
admin    3244733  0.0  0.0 199492  5956 ?        Ss   Aug25   0:00
postgres: autovacuum worker
admin    3245652  0.0  0.0 199884  8468 ?        Ss   Aug25   0:00
postgres: autovacuum worker

As you can see there are a bunch of backends presumably waiting, and
also the VACUUM process has been pegging a single CPU core for at
least since that ~16 hour ago mark.

I hope to be able to do more investigation later, but I wanted to at
least give you this information now.

James Coleman

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

28 August 2023, 07:01:53

On 2023-08-26 21:03, James Coleman wrote:
> On Fri, Aug 25, 2023 at 7:43 AM James Coleman <jtc331@gmail.com> wrote:
>> 
>> On Thu, Aug 17, 2023 at 10:02 AM torikoshia 
>> <torikoshia@oss.nttdata.com> wrote:
>> >
>> > On 2023-06-16 01:34, James Coleman wrote:
>> > > Attached is v28
>> > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish
>> > > when necessary. Once built with those two patches I'm simply running
>> > > `make check`.
>> >
>> > With v28-0001 and v28-0002 patch, I confirmed backend processes consume
>> > huge
>> > amount of memory and under some environments they were terminated by OOM
>> > killer.
>> >
>> > This was because memory was allocated from existing memory contexts and
>> > they
>> > were not freed after ProcessLogQueryPlanInterrupt().
>> > Updated the patch to use dedicated memory context for
>> > ProcessLogQueryPlanInterrupt().
>> >
>> > Applying attached patch and v28-0002 patch, `make check` successfully
>> > completed after 20min and 50GB of logs on my environment.
>> >
>> > >>> On 2023-06-15 01:48, James Coleman wrote:
>> > >>> > The tests have been running since last night, but have been apparently
>> > >>> > hung now for many hours.
>> >
>> > I don't know if this has anything to do with the hung you faced, but I
>> > thought
>> > it might be possible that the large amount of memory usage resulted in
>> > swapping, which caused a significant delay in processing.
>> 
>> Ah, yes, I think that could be a possible explanation. I was delaying
>> on this thread because I wasn't comfortable with having caused an
>> issue once (even if I couldn't easily reproduce) without at least some
>> theory as to the cause (and a fix).
>> 
>> > If possible, I would be very grateful if you could try to reproduce this
>> > with
>> > the v29 patch.
>> 
>> I'll kick off some testing.
>> 
> 
> I don't have time to investigate what's happening here, but 24 hours
> later the first "make check" is still running, and at first glance it
> seems to have the same behavior I'd seen that first time. The test
> output is to this point:
> 
> # parallel group (5 tests):  index_including create_view
> index_including_gist create_index create_index_spgist
> ok 66        + create_index                            26365 ms
> ok 67        + create_index_spgist                     27675 ms
> ok 68        + create_view                              1235 ms
> ok 69        + index_including                          1102 ms
> ok 70        + index_including_gist                     1633 ms
> # parallel group (16 tests):  create_aggregate create_cast errors
> roleattributes drop_if_exists hash_func typed_table create_am
> infinite_recurse
> 
> and it hasn't progressed past that point since at least ~16 hours ago
> (the first several hours of the run I wasn't monitoring it).
> 
> I haven't connected up gdb yet, and won't be able to until maybe
> tomorrow, but here's the ps output for postgres processes that are
> running:
> 
> admin    3213249  0.0  0.0 196824 20552 ?        Ss   Aug25   0:00
> /home/admin/postgresql-test/bin/postgres -D
> /home/admin/postgresql-test-data
> admin    3213250  0.0  0.0 196964  7284 ?        Ss   Aug25   0:00
> postgres: checkpointer
> admin    3213251  0.0  0.0 196956  4276 ?        Ss   Aug25   0:00
> postgres: background writer
> admin    3213253  0.0  0.0 196956  8600 ?        Ss   Aug25   0:00
> postgres: walwriter
> admin    3213254  0.0  0.0 198424  7316 ?        Ss   Aug25   0:00
> postgres: autovacuum launcher
> admin    3213255  0.0  0.0 198412  5488 ?        Ss   Aug25   0:00
> postgres: logical replication launcher
> admin    3237967  0.0  0.0   2484   516 pts/4    S+   Aug25   0:00
> /bin/sh -c echo "# +++ regress check in src/test/regress +++" &&
> PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH"
> LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib"
> INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template
> ../../../src/test/regress/pg_regress --temp-instance=./tmp_check
> --inputdir=. --bindir=     --dlpath=. --max-concurrent-tests=20
> --schedule=./parallel_schedule
> admin    3237973  0.0  0.0 197880 20688 pts/4    S+   Aug25   0:00
> postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c
> listen_addresses= -k /tmp/pg_regress-7mmGUa
> admin    3237976  0.0  0.1 198332 44608 ?        Ss   Aug25   0:00
> postgres: checkpointer
> admin    3237977  0.0  0.0 198032  4640 ?        Ss   Aug25   0:00
> postgres: background writer
> admin    3237979  0.0  0.0 197880  8580 ?        Ss   Aug25   0:00
> postgres: walwriter
> admin    3237980  0.0  0.0 199484  7608 ?        Ss   Aug25   0:00
> postgres: autovacuum launcher
> admin    3237981  0.0  0.0 199460  5488 ?        Ss   Aug25   0:00
> postgres: logical replication launcher
> admin    3243644  0.0  0.2 252400 74656 ?        Ss   Aug25   0:01
> postgres: admin regression [local] SELECT waiting
> admin    3243645  0.0  0.1 205480 33992 ?        Ss   Aug25   0:00
> postgres: admin regression [local] SELECT waiting
> admin    3243654 99.9  0.0 203156 31504 ?        Rs   Aug25 1437:49
> postgres: admin regression [local] VACUUM
> admin    3243655  0.0  0.1 212036 38504 ?        Ss   Aug25   0:00
> postgres: admin regression [local] SELECT waiting
> admin    3243656  0.0  0.0 206024 30892 ?        Ss   Aug25   0:00
> postgres: admin regression [local] DELETE waiting
> admin    3243657  0.0  0.1 205568 32232 ?        Ss   Aug25   0:00
> postgres: admin regression [local] ALTER TABLE waiting
> admin    3243658  0.0  0.0 203740 21532 ?        Ss   Aug25   0:00
> postgres: admin regression [local] ANALYZE waiting
> admin    3243798  0.0  0.0 199884  8464 ?        Ss   Aug25   0:00
> postgres: autovacuum worker
> admin    3244733  0.0  0.0 199492  5956 ?        Ss   Aug25   0:00
> postgres: autovacuum worker
> admin    3245652  0.0  0.0 199884  8468 ?        Ss   Aug25   0:00
> postgres: autovacuum worker
> 
> As you can see there are a bunch of backends presumably waiting, and
> also the VACUUM process has been pegging a single CPU core for at
> least since that ~16 hour ago mark.
> 
> I hope to be able to do more investigation later, but I wanted to at
> least give you this information now.

Thanks a lot for testing the patch!
I really appreciate your cooperation.

Hmm, I also tested on the current HEAD(165d581f146b09) again on Ubuntu 
22.04 and macOS, but unfortunately(fortunately?) they succeeded as 
below:

```
$ git apply v29-0001-Add-function-to-log-the-plan-of-the-query.patch
$ git apply v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch
$ ./configure --enable-debug --enable-cassert
$ make
$ make check

...(snip)...

# parallel group (5 tests):  index_including index_including_gist 
create_view create_index create_index_spgist
ok 66        + create_index                            25033 ms
ok 67        + create_index_spgist                     26144 ms
ok 68        + create_view                              3061 ms
ok 69        + index_including                           976 ms
ok 70        + index_including_gist                     2998 ms
# parallel group (16 tests):  create_cast errors create_aggregate 
roleattributes drop_if_exists hash_func typed_table
create_am select constraints updatable_views inherit triggers vacuum 
create_function_sql infinite_recurse
ok 71        + create_aggregate                          225 ms
ok 72        + create_function_sql                     18874 ms
ok 73        + create_cast                               168 ms

...(snip)...

# All 215 tests passed.
```

If you notice any difference, I would be grateful if you could let me 
know.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

James Coleman

Date:

28 August 2023, 13:47:19

On Mon, Aug 28, 2023 at 3:01 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2023-08-26 21:03, James Coleman wrote:
> > On Fri, Aug 25, 2023 at 7:43 AM James Coleman <jtc331@gmail.com> wrote:
> >>
> >> On Thu, Aug 17, 2023 at 10:02 AM torikoshia
> >> <torikoshia@oss.nttdata.com> wrote:
> >> >
> >> > On 2023-06-16 01:34, James Coleman wrote:
> >> > > Attached is v28
> >> > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish
> >> > > when necessary. Once built with those two patches I'm simply running
> >> > > `make check`.
> >> >
> >> > With v28-0001 and v28-0002 patch, I confirmed backend processes consume
> >> > huge
> >> > amount of memory and under some environments they were terminated by OOM
> >> > killer.
> >> >
> >> > This was because memory was allocated from existing memory contexts and
> >> > they
> >> > were not freed after ProcessLogQueryPlanInterrupt().
> >> > Updated the patch to use dedicated memory context for
> >> > ProcessLogQueryPlanInterrupt().
> >> >
> >> > Applying attached patch and v28-0002 patch, `make check` successfully
> >> > completed after 20min and 50GB of logs on my environment.
> >> >
> >> > >>> On 2023-06-15 01:48, James Coleman wrote:
> >> > >>> > The tests have been running since last night, but have been apparently
> >> > >>> > hung now for many hours.
> >> >
> >> > I don't know if this has anything to do with the hung you faced, but I
> >> > thought
> >> > it might be possible that the large amount of memory usage resulted in
> >> > swapping, which caused a significant delay in processing.
> >>
> >> Ah, yes, I think that could be a possible explanation. I was delaying
> >> on this thread because I wasn't comfortable with having caused an
> >> issue once (even if I couldn't easily reproduce) without at least some
> >> theory as to the cause (and a fix).
> >>
> >> > If possible, I would be very grateful if you could try to reproduce this
> >> > with
> >> > the v29 patch.
> >>
> >> I'll kick off some testing.
> >>
> >
> > I don't have time to investigate what's happening here, but 24 hours
> > later the first "make check" is still running, and at first glance it
> > seems to have the same behavior I'd seen that first time. The test
> > output is to this point:
> >
> > # parallel group (5 tests):  index_including create_view
> > index_including_gist create_index create_index_spgist
> > ok 66        + create_index                            26365 ms
> > ok 67        + create_index_spgist                     27675 ms
> > ok 68        + create_view                              1235 ms
> > ok 69        + index_including                          1102 ms
> > ok 70        + index_including_gist                     1633 ms
> > # parallel group (16 tests):  create_aggregate create_cast errors
> > roleattributes drop_if_exists hash_func typed_table create_am
> > infinite_recurse
> >
> > and it hasn't progressed past that point since at least ~16 hours ago
> > (the first several hours of the run I wasn't monitoring it).
> >
> > I haven't connected up gdb yet, and won't be able to until maybe
> > tomorrow, but here's the ps output for postgres processes that are
> > running:
> >
> > admin    3213249  0.0  0.0 196824 20552 ?        Ss   Aug25   0:00
> > /home/admin/postgresql-test/bin/postgres -D
> > /home/admin/postgresql-test-data
> > admin    3213250  0.0  0.0 196964  7284 ?        Ss   Aug25   0:00
> > postgres: checkpointer
> > admin    3213251  0.0  0.0 196956  4276 ?        Ss   Aug25   0:00
> > postgres: background writer
> > admin    3213253  0.0  0.0 196956  8600 ?        Ss   Aug25   0:00
> > postgres: walwriter
> > admin    3213254  0.0  0.0 198424  7316 ?        Ss   Aug25   0:00
> > postgres: autovacuum launcher
> > admin    3213255  0.0  0.0 198412  5488 ?        Ss   Aug25   0:00
> > postgres: logical replication launcher
> > admin    3237967  0.0  0.0   2484   516 pts/4    S+   Aug25   0:00
> > /bin/sh -c echo "# +++ regress check in src/test/regress +++" &&
> > PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH"
> > LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib"
> > INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template
> > ../../../src/test/regress/pg_regress --temp-instance=./tmp_check
> > --inputdir=. --bindir=     --dlpath=. --max-concurrent-tests=20
> > --schedule=./parallel_schedule
> > admin    3237973  0.0  0.0 197880 20688 pts/4    S+   Aug25   0:00
> > postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c
> > listen_addresses= -k /tmp/pg_regress-7mmGUa
> > admin    3237976  0.0  0.1 198332 44608 ?        Ss   Aug25   0:00
> > postgres: checkpointer
> > admin    3237977  0.0  0.0 198032  4640 ?        Ss   Aug25   0:00
> > postgres: background writer
> > admin    3237979  0.0  0.0 197880  8580 ?        Ss   Aug25   0:00
> > postgres: walwriter
> > admin    3237980  0.0  0.0 199484  7608 ?        Ss   Aug25   0:00
> > postgres: autovacuum launcher
> > admin    3237981  0.0  0.0 199460  5488 ?        Ss   Aug25   0:00
> > postgres: logical replication launcher
> > admin    3243644  0.0  0.2 252400 74656 ?        Ss   Aug25   0:01
> > postgres: admin regression [local] SELECT waiting
> > admin    3243645  0.0  0.1 205480 33992 ?        Ss   Aug25   0:00
> > postgres: admin regression [local] SELECT waiting
> > admin    3243654 99.9  0.0 203156 31504 ?        Rs   Aug25 1437:49
> > postgres: admin regression [local] VACUUM
> > admin    3243655  0.0  0.1 212036 38504 ?        Ss   Aug25   0:00
> > postgres: admin regression [local] SELECT waiting
> > admin    3243656  0.0  0.0 206024 30892 ?        Ss   Aug25   0:00
> > postgres: admin regression [local] DELETE waiting
> > admin    3243657  0.0  0.1 205568 32232 ?        Ss   Aug25   0:00
> > postgres: admin regression [local] ALTER TABLE waiting
> > admin    3243658  0.0  0.0 203740 21532 ?        Ss   Aug25   0:00
> > postgres: admin regression [local] ANALYZE waiting
> > admin    3243798  0.0  0.0 199884  8464 ?        Ss   Aug25   0:00
> > postgres: autovacuum worker
> > admin    3244733  0.0  0.0 199492  5956 ?        Ss   Aug25   0:00
> > postgres: autovacuum worker
> > admin    3245652  0.0  0.0 199884  8468 ?        Ss   Aug25   0:00
> > postgres: autovacuum worker
> >
> > As you can see there are a bunch of backends presumably waiting, and
> > also the VACUUM process has been pegging a single CPU core for at
> > least since that ~16 hour ago mark.
> >
> > I hope to be able to do more investigation later, but I wanted to at
> > least give you this information now.
>
> Thanks a lot for testing the patch!
> I really appreciate your cooperation.
>
> Hmm, I also tested on the current HEAD(165d581f146b09) again on Ubuntu
> 22.04 and macOS, but unfortunately(fortunately?) they succeeded as
> below:
>
> ```
> $ git apply v29-0001-Add-function-to-log-the-plan-of-the-query.patch
> $ git apply v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch
> $ ./configure --enable-debug --enable-cassert
> $ make
> $ make check
>
> ...(snip)...
>
> # parallel group (5 tests):  index_including index_including_gist
> create_view create_index create_index_spgist
> ok 66        + create_index                            25033 ms
> ok 67        + create_index_spgist                     26144 ms
> ok 68        + create_view                              3061 ms
> ok 69        + index_including                           976 ms
> ok 70        + index_including_gist                     2998 ms
> # parallel group (16 tests):  create_cast errors create_aggregate
> roleattributes drop_if_exists hash_func typed_table
> create_am select constraints updatable_views inherit triggers vacuum
> create_function_sql infinite_recurse
> ok 71        + create_aggregate                          225 ms
> ok 72        + create_function_sql                     18874 ms
> ok 73        + create_cast                               168 ms
>
> ...(snip)...
>
> # All 215 tests passed.
> ```
>
> If you notice any difference, I would be grateful if you could let me
> know.

I've never been able to reproduce it (haven't tested the new version,
but v28 at least) on my M1 Mac; where I've reproduced it is on Debian
(first buster and now bullseye).

I'm attaching several stacktraces in the hope that they provide some
clues. These all match the ps output I sent earlier, though note in
that output there is both the regress instance and my test instance
(pid 3213249) running (different ports, of course, and they are from
the exact same compilation run). I've attached ps output for the
postgres processes under the make check process to simplify cross
referencing.

A few interesting things:
- There's definitely a lock on a relation that seems to be what's
blocking the processes.
- When I try to connect with psql the process forks but then hangs
(see the ps output with task names stuck in "authentication"). I've
also included a trace from one of these.

If you think a core file for any of these processes would be helpful
for debugging I'd be happy to try to figure out a way to get that to
you.

Regards,
James Coleman

Attachment

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

05 September 2023, 13:58:59

On 2023-08-28 22:47, James Coleman wrote:
> On Mon, Aug 28, 2023 at 3:01 AM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> On 2023-08-26 21:03, James Coleman wrote:
>> > On Fri, Aug 25, 2023 at 7:43 AM James Coleman <jtc331@gmail.com> wrote:
>> >>
>> >> On Thu, Aug 17, 2023 at 10:02 AM torikoshia
>> >> <torikoshia@oss.nttdata.com> wrote:
>> >> >
>> >> > On 2023-06-16 01:34, James Coleman wrote:
>> >> > > Attached is v28
>> >> > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish
>> >> > > when necessary. Once built with those two patches I'm simply running
>> >> > > `make check`.
>> >> >
>> >> > With v28-0001 and v28-0002 patch, I confirmed backend processes consume
>> >> > huge
>> >> > amount of memory and under some environments they were terminated by OOM
>> >> > killer.
>> >> >
>> >> > This was because memory was allocated from existing memory contexts and
>> >> > they
>> >> > were not freed after ProcessLogQueryPlanInterrupt().
>> >> > Updated the patch to use dedicated memory context for
>> >> > ProcessLogQueryPlanInterrupt().
>> >> >
>> >> > Applying attached patch and v28-0002 patch, `make check` successfully
>> >> > completed after 20min and 50GB of logs on my environment.
>> >> >
>> >> > >>> On 2023-06-15 01:48, James Coleman wrote:
>> >> > >>> > The tests have been running since last night, but have been apparently
>> >> > >>> > hung now for many hours.
>> >> >
>> >> > I don't know if this has anything to do with the hung you faced, but I
>> >> > thought
>> >> > it might be possible that the large amount of memory usage resulted in
>> >> > swapping, which caused a significant delay in processing.
>> >>
>> >> Ah, yes, I think that could be a possible explanation. I was delaying
>> >> on this thread because I wasn't comfortable with having caused an
>> >> issue once (even if I couldn't easily reproduce) without at least some
>> >> theory as to the cause (and a fix).
>> >>
>> >> > If possible, I would be very grateful if you could try to reproduce this
>> >> > with
>> >> > the v29 patch.
>> >>
>> >> I'll kick off some testing.
>> >>
>> >
>> > I don't have time to investigate what's happening here, but 24 hours
>> > later the first "make check" is still running, and at first glance it
>> > seems to have the same behavior I'd seen that first time. The test
>> > output is to this point:
>> >
>> > # parallel group (5 tests):  index_including create_view
>> > index_including_gist create_index create_index_spgist
>> > ok 66        + create_index                            26365 ms
>> > ok 67        + create_index_spgist                     27675 ms
>> > ok 68        + create_view                              1235 ms
>> > ok 69        + index_including                          1102 ms
>> > ok 70        + index_including_gist                     1633 ms
>> > # parallel group (16 tests):  create_aggregate create_cast errors
>> > roleattributes drop_if_exists hash_func typed_table create_am
>> > infinite_recurse
>> >
>> > and it hasn't progressed past that point since at least ~16 hours ago
>> > (the first several hours of the run I wasn't monitoring it).
>> >
>> > I haven't connected up gdb yet, and won't be able to until maybe
>> > tomorrow, but here's the ps output for postgres processes that are
>> > running:
>> >
>> > admin    3213249  0.0  0.0 196824 20552 ?        Ss   Aug25   0:00
>> > /home/admin/postgresql-test/bin/postgres -D
>> > /home/admin/postgresql-test-data
>> > admin    3213250  0.0  0.0 196964  7284 ?        Ss   Aug25   0:00
>> > postgres: checkpointer
>> > admin    3213251  0.0  0.0 196956  4276 ?        Ss   Aug25   0:00
>> > postgres: background writer
>> > admin    3213253  0.0  0.0 196956  8600 ?        Ss   Aug25   0:00
>> > postgres: walwriter
>> > admin    3213254  0.0  0.0 198424  7316 ?        Ss   Aug25   0:00
>> > postgres: autovacuum launcher
>> > admin    3213255  0.0  0.0 198412  5488 ?        Ss   Aug25   0:00
>> > postgres: logical replication launcher
>> > admin    3237967  0.0  0.0   2484   516 pts/4    S+   Aug25   0:00
>> > /bin/sh -c echo "# +++ regress check in src/test/regress +++" &&
>> >
PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH"
>> > LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib"
>> > INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template
>> > ../../../src/test/regress/pg_regress --temp-instance=./tmp_check
>> > --inputdir=. --bindir=     --dlpath=. --max-concurrent-tests=20
>> > --schedule=./parallel_schedule
>> > admin    3237973  0.0  0.0 197880 20688 pts/4    S+   Aug25   0:00
>> > postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c
>> > listen_addresses= -k /tmp/pg_regress-7mmGUa
>> > admin    3237976  0.0  0.1 198332 44608 ?        Ss   Aug25   0:00
>> > postgres: checkpointer
>> > admin    3237977  0.0  0.0 198032  4640 ?        Ss   Aug25   0:00
>> > postgres: background writer
>> > admin    3237979  0.0  0.0 197880  8580 ?        Ss   Aug25   0:00
>> > postgres: walwriter
>> > admin    3237980  0.0  0.0 199484  7608 ?        Ss   Aug25   0:00
>> > postgres: autovacuum launcher
>> > admin    3237981  0.0  0.0 199460  5488 ?        Ss   Aug25   0:00
>> > postgres: logical replication launcher
>> > admin    3243644  0.0  0.2 252400 74656 ?        Ss   Aug25   0:01
>> > postgres: admin regression [local] SELECT waiting
>> > admin    3243645  0.0  0.1 205480 33992 ?        Ss   Aug25   0:00
>> > postgres: admin regression [local] SELECT waiting
>> > admin    3243654 99.9  0.0 203156 31504 ?        Rs   Aug25 1437:49
>> > postgres: admin regression [local] VACUUM
>> > admin    3243655  0.0  0.1 212036 38504 ?        Ss   Aug25   0:00
>> > postgres: admin regression [local] SELECT waiting
>> > admin    3243656  0.0  0.0 206024 30892 ?        Ss   Aug25   0:00
>> > postgres: admin regression [local] DELETE waiting
>> > admin    3243657  0.0  0.1 205568 32232 ?        Ss   Aug25   0:00
>> > postgres: admin regression [local] ALTER TABLE waiting
>> > admin    3243658  0.0  0.0 203740 21532 ?        Ss   Aug25   0:00
>> > postgres: admin regression [local] ANALYZE waiting
>> > admin    3243798  0.0  0.0 199884  8464 ?        Ss   Aug25   0:00
>> > postgres: autovacuum worker
>> > admin    3244733  0.0  0.0 199492  5956 ?        Ss   Aug25   0:00
>> > postgres: autovacuum worker
>> > admin    3245652  0.0  0.0 199884  8468 ?        Ss   Aug25   0:00
>> > postgres: autovacuum worker
>> >
>> > As you can see there are a bunch of backends presumably waiting, and
>> > also the VACUUM process has been pegging a single CPU core for at
>> > least since that ~16 hour ago mark.
>> >
>> > I hope to be able to do more investigation later, but I wanted to at
>> > least give you this information now.
>> 
>> Thanks a lot for testing the patch!
>> I really appreciate your cooperation.
>> 
>> Hmm, I also tested on the current HEAD(165d581f146b09) again on Ubuntu
>> 22.04 and macOS, but unfortunately(fortunately?) they succeeded as
>> below:
>> 
>> ```
>> $ git apply v29-0001-Add-function-to-log-the-plan-of-the-query.patch
>> $ git apply 
>> v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch
>> $ ./configure --enable-debug --enable-cassert
>> $ make
>> $ make check
>> 
>> ...(snip)...
>> 
>> # parallel group (5 tests):  index_including index_including_gist
>> create_view create_index create_index_spgist
>> ok 66        + create_index                            25033 ms
>> ok 67        + create_index_spgist                     26144 ms
>> ok 68        + create_view                              3061 ms
>> ok 69        + index_including                           976 ms
>> ok 70        + index_including_gist                     2998 ms
>> # parallel group (16 tests):  create_cast errors create_aggregate
>> roleattributes drop_if_exists hash_func typed_table
>> create_am select constraints updatable_views inherit triggers vacuum
>> create_function_sql infinite_recurse
>> ok 71        + create_aggregate                          225 ms
>> ok 72        + create_function_sql                     18874 ms
>> ok 73        + create_cast                               168 ms
>> 
>> ...(snip)...
>> 
>> # All 215 tests passed.
>> ```
>> 
>> If you notice any difference, I would be grateful if you could let me
>> know.
> 
> I've never been able to reproduce it (haven't tested the new version,
> but v28 at least) on my M1 Mac; where I've reproduced it is on Debian
> (first buster and now bullseye).
> 
> I'm attaching several stacktraces in the hope that they provide some
> clues. These all match the ps output I sent earlier, though note in
> that output there is both the regress instance and my test instance
> (pid 3213249) running (different ports, of course, and they are from
> the exact same compilation run). I've attached ps output for the
> postgres processes under the make check process to simplify cross
> referencing.
> 
> A few interesting things:
> - There's definitely a lock on a relation that seems to be what's
> blocking the processes.
> - When I try to connect with psql the process forks but then hangs
> (see the ps output with task names stuck in "authentication"). I've
> also included a trace from one of these.

Thanks for sharing them!

Many processes are waiting to acquire the LW lock, including the process 
trying to output the plan(select1.trace).

I suspect that this is due to a lock that was acquired prior to being 
interrupted by ProcessLogQueryPlanInterrupt(), but have not been able to 
reproduce the same situation..

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

James Coleman

Date:

06 September 2023, 02:17:08

On Tue, Sep 5, 2023 at 9:59 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2023-08-28 22:47, James Coleman wrote:
> > On Mon, Aug 28, 2023 at 3:01 AM torikoshia <torikoshia@oss.nttdata.com>
> > wrote:
> >>
> >> On 2023-08-26 21:03, James Coleman wrote:
> >> > On Fri, Aug 25, 2023 at 7:43 AM James Coleman <jtc331@gmail.com> wrote:
> >> >>
> >> >> On Thu, Aug 17, 2023 at 10:02 AM torikoshia
> >> >> <torikoshia@oss.nttdata.com> wrote:
> >> >> >
> >> >> > On 2023-06-16 01:34, James Coleman wrote:
> >> >> > > Attached is v28
> >> >> > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish
> >> >> > > when necessary. Once built with those two patches I'm simply running
> >> >> > > `make check`.
> >> >> >
> >> >> > With v28-0001 and v28-0002 patch, I confirmed backend processes consume
> >> >> > huge
> >> >> > amount of memory and under some environments they were terminated by OOM
> >> >> > killer.
> >> >> >
> >> >> > This was because memory was allocated from existing memory contexts and
> >> >> > they
> >> >> > were not freed after ProcessLogQueryPlanInterrupt().
> >> >> > Updated the patch to use dedicated memory context for
> >> >> > ProcessLogQueryPlanInterrupt().
> >> >> >
> >> >> > Applying attached patch and v28-0002 patch, `make check` successfully
> >> >> > completed after 20min and 50GB of logs on my environment.
> >> >> >
> >> >> > >>> On 2023-06-15 01:48, James Coleman wrote:
> >> >> > >>> > The tests have been running since last night, but have been apparently
> >> >> > >>> > hung now for many hours.
> >> >> >
> >> >> > I don't know if this has anything to do with the hung you faced, but I
> >> >> > thought
> >> >> > it might be possible that the large amount of memory usage resulted in
> >> >> > swapping, which caused a significant delay in processing.
> >> >>
> >> >> Ah, yes, I think that could be a possible explanation. I was delaying
> >> >> on this thread because I wasn't comfortable with having caused an
> >> >> issue once (even if I couldn't easily reproduce) without at least some
> >> >> theory as to the cause (and a fix).
> >> >>
> >> >> > If possible, I would be very grateful if you could try to reproduce this
> >> >> > with
> >> >> > the v29 patch.
> >> >>
> >> >> I'll kick off some testing.
> >> >>
> >> >
> >> > I don't have time to investigate what's happening here, but 24 hours
> >> > later the first "make check" is still running, and at first glance it
> >> > seems to have the same behavior I'd seen that first time. The test
> >> > output is to this point:
> >> >
> >> > # parallel group (5 tests):  index_including create_view
> >> > index_including_gist create_index create_index_spgist
> >> > ok 66        + create_index                            26365 ms
> >> > ok 67        + create_index_spgist                     27675 ms
> >> > ok 68        + create_view                              1235 ms
> >> > ok 69        + index_including                          1102 ms
> >> > ok 70        + index_including_gist                     1633 ms
> >> > # parallel group (16 tests):  create_aggregate create_cast errors
> >> > roleattributes drop_if_exists hash_func typed_table create_am
> >> > infinite_recurse
> >> >
> >> > and it hasn't progressed past that point since at least ~16 hours ago
> >> > (the first several hours of the run I wasn't monitoring it).
> >> >
> >> > I haven't connected up gdb yet, and won't be able to until maybe
> >> > tomorrow, but here's the ps output for postgres processes that are
> >> > running:
> >> >
> >> > admin    3213249  0.0  0.0 196824 20552 ?        Ss   Aug25   0:00
> >> > /home/admin/postgresql-test/bin/postgres -D
> >> > /home/admin/postgresql-test-data
> >> > admin    3213250  0.0  0.0 196964  7284 ?        Ss   Aug25   0:00
> >> > postgres: checkpointer
> >> > admin    3213251  0.0  0.0 196956  4276 ?        Ss   Aug25   0:00
> >> > postgres: background writer
> >> > admin    3213253  0.0  0.0 196956  8600 ?        Ss   Aug25   0:00
> >> > postgres: walwriter
> >> > admin    3213254  0.0  0.0 198424  7316 ?        Ss   Aug25   0:00
> >> > postgres: autovacuum launcher
> >> > admin    3213255  0.0  0.0 198412  5488 ?        Ss   Aug25   0:00
> >> > postgres: logical replication launcher
> >> > admin    3237967  0.0  0.0   2484   516 pts/4    S+   Aug25   0:00
> >> > /bin/sh -c echo "# +++ regress check in src/test/regress +++" &&
> >> >
PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH"
> >> > LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib"
> >> > INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template
> >> > ../../../src/test/regress/pg_regress --temp-instance=./tmp_check
> >> > --inputdir=. --bindir=     --dlpath=. --max-concurrent-tests=20
> >> > --schedule=./parallel_schedule
> >> > admin    3237973  0.0  0.0 197880 20688 pts/4    S+   Aug25   0:00
> >> > postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c
> >> > listen_addresses= -k /tmp/pg_regress-7mmGUa
> >> > admin    3237976  0.0  0.1 198332 44608 ?        Ss   Aug25   0:00
> >> > postgres: checkpointer
> >> > admin    3237977  0.0  0.0 198032  4640 ?        Ss   Aug25   0:00
> >> > postgres: background writer
> >> > admin    3237979  0.0  0.0 197880  8580 ?        Ss   Aug25   0:00
> >> > postgres: walwriter
> >> > admin    3237980  0.0  0.0 199484  7608 ?        Ss   Aug25   0:00
> >> > postgres: autovacuum launcher
> >> > admin    3237981  0.0  0.0 199460  5488 ?        Ss   Aug25   0:00
> >> > postgres: logical replication launcher
> >> > admin    3243644  0.0  0.2 252400 74656 ?        Ss   Aug25   0:01
> >> > postgres: admin regression [local] SELECT waiting
> >> > admin    3243645  0.0  0.1 205480 33992 ?        Ss   Aug25   0:00
> >> > postgres: admin regression [local] SELECT waiting
> >> > admin    3243654 99.9  0.0 203156 31504 ?        Rs   Aug25 1437:49
> >> > postgres: admin regression [local] VACUUM
> >> > admin    3243655  0.0  0.1 212036 38504 ?        Ss   Aug25   0:00
> >> > postgres: admin regression [local] SELECT waiting
> >> > admin    3243656  0.0  0.0 206024 30892 ?        Ss   Aug25   0:00
> >> > postgres: admin regression [local] DELETE waiting
> >> > admin    3243657  0.0  0.1 205568 32232 ?        Ss   Aug25   0:00
> >> > postgres: admin regression [local] ALTER TABLE waiting
> >> > admin    3243658  0.0  0.0 203740 21532 ?        Ss   Aug25   0:00
> >> > postgres: admin regression [local] ANALYZE waiting
> >> > admin    3243798  0.0  0.0 199884  8464 ?        Ss   Aug25   0:00
> >> > postgres: autovacuum worker
> >> > admin    3244733  0.0  0.0 199492  5956 ?        Ss   Aug25   0:00
> >> > postgres: autovacuum worker
> >> > admin    3245652  0.0  0.0 199884  8468 ?        Ss   Aug25   0:00
> >> > postgres: autovacuum worker
> >> >
> >> > As you can see there are a bunch of backends presumably waiting, and
> >> > also the VACUUM process has been pegging a single CPU core for at
> >> > least since that ~16 hour ago mark.
> >> >
> >> > I hope to be able to do more investigation later, but I wanted to at
> >> > least give you this information now.
> >>
> >> Thanks a lot for testing the patch!
> >> I really appreciate your cooperation.
> >>
> >> Hmm, I also tested on the current HEAD(165d581f146b09) again on Ubuntu
> >> 22.04 and macOS, but unfortunately(fortunately?) they succeeded as
> >> below:
> >>
> >> ```
> >> $ git apply v29-0001-Add-function-to-log-the-plan-of-the-query.patch
> >> $ git apply
> >> v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch
> >> $ ./configure --enable-debug --enable-cassert
> >> $ make
> >> $ make check
> >>
> >> ...(snip)...
> >>
> >> # parallel group (5 tests):  index_including index_including_gist
> >> create_view create_index create_index_spgist
> >> ok 66        + create_index                            25033 ms
> >> ok 67        + create_index_spgist                     26144 ms
> >> ok 68        + create_view                              3061 ms
> >> ok 69        + index_including                           976 ms
> >> ok 70        + index_including_gist                     2998 ms
> >> # parallel group (16 tests):  create_cast errors create_aggregate
> >> roleattributes drop_if_exists hash_func typed_table
> >> create_am select constraints updatable_views inherit triggers vacuum
> >> create_function_sql infinite_recurse
> >> ok 71        + create_aggregate                          225 ms
> >> ok 72        + create_function_sql                     18874 ms
> >> ok 73        + create_cast                               168 ms
> >>
> >> ...(snip)...
> >>
> >> # All 215 tests passed.
> >> ```
> >>
> >> If you notice any difference, I would be grateful if you could let me
> >> know.
> >
> > I've never been able to reproduce it (haven't tested the new version,
> > but v28 at least) on my M1 Mac; where I've reproduced it is on Debian
> > (first buster and now bullseye).
> >
> > I'm attaching several stacktraces in the hope that they provide some
> > clues. These all match the ps output I sent earlier, though note in
> > that output there is both the regress instance and my test instance
> > (pid 3213249) running (different ports, of course, and they are from
> > the exact same compilation run). I've attached ps output for the
> > postgres processes under the make check process to simplify cross
> > referencing.
> >
> > A few interesting things:
> > - There's definitely a lock on a relation that seems to be what's
> > blocking the processes.
> > - When I try to connect with psql the process forks but then hangs
> > (see the ps output with task names stuck in "authentication"). I've
> > also included a trace from one of these.
>
> Thanks for sharing them!
>
> Many processes are waiting to acquire the LW lock, including the process
> trying to output the plan(select1.trace).
>
> I suspect that this is due to a lock that was acquired prior to being
> interrupted by ProcessLogQueryPlanInterrupt(), but have not been able to
> reproduce the same situation..
>

I don't have time immediately to dig into this, but perhaps loading up
the core dumps would allow us to see what query is running in each
backend process (if it hasn't already been discarded by that point)
and thereby determine what point in each test process led to the error
condition?

Alternatively we might be able to apply the same trick to the test
client instead...

BTW, for my own easy reference in this thread: relid 1259 is pg_class
if I'm not mistaken.

Thoughts?
James Coleman

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

07 September 2023, 06:09:12

On 2023-09-06 11:17, James Coleman wrote:

>> > I've never been able to reproduce it (haven't tested the new version,
>> > but v28 at least) on my M1 Mac; where I've reproduced it is on Debian
>> > (first buster and now bullseye).
>> >
>> > I'm attaching several stacktraces in the hope that they provide some
>> > clues. These all match the ps output I sent earlier, though note in
>> > that output there is both the regress instance and my test instance
>> > (pid 3213249) running (different ports, of course, and they are from
>> > the exact same compilation run). I've attached ps output for the
>> > postgres processes under the make check process to simplify cross
>> > referencing.
>> >
>> > A few interesting things:
>> > - There's definitely a lock on a relation that seems to be what's
>> > blocking the processes.
>> > - When I try to connect with psql the process forks but then hangs
>> > (see the ps output with task names stuck in "authentication"). I've
>> > also included a trace from one of these.
>> 
>> Thanks for sharing them!
>> 
>> Many processes are waiting to acquire the LW lock, including the 
>> process
>> trying to output the plan(select1.trace).
>> 
>> I suspect that this is due to a lock that was acquired prior to being
>> interrupted by ProcessLogQueryPlanInterrupt(), but have not been able 
>> to
>> reproduce the same situation..
>> 
> 
> I don't have time immediately to dig into this, but perhaps loading up
> the core dumps would allow us to see what query is running in each
> backend process (if it hasn't already been discarded by that point)
> and thereby determine what point in each test process led to the error
> condition?

Thanks for the suggestion.
I am concerned that core dumps may not be readable on different 
operating systems or other environments. (Unfortunately, I do not have 
Debian on hand)

It seems that we can know what queries were running from the stack 
traces you shared.
As described above, I suspect a lock which was acquired prior to 
ProcessLogQueryPlanInterrupt() caused the issue.
I will try a little more to see if I can devise a way to create the same 
situation.

> Alternatively we might be able to apply the same trick to the test
> client instead...
> 
> BTW, for my own easy reference in this thread: relid 1259 is pg_class
> if I'm not mistaken.

Yeah, and I think it's strange that the lock to 1259 appears twice and 
must be avoided.

   #10 0x0000559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) at 
lmgr.c:117
   ..
   #49 0x0000559d61b4989d in ProcessLogQueryPlanInterrupt () at 
explain.c:5158
   ..
   #53 0x0000559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) at 
lmgr.c:117

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

"Lepikhov Andrei"

Date:

15 September 2023, 06:21:27

On Thu, Sep 7, 2023, at 1:09 PM, torikoshia wrote:
> On 2023-09-06 11:17, James Coleman wrote:
> It seems that we can know what queries were running from the stack 
> traces you shared.
> As described above, I suspect a lock which was acquired prior to 
> ProcessLogQueryPlanInterrupt() caused the issue.
> I will try a little more to see if I can devise a way to create the same 
> situation.
Hi,
I have explored this patch and, by and large, agree with the code. But some questions I want to discuss:
1. Maybe add a hook to give a chance for extensions, like pg_query_state, to do their job?
2. In this implementation, ProcessInterrupts does a lot of work during the explain creation: memory allocations, pass
throughthe plan, systables locks, syscache access, etc. I guess it can change the semantic meaning of 'safe place'
whereCHECK_FOR_INTERRUPTS can be called - I can imagine a SELECT query, which would call a stored procedure, which
executessome DDL and acquires row exclusive lock at the time of interruption. So, in my mind, it is too unpredictable
tomake the explain in the place of interruption processing. Maybe it is better to think about some hook at the end of
ExecProcNode,where a pending explain could be created?
 

Also, I suggest minor code change with the diff in attachment. 

-- 
Regards,
Andrei Lepikhov

Attachment

improve.diff

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

19 September 2023, 13:39:59

On 2023-09-15 15:21, Lepikhov Andrei wrote:
> On Thu, Sep 7, 2023, at 1:09 PM, torikoshia wrote:
>> On 2023-09-06 11:17, James Coleman wrote:
>> It seems that we can know what queries were running from the stack
>> traces you shared.
>> As described above, I suspect a lock which was acquired prior to
>> ProcessLogQueryPlanInterrupt() caused the issue.
>> I will try a little more to see if I can devise a way to create the 
>> same
>> situation.
> Hi,
> I have explored this patch and, by and large, agree with the code. But
> some questions I want to discuss:
> 1. Maybe add a hook to give a chance for extensions, like
> pg_query_state, to do their job?

Do you imagine adding a hook which enables adding custom interrupt codes 
like below?

https://github.com/postgrespro/pg_query_state/blob/master/patches/custom_signals_15.0.patch

If so, that would be possible, but this patch doesn't require the 
functionality and I feel it'd be better doing in independent patch.


> 2. In this implementation, ProcessInterrupts does a lot of work during
> the explain creation: memory allocations, pass through the plan,
> systables locks, syscache access, etc. I guess it can change the
> semantic meaning of 'safe place' where CHECK_FOR_INTERRUPTS can be
> called - I can imagine a SELECT query, which would call a stored
> procedure, which executes some DDL and acquires row exclusive lock at
> the time of interruption. So, in my mind, it is too unpredictable to
> make the explain in the place of interruption processing. Maybe it is
> better to think about some hook at the end of ExecProcNode, where a
> pending explain could be created?

Yeah, I withdrew this patch once for that reason, but we are resuming 
development in response to the results of a discussion by James and 
others at this year's pgcon about the safety of this process in CFI:

https://www.postgresql.org/message-id/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw%40mail.gmail.com

BTW it seems pg_query_state also enables users to get running query 
plans using CFI.
Does pg_query_state do something for this safety concern?

> Also, I suggest minor code change with the diff in attachment.

Thanks!

This might be biased opinion and objections are welcomed, but I feel the 
original patch is easier to read since PG_RETURN_BOOL(true/false) is 
located in near place to each cases.
Also the existing function pg_log_backend_memory_contexts(), which does 
the same thing, has the same form as the original patch.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

"Lepikhov Andrei"

Date:

20 September 2023, 05:39:13

On Tue, Sep 19, 2023, at 8:39 PM, torikoshia wrote:
> On 2023-09-15 15:21, Lepikhov Andrei wrote:
>> On Thu, Sep 7, 2023, at 1:09 PM, torikoshia wrote:
>> I have explored this patch and, by and large, agree with the code. But
>> some questions I want to discuss:
>> 1. Maybe add a hook to give a chance for extensions, like
>> pg_query_state, to do their job?
>
> Do you imagine adding a hook which enables adding custom interrupt codes 
> like below?
>
> https://github.com/postgrespro/pg_query_state/blob/master/patches/custom_signals_15.0.patch

No, I think around the hook, which would allow us to rewrite the  pg_query_state extension without additional patches
byusing the functionality provided by your patch. I mean, an extension could provide console UI, not only server
logging.And obtain from target backend so much information in the explain as the instrumentation level of the current
querycan give.

It may make pg_query_state shorter and more stable.

>> 2. In this implementation, ProcessInterrupts does a lot of work during
>> the explain creation: memory allocations, pass through the plan,
>> systables locks, syscache access, etc. I guess it can change the
>> semantic meaning of 'safe place' where CHECK_FOR_INTERRUPTS can be
>> called - I can imagine a SELECT query, which would call a stored
>> procedure, which executes some DDL and acquires row exclusive lock at
>> the time of interruption. So, in my mind, it is too unpredictable to
>> make the explain in the place of interruption processing. Maybe it is
>> better to think about some hook at the end of ExecProcNode, where a
>> pending explain could be created?
>
> Yeah, I withdrew this patch once for that reason, but we are resuming 
> development in response to the results of a discussion by James and 
> others at this year's pgcon about the safety of this process in CFI:
>
> https://www.postgresql.org/message-id/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw%40mail.gmail.com

I can't track the logic path  of the decision provided at this conference. But my anxiety related to specific place,
whereActiveQueryDesc points top-level query and interruption comes during DDL, wrapped up in stored procedure. For
example:
CREATE TABLE test();
CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$
BEGIN
  EXECUTE format('ALTER TABLE test ADD COLUMN x integer;');
  ...
END; $$ LANGUAGE plpgsql VOLATILE;
SELECT ddl(), ... FROM ...;

> BTW it seems pg_query_state also enables users to get running query 
> plans using CFI.
> Does pg_query_state do something for this safety concern?

No, and I'm looking for the solution, which could help to rewrite pg_query_state as a clean extension, without
patches.

>> Also, I suggest minor code change with the diff in attachment.
>
> Thanks!
>
> This might be biased opinion and objections are welcomed, but I feel the 
> original patch is easier to read since PG_RETURN_BOOL(true/false) is 
> located in near place to each cases.
> Also the existing function pg_log_backend_memory_contexts(), which does 
> the same thing, has the same form as the original patch.
I got it, thank you.

-- 
Regards,
Andrei Lepikhov

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

25 September 2023, 07:21:31

On 2023-09-20 14:39, Lepikhov Andrei wrote:
Thanks for your reply.

> On Tue, Sep 19, 2023, at 8:39 PM, torikoshia wrote:
>> On 2023-09-15 15:21, Lepikhov Andrei wrote:
>>> On Thu, Sep 7, 2023, at 1:09 PM, torikoshia wrote:
>>> I have explored this patch and, by and large, agree with the code. 
>>> But
>>> some questions I want to discuss:
>>> 1. Maybe add a hook to give a chance for extensions, like
>>> pg_query_state, to do their job?
>> 
>> Do you imagine adding a hook which enables adding custom interrupt 
>> codes
>> like below?
>> 
>> https://github.com/postgrespro/pg_query_state/blob/master/patches/custom_signals_15.0.patch
> 
> No, I think around the hook, which would allow us to rewrite the
> pg_query_state extension without additional patches by using the
> functionality provided by your patch. I mean, an extension could
> provide console UI, not only server logging. And obtain from target
> backend so much information in the explain as the instrumentation
> level of the current query can give.
> It may make pg_query_state shorter and more stable.
> 
>>> 2. In this implementation, ProcessInterrupts does a lot of work 
>>> during
>>> the explain creation: memory allocations, pass through the plan,
>>> systables locks, syscache access, etc. I guess it can change the
>>> semantic meaning of 'safe place' where CHECK_FOR_INTERRUPTS can be
>>> called - I can imagine a SELECT query, which would call a stored
>>> procedure, which executes some DDL and acquires row exclusive lock at
>>> the time of interruption. So, in my mind, it is too unpredictable to
>>> make the explain in the place of interruption processing. Maybe it is
>>> better to think about some hook at the end of ExecProcNode, where a
>>> pending explain could be created?
>> 
>> Yeah, I withdrew this patch once for that reason, but we are resuming
>> development in response to the results of a discussion by James and
>> others at this year's pgcon about the safety of this process in CFI:
>> 
>> https://www.postgresql.org/message-id/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw%40mail.gmail.com
> 
> I can't track the logic path  of the decision provided at this
> conference. But my anxiety related to specific place, where
> ActiveQueryDesc points top-level query and interruption comes during
> DDL, wrapped up in stored procedure. For example:
> CREATE TABLE test();
> CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$
> BEGIN
>   EXECUTE format('ALTER TABLE test ADD COLUMN x integer;');
>   ...
> END; $$ LANGUAGE plpgsql VOLATILE;
> SELECT ddl(), ... FROM ...;

Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on 
all CFI using 
v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then 
ran the following query but did not cause any problems.

```
=# CREATE TABLE test();
=# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$
BEGIN
   EXECUTE format('ALTER TABLE test ADD COLUMN x integer;');
   PERFORM pg_sleep(5);
END; $$ LANGUAGE plpgsql VOLATILE;
=# SELECT ddl();
```

Is this the case you're worrying about?


-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

Andrey Lepikhov

Date:

25 September 2023, 09:49:18

On 25/9/2023 14:21, torikoshia wrote:
> On 2023-09-20 14:39, Lepikhov Andrei wrote:
> Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on 
> all CFI using 
> v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then 
> ran the following query but did not cause any problems.
> 
> ```
> =# CREATE TABLE test();
> =# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$
> BEGIN
>    EXECUTE format('ALTER TABLE test ADD COLUMN x integer;');
>    PERFORM pg_sleep(5);
> END; $$ LANGUAGE plpgsql VOLATILE;
> =# SELECT ddl();
> ```
> 
> Is this the case you're worrying about?

I didn't find a problem either. I just feel uncomfortable if, at the 
moment of interruption, we have a descriptor of another query than the 
query have been executing and holding resources.

-- 
regards,
Andrey Lepikhov
Postgres Professional

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

28 September 2023, 02:04:09

On 2023-09-25 18:49, Andrey Lepikhov wrote:
> On 25/9/2023 14:21, torikoshia wrote:
>> On 2023-09-20 14:39, Lepikhov Andrei wrote:
>> Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on 
>> all CFI using 
>> v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then 
>> ran the following query but did not cause any problems.
>> 
>> ```
>> =# CREATE TABLE test();
>> =# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$
>> BEGIN
>>    EXECUTE format('ALTER TABLE test ADD COLUMN x integer;');
>>    PERFORM pg_sleep(5);
>> END; $$ LANGUAGE plpgsql VOLATILE;
>> =# SELECT ddl();
>> ```
>> 
>> Is this the case you're worrying about?
> 
> I didn't find a problem either. I just feel uncomfortable if, at the
> moment of interruption, we have a descriptor of another query than the
> query have been executing and holding resources.

I think that "descriptor" here refers to ActiveQueryDesc, in which case
it is updated at the beginning of ExecutorRun(), so I am wondering if
the situation you're worried about would not occur.

---------
@@ -303,10 +306,21 @@ ExecutorRun(QueryDesc *queryDesc,
             ScanDirection direction, uint64 count,
             bool execute_once)
  {
+   /*
+    * Update ActiveQueryDesc here to enable retrieval of the currently
+    * running queryDesc for nested queries.
+    */
+   QueryDesc *save_ActiveQueryDesc;
+
+   save_ActiveQueryDesc = ActiveQueryDesc;
+   ActiveQueryDesc = queryDesc;
---------
-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

Andrey Lepikhov

Date:

28 September 2023, 02:32:46

On 28/9/2023 09:04, torikoshia wrote:
> On 2023-09-25 18:49, Andrey Lepikhov wrote:
>> On 25/9/2023 14:21, torikoshia wrote:
>>> On 2023-09-20 14:39, Lepikhov Andrei wrote:
>>> Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on 
>>> all CFI using 
>>> v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and 
>>> then ran the following query but did not cause any problems.
>>>
>>> ```
>>> =# CREATE TABLE test();
>>> =# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$
>>> BEGIN
>>>    EXECUTE format('ALTER TABLE test ADD COLUMN x integer;');
>>>    PERFORM pg_sleep(5);
>>> END; $$ LANGUAGE plpgsql VOLATILE;
>>> =# SELECT ddl();
>>> ```
>>>
>>> Is this the case you're worrying about?
>>
>> I didn't find a problem either. I just feel uncomfortable if, at the
>> moment of interruption, we have a descriptor of another query than the
>> query have been executing and holding resources.
> 
> I think that "descriptor" here refers to ActiveQueryDesc, in which case
> it is updated at the beginning of ExecutorRun(), so I am wondering if
> the situation you're worried about would not occur.
As you can see, in my example we have the only DDL and no queries with 
plans. In this case postgres doesn't call ExecutorRun() just because it 
doesn't have a plan. But locks will be obtained.

-- 
regards,
Andrey Lepikhov
Postgres Professional

Re: RFC: Logging plan of the running query

From

James Coleman

Date:

03 October 2023, 18:00:11

On Thu, Sep 7, 2023 at 2:09 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2023-09-06 11:17, James Coleman wrote:
>
> >> > I've never been able to reproduce it (haven't tested the new version,
> >> > but v28 at least) on my M1 Mac; where I've reproduced it is on Debian
> >> > (first buster and now bullseye).
> >> >
> >> > I'm attaching several stacktraces in the hope that they provide some
> >> > clues. These all match the ps output I sent earlier, though note in
> >> > that output there is both the regress instance and my test instance
> >> > (pid 3213249) running (different ports, of course, and they are from
> >> > the exact same compilation run). I've attached ps output for the
> >> > postgres processes under the make check process to simplify cross
> >> > referencing.
> >> >
> >> > A few interesting things:
> >> > - There's definitely a lock on a relation that seems to be what's
> >> > blocking the processes.
> >> > - When I try to connect with psql the process forks but then hangs
> >> > (see the ps output with task names stuck in "authentication"). I've
> >> > also included a trace from one of these.
> >>
> >> Thanks for sharing them!
> >>
> >> Many processes are waiting to acquire the LW lock, including the
> >> process
> >> trying to output the plan(select1.trace).
> >>
> >> I suspect that this is due to a lock that was acquired prior to being
> >> interrupted by ProcessLogQueryPlanInterrupt(), but have not been able
> >> to
> >> reproduce the same situation..
> >>
> >
> > I don't have time immediately to dig into this, but perhaps loading up
> > the core dumps would allow us to see what query is running in each
> > backend process (if it hasn't already been discarded by that point)
> > and thereby determine what point in each test process led to the error
> > condition?
>
> Thanks for the suggestion.
> I am concerned that core dumps may not be readable on different
> operating systems or other environments. (Unfortunately, I do not have
> Debian on hand)
>
> It seems that we can know what queries were running from the stack
> traces you shared.
> As described above, I suspect a lock which was acquired prior to
> ProcessLogQueryPlanInterrupt() caused the issue.
> I will try a little more to see if I can devise a way to create the same
> situation.
>
> > Alternatively we might be able to apply the same trick to the test
> > client instead...
> >
> > BTW, for my own easy reference in this thread: relid 1259 is pg_class
> > if I'm not mistaken.
>
> Yeah, and I think it's strange that the lock to 1259 appears twice and
> must be avoided.
>
>    #10 0x0000559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) at
> lmgr.c:117
>    ..
>    #49 0x0000559d61b4989d in ProcessLogQueryPlanInterrupt () at
> explain.c:5158
>    ..
>    #53 0x0000559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) at
> lmgr.c:117

I chatted with Andres and David about this at PGConf.NYC, and I think
what we need to do is explicitly disallow running this code any time
we are inside of lock acquisition code.

Regards,
James Coleman

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

06 October 2023, 12:58:46

On 2023-10-04 03:00, James Coleman wrote:
> On Thu, Sep 7, 2023 at 2:09 AM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> On 2023-09-06 11:17, James Coleman wrote:
>> 
>> >> > I've never been able to reproduce it (haven't tested the new version,
>> >> > but v28 at least) on my M1 Mac; where I've reproduced it is on Debian
>> >> > (first buster and now bullseye).
>> >> >
>> >> > I'm attaching several stacktraces in the hope that they provide some
>> >> > clues. These all match the ps output I sent earlier, though note in
>> >> > that output there is both the regress instance and my test instance
>> >> > (pid 3213249) running (different ports, of course, and they are from
>> >> > the exact same compilation run). I've attached ps output for the
>> >> > postgres processes under the make check process to simplify cross
>> >> > referencing.
>> >> >
>> >> > A few interesting things:
>> >> > - There's definitely a lock on a relation that seems to be what's
>> >> > blocking the processes.
>> >> > - When I try to connect with psql the process forks but then hangs
>> >> > (see the ps output with task names stuck in "authentication"). I've
>> >> > also included a trace from one of these.
>> >>
>> >> Thanks for sharing them!
>> >>
>> >> Many processes are waiting to acquire the LW lock, including the
>> >> process
>> >> trying to output the plan(select1.trace).
>> >>
>> >> I suspect that this is due to a lock that was acquired prior to being
>> >> interrupted by ProcessLogQueryPlanInterrupt(), but have not been able
>> >> to
>> >> reproduce the same situation..
>> >>
>> >
>> > I don't have time immediately to dig into this, but perhaps loading up
>> > the core dumps would allow us to see what query is running in each
>> > backend process (if it hasn't already been discarded by that point)
>> > and thereby determine what point in each test process led to the error
>> > condition?
>> 
>> Thanks for the suggestion.
>> I am concerned that core dumps may not be readable on different
>> operating systems or other environments. (Unfortunately, I do not have
>> Debian on hand)
>> 
>> It seems that we can know what queries were running from the stack
>> traces you shared.
>> As described above, I suspect a lock which was acquired prior to
>> ProcessLogQueryPlanInterrupt() caused the issue.
>> I will try a little more to see if I can devise a way to create the 
>> same
>> situation.
>> 
>> > Alternatively we might be able to apply the same trick to the test
>> > client instead...
>> >
>> > BTW, for my own easy reference in this thread: relid 1259 is pg_class
>> > if I'm not mistaken.
>> 
>> Yeah, and I think it's strange that the lock to 1259 appears twice and
>> must be avoided.
>> 
>>    #10 0x0000559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) 
>> at
>> lmgr.c:117
>>    ..
>>    #49 0x0000559d61b4989d in ProcessLogQueryPlanInterrupt () at
>> explain.c:5158
>>    ..
>>    #53 0x0000559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) 
>> at
>> lmgr.c:117
> 
> I chatted with Andres and David about this at PGConf.NYC,
Thanks again for the discussion with hackers!

> and I think
> what we need to do is explicitly disallow running this code any time
> we are inside of lock acquisition code.

Yeah, I think it's a straightforward workaround.
And I'm also concerned that the condition of being in the process
of acquiring some kind of lock is too strict and will make it almost
impossible to output a running plan.

Anyway I'm going to implement it and run pg_log_query_plan()
while the regression test is running to see how successful the plan
output is.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

James Coleman

Date:

06 October 2023, 13:10:15

On Fri, Oct 6, 2023 at 8:58 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2023-10-04 03:00, James Coleman wrote:
> > On Thu, Sep 7, 2023 at 2:09 AM torikoshia <torikoshia@oss.nttdata.com>
> > wrote:
> >>
> >> On 2023-09-06 11:17, James Coleman wrote:
> >>
> >> >> > I've never been able to reproduce it (haven't tested the new version,
> >> >> > but v28 at least) on my M1 Mac; where I've reproduced it is on Debian
> >> >> > (first buster and now bullseye).
> >> >> >
> >> >> > I'm attaching several stacktraces in the hope that they provide some
> >> >> > clues. These all match the ps output I sent earlier, though note in
> >> >> > that output there is both the regress instance and my test instance
> >> >> > (pid 3213249) running (different ports, of course, and they are from
> >> >> > the exact same compilation run). I've attached ps output for the
> >> >> > postgres processes under the make check process to simplify cross
> >> >> > referencing.
> >> >> >
> >> >> > A few interesting things:
> >> >> > - There's definitely a lock on a relation that seems to be what's
> >> >> > blocking the processes.
> >> >> > - When I try to connect with psql the process forks but then hangs
> >> >> > (see the ps output with task names stuck in "authentication"). I've
> >> >> > also included a trace from one of these.
> >> >>
> >> >> Thanks for sharing them!
> >> >>
> >> >> Many processes are waiting to acquire the LW lock, including the
> >> >> process
> >> >> trying to output the plan(select1.trace).
> >> >>
> >> >> I suspect that this is due to a lock that was acquired prior to being
> >> >> interrupted by ProcessLogQueryPlanInterrupt(), but have not been able
> >> >> to
> >> >> reproduce the same situation..
> >> >>
> >> >
> >> > I don't have time immediately to dig into this, but perhaps loading up
> >> > the core dumps would allow us to see what query is running in each
> >> > backend process (if it hasn't already been discarded by that point)
> >> > and thereby determine what point in each test process led to the error
> >> > condition?
> >>
> >> Thanks for the suggestion.
> >> I am concerned that core dumps may not be readable on different
> >> operating systems or other environments. (Unfortunately, I do not have
> >> Debian on hand)
> >>
> >> It seems that we can know what queries were running from the stack
> >> traces you shared.
> >> As described above, I suspect a lock which was acquired prior to
> >> ProcessLogQueryPlanInterrupt() caused the issue.
> >> I will try a little more to see if I can devise a way to create the
> >> same
> >> situation.
> >>
> >> > Alternatively we might be able to apply the same trick to the test
> >> > client instead...
> >> >
> >> > BTW, for my own easy reference in this thread: relid 1259 is pg_class
> >> > if I'm not mistaken.
> >>
> >> Yeah, and I think it's strange that the lock to 1259 appears twice and
> >> must be avoided.
> >>
> >>    #10 0x0000559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1)
> >> at
> >> lmgr.c:117
> >>    ..
> >>    #49 0x0000559d61b4989d in ProcessLogQueryPlanInterrupt () at
> >> explain.c:5158
> >>    ..
> >>    #53 0x0000559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1)
> >> at
> >> lmgr.c:117
> >
> > I chatted with Andres and David about this at PGConf.NYC,
> Thanks again for the discussion with hackers!
>
> > and I think
> > what we need to do is explicitly disallow running this code any time
> > we are inside of lock acquisition code.
>
> Yeah, I think it's a straightforward workaround.
> And I'm also concerned that the condition of being in the process
> of acquiring some kind of lock is too strict and will make it almost
> impossible to output a running plan.

I was concerned about this too, but I was wondering if we might be
able to cheat a bit by making such a case not clear the flag but
instead just skip it for now.

Regards,
James

Re: RFC: Logging plan of the running query

From

Andres Freund

Date:

06 October 2023, 15:58:50

Hi,

On 2023-10-06 21:58:46 +0900, torikoshia wrote:
> Yeah, I think it's a straightforward workaround.
> And I'm also concerned that the condition of being in the process
> of acquiring some kind of lock is too strict and will make it almost
> impossible to output a running plan.

How so? We shouldn't commonly acquire relevant locks while executing a query?
With a few exceptions, they should instead be acquired t the start of query
processing. We do acquire a lot of lwlocks, obviously, but we don't process
interrupts during the acquisition / holding of lwlocks.

And presumably the interrupt would just be processed the next time interrupt
processing is happening?

Greetings,

Andres Freund

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

10 October 2023, 13:27:08

On 2023-10-04 03:00, James Coleman wrote:
> and I think
> what we need to do is explicitly disallow running this code any time
> we are inside of lock acquisition code.

Updated patch to check if any locks have already been acquired by 
examining MyProc->heldLocks.

I'm not sure this change can "disallow running this code `any time` we 
are inside of lock acquisition code", but as far as select1.trace, which 
you shared, I believe it can prevent running explain codes since it must 
have set MyProc->heldLocks in LockAcquireExtended() before WaitOnLock():

```
          /*
           * Set bitmask of locks this process already holds on this 
object.
           */
          MyProc->heldLocks = proclock->holdMask;

         ..(snip)..

          WaitOnLock(locallock, owner);
```

On 2023-10-07 00:58, Andres Freund wrote:

> How so? We shouldn't commonly acquire relevant locks while executing a 
> query?
> With a few exceptions, they should instead be acquired t the start of 
> query
> processing. We do acquire a lot of lwlocks, obviously, but we don't 
> process
> interrupts during the acquisition / holding of lwlocks.
> 
> And presumably the interrupt would just be processed the next time 
> interrupt
> processing is happening?

Thanks for your comments!

I tested v30 patch with 
v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch which makes 
CFI() call ProcessLogQueryPlanInterrupt() internally, and confirmed that 
very few logging queries failed with v30 patch.

This is a result in line with your prediction.


```
  $ rg -c'ignored request for logging query plan due to lock confilcts' 
postmaster.log
  8
```

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Attachment

v30-0001-Add-function-to-log-the-plan-of-the-query.patch

Re: RFC: Logging plan of the running query

From

Ashutosh Bapat

Date:

11 October 2023, 07:22:00

Hi,

On Tue, Oct 10, 2023 at 7:00 PM torikoshia <torikoshia@oss.nttdata.com> wrote:

> Thanks for your comments!
>
> I tested v30 patch with
> v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch which makes
> CFI() call ProcessLogQueryPlanInterrupt() internally, and confirmed that
> very few logging queries failed with v30 patch.
>
> This is a result in line with your prediction.
>
>
> ```
>   $ rg -c'ignored request for logging query plan due to lock confilcts'
> postmaster.log
>   8

Like many others I think this feature is useful to debug a long running query.

Sorry for jumping late into this.

I have a few of high level comments

There is a lot of similarity between what this feature does and what
auto explain does. I see the code is also duplicated. There is some
merit in avoiding this duplication
1. we will get all the features of auto_explain automatically like
choosing a format (this was expressed somebody earlier in this
thread), setings etc.
2. avoid bugs. E.g your code switches context after ExplainState has
been allocated. These states may leak depending upon when this
function gets called.
3. Building features on top as James envisions will be easier.

Considering the similarity with auto_explain I wondered whether this
function should be part of auto_explain contrib module itself? If we
do that users will need to load auto_explain extension and thus
install executor hooks when this function doesn't need those. So may
not be such a good idea. I didn't see any discussion on this.

I tried following query to pass PID of a non-client backend to this function.
#select pg_log_query_plan(pid), application_name, backend_type from
pg_stat_activity where backend_type = 'autovacuum launcher';
 pg_log_query_plan | application_name |    backend_type
-------------------+------------------+---------------------
 t                 |                  | autovacuum launcher
(1 row)
I see "LOG:  backend with PID 2733631 is not running a query or a
subtransaction is aborted" in server logs. That's ok. But may be we
should not send signal to these kinds of backends at all, thus
avoiding some system calls.

I am also wondering whether it's better to report the WARNING as
status column in the output. E.g. instead of
#select pg_log_query_plan(100);
WARNING:  PID 100 is not a PostgreSQL backend process
 pg_log_query_plan
-------------------
 f
(1 row)
we output
#select pg_log_query_plan(100);
 pg_log_query_plan |                   status
-------------------+---------------------------------------------
 f                 | PID 100 is not a PostgreSQL backend process
(1 row)

That looks neater and can easily be handled by scripts, applications
and such. But it will be inconsistent with other functions like
pg_terminate_backend() and pg_log_backend_memory_contexts().

I do share a concern that was discussed earlier. If a query is running
longer, there's something problematic with it. A diagnostic
intervention breaking it further would be unwelcome. James has run
experiments to shake this code for any loose breakages. He has not
found any. So may be we are good. And we wouldn't know about very rare
corner cases so easily without using it in the field. So fine with it.
If we could add some safety net that will be great but may not be
necessary for the first cut.

--
Best Wishes,
Ashutosh Bapat

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

12 October 2023, 13:21:26

On 2023-10-11 16:22, Ashutosh Bapat wrote:

> Like many others I think this feature is useful to debug a long running 
> query.
> 
> Sorry for jumping late into this.
> 
> I have a few of high level comments

Thanks for your comments!

> There is a lot of similarity between what this feature does and what
> auto explain does. I see the code is also duplicated. There is some
> merit in avoiding this duplication
> 1. we will get all the features of auto_explain automatically like
> choosing a format (this was expressed somebody earlier in this
> thread), setings etc.
> 2. avoid bugs. E.g your code switches context after ExplainState has
> been allocated. These states may leak depending upon when this
> function gets called.
> 3. Building features on top as James envisions will be easier.
> 
> Considering the similarity with auto_explain I wondered whether this
> function should be part of auto_explain contrib module itself? If we
> do that users will need to load auto_explain extension and thus
> install executor hooks when this function doesn't need those. So may
> not be such a good idea. I didn't see any discussion on this.

I once thought about adding this to auto_explain, but I left it asis for 
below reasons:

- One of the typical use case of pg_log_query_plan() would be analyzing 
slow query on customer environments. On such environments, We cannot 
always control what extensions to install.
   Of course auto_explain is a major extension and it is quite possible 
that they installed auto_explain, but but it is also possible they do 
not.
- It seems a bit counter-intuitive that pg_log_query_plan() is in an 
extension called auto_explain, since it `manually`` logs plans

> I tried following query to pass PID of a non-client backend to this 
> function.
> #select pg_log_query_plan(pid), application_name, backend_type from
> pg_stat_activity where backend_type = 'autovacuum launcher';
>  pg_log_query_plan | application_name |    backend_type
> -------------------+------------------+---------------------
>  t                 |                  | autovacuum launcher
> (1 row)
> I see "LOG:  backend with PID 2733631 is not running a query or a
> subtransaction is aborted" in server logs. That's ok. But may be we
> should not send signal to these kinds of backends at all, thus
> avoiding some system calls.

Agreed, it seems better.
Attached patch checks if the backendType of target process is 'client 
backend'.

   =# select pg_log_query_plan(pid), application_name, backend_type from  
pg_stat_activity where backend_type = 'autovacuum launcher';
   WARNING:  PID 63323 is not a PostgreSQL client backend process
    pg_log_query_plan | application_name |    backend_type
   -------------------+------------------+---------------------
    f                 |                  | autovacuum launcher


> I am also wondering whether it's better to report the WARNING as
> status column in the output. E.g. instead of
> #select pg_log_query_plan(100);
> WARNING:  PID 100 is not a PostgreSQL backend process
>  pg_log_query_plan
> -------------------
>  f
> (1 row)
> we output
> #select pg_log_query_plan(100);
>  pg_log_query_plan |                   status
> -------------------+---------------------------------------------
>  f                 | PID 100 is not a PostgreSQL backend process
> (1 row)
> 
> That looks neater and can easily be handled by scripts, applications
> and such. But it will be inconsistent with other functions like
> pg_terminate_backend() and pg_log_backend_memory_contexts().

It seems neater, but it might be inconvenient because we can no longer 
use it  in select list like the following query as you wrote:

   #select pg_log_query_plan(pid), application_name, backend_type from
   pg_stat_activity where backend_type = 'autovacuum launcher';

> I do share a concern that was discussed earlier. If a query is running
> longer, there's something problematic with it. A diagnostic
> intervention breaking it further would be unwelcome. James has run
> experiments to shake this code for any loose breakages. He has not
> found any. So may be we are good. And we wouldn't know about very rare
> corner cases so easily without using it in the field. So fine with it.
> If we could add some safety net that will be great but may not be
> necessary for the first cut.

If there are candidates for the safety net, I'm willing to add them.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Attachment

v31-0001-Add-function-to-log-the-plan-of-the-query.patch

Re: RFC: Logging plan of the running query

From

Ashutosh Bapat

Date:

16 October 2023, 09:46:28

On Thu, Oct 12, 2023 at 6:51 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2023-10-11 16:22, Ashutosh Bapat wrote:
> >
> > Considering the similarity with auto_explain I wondered whether this
> > function should be part of auto_explain contrib module itself? If we
> > do that users will need to load auto_explain extension and thus
> > install executor hooks when this function doesn't need those. So may
> > not be such a good idea. I didn't see any discussion on this.
>
> I once thought about adding this to auto_explain, but I left it asis for
> below reasons:
>
> - One of the typical use case of pg_log_query_plan() would be analyzing
> slow query on customer environments. On such environments, We cannot
> always control what extensions to install.

The same argument applies to auto_explain functionality as well. But
it's not part of the core.

>    Of course auto_explain is a major extension and it is quite possible
> that they installed auto_explain, but but it is also possible they do
> not.
> - It seems a bit counter-intuitive that pg_log_query_plan() is in an
> extension called auto_explain, since it `manually`` logs plans
>

pg_log_query_plan() may not fit auto_explain but
pg_explain_backend_query() does. What we are logging is more than just
plan of the query, it might expand to be closer to explain output.
While auto in auto_explain would refer to its automatically logging
explain outputs, it can provide an additional function which provides
similar functionality by manually triggering it.

But we can defer this to a committer, if you want.

I am more interested in avoiding the duplication of code, esp. the
first comment in my reply
>> There is a lot of similarity between what this feature does and what
>> auto explain does. I see the code is also duplicated. There is some
>> merit in avoiding this duplication
>> 1. we will get all the features of auto_explain automatically like
>> choosing a format (this was expressed somebody earlier in this
>> thread), setings etc.
>> 2. avoid bugs. E.g your code switches context after ExplainState has
>> been allocated. These states may leak depending upon when this
>> function gets called.
>> 3. Building features on top as James envisions will be easier.

>
>    =# select pg_log_query_plan(pid), application_name, backend_type from
> pg_stat_activity where backend_type = 'autovacuum launcher';
>    WARNING:  PID 63323 is not a PostgreSQL client backend process
>     pg_log_query_plan | application_name |    backend_type
>    -------------------+------------------+---------------------
>     f                 |                  | autovacuum launcher
>
>
> > I am also wondering whether it's better to report the WARNING as
> > status column in the output. E.g. instead of
> > #select pg_log_query_plan(100);
> > WARNING:  PID 100 is not a PostgreSQL backend process
> >  pg_log_query_plan
> > -------------------
> >  f
> > (1 row)
> > we output
> > #select pg_log_query_plan(100);
> >  pg_log_query_plan |                   status
> > -------------------+---------------------------------------------
> >  f                 | PID 100 is not a PostgreSQL backend process
> > (1 row)
> >
> > That looks neater and can easily be handled by scripts, applications
> > and such. But it will be inconsistent with other functions like
> > pg_terminate_backend() and pg_log_backend_memory_contexts().
>
> It seems neater, but it might be inconvenient because we can no longer
> use it  in select list like the following query as you wrote:
>
>    #select pg_log_query_plan(pid), application_name, backend_type from
>    pg_stat_activity where backend_type = 'autovacuum launcher';

Why is that?

--
Best Wishes,
Ashutosh Bapat

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

18 October 2023, 06:09:38

On 2023-10-16 18:46, Ashutosh Bapat wrote:
> On Thu, Oct 12, 2023 at 6:51 PM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> On 2023-10-11 16:22, Ashutosh Bapat wrote:
>> >
>> > Considering the similarity with auto_explain I wondered whether this
>> > function should be part of auto_explain contrib module itself? If we
>> > do that users will need to load auto_explain extension and thus
>> > install executor hooks when this function doesn't need those. So may
>> > not be such a good idea. I didn't see any discussion on this.
>> 
>> I once thought about adding this to auto_explain, but I left it asis 
>> for
>> below reasons:
>> 
>> - One of the typical use case of pg_log_query_plan() would be 
>> analyzing
>> slow query on customer environments. On such environments, We cannot
>> always control what extensions to install.
> 
> The same argument applies to auto_explain functionality as well. But
> it's not part of the core.

Yeah, and when we have a situation where we want to run
pg_log_query_plan(), we can run it in any environment as long as it is
  bundled with the core.
On the other hand, if it is built into auto_explain, we need to start by
installing auto_explain if we do not have auto_explain, which is often
difficult to do in production environments.

>>    Of course auto_explain is a major extension and it is quite 
>> possible
>> that they installed auto_explain, but but it is also possible they do
>> not.
>> - It seems a bit counter-intuitive that pg_log_query_plan() is in an
>> extension called auto_explain, since it `manually`` logs plans
>> 
> 
> pg_log_query_plan() may not fit auto_explain but
> pg_explain_backend_query() does. What we are logging is more than just
> plan of the query, it might expand to be closer to explain output.
> While auto in auto_explain would refer to its automatically logging
> explain outputs, it can provide an additional function which provides
> similar functionality by manually triggering it.
> 
> But we can defer this to a committer, if you want.
> 
> I am more interested in avoiding the duplication of code, esp. the
> first comment in my reply

If there are no objections, I will try porting it to auto_explain and
see its feasibility.

>>> There is a lot of similarity between what this feature does and what
>>> auto explain does. I see the code is also duplicated. There is some
>>> merit in avoiding this duplication
>>> 1. we will get all the features of auto_explain automatically like
>>> choosing a format (this was expressed somebody earlier in this
>>> thread), setings etc.
>>> 2. avoid bugs. E.g your code switches context after ExplainState has
>>> been allocated. These states may leak depending upon when this
>>> function gets called.
>>> 3. Building features on top as James envisions will be easier.
> 
>> 
>>    =# select pg_log_query_plan(pid), application_name, backend_type 
>> from
>> pg_stat_activity where backend_type = 'autovacuum launcher';
>>    WARNING:  PID 63323 is not a PostgreSQL client backend process
>>     pg_log_query_plan | application_name |    backend_type
>>    -------------------+------------------+---------------------
>>     f                 |                  | autovacuum launcher
>> 
>> 
>> > I am also wondering whether it's better to report the WARNING as
>> > status column in the output. E.g. instead of
>> > #select pg_log_query_plan(100);
>> > WARNING:  PID 100 is not a PostgreSQL backend process
>> >  pg_log_query_plan
>> > -------------------
>> >  f
>> > (1 row)
>> > we output
>> > #select pg_log_query_plan(100);
>> >  pg_log_query_plan |                   status
>> > -------------------+---------------------------------------------
>> >  f                 | PID 100 is not a PostgreSQL backend process
>> > (1 row)
>> >
>> > That looks neater and can easily be handled by scripts, applications
>> > and such. But it will be inconsistent with other functions like
>> > pg_terminate_backend() and pg_log_backend_memory_contexts().
>> 
>> It seems neater, but it might be inconvenient because we can no longer
>> use it  in select list like the following query as you wrote:
>> 
>>    #select pg_log_query_plan(pid), application_name, backend_type from
>>    pg_stat_activity where backend_type = 'autovacuum launcher';
> 
> Why is that?

Sorry, I misunderstood and confirmed we can run queries like below:

```
=# CREATE OR REPLACE FUNCTION pg_log_query_plan_stab(i int)              
                            RETURNS TABLE(                                
                                                                          
  pg_log_query_plan bool,                                                 
                                               status text                
                                                                          
               ) AS $$                                                    
                                                            DECLARE       
                                                                          
                                BEGIN                                     
                                                                          
        RETURN QUERY SELECT false::bool, 'PID 100 is not a PostgreSQL 
backend process'::text;                             END;                 
                                                                          
                         $$ LANGUAGE plpgsql;

=# select pg_log_query_plan_stab(pid), application_name, backend_type 
from pg_stat_activity where backend_type = 'autovacuum launcher';
               pg_log_query_plan_stab               | application_name |  
   backend_type
---------------------------------------------------+------------------+---------------------
  (f,"PID 100 is not a PostgreSQL backend process") |                  | 
autovacuum launcher
```
-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

Ashutosh Bapat

Date:

18 October 2023, 10:13:01

On Wed, Oct 18, 2023 at 11:39 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
> >
> > I am more interested in avoiding the duplication of code, esp. the
> > first comment in my reply
>
> If there are no objections, I will try porting it to auto_explain and
> see its feasibility.

If you want it in core, you will need to port relevant parts of
auto_explain code to core.

--
Best Wishes,
Ashutosh Bapat

Re: RFC: Logging plan of the running query

From

James Coleman

Date:

18 October 2023, 16:34:36

On Wed, Oct 18, 2023 at 2:09 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2023-10-16 18:46, Ashutosh Bapat wrote:
> > On Thu, Oct 12, 2023 at 6:51 PM torikoshia <torikoshia@oss.nttdata.com>
> > wrote:
> >>
> >> On 2023-10-11 16:22, Ashutosh Bapat wrote:
> >> >
> >> > Considering the similarity with auto_explain I wondered whether this
> >> > function should be part of auto_explain contrib module itself? If we
> >> > do that users will need to load auto_explain extension and thus
> >> > install executor hooks when this function doesn't need those. So may
> >> > not be such a good idea. I didn't see any discussion on this.
> >>
> >> I once thought about adding this to auto_explain, but I left it asis
> >> for
> >> below reasons:
> >>
> >> - One of the typical use case of pg_log_query_plan() would be
> >> analyzing
> >> slow query on customer environments. On such environments, We cannot
> >> always control what extensions to install.
> >
> > The same argument applies to auto_explain functionality as well. But
> > it's not part of the core.
>
> Yeah, and when we have a situation where we want to run
> pg_log_query_plan(), we can run it in any environment as long as it is
>   bundled with the core.
> On the other hand, if it is built into auto_explain, we need to start by
> installing auto_explain if we do not have auto_explain, which is often
> difficult to do in production environments.
>
> >>    Of course auto_explain is a major extension and it is quite
> >> possible
> >> that they installed auto_explain, but but it is also possible they do
> >> not.
> >> - It seems a bit counter-intuitive that pg_log_query_plan() is in an
> >> extension called auto_explain, since it `manually`` logs plans
> >>
> >
> > pg_log_query_plan() may not fit auto_explain but
> > pg_explain_backend_query() does. What we are logging is more than just
> > plan of the query, it might expand to be closer to explain output.
> > While auto in auto_explain would refer to its automatically logging
> > explain outputs, it can provide an additional function which provides
> > similar functionality by manually triggering it.
> >
> > But we can defer this to a committer, if you want.
> >
> > I am more interested in avoiding the duplication of code, esp. the
> > first comment in my reply
>
> If there are no objections, I will try porting it to auto_explain and
> see its feasibility.
>
> >>> There is a lot of similarity between what this feature does and what
> >>> auto explain does. I see the code is also duplicated. There is some
> >>> merit in avoiding this duplication
> >>> 1. we will get all the features of auto_explain automatically like
> >>> choosing a format (this was expressed somebody earlier in this
> >>> thread), setings etc.
> >>> 2. avoid bugs. E.g your code switches context after ExplainState has
> >>> been allocated. These states may leak depending upon when this
> >>> function gets called.
> >>> 3. Building features on top as James envisions will be easier.

In my view the fact that auto_explain is itself not part of core is a
mistake, and I know there are more prominent hackers than myself who
hold that view.

While that decision as regards auto_explain has long since been made
(and there would be work to undo it), I don't believe that we should
repeat that choice here. I'm -10 on moving this into auto_explain.

However perhaps there is still an opportunity for moving some of the
auto_explain code into core so as to enable deduplicating the code.

Regards,
James Coleman

Re: RFC: Logging plan of the running query

From

Étienne BERSAC

Date:

24 October 2023, 07:12:46

> Hi,
>
> Yeah, and when we have a situation where we want to run
> pg_log_query_plan(), we can run it in any environment as long as it
> is bundled with the core.

Is it possible to get the plan of a backend programmatically without
access to the logs?

Something like pg_get_query_plan(pid, format) which output would be the
same as EXPLAIN.

Regards,
Étienne BERSAC
Dalibo

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

24 October 2023, 12:30:49

On 2023-10-24 16:12, Étienne BERSAC wrote:
>> Hi,
>> 
>> Yeah, and when we have a situation where we want to run
>> pg_log_query_plan(), we can run it in any environment as long as it
>> is bundled with the core.
> 
> Is it possible to get the plan of a backend programmatically without
> access to the logs?
> 
> Something like pg_get_query_plan(pid, format) which output would be the
> same as EXPLAIN.

Do you imagine a function like below?

   =# select pg_get_query_plan(100, 'plain');
                             QUERY PLAN
   -------------------------------------------------------------------
    Limit  (cost=0.00..0.04 rows=1 width=273)
      ->  Seq Scan on pg_class  (cost=0.00..17.14 rows=414 width=273)

If so, we once tried to implement such function for getting memory 
contexts.
However, this attempt didn't succeed because it could lead dead lock 
situation[1].

I think the same problem can occur when implementing 
pg_get_query_plan().

[1] 
https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com
> 
> Regards,
> Étienne BERSAC
> Dalibo

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

Ashutosh Bapat

Date:

25 October 2023, 03:40:06

On Wed, Oct 18, 2023 at 10:04 PM James Coleman <jtc331@gmail.com> wrote:
>
> While that decision as regards auto_explain has long since been made
> (and there would be work to undo it), I don't believe that we should
> repeat that choice here. I'm -10 on moving this into auto_explain.
>

Right.

> However perhaps there is still an opportunity for moving some of the
> auto_explain code into core so as to enable deduplicating the code.
>

Right. That's what I also think.

--
Best Wishes,
Ashutosh Bapat

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

26 October 2023, 06:53:22

On 2023-10-25 12:40, Ashutosh Bapat wrote:
> On Wed, Oct 18, 2023 at 10:04 PM James Coleman <jtc331@gmail.com> 
> wrote:
>> 
>> While that decision as regards auto_explain has long since been made
>> (and there would be work to undo it), I don't believe that we should
>> repeat that choice here. I'm -10 on moving this into auto_explain.
>> 
> 
> Right.
> 
>> However perhaps there is still an opportunity for moving some of the
>> auto_explain code into core so as to enable deduplicating the code.
>> 
> 
> Right. That's what I also think.

Thanks for your comments.

Attached patch adds a new function which assembles es->str for logging
according to specified contents and format.
This is called from both auto_explain and pg_log_query_plan().

On 2023-10-11 16:22, Ashutosh Bapat wrote:
> I am also wondering whether it's better to report the WARNING as
> status column in the output.

Attached patch left as it was since the inconsistency with
pg_terminate_backend() and pg_log_backend_memory_contexts() as you
pointed out.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Attachment

v32-0001-Add-function-to-log-the-plan-of-the-query.patch

Re: RFC: Logging plan of the running query

From

Étienne BERSAC

Date:

27 October 2023, 07:06:31

Hi Torikoshia,

> If so, we once tried to implement such function for getting memory
> contexts.
> However, this attempt didn't succeed because it could lead dead lock
> situation[1].

Thanks for the pointer. Why not use client log message to allow client
to get plan without locking backend memory context and without access
to server log ? I missed the rationnal for not sending the plan to
client.

Regards,
Étienne

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

27 October 2023, 12:30:02

On 2023-10-27 16:06, Étienne BERSAC wrote:
> Hi Torikoshia,
> 
>> If so, we once tried to implement such function for getting memory
>> contexts.
>> However, this attempt didn't succeed because it could lead dead lock
>> situation[1].
> 
> Thanks for the pointer. Why not use client log message to allow client
> to get plan without locking backend memory context and without access
> to server log ? I missed the rationnal for not sending the plan to
> client.

If we use client log message, that message is shown on the target 
process whose pid is specified by the parameter of pg_log_query_plan():

   (pid:1000)=# select pg_sleep(60);
   (pid:1001)=# select pg_log_query_plan(1000);
   (pid:1000)=# LOG:  query plan running on backend with PID 1000 is:
                Query Text: select pg_sleep(1000);
                Result  (cost=0.00..0.01 rows=1 width=4)
                  Output: pg_sleep('1000'::double precision)

I think this is not an expected behavior and we set elevel to 
LOG_SERVER_ONLY.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

Étienne BERSAC

Date:

27 October 2023, 12:54:48

> Hi,
>

> If we use client log message, that message is shown on the target
> process whose pid is specified by the parameter of
> pg_log_query_plan():
>
>    (pid:1000)=# select pg_sleep(60);
>    (pid:1001)=# select pg_log_query_plan(1000);
>    (pid:1000)=# LOG:  query plan running on backend with PID 1000 is:
>                 Query Text: select pg_sleep(1000);
>                 Result  (cost=0.00..0.01 rows=1 width=4)
>                   Output: pg_sleep('1000'::double precision)
>
> I think this is not an expected behavior and we set elevel to
> LOG_SERVER_ONLY.


Makes sens. Thanks.

Re: RFC: Logging plan of the running query

From

Ashutosh Bapat

Date:

03 November 2023, 14:01:14

I have following questions related to the functionality. (Please point
me to the relevant discussion if this has been already discussed.)

1. When a backend is running nested queries, we will see the plan of
the innermost query. That query may not be the actual culprit if the
user query is running slowly. E.g a query being run as part of inner
side nested loop when the nested loop itself is the bottleneck. I
think it will be useful to print plans of all the whole query stack.

2. When a query is running in parallel worker do we want to print that
query? It may or may not be interesting given the situation. If the
overall plan itself is faulty, output of the parallel worker query is
not useful. If the plan is fine but a given worker's leg is running
slowly it may be interesting.

As a side note, you may want to fix the indentation in
ExplainAssembleLogOutput().

On Fri, Oct 27, 2023 at 6:24 PM Étienne BERSAC
<etienne.bersac@dalibo.com> wrote:
>
> > Hi,
> >
>
> > If we use client log message, that message is shown on the target
> > process whose pid is specified by the parameter of
> > pg_log_query_plan():
> >
> >    (pid:1000)=# select pg_sleep(60);
> >    (pid:1001)=# select pg_log_query_plan(1000);
> >    (pid:1000)=# LOG:  query plan running on backend with PID 1000 is:
> >                 Query Text: select pg_sleep(1000);
> >                 Result  (cost=0.00..0.01 rows=1 width=4)
> >                   Output: pg_sleep('1000'::double precision)
> >
> > I think this is not an expected behavior and we set elevel to
> > LOG_SERVER_ONLY.
>
>
> Makes sens. Thanks.

--
Best Wishes,
Ashutosh Bapat

Re: RFC: Logging plan of the running query

From

Ashutosh Bapat

Date:

06 November 2023, 06:32:12

On Fri, Nov 3, 2023 at 7:31 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> I have following questions related to the functionality. (Please point
> me to the relevant discussion if this has been already discussed.)
>
> 1. When a backend is running nested queries, we will see the plan of
> the innermost query. That query may not be the actual culprit if the
> user query is running slowly. E.g a query being run as part of inner
> side nested loop when the nested loop itself is the bottleneck. I
> think it will be useful to print plans of all the whole query stack.

To further explain this point, consider following scenario

-- A completely useless function which executes two SQL statements
which take small amount individually
#create function multstmt() returns int
language sql as $$
select count(*) from pg_class where reltype = 12345;
select count(*) from pg_type limit 10;
$$;

-- force a suboptimal plan
#set enable_hashjoin to false;
#set enable_mergejoin to false;

-- A completely useless but long running query
#select c.oid, t.oid from pg_class c, pg_type t where multstmt(c.oid)
= multstmt(t.oid);

This take a few minutes on my laptop.

In a separate session query pg_stat_activity. We will see the original query
#select pid, query, backend_type from pg_stat_activity where pid = 349330;
  pid   |                                          query
                           |  backend_type
--------+-----------------------------------------------------------------------------------------+----------------
 349330 | select c.oid, t.oid from pg_class c, pg_type t where
multstmt(c.oid) = multstmt(t.oid); | client backend
(1 row)

Run the plan output function a few times
#select pg_log_query_plan(349330);

You will observe different plans based on which of the innermost query
is runnning
LOG:  query plan running on backend with PID 349330 is:
        Query Text:
        select count(*) from pg_class where reltype = typeid;
        select count(*) from pg_type where oid = typeid;

        Query Parameters: $1 = '600'
        Aggregate  (cost=18.18..18.19 rows=1 width=8)
          Output: count(*)
          ->  Seq Scan on pg_catalog.pg_class  (cost=0.00..18.18 rows=2 width=0)
                Output: oid, relname, relnamespace, reltype,
reloftype, relowner, relam, relfilenode, reltablespace, relpages,
reltuples, relallvisible, reltoastrelid, relhasindex, relisshared,
relpersistence, relkind, relnatts, relchecks, relhasrules,
relhastriggers, relhassubclass, relrowsecurity, relforcerowsecurity,
relispopulated, relreplident, relispartition, relrewrite,
relfrozenxid, relminmxid, relacl, reloptions, relpartbound
                Filter: (pg_class.reltype = $1)
        Settings: enable_hashjoin = 'off', enable_mergejoin = 'off'
2023-11-06 11:52:25.610 IST [349330] LOG:  query plan running on
backend with PID 349330 is:
        Query Text:
        select count(*) from pg_class where reltype = typeid;
        select count(*) from pg_type where oid = typeid;

        Query Parameters: $1 = '2203'
        Aggregate  (cost=4.30..4.31 rows=1 width=4)
          Output: count(*)
          ->  Index Only Scan using pg_type_oid_index on
pg_catalog.pg_type  (cost=0.28..4.29 rows=1 width=0)
                Output: oid
                Index Cond: (pg_type.oid = $1)
        Settings: enable_hashjoin = 'off', enable_mergejoin = 'off'

Individual pieces are confusing here since the query run by the
backend is not the one being EXPLAINed. A user may not know that the
queries being EXPLAINed arise from the function call multstmt(). So
they won't be able to stitch the pieces together unless they see plan
of the outermost query with loops and costs. What might help if we
explain each query in the hierarchy.

I think we can start with what auto_explain is doing. Always print the
plan of the outermost query; the query found in pg_stat_activity. In a
later version we might find a way to print plans of all the queries in
the stack and do so in a readable manner.

This makes tracking activeQueryDesc a bit tricky. My guess is that the
outermost query's descriptor will be available through ActivePortal
most of the time. But there are cases when ExecutorRun is called by
passing a queryDesc directly. So may be there are some cases where
that's not true.

--
Best Wishes,
Ashutosh Bapat

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

09 November 2023, 06:33:36

On 2023-11-06 15:32, Ashutosh Bapat wrote:

Thanks for the suggestion and example.

> On Fri, Nov 3, 2023 at 7:31 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
>> 
>> I have following questions related to the functionality. (Please point
>> me to the relevant discussion if this has been already discussed.)
>> 
>> 1. When a backend is running nested queries, we will see the plan of
>> the innermost query. That query may not be the actual culprit if the
>> user query is running slowly. E.g a query being run as part of inner
>> side nested loop when the nested loop itself is the bottleneck. I
>> think it will be useful to print plans of all the whole query stack.

This was discussed in previous threads[1] and we thought it'd be useful 
but since it needed some extra works, we stopped widening the scope.

> To further explain this point, consider following scenario
> 
> -- A completely useless function which executes two SQL statements
> which take small amount individually
> #create function multstmt() returns int
> language sql as $$
> select count(*) from pg_class where reltype = 12345;
> select count(*) from pg_type limit 10;
> $$;
> 
> -- force a suboptimal plan
> #set enable_hashjoin to false;
> #set enable_mergejoin to false;
> 
> -- A completely useless but long running query
> #select c.oid, t.oid from pg_class c, pg_type t where multstmt(c.oid)
> = multstmt(t.oid);
> 
> This take a few minutes on my laptop.
> 
> In a separate session query pg_stat_activity. We will see the original 
> query
> #select pid, query, backend_type from pg_stat_activity where pid = 
> 349330;
>   pid   |                                          query
>                            |  backend_type
> --------+-----------------------------------------------------------------------------------------+----------------
>  349330 | select c.oid, t.oid from pg_class c, pg_type t where
> multstmt(c.oid) = multstmt(t.oid); | client backend
> (1 row)
> 
> Run the plan output function a few times
> #select pg_log_query_plan(349330);
> 
> You will observe different plans based on which of the innermost query
> is runnning
> LOG:  query plan running on backend with PID 349330 is:
>         Query Text:
>         select count(*) from pg_class where reltype = typeid;
>         select count(*) from pg_type where oid = typeid;
> 
>         Query Parameters: $1 = '600'
>         Aggregate  (cost=18.18..18.19 rows=1 width=8)
>           Output: count(*)
>           ->  Seq Scan on pg_catalog.pg_class  (cost=0.00..18.18 rows=2 
> width=0)
>                 Output: oid, relname, relnamespace, reltype,
> reloftype, relowner, relam, relfilenode, reltablespace, relpages,
> reltuples, relallvisible, reltoastrelid, relhasindex, relisshared,
> relpersistence, relkind, relnatts, relchecks, relhasrules,
> relhastriggers, relhassubclass, relrowsecurity, relforcerowsecurity,
> relispopulated, relreplident, relispartition, relrewrite,
> relfrozenxid, relminmxid, relacl, reloptions, relpartbound
>                 Filter: (pg_class.reltype = $1)
>         Settings: enable_hashjoin = 'off', enable_mergejoin = 'off'
> 2023-11-06 11:52:25.610 IST [349330] LOG:  query plan running on
> backend with PID 349330 is:
>         Query Text:
>         select count(*) from pg_class where reltype = typeid;
>         select count(*) from pg_type where oid = typeid;
> 
>         Query Parameters: $1 = '2203'
>         Aggregate  (cost=4.30..4.31 rows=1 width=4)
>           Output: count(*)
>           ->  Index Only Scan using pg_type_oid_index on
> pg_catalog.pg_type  (cost=0.28..4.29 rows=1 width=0)
>                 Output: oid
>                 Index Cond: (pg_type.oid = $1)
>         Settings: enable_hashjoin = 'off', enable_mergejoin = 'off'
> 
> Individual pieces are confusing here since the query run by the
> backend is not the one being EXPLAINed. A user may not know that the
> queries being EXPLAINed arise from the function call multstmt(). So
> they won't be able to stitch the pieces together unless they see plan
> of the outermost query with loops and costs. What might help if we
> explain each query in the hierarchy.
> 
> I think we can start with what auto_explain is doing. Always print the
> plan of the outermost query; the query found in pg_stat_activity. In a
> later version we might find a way to print plans of all the queries in
> the stack and do so in a readable manner.

Agreed there are cases printing plan of the outermost query is more 
useful.

> 
> This makes tracking activeQueryDesc a bit tricky. My guess is that the
> outermost query's descriptor will be available through ActivePortal
> most of the time. But there are cases when ExecutorRun is called by
> passing a queryDesc directly. So may be there are some cases where
> that's not true.

Yeah, actually the original version of the patch got the plan from 
ActivePortal, but it failed logging plan when the query was something 
like this[2]:

      DO $$
      BEGIN
      PERFORM pg_sleep(100);
      END$$;

> 2. When a query is running in parallel worker do we want to print that
> query? It may or may not be interesting given the situation. If the
> overall plan itself is faulty, output of the parallel worker query is
> not useful. If the plan is fine but a given worker's leg is running
> slowly it may be interesting.

I think it can be useful.
I'm wondering if we can add this after the first patch for this feature 
is committed.

> As a side note, you may want to fix the indentation in
> ExplainAssembleLogOutput().

Thanks, modified it.
Since the documentation was obsoleted, attached patch also updated it.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Attachment

v33-0001-Add-function-to-log-the-plan-of-the-query.patch

Re: RFC: Logging plan of the running query

From

Ashutosh Bapat

Date:

09 November 2023, 07:11:49

On Thu, Nov 9, 2023 at 12:03 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
> >>
> >> 1. When a backend is running nested queries, we will see the plan of
> >> the innermost query. That query may not be the actual culprit if the
> >> user query is running slowly. E.g a query being run as part of inner
> >> side nested loop when the nested loop itself is the bottleneck. I
> >> think it will be useful to print plans of all the whole query stack.
>
> This was discussed in previous threads[1] and we thought it'd be useful
> but since it needed some extra works, we stopped widening the scope.
>
> >
> > I think we can start with what auto_explain is doing. Always print the
> > plan of the outermost query; the query found in pg_stat_activity. In a
> > later version we might find a way to print plans of all the queries in
> > the stack and do so in a readable manner.
>
> Agreed there are cases printing plan of the outermost query is more
> useful.
>

I am fine printing the plan of the outermost query. This will help
many cases. Printing plans of the whole query stack can be added as an
add on later.

> >
> > This makes tracking activeQueryDesc a bit tricky. My guess is that the
> > outermost query's descriptor will be available through ActivePortal
> > most of the time. But there are cases when ExecutorRun is called by
> > passing a queryDesc directly. So may be there are some cases where
> > that's not true.
>
> Yeah, actually the original version of the patch got the plan from
> ActivePortal, but it failed logging plan when the query was something
> like this[2]:
>
>       DO $$
>       BEGIN
>       PERFORM pg_sleep(100);
>       END$$;

References [1] and [2] are not listed in your email.

Is that because there was no ActivePortal created or the ActivePortal
pointed to DO block instead of PERFORM pg_sleep?

>
> > 2. When a query is running in parallel worker do we want to print that
> > query? It may or may not be interesting given the situation. If the
> > overall plan itself is faulty, output of the parallel worker query is
> > not useful. If the plan is fine but a given worker's leg is running
> > slowly it may be interesting.
>
> I think it can be useful.
> I'm wondering if we can add this after the first patch for this feature
> is committed.

With the current patches, it will print the query from a parallel
backend. If that's not desirable we should prohibit that case at
least.

--
Best Wishes,
Ashutosh Bapat

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

09 November 2023, 11:26:27

On 2023-11-09 16:11, Ashutosh Bapat wrote:
> On Thu, Nov 9, 2023 at 12:03 PM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> >>
>> >> 1. When a backend is running nested queries, we will see the plan of
>> >> the innermost query. That query may not be the actual culprit if the
>> >> user query is running slowly. E.g a query being run as part of inner
>> >> side nested loop when the nested loop itself is the bottleneck. I
>> >> think it will be useful to print plans of all the whole query stack.
>> 
>> This was discussed in previous threads[1] and we thought it'd be 
>> useful
>> but since it needed some extra works, we stopped widening the scope.
>> 
>> >
>> > I think we can start with what auto_explain is doing. Always print the
>> > plan of the outermost query; the query found in pg_stat_activity. In a
>> > later version we might find a way to print plans of all the queries in
>> > the stack and do so in a readable manner.
>> 
>> Agreed there are cases printing plan of the outermost query is more
>> useful.
>> 
> 
> I am fine printing the plan of the outermost query. This will help
> many cases. Printing plans of the whole query stack can be added as an
> add on later.
> 
>> >
>> > This makes tracking activeQueryDesc a bit tricky. My guess is that the
>> > outermost query's descriptor will be available through ActivePortal
>> > most of the time. But there are cases when ExecutorRun is called by
>> > passing a queryDesc directly. So may be there are some cases where
>> > that's not true.
>> 
>> Yeah, actually the original version of the patch got the plan from
>> ActivePortal, but it failed logging plan when the query was something
>> like this[2]:
>> 
>>       DO $$
>>       BEGIN
>>       PERFORM pg_sleep(100);
>>       END$$;
> 
> References [1] and [2] are not listed in your email.

Oops, sorry. Here are links:

[1] 
https://www.postgresql.org/message-id/64f716c44629e303b66e6c24502147cc%40oss.nttdata.com
[2] 
https://www.postgresql.org/message-id/632e99eb-8090-53e6-1b1a-101601904cbd%40oss.nttdata.com

> Is that because there was no ActivePortal created or the ActivePortal
> pointed to DO block instead of PERFORM pg_sleep?

ActivePortal is created but ActivePortal->queryDesc is null.

>> > 2. When a query is running in parallel worker do we want to print that
>> > query? It may or may not be interesting given the situation. If the
>> > overall plan itself is faulty, output of the parallel worker query is
>> > not useful. If the plan is fine but a given worker's leg is running
>> > slowly it may be interesting.
>> 
>> I think it can be useful.
>> I'm wondering if we can add this after the first patch for this 
>> feature
>> is committed.
> 
> With the current patches, it will print the query from a parallel
> backend. If that's not desirable we should prohibit that case at
> least.

Current patch prohibits printing plan if backend type is parallel worker 
as below:

   =# select pg_log_query_plan(pid), backend_type from pg_stat_activity 
where backend_type = 'parallel worker';

    pg_log_query_plan |  backend_type
   -------------------+-----------------
    f                 | parallel worker
    f                 | parallel worker
   (2 rows)

   WARNING:  PID 4618 is not a PostgreSQL client backend process
   WARNING:  PID 4617 is not a PostgreSQL client backend process

Is this the behavior you expect?

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

Ashutosh Bapat

Date:

09 November 2023, 14:58:48

On Thu, Nov 9, 2023 at 4:56 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2023-11-09 16:11, Ashutosh Bapat wrote:
> > On Thu, Nov 9, 2023 at 12:03 PM torikoshia <torikoshia@oss.nttdata.com>
> > wrote:
> >> >>
> >> >> 1. When a backend is running nested queries, we will see the plan of
> >> >> the innermost query. That query may not be the actual culprit if the
> >> >> user query is running slowly. E.g a query being run as part of inner
> >> >> side nested loop when the nested loop itself is the bottleneck. I
> >> >> think it will be useful to print plans of all the whole query stack.
> >>
> >> This was discussed in previous threads[1] and we thought it'd be
> >> useful
> >> but since it needed some extra works, we stopped widening the scope.
> >>
> >> >
> >> > I think we can start with what auto_explain is doing. Always print the
> >> > plan of the outermost query; the query found in pg_stat_activity. In a
> >> > later version we might find a way to print plans of all the queries in
> >> > the stack and do so in a readable manner.
> >>
> >> Agreed there are cases printing plan of the outermost query is more
> >> useful.
> >>
> >
> > I am fine printing the plan of the outermost query. This will help
> > many cases. Printing plans of the whole query stack can be added as an
> > add on later.
> >
> >> >
> >> > This makes tracking activeQueryDesc a bit tricky. My guess is that the
> >> > outermost query's descriptor will be available through ActivePortal
> >> > most of the time. But there are cases when ExecutorRun is called by
> >> > passing a queryDesc directly. So may be there are some cases where
> >> > that's not true.
> >>
> >> Yeah, actually the original version of the patch got the plan from
> >> ActivePortal, but it failed logging plan when the query was something
> >> like this[2]:
> >>
> >>       DO $$
> >>       BEGIN
> >>       PERFORM pg_sleep(100);
> >>       END$$;
> >
> > References [1] and [2] are not listed in your email.
>
> Oops, sorry. Here are links:
>
> [1]
> https://www.postgresql.org/message-id/64f716c44629e303b66e6c24502147cc%40oss.nttdata.com
> [2]
> https://www.postgresql.org/message-id/632e99eb-8090-53e6-1b1a-101601904cbd%40oss.nttdata.com
>
> > Is that because there was no ActivePortal created or the ActivePortal
> > pointed to DO block instead of PERFORM pg_sleep?
>
> ActivePortal is created but ActivePortal->queryDesc is null.

Thanks.

>
> >> > 2. When a query is running in parallel worker do we want to print that
> >> > query? It may or may not be interesting given the situation. If the
> >> > overall plan itself is faulty, output of the parallel worker query is
> >> > not useful. If the plan is fine but a given worker's leg is running
> >> > slowly it may be interesting.
> >>
> >> I think it can be useful.
> >> I'm wondering if we can add this after the first patch for this
> >> feature
> >> is committed.
> >
> > With the current patches, it will print the query from a parallel
> > backend. If that's not desirable we should prohibit that case at
> > least.
>
> Current patch prohibits printing plan if backend type is parallel worker
> as below:
>
>    =# select pg_log_query_plan(pid), backend_type from pg_stat_activity
> where backend_type = 'parallel worker';
>
>     pg_log_query_plan |  backend_type
>    -------------------+-----------------
>     f                 | parallel worker
>     f                 | parallel worker
>    (2 rows)
>
>    WARNING:  PID 4618 is not a PostgreSQL client backend process
>    WARNING:  PID 4617 is not a PostgreSQL client backend process
>
> Is this the behavior you expect?
>

I misread then. Thanks for correcting me. We could consider plans from
parallel workers in v2 of this feature.

--
Best Wishes,
Ashutosh Bapat

Re: RFC: Logging plan of the running query

From

Rafael Thofehrn Castro

Date:

06 December 2023, 23:33:56

Hello hackers,

Last Saturday I submitted a patch to the pgsql-hackers list with the title
"Proposal: In-flight explain logging" with a patch proposing a feature very
similar to the one being worked on in this thread. I should have done a better
search in the commitfest before implementing something from scratch.

So, as recommended by Ashutosh, I am sending an incremental patch containing
an additional feature I personally think we should include: logging the plan
with instrumentation details if enabled.

When targeting a query with instrumentation PG should log the complete
EXPLAIN ANALYZE plan with current row count and, if enabled, timing for each
node. This gives the user not only the ability to see what the plan is
but also what was executed so far, which is super useful when

troubleshooting queries that never finish.

Considering that the query is in progress the output will include the
statement (never executed) for nodes that weren't touched yet (or may
never be). This feature is already present in the current ExplainNode
implementation.

I added a new statement (in progress) for nodes currently being executed, ie,
InstrStartNode was called and clock is ticking there.

Back-reading this thread I saw the discussion about extending pg_log_query_plan
to parallel workers or not. I added it in my incremental patch as the new
capability of logging instrumentation makes it useful to be able to see
what parallel workers are currently doing.

# DEMONSTRATION:

postgres=# select pid, backend_type,pg_log_query_plan(pid)
postgres=# from pg_stat_activity
postgres=# where (backend_type = 'client backend' and pid != pg_backend_pid())
postgres=# or backend_type = 'parallel worker';
pid | backend_type | pg_log_query_plan
-------+-----------------+-------------------
33833 | client backend | t
47202 | parallel worker | t
47203 | parallel worker | t
(3 rows)

2023-12-06 23:14:41.756 UTC [33833] LOG: query plan running on backend with PID 33833 is:
Query Text: explain (analyze, buffers) select *
from t2 a
inner join t1 b on a.c1=b.c1
inner join t1 c on a.c1=c.c1
inner join t1 d on a.c1=d.c1
inner join t1 e on a.c1=e.c1;
Gather (cost=70894.63..202643.27 rows=1000000 width=20) (never executed)
Output: a.c1, b.c1, c.c1, d.c1, e.c1
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Join (cost=69894.63..101643.27 rows=416667 width=20) (never executed)
Output: a.c1, b.c1, c.c1, d.c1, e.c1
Hash Cond: (a.c1 = e.c1)
-> Parallel Hash Join (cost=54466.62..77218.65 rows=416667 width=16) (never executed)
Output: a.c1, b.c1, c.c1, d.c1
Hash Cond: (a.c1 = c.c1)
-> Parallel Hash Join (cost=15428.00..29997.42 rows=416667 width=8) (actual time=2500.914..2625.922 rows=250412 loops=1) (in progress)
Output: a.c1, b.c1
Inner Unique: true
Hash Cond: (b.c1 = a.c1)
Buffers: shared hit=755 read=2175, temp read=1473 written=1860
-> Parallel Seq Scan on public.t1 b (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.022..20.904 rows=331492 loops=1)
Output: b.c1
Buffers: shared hit=602 read=865
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=1745.107..1745.107 rows=330638 loops=1)
Output: a.c1
Buffers: shared hit=153 read=1310, temp written=868
-> Parallel Seq Scan on public.t2 a (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.042..27.695 rows=330638 loops=1)
Output: a.c1
Buffers: shared hit=153 read=1310
-> Parallel Hash (cost=32202.28..32202.28 rows=416667 width=8) (actual time=2450.489..2450.489 rows=407941 loops=1)
Output: c.c1, d.c1
Buffers: shared hit=1141 read=1833, temp read=1938 written=2836
-> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=8) (actual time=1323.422..1575.245 rows=407941 loops=1)
Output: c.c1, d.c1
Hash Cond: (c.c1 = d.c1)
Buffers: shared hit=1141 read=1833, temp read=1938 written=1864
-> Parallel Seq Scan on public.t1 c (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.026..22.223 rows=336238 loops=1)
Output: c.c1
Buffers: shared hit=590 read=898
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=653.306..653.306 rows=335836 loops=1)
Output: d.c1
Buffers: shared hit=551 read=935, temp written=872
-> Parallel Seq Scan on public.t1 d (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.022..23.127 rows=335836 loops=1)
Output: d.c1
Buffers: shared hit=551 read=935
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=590.086..590.086 rows=343696 loops=1)
Output: e.c1
Buffers: shared hit=519 read=1002, temp written=896
-> Parallel Seq Scan on public.t1 e (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.066..21.797 rows=343696 loops=1)
Output: e.c1
Buffers: shared hit=519 read=1002
2023-12-06 23:14:41.757 UTC [47203] LOG: query plan running on backend with PID 47203 is:
Query Text: explain (analyze, buffers) select *
from t2 a
inner join t1 b on a.c1=b.c1
inner join t1 c on a.c1=c.c1
inner join t1 d on a.c1=d.c1
inner join t1 e on a.c1=e.c1;
Parallel Hash Join (cost=69894.63..101643.27 rows=416667 width=20) (never executed)
Output: a.c1, b.c1, c.c1, d.c1, e.c1
Hash Cond: (a.c1 = e.c1)
-> Parallel Hash Join (cost=54466.62..77218.65 rows=416667 width=16) (never executed)
Output: a.c1, b.c1, c.c1, d.c1
Hash Cond: (a.c1 = c.c1)
-> Parallel Hash Join (cost=15428.00..29997.42 rows=416667 width=8) (actual time=2464.367..2628.476 rows=258293 loops=1)
Output: a.c1, b.c1
Inner Unique: true
Hash Cond: (b.c1 = a.c1)
Buffers: shared hit=704 read=2213, temp read=1497 written=1860
-> Parallel Seq Scan on public.t1 b (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.032..22.247 rows=330412 loops=1)
Output: b.c1
Buffers: shared hit=594 read=868
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=1745.093..1745.093 rows=328830 loops=1)
Output: a.c1
Buffers: shared hit=110 read=1345, temp written=868
-> Parallel Seq Scan on public.t2 a (cost=0.00..8591.67 rows=416667 width=4) (actual time=1.141..30.128 rows=328830 loops=1)
Output: a.c1
Buffers: shared hit=110 read=1345
-> Parallel Hash (cost=32202.28..32202.28 rows=416667 width=8) (actual time=2449.694..2449.694 rows=295462 loops=1)
Output: c.c1, d.c1
Buffers: shared hit=1161 read=1844, temp read=1971 written=2872
-> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=8) (actual time=1378.678..1577.182 rows=295462 loops=1)
Output: c.c1, d.c1
Hash Cond: (c.c1 = d.c1)
Buffers: shared hit=1161 read=1844, temp read=1971 written=1856
-> Parallel Seq Scan on public.t1 c (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.031..22.459 rows=336288 loops=1)
Output: c.c1
Buffers: shared hit=590 read=898
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=651.048..651.048 rows=328378 loops=1)
Output: d.c1
Buffers: shared hit=507 read=946, temp written=860
-> Parallel Seq Scan on public.t1 d (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.034..24.462 rows=328378 loops=1)
Output: d.c1
Buffers: shared hit=507 read=946
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=575.417..575.417 rows=327926 loops=1)
Output: e.c1
Buffers: shared hit=522 read=929, temp written=860
-> Parallel Seq Scan on public.t1 e (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.786..23.114 rows=327926 loops=1)
Output: e.c1
Buffers: shared hit=522 read=929
2023-12-06 23:14:41.758 UTC [47202] LOG: query plan running on backend with PID 47202 is:
Query Text: explain (analyze, buffers) select *
from t2 a
inner join t1 b on a.c1=b.c1
inner join t1 c on a.c1=c.c1
inner join t1 d on a.c1=d.c1
inner join t1 e on a.c1=e.c1;
Parallel Hash Join (cost=69894.63..101643.27 rows=416667 width=20) (never executed)
Output: a.c1, b.c1, c.c1, d.c1, e.c1
Hash Cond: (a.c1 = e.c1)
-> Parallel Hash Join (cost=54466.62..77218.65 rows=416667 width=16) (never executed)
Output: a.c1, b.c1, c.c1, d.c1
Hash Cond: (a.c1 = c.c1)
-> Parallel Hash Join (cost=15428.00..29997.42 rows=416667 width=8) (actual time=2500.138..2631.855 rows=254125 loops=1)
Output: a.c1, b.c1
Inner Unique: true
Hash Cond: (b.c1 = a.c1)
Buffers: shared hit=749 read=2254, temp read=1483 written=1896
-> Parallel Seq Scan on public.t1 b (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.027..22.224 rows=338096 loops=1)
Output: b.c1
Buffers: shared hit=628 read=868
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=1744.616..1744.616 rows=340532 loops=1)
Output: a.c1
Buffers: shared hit=121 read=1386, temp written=892
-> Parallel Seq Scan on public.t2 a (cost=0.00..8591.67 rows=416667 width=4) (actual time=1.138..30.003 rows=340532 loops=1)
Output: a.c1
Buffers: shared hit=121 read=1386
-> Parallel Hash (cost=32202.28..32202.28 rows=416667 width=8) (actual time=2450.470..2450.470 rows=296597 loops=1)
Output: c.c1, d.c1
Buffers: shared hit=1186 read=1813, temp read=1610 written=2872
-> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=8) (actual time=1380.286..1561.412 rows=296597 loops=1)
Output: c.c1, d.c1
Hash Cond: (c.c1 = d.c1)
Buffers: shared hit=1186 read=1813, temp read=1610 written=1848
-> Parallel Seq Scan on public.t1 c (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.032..22.170 rows=327474 loops=1)
Output: c.c1
Buffers: shared hit=548 read=901
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=653.297..653.297 rows=335786 loops=1)
Output: d.c1
Buffers: shared hit=574 read=912, temp written=868
-> Parallel Seq Scan on public.t1 d (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.023..24.210 rows=335786 loops=1)
Output: d.c1
Buffers: shared hit=574 read=912
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=574.553..574.553 rows=328378 loops=1)
Output: e.c1
Buffers: shared hit=495 read=958, temp written=864
-> Parallel Seq Scan on public.t1 e (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.717..23.270 rows=328378 loops=1)
Output: e.c1
Buffers: shared hit=495 read=958

The thread of my initial patch contains all important implementation
details. Sharing the ones relevant to my incremental patch here:

- Safely printing signaled plans with instrumentation

A plan string is built in function ExplainNode here (https://github.com/postgres/postgres/blob/REL_16_STABLE/src/backend/commands/explain.c#L1178)
which is called at the end of a query execution when EXPLAIN is used.
That function performs logic using a PlanState (part of QueryDesc) of
the running query and a ExplainState.

The main challenge there is that ExplainNode calls InstrEndLoop which
changes values in Instrumentation. This is ok for a regular EXPLAIN
where the query is already complete but not ok for the new feature with
signaled plan logging.

So the new code has custom logic to clone Instrumentation instance of
the current node. The cloned object can be safely written.

Function InstrEndLoop has a safety rule here (https://github.com/postgres/postgres/blob/REL_16_STABLE/src/backend/executor/instrument.c#L148)
that prevents adjusting instrumentation details in a running node. This
never happens in the current code logic but with the new feature
it will happen very often.

I didn't want to remove this safety rule as InstrEndLoop gets called in
other places too (even in auto_explain) so the solution was to keep
InstrEndLoop and have a new InstrEndLoopForce for the signaled

plan logging with instrumentation. Both InstrEndLoop and InstrEndLoopForce

call a new internal InstrEndLoopInternal to avoid duplicating the code.

- Memory management

Considering that pg_log_query_plan() creates its own memory context before
calling ExplainAssembleLogOutput, the new logic that allocates memory
to clone instrumentation doesn't need to free anything.

# FINAL CONSIDERATIONS

Let me know what you think about this incremental patch. If you think it
is useful but needs adjustments I will be happy to change as needed.

Kind Regards,

Rafael Castro.

Attachment

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

11 December 2023, 05:46:23

On 2023-12-07 08:33, Rafael Thofehrn Castro wrote:
> Hello hackers,
> 
> Last Saturday I submitted a patch to the pgsql-hackers list with the
> title
> "Proposal: In-flight explain logging" with a patch proposing a feature
> very
> similar to the one being worked on in this thread. I should have done
> a better
> search in the commitfest before implementing something from scratch.
> 
> So, as recommended by Ashutosh, I am sending an incremental patch
> containing
> an additional feature I personally think we should include: logging
> the plan
> with instrumentation details if enabled.

Thanks for the proposal and making the patch!

> When targeting a query with instrumentation PG should log the complete
> EXPLAIN ANALYZE plan with current row count and, if enabled, timing
> for each
> node. This gives the user not only the ability to see what the plan is
> but also what was executed so far, which is super useful when
> troubleshooting queries that never finish.

I think showing the progress of the query execution would be useful.

OTOH it seems to at least need some modifications around Instrumentation 
as your patch.
As a first step, I think it would better to minimize the scope and focus 
on the fundamental function.
For the same reason, getting queries for parallel workers is also 
prohibited in the current patch as discussed here[1].

[1] 
https://www.postgresql.org/message-id/c25ae6015be96a1964eddd964657660b%40oss.nttdata.com

So I think below steps would be better than pushing all the 
functionalities to the 1st commit.

- First, develop function to enable output of query 
progress(v34-0001-Add-function-to-log-the-plan-of-the-query.patch).
- Then enhance the function
   - showing the progress of the query 
execution(v34-0002-Log-plan-along-with-instrumentation-details.patch), 
etc.

> --https://www.postgresql.org/message-id/CAG0ozMp3g3drnkDa6RZxXO_OmnisL2sD9vBrmpu5fOBoYpC-3w%40mail.gmail.com
> - ExplainState customization
> 
> A ExplainState is allocated and customized for the in-flight logging.
> Instrumentation related settings are enabled based on how the target
> query started, which is usually via EXPLAIN ANALYZE or with 
> auto_explain.

Does this mean the progress can be got only when the target query was 
run with EXPLAIN ANALYZE or auto_explain.log_analyze?

If so, there might be limited situations we can get the progress since I 
imagine EXPLAIN ANALYZE is used when user want to get the plan from the 
beginning and auto_explain.log_analyze can give negative impact on 
performance as described in the manual and there may not be many 
environments which enable it.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

29 January 2024, 13:02:29

Hi,

Updated the patch to fix typos and move 
ProcessLogQueryPlanInterruptActive from errfinish() to AbortTransaction.


BTW since the thread is getting long, I list the some points of the 
discussion so far:

# Safety concern
## Catalog access inside CFI
- it seems safe if the CFI call is inside an existing valid 
transaction/query state[1]

- We did some tests, for example calling ProcessLogQueryPlanInterrupt() 
in every single CHECK_FOR_INTERRUPTS()[2]. This test passed on my env 
but was stucked on James's env, so I modified to exit 
ProcessLogQueryPlanInterrupt() when target process is inside of lock 
acquisition code[3]

## Risk of calling EXPLAIN code in CFI
- EXPLAIN is not a simple logic code, and there exists risk calling it 
from CFI. For example, if there is a bug, we may find ourselves in a 
situation where we can't cancel the query

- it's a trade-off that's worth making for the introspection benefits 
this patch would provide?[4]

# Design
- Although some suggested it should be in auto_explain, current patch 
introduces this feature to the core[5]

- When the target query is nested, only the most inner query's plan is 
explained. In future, all the nested queries' plans are expected to 
explained optionally like auto_explain.log_nested_statements[6]

- When the target process is a parallel worker, the plan is not shown[6]

- When the target query is nested and its subtransaction is aborted, 
pg_log_query_plan cannot log the parental query plan after the abort 
even parental query is running[7]

- The output corresponds to EXPLAIN with VERBOSE, COST, SETTINGS and 
FORMAT text. It doesn't do ANALYZE or show the progress of the query 
execution. Future work proposed by Rafael Thofehrn Castro may realize 
this[8]

- To prevent assertion error, this patch ensures no page lock is held by 
checking all the LocalLock entries before running explain code, but 
there is a discussion that ginInsertCleanup() should be modified[9]


It may be not so difficult to improve some of restrictions in "Design", 
but I'd like to limit the scope of the 1st patch to make it simpler.


[1] 
https://www.postgresql.org/message-id/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAAaqYe8LXVXQhYy3yT0QOHUymdM%3Duha0dJ0%3DBEPzVAx2nG1gsw%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/0e0e7ca08dff077a625c27a5e0c2ef0a%40oss.nttdata.com
[4] 
https://www.postgresql.org/message-id/CAAaqYe8LXVXQhYy3yT0QOHUymdM%3Duha0dJ0%3DBEPzVAx2nG1gsw%40mail.gmail.com
[5] 
https://www.postgresql.org/message-id/CAAaqYe_1EuoTudAz8mr8-qtN5SoNtYRm4JM2J9CqeverpE3B2A%40mail.gmail.com
[6] 
https://www.postgresql.org/message-id/CAExHW5sh4ahrJgmMAGfptWVmESt1JLKCNm148XVxTunRr%2B-6gA%40mail.gmail.com
[7] 
https://www.postgresql.org/message-id/3d121ed5f81cef588bac836b43f5d1f9%40oss.nttdata.com
[8] 
https://www.postgresql.org/message-id/c161b5e7e1888eb9c9eb182a7d9dcf89%40oss.nttdata.com
[9] 
https://www.postgresql.org/message-id/20220201.172757.1480996662235658750.horikyota.ntt%40gmail.com

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Attachment

v35-0001-Add-function-to-log-the-plan-of-the-query.patch

Re: RFC: Logging plan of the running query

From

Ashutosh Bapat

Date:

06 February 2024, 10:51:58

Hi Atsushi,

On Mon, Jan 29, 2024 at 6:32 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> Hi,
>
> Updated the patch to fix typos and move
> ProcessLogQueryPlanInterruptActive from errfinish() to AbortTransaction.
>
>
> BTW since the thread is getting long, I list the some points of the
> discussion so far:
>
> # Safety concern
> ## Catalog access inside CFI
> - it seems safe if the CFI call is inside an existing valid
> transaction/query state[1]
>
> - We did some tests, for example calling ProcessLogQueryPlanInterrupt()
> in every single CHECK_FOR_INTERRUPTS()[2]. This test passed on my env
> but was stucked on James's env, so I modified to exit
> ProcessLogQueryPlanInterrupt() when target process is inside of lock
> acquisition code[3]
>
> ## Risk of calling EXPLAIN code in CFI
> - EXPLAIN is not a simple logic code, and there exists risk calling it
> from CFI. For example, if there is a bug, we may find ourselves in a
> situation where we can't cancel the query
>
> - it's a trade-off that's worth making for the introspection benefits
> this patch would provide?[4]
>
> # Design
> - Although some suggested it should be in auto_explain, current patch
> introduces this feature to the core[5]
>
> - When the target query is nested, only the most inner query's plan is
> explained. In future, all the nested queries' plans are expected to
> explained optionally like auto_explain.log_nested_statements[6]
>
> - When the target process is a parallel worker, the plan is not shown[6]
>
> - When the target query is nested and its subtransaction is aborted,
> pg_log_query_plan cannot log the parental query plan after the abort
> even parental query is running[7]
>
> - The output corresponds to EXPLAIN with VERBOSE, COST, SETTINGS and
> FORMAT text. It doesn't do ANALYZE or show the progress of the query
> execution. Future work proposed by Rafael Thofehrn Castro may realize
> this[8]
>
> - To prevent assertion error, this patch ensures no page lock is held by
> checking all the LocalLock entries before running explain code, but
> there is a discussion that ginInsertCleanup() should be modified[9]
>
>
> It may be not so difficult to improve some of restrictions in "Design",
> but I'd like to limit the scope of the 1st patch to make it simpler.

Thanks for the summary. It is helpful. I think patch is also getting better.

I have a few questions and suggestions
1. Prologue of GetLockMethodLocalHash() mentions
 * NOTE: When there are many entries in LockMethodLocalHash, calling this
 * function and looking into all of them can lead to performance problems.
 */
How bad this performance could be. Let's assume that a query is taking
time and pg_log_query_plan() is invoked to examine the plan of this
query. Is it possible that the looping over all the locks itself takes
a lot of time delaying the query execution further?

2. What happens if auto_explain is enabled in the backend and
pg_log_query_plan() is called on the same backend? Will they conflict?
I think we should add a test for the same.

Using injection point support we should be able to add tests for
testing pg_log_query_plan behaviour when there are page locks held or
when auto_explain (with instrumentation) and pg_log_query_plan() work
on the same query plan. Use injection point to make the backend
running query wait at a suitable point to delay its execution and fire
pg_log_query_plan() from other backend. May be the same test could
examine the server log file to see if the plan is indeed output to the
server log file.

Given that the feature will be used when the things have already gone
wrong, it should not make things more serious. So more testing and
especially automated would help.

--
Best Wishes,
Ashutosh Bapat

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

07 February 2024, 04:08:24

Hi Ashutosh,

On 2024-02-06 19:51, Ashutosh Bapat wrote:

> Thanks for the summary. It is helpful. I think patch is also getting 
> better.
> 
> I have a few questions and suggestions

Thanks for your comments.

> 1. Prologue of GetLockMethodLocalHash() mentions
>  * NOTE: When there are many entries in LockMethodLocalHash, calling 
> this
>  * function and looking into all of them can lead to performance 
> problems.
>  */
> How bad this performance could be. Let's assume that a query is taking
> time and pg_log_query_plan() is invoked to examine the plan of this
> query. Is it possible that the looping over all the locks itself takes
> a lot of time delaying the query execution further?

I think it depends on the number of local locks, but I've measured cpu 
time for this page lock check by adding below codes and 
v27-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch[1], which 
calls ProcessLogQueryPlanInterrupt() in every CFI on my laptop just for 
your information:

   diff --git a/src/backend/commands/explain.c 
b/src/backend/commands/explain.c
   index 5f7d77d567..65b7cb4925 100644
   --- a/src/backend/commands/explain.c
   +++ b/src/backend/commands/explain.c
   @@ -44,6 +44,8 @@

   +#include "time.h"
   ...
   @@ -5287,6 +5292,7 @@ ProcessLogQueryPlanInterrupt(void)
            * we check all the LocalLock entries and when finding even 
one, give up
            * logging the plan.
            */
   +       start = clock();
           hash_seq_init(&status, GetLockMethodLocalHash());
           while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != 
NULL)
           {
           if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE)
               {
                   ereport(LOG_SERVER_ONLY,
                       errmsg("ignored request for logging query plan due 
to page lock conflicts"),
                       errdetail("You can try again in a moment."));
                   hash_seq_term(&status);

                   ProcessLogQueryPlanInterruptActive = false;
                   return;
               }
           }
   +       end = clock();
   +       cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
   +
   +       ereport(LOG,
   +               errmsg("all locallock entry search took: %f", 
cpu_time_used));
   +

There were about 3 million log lines which recorded the cpu time, and 
the duration was quite short:

   =# -- Extracted cpu_time_used from log and loaded it to cpu_time.d.
   =# select max(d), min(d), avg(d) from cpu_time ;
      max    | min |          avg
   ----------+-----+-----------------------
    0.000116 |   0 | 4.706274625332238e-07

I'm not certain that this is valid for actual use cases, but these 
results seem to suggest that it will not take that long.


> 2. What happens if auto_explain is enabled in the backend and
> pg_log_query_plan() is called on the same backend? Will they conflict?
> I think we should add a test for the same.

Hmm, I think they don't conflict since they just refer QueryDesc and 
don't modify it and don't use same objects for locking.
(I imagine 'conflict' here is something like 'hard conflict' in 
replication[2].)

Actually using both auto_explain and pg_log_query_plan() output each 
logs separately:

   (pid:62835)=# select pg_sleep(10);
   (pid:70000)=# select pg_log_query_plan(62835);

   (pid:70000)=# \! cat data/log/postgres.log
   ...
   2024-02-06 21:44:17.837 JST [62835:4:0] LOG:  00000: query plan 
running on backend with PID 62835 is:
         Query Text: select pg_sleep(10);
         Result  (cost=0.00..0.01 rows=1 width=4)
           Output: pg_sleep('10'::double precision)
         Query Identifier: 3506829283127886044
   2024-02-06 21:44:17.837 JST [62835:5:0] LOCATION:  
ProcessLogQueryPlanInterrupt, explain.c:5336
   2024-02-06 21:44:26.974 JST [62835:6:0] LOG:  00000: duration: 
10000.868 ms  plan:
         Query Text: select pg_sleep(10);
         Result  (cost=0.00..0.01 rows=1 width=4) (actual 
time=10000.802..10000.804 rows=1 loops=1)

> Using injection point support we should be able to add tests for
> testing pg_log_query_plan behaviour when there are page locks held or
> when auto_explain (with instrumentation) and pg_log_query_plan() work
> on the same query plan. Use injection point to make the backend
> running query wait at a suitable point to delay its execution and fire
> pg_log_query_plan() from other backend. May be the same test could
> examine the server log file to see if the plan is indeed output to the
> server log file.
> 
> Given that the feature will be used when the things have already gone
> wrong, it should not make things more serious. So more testing and
> especially automated would help.

Thanks for the advice, it seems a good idea.
I'm going to try to add tests using injection point.


[1] 
https://www.postgresql.org/message-id/CAAaqYe8LXVXQhYy3yT0QOHUymdM%3Duha0dJ0%3DBEPzVAx2nG1gsw%40mail.gmail.com
[2] 
https://www.postgresql.org/docs/devel/hot-standby.html#HOT-STANDBY-CONFLICT

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

Ashutosh Bapat

Date:

07 February 2024, 04:58:18

On Wed, Feb 7, 2024 at 9:38 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> Hi Ashutosh,
>
> On 2024-02-06 19:51, Ashutosh Bapat wrote:
>
> > Thanks for the summary. It is helpful. I think patch is also getting
> > better.
> >
> > I have a few questions and suggestions
>
> Thanks for your comments.
>
> > 1. Prologue of GetLockMethodLocalHash() mentions
> >  * NOTE: When there are many entries in LockMethodLocalHash, calling
> > this
> >  * function and looking into all of them can lead to performance
> > problems.
> >  */
> > How bad this performance could be. Let's assume that a query is taking
> > time and pg_log_query_plan() is invoked to examine the plan of this
> > query. Is it possible that the looping over all the locks itself takes
> > a lot of time delaying the query execution further?
>
> I think it depends on the number of local locks, but I've measured cpu
> time for this page lock check by adding below codes and
> v27-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch[1], which
> calls ProcessLogQueryPlanInterrupt() in every CFI on my laptop just for
> your information:
>
>    diff --git a/src/backend/commands/explain.c
> b/src/backend/commands/explain.c
>    index 5f7d77d567..65b7cb4925 100644
>    --- a/src/backend/commands/explain.c
>    +++ b/src/backend/commands/explain.c
>    @@ -44,6 +44,8 @@
>
>    +#include "time.h"
>    ...
>    @@ -5287,6 +5292,7 @@ ProcessLogQueryPlanInterrupt(void)
>             * we check all the LocalLock entries and when finding even
> one, give up
>             * logging the plan.
>             */
>    +       start = clock();
>            hash_seq_init(&status, GetLockMethodLocalHash());
>            while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) !=
> NULL)
>            {
>            if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE)
>                {
>                    ereport(LOG_SERVER_ONLY,
>                        errmsg("ignored request for logging query plan due
> to page lock conflicts"),
>                        errdetail("You can try again in a moment."));
>                    hash_seq_term(&status);
>
>                    ProcessLogQueryPlanInterruptActive = false;
>                    return;
>                }
>            }
>    +       end = clock();
>    +       cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
>    +
>    +       ereport(LOG,
>    +               errmsg("all locallock entry search took: %f",
> cpu_time_used));
>    +
>
> There were about 3 million log lines which recorded the cpu time, and
> the duration was quite short:
>
>    =# -- Extracted cpu_time_used from log and loaded it to cpu_time.d.
>    =# select max(d), min(d), avg(d) from cpu_time ;
>       max    | min |          avg
>    ----------+-----+-----------------------
>     0.000116 |   0 | 4.706274625332238e-07
>
> I'm not certain that this is valid for actual use cases, but these
> results seem to suggest that it will not take that long.

What load did you run? I don't think any query in make check would
take say thousands of locks. The prologue refers to a very populated
lock hash table. I think that will happen if thousands of tables are
queried in a single query OR a query runs on a partitioned table with
thousands of partitions. May be we want to try that scenario.

>
>
> > 2. What happens if auto_explain is enabled in the backend and
> > pg_log_query_plan() is called on the same backend? Will they conflict?
> > I think we should add a test for the same.
>
> Hmm, I think they don't conflict since they just refer QueryDesc and
> don't modify it and don't use same objects for locking.
> (I imagine 'conflict' here is something like 'hard conflict' in
> replication[2].)

By conflict, I mean the two features behave weird when used together
e.g give wrong results or crash etc.

>
> Actually using both auto_explain and pg_log_query_plan() output each
> logs separately:
>
>    (pid:62835)=# select pg_sleep(10);
>    (pid:70000)=# select pg_log_query_plan(62835);
>
>    (pid:70000)=# \! cat data/log/postgres.log
>    ...
>    2024-02-06 21:44:17.837 JST [62835:4:0] LOG:  00000: query plan
> running on backend with PID 62835 is:
>          Query Text: select pg_sleep(10);
>          Result  (cost=0.00..0.01 rows=1 width=4)
>            Output: pg_sleep('10'::double precision)
>          Query Identifier: 3506829283127886044
>    2024-02-06 21:44:17.837 JST [62835:5:0] LOCATION:
> ProcessLogQueryPlanInterrupt, explain.c:5336
>    2024-02-06 21:44:26.974 JST [62835:6:0] LOG:  00000: duration:
> 10000.868 ms  plan:
>          Query Text: select pg_sleep(10);
>          Result  (cost=0.00..0.01 rows=1 width=4) (actual
> time=10000.802..10000.804 rows=1 loops=1)
>
> > Using injection point support we should be able to add tests for
> > testing pg_log_query_plan behaviour when there are page locks held or
> > when auto_explain (with instrumentation) and pg_log_query_plan() work
> > on the same query plan. Use injection point to make the backend
> > running query wait at a suitable point to delay its execution and fire
> > pg_log_query_plan() from other backend. May be the same test could
> > examine the server log file to see if the plan is indeed output to the
> > server log file.
> >
> > Given that the feature will be used when the things have already gone
> > wrong, it should not make things more serious. So more testing and
> > especially automated would help.
>
> Thanks for the advice, it seems a good idea.
> I'm going to try to add tests using injection point.

Your test with pg_sleep() is a good basic test. But more involved
testing might need something like injection points.

--
Best Wishes,
Ashutosh Bapat

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

07 February 2024, 10:14:40

On 2024-02-07 13:58, Ashutosh Bapat wrote:
> On Wed, Feb 7, 2024 at 9:38 AM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> Hi Ashutosh,
>> 
>> On 2024-02-06 19:51, Ashutosh Bapat wrote:
>> 
>> > Thanks for the summary. It is helpful. I think patch is also getting
>> > better.
>> >
>> > I have a few questions and suggestions
>> 
>> Thanks for your comments.
>> 
>> > 1. Prologue of GetLockMethodLocalHash() mentions
>> >  * NOTE: When there are many entries in LockMethodLocalHash, calling
>> > this
>> >  * function and looking into all of them can lead to performance
>> > problems.
>> >  */
>> > How bad this performance could be. Let's assume that a query is taking
>> > time and pg_log_query_plan() is invoked to examine the plan of this
>> > query. Is it possible that the looping over all the locks itself takes
>> > a lot of time delaying the query execution further?
>> 
>> I think it depends on the number of local locks, but I've measured cpu
>> time for this page lock check by adding below codes and
>> v27-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch[1], which
>> calls ProcessLogQueryPlanInterrupt() in every CFI on my laptop just 
>> for
>> your information:
>> 
>>    diff --git a/src/backend/commands/explain.c
>> b/src/backend/commands/explain.c
>>    index 5f7d77d567..65b7cb4925 100644
>>    --- a/src/backend/commands/explain.c
>>    +++ b/src/backend/commands/explain.c
>>    @@ -44,6 +44,8 @@
>> 
>>    +#include "time.h"
>>    ...
>>    @@ -5287,6 +5292,7 @@ ProcessLogQueryPlanInterrupt(void)
>>             * we check all the LocalLock entries and when finding even
>> one, give up
>>             * logging the plan.
>>             */
>>    +       start = clock();
>>            hash_seq_init(&status, GetLockMethodLocalHash());
>>            while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) 
>> !=
>> NULL)
>>            {
>>            if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE)
>>                {
>>                    ereport(LOG_SERVER_ONLY,
>>                        errmsg("ignored request for logging query plan 
>> due
>> to page lock conflicts"),
>>                        errdetail("You can try again in a moment."));
>>                    hash_seq_term(&status);
>> 
>>                    ProcessLogQueryPlanInterruptActive = false;
>>                    return;
>>                }
>>            }
>>    +       end = clock();
>>    +       cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
>>    +
>>    +       ereport(LOG,
>>    +               errmsg("all locallock entry search took: %f",
>> cpu_time_used));
>>    +
>> 
>> There were about 3 million log lines which recorded the cpu time, and
>> the duration was quite short:
>> 
>>    =# -- Extracted cpu_time_used from log and loaded it to cpu_time.d.
>>    =# select max(d), min(d), avg(d) from cpu_time ;
>>       max    | min |          avg
>>    ----------+-----+-----------------------
>>     0.000116 |   0 | 4.706274625332238e-07
>> 
>> I'm not certain that this is valid for actual use cases, but these
>> results seem to suggest that it will not take that long.
> 
> What load did you run? I don't think any query in make check would
> take say thousands of locks.

Sorry, I forgot to write it but ran make check as you imagined.

> The prologue refers to a very populated
> lock hash table. I think that will happen if thousands of tables are
> queried in a single query OR a query runs on a partitioned table with
> thousands of partitions. May be we want to try that scenario.

OK, I'll try such cases.

>> > 2. What happens if auto_explain is enabled in the backend and
>> > pg_log_query_plan() is called on the same backend? Will they conflict?
>> > I think we should add a test for the same.
>> 
>> Hmm, I think they don't conflict since they just refer QueryDesc and
>> don't modify it and don't use same objects for locking.
>> (I imagine 'conflict' here is something like 'hard conflict' in
>> replication[2].)
> 
> By conflict, I mean the two features behave weird when used together
> e.g give wrong results or crash etc.
> 
>> 
>> Actually using both auto_explain and pg_log_query_plan() output each
>> logs separately:
>> 
>>    (pid:62835)=# select pg_sleep(10);
>>    (pid:70000)=# select pg_log_query_plan(62835);
>> 
>>    (pid:70000)=# \! cat data/log/postgres.log
>>    ...
>>    2024-02-06 21:44:17.837 JST [62835:4:0] LOG:  00000: query plan
>> running on backend with PID 62835 is:
>>          Query Text: select pg_sleep(10);
>>          Result  (cost=0.00..0.01 rows=1 width=4)
>>            Output: pg_sleep('10'::double precision)
>>          Query Identifier: 3506829283127886044
>>    2024-02-06 21:44:17.837 JST [62835:5:0] LOCATION:
>> ProcessLogQueryPlanInterrupt, explain.c:5336
>>    2024-02-06 21:44:26.974 JST [62835:6:0] LOG:  00000: duration:
>> 10000.868 ms  plan:
>>          Query Text: select pg_sleep(10);
>>          Result  (cost=0.00..0.01 rows=1 width=4) (actual
>> time=10000.802..10000.804 rows=1 loops=1)
>> 
>> > Using injection point support we should be able to add tests for
>> > testing pg_log_query_plan behaviour when there are page locks held or
>> > when auto_explain (with instrumentation) and pg_log_query_plan() work
>> > on the same query plan. Use injection point to make the backend
>> > running query wait at a suitable point to delay its execution and fire
>> > pg_log_query_plan() from other backend. May be the same test could
>> > examine the server log file to see if the plan is indeed output to the
>> > server log file.
>> >
>> > Given that the feature will be used when the things have already gone
>> > wrong, it should not make things more serious. So more testing and
>> > especially automated would help.
>> 
>> Thanks for the advice, it seems a good idea.
>> I'm going to try to add tests using injection point.
> 
> Your test with pg_sleep() is a good basic test. But more involved
> testing might need something like injection points.

It might be so, I will consider whether there are any subtle timing 
issues, etc.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

jian he

Date:

12 February 2024, 00:00:00

On Mon, Jan 29, 2024 at 9:02 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> Hi,
>
> Updated the patch to fix typos and move
> ProcessLogQueryPlanInterruptActive from errfinish() to AbortTransaction.
>

+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_log_query_plan</primary>
+        </indexterm>
+        <function>pg_log_query_plan</function> (
<parameter>pid</parameter> <type>integer</type> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Requests to log the plan of the query currently running on the
+        backend with specified process ID.
+        It will be logged at <literal>LOG</literal> message level and
+        will appear in the server log based on the log
+        configuration set (See <xref linkend="runtime-config-logging"/>
+        for more information), but will not be sent to the client
+        regardless of <xref linkend="guc-client-min-messages"/>.
+        </para></entry>
+      </row>
it would be better to explain the meaning of return value TRUE/FALSE?

+# logging plan of the running query on the specified backend
+{ oid => '8000', descr => 'log plan of the running query on the
specified backend',
+  proname => 'pg_log_query_plan',
+  provolatile => 'v', prorettype => 'bool',
you can add
`proargnames => '{pid}'`

+ if (proc == NULL)
+ {
+ /*
+ * This is just a warning so a loop-through-resultset will not abort
+ * if one backend terminated on its own during the run.
+ */
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL backend process", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ be_status = pgstat_get_beentry_by_backend_id(proc->backendId);
+ if (be_status->st_backendType != B_BACKEND)
+ {
+ ereport(WARNING,
+ (errmsg("PID %d is not a PostgreSQL client backend process", pid)));
+ PG_RETURN_BOOL(false);
+ }
+
+ if (SendProcSignal(pid, PROCSIG_LOG_QUERY_PLAN, proc->backendId) < 0)
+ {
+ /* Again, just a warning to allow loops */
+ ereport(WARNING,
+ (errmsg("could not send signal to process %d: %m", pid)));
+ PG_RETURN_BOOL(false);
+ }

I found out that pg_log_query_plan's comments look like
pg_log_backend_memory_contexts.
pg_log_backend_memory_contexts will iterate through many memory contexts.
but pg_log_query_plan for one specific pid will only output one plan?
so I am a little bit confused by the comments.

+ /*
+ * Ensure no page lock is held on this process.
+ *
+ * If page lock is held at the time of the interrupt, we can't acquire any
+ * other heavyweight lock, which might be necessary for explaining the plan
+ * when retrieving column names.
+ *
+ * This may be overkill, but since page locks are held for a short duration
+ * we check all the LocalLock entries and when finding even one, give up
+ * logging the plan.
+ */
+ hash_seq_init(&status, GetLockMethodLocalHash());
+ while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL)
+ {
+ if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE)
maybe not that self evident, the above comments still not explained
why we need to ensure only
PAGE lock was held on this process?

In the commit message, can you add all the discussion links?
My gmail account doesn't have a previous discussion history.
I am not sure this
(https://www.postgresql.org/message-id/flat/d68c3ae31672664876b22d2dcbb526d2%40postgrespro.ru)
is the only discussion link?

I found a bug:
src8=# select *, pg_sleep(10)  from tenk1 for update;
2024-02-11 15:54:17.944 CST [48602] LOG:  query plan running on
backend with PID 48602 is:
        Query Text: select *, pg_sleep(10)  from tenk1 for update;
        LockRows  (cost=0.00..570.00 rows=10000 width=254)
          Output: unique1, unique2, two, four, ten, twenty, hundred,
thousand, twothousand, fivethous, tenthous, odd, even, stringu1,
stringu2, string4, (pg_sleep('10'::double precision)), ctid
          ->  Seq Scan on public.tenk1  (cost=0.00..470.00 rows=10000 width=254)
                Output: unique1, unique2, two, four, ten, twenty,
hundred, thousand, twothousand, fivethous, tenthous, odd, even,
stringu1, stringu2, string4, pg_sleep('10'::double precision), ctid

another session (PID) executes `SELECT pg_log_query_plan(48602);` in
the meantime.
pg_log_query_plan returns true successfully, but PID 48602 was stuck.

I have problem using git apply:
error: patch failed: src/include/commands/explain.h:94
error: src/include/commands/explain.h: patch does not apply

`patch -p1 < /v35-0001-Add-function-to-log-the-plan-of-the-query.patch` works

Re: RFC: Logging plan of the running query

From

jian he

Date:

12 February 2024, 00:00:00

On Wed, Feb 7, 2024 at 12:58 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> >
> > >  */
> > > How bad this performance could be. Let's assume that a query is taking
> > > time and pg_log_query_plan() is invoked to examine the plan of this
> > > query. Is it possible that the looping over all the locks itself takes
> > > a lot of time delaying the query execution further?
> >
corner case test:
pgbench  --initialize --partition-method=range --partitions=20000
Somehow my setup, the pg_bench didn't populate the data but there are
20000 partitions there.
(all my other settings are default)

some interesting things happened when a query touch so many partitions like:
select abalance, aid from public.pgbench_accounts,pg_sleep(4) where aid > 1;

in another session, if you immediate call SELECT pg_log_query_plan(9482);
then output be
`
LOG:  backend with PID 9482 is not running a query or a subtransaction
is aborted
`
however if you delay a little bit of time (like 1 second), then
LOG will emit the plan with lots of text (not sure the plan is right).

I think the reason is that the `InitPlan` within
standard_ExecutorStart takes more time to finish
when your query touches a lot of partitions.

Re: RFC: Logging plan of the running query

From

Ashutosh Bapat

Date:

12 February 2024, 04:39:18

On Mon, Feb 12, 2024 at 5:31 AM jian he <jian.universality@gmail.com> wrote:
>
> I found a bug:
> src8=# select *, pg_sleep(10)  from tenk1 for update;
> 2024-02-11 15:54:17.944 CST [48602] LOG:  query plan running on
> backend with PID 48602 is:
>         Query Text: select *, pg_sleep(10)  from tenk1 for update;
>         LockRows  (cost=0.00..570.00 rows=10000 width=254)
>           Output: unique1, unique2, two, four, ten, twenty, hundred,
> thousand, twothousand, fivethous, tenthous, odd, even, stringu1,
> stringu2, string4, (pg_sleep('10'::double precision)), ctid
>           ->  Seq Scan on public.tenk1  (cost=0.00..470.00 rows=10000 width=254)
>                 Output: unique1, unique2, two, four, ten, twenty,
> hundred, thousand, twothousand, fivethous, tenthous, odd, even,
> stringu1, stringu2, string4, pg_sleep('10'::double precision), ctid
>
> another session (PID) executes `SELECT pg_log_query_plan(48602);` in
> the meantime.
> pg_log_query_plan returns true successfully, but PID 48602 was stuck.

What do you mean by PID 48602 was stuck?

--
Best Wishes,
Ashutosh Bapat

Re: RFC: Logging plan of the running query

From

Ashutosh Bapat

Date:

12 February 2024, 04:42:21

On Mon, Feb 12, 2024 at 5:31 AM jian he <jian.universality@gmail.com> wrote:
>
> On Wed, Feb 7, 2024 at 12:58 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > >
> > > >  */
> > > > How bad this performance could be. Let's assume that a query is taking
> > > > time and pg_log_query_plan() is invoked to examine the plan of this
> > > > query. Is it possible that the looping over all the locks itself takes
> > > > a lot of time delaying the query execution further?
> > >
> corner case test:
> pgbench  --initialize --partition-method=range --partitions=20000
> Somehow my setup, the pg_bench didn't populate the data but there are
> 20000 partitions there.
> (all my other settings are default)
>
> some interesting things happened when a query touch so many partitions like:
> select abalance, aid from public.pgbench_accounts,pg_sleep(4) where aid > 1;
>
> in another session, if you immediate call SELECT pg_log_query_plan(9482);
> then output be
> `
> LOG:  backend with PID 9482 is not running a query or a subtransaction
> is aborted
> `
> however if you delay a little bit of time (like 1 second), then
> LOG will emit the plan with lots of text (not sure the plan is right).
>
> I think the reason is that the `InitPlan` within
> standard_ExecutorStart takes more time to finish
> when your query touches a lot of partitions.

That's probably expected unless we make the ActiveQueryDesc available
before ExecutorRun.

How much time did it took between issuing SELECT
pg_log_query_plan(9482); and plan getting output to the server error
logs? How does this time compare with say the same time difference for
a simple query and how much of that time can be attributed to Lock
table hash scan, if the difference between time difference is huge.

--
Best Wishes,
Ashutosh Bapat

Re: RFC: Logging plan of the running query

From

jian he

Date:

12 February 2024, 07:08:00

On Mon, Feb 12, 2024 at 12:42 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Mon, Feb 12, 2024 at 5:31 AM jian he <jian.universality@gmail.com> wrote:
> >
> > On Wed, Feb 7, 2024 at 12:58 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > >
> > > >
> > > > >  */
> > > > > How bad this performance could be. Let's assume that a query is taking
> > > > > time and pg_log_query_plan() is invoked to examine the plan of this
> > > > > query. Is it possible that the looping over all the locks itself takes
> > > > > a lot of time delaying the query execution further?
> > > >
> > corner case test:
> > pgbench  --initialize --partition-method=range --partitions=20000
> > Somehow my setup, the pg_bench didn't populate the data but there are
> > 20000 partitions there.
> > (all my other settings are default)
> >
> > some interesting things happened when a query touch so many partitions like:
> > select abalance, aid from public.pgbench_accounts,pg_sleep(4) where aid > 1;
> >
> > in another session, if you immediate call SELECT pg_log_query_plan(9482);
> > then output be
> > `
> > LOG:  backend with PID 9482 is not running a query or a subtransaction
> > is aborted
> > `
> > however if you delay a little bit of time (like 1 second), then
> > LOG will emit the plan with lots of text (not sure the plan is right).
> >
> > I think the reason is that the `InitPlan` within
> > standard_ExecutorStart takes more time to finish
> > when your query touches a lot of partitions.
>
> That's probably expected unless we make the ActiveQueryDesc available
> before ExecutorRun.

so the error message should be something like:
errmsg("backend with PID %d is not running a query or a subtransaction
is aborted or the plan is not generated",

>
> How much time did it took between issuing SELECT
> pg_log_query_plan(9482); and plan getting output to the server error
> logs?

it either says errmsg("backend with PID %d is not running a query,.....)
or outputs the plan immediately, if i wait one or two seconds to call
pg_log_query_plan.

because of previously mentioned: with lots of partitions, initplan
took longer to finish.
setup: 2 sessions, one runs the query (select abalance, aid from
public.pgbench_accounts,pg_sleep(4) where aid > 1;),
another one calls pg_log_query_plan.

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

13 February 2024, 00:19:03

On 2024-02-07 19:14, torikoshia wrote:
> On 2024-02-07 13:58, Ashutosh Bapat wrote:

>> The prologue refers to a very populated
>> lock hash table. I think that will happen if thousands of tables are
>> queried in a single query OR a query runs on a partitioned table with
>> thousands of partitions. May be we want to try that scenario.
> 
> OK, I'll try such cases.

I measured this using partitioned pgbench_accounts with some 
modification to v36[1].
The results[2] show that CPU time increases in proportion to the number 
of partitions, and the increase is not that large.

However I've noticed that these ensuring no page lock logic would not be 
necessary anymore since cc32ec24fdf3b98 removed the assertion which 
caused an error[1].

   $ git show cc32ec24fdf3b98
   ..
   diff --git a/src/backend/storage/lmgr/lock.c 
b/src/backend/storage/lmgr/lock.c
   index 0a692ee0a6..f595bce31b 100644
   --- a/src/backend/storage/lmgr/lock.c
   +++ b/src/backend/storage/lmgr/lock.c
   @@ -186,18 +186,6 @@ static int FastPathLocalUseCount = 0;
     */
  static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = 
false;

   -       /*
   -        * We don't acquire any other heavyweight lock while holding 
the page lock
   -        * except for relation extension.
   -        */
   -       Assert(!IsPageLockHeld ||
   -                  (locktag->locktag_type == 
LOCKTAG_RELATION_EXTEND));

I'm going to remove ensuring no page lock logic after some testings.


[1]
$ git diff _submission/log_running_query-v36
+#include "time.h"
+
  bool ProcessLogQueryPlanInterruptActive = false;

  /* Hook for plugins to get control in ExplainOneQuery() */
@@ -5258,6 +5260,10 @@ ProcessLogQueryPlanInterrupt(void)
     MemoryContext old_cxt;
     LogQueryPlanPending = false;

+   clock_t start, end;
+   double cpu_time_used;
+   int num_hash_entry = 0;
+
     /* Cannot re-enter. */
     if (ProcessLogQueryPlanInterruptActive)
         return;
@@ -5287,9 +5293,11 @@ ProcessLogQueryPlanInterrupt(void)
      * we check all the LocalLock entries and when finding even one, 
give up
      * logging the plan.
      */
+   start = clock();
     hash_seq_init(&status, GetLockMethodLocalHash());
     while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL)
     {
+       num_hash_entry++;
         if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE)
         {
             ereport(LOG_SERVER_ONLY,
@@ -5301,6 +5309,12 @@ ProcessLogQueryPlanInterrupt(void)
             return;
         }
     }
+   end = clock();
+   cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
+
+   ereport(LOG,
+       errmsg("locallock entry search took: %f for %d entries", 
cpu_time_used, num_hash_entry));

[2]
# partition number: 512
locallock entry search took: 0.000029 for 1026 entries
locallock entry search took: 0.000030 for 1026 entries
locallock entry search took: 0.000036 for 1026 entries

# partition number: 1024
locallock entry search took: 0.000070 for 2050 entries
locallock entry search took: 0.000059 for 2050 entries
locallock entry search took: 0.000049 for 2050 entries

# partition number: 2048
locallock entry search took: 0.000100 for 4098 entries
locallock entry search took: 0.000103 for 4098 entries
locallock entry search took: 0.000101 for 4098 entries

# partition number: 4096
locallock entry search took: 0.000197 for 8194 entries
locallock entry search took: 0.000193 for 8194 entries
locallock entry search took: 0.000192 for 8194 entries

[3] 
https://www.postgresql.org/message-id/0642712f-1298-960a-a3ba-e256d56040ac%40oss.nttdata.com


-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

13 February 2024, 02:30:53

On 2024-02-12 09:00, jian he wrote:

Thanks for you comments.

> On Mon, Jan 29, 2024 at 9:02 PM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> Hi,
>> 
>> Updated the patch to fix typos and move
>> ProcessLogQueryPlanInterruptActive from errfinish() to 
>> AbortTransaction.
>> 
> 
> +      <row>
> +       <entry role="func_table_entry"><para role="func_signature">
> +        <indexterm>
> +         <primary>pg_log_query_plan</primary>
> +        </indexterm>
> +        <function>pg_log_query_plan</function> (
> <parameter>pid</parameter> <type>integer</type> )
> +        <returnvalue>boolean</returnvalue>
> +       </para>
> +       <para>
> +        Requests to log the plan of the query currently running on the
> +        backend with specified process ID.
> +        It will be logged at <literal>LOG</literal> message level and
> +        will appear in the server log based on the log
> +        configuration set (See <xref 
> linkend="runtime-config-logging"/>
> +        for more information), but will not be sent to the client
> +        regardless of <xref linkend="guc-client-min-messages"/>.
> +        </para></entry>
> +      </row>
> it would be better to explain the meaning of return value TRUE/FALSE?

Yeah, but I've noticed that this should be located in 'Table Server 
Signaling Functions' not 'Table Control Data Functions'.
Since 'Table Server Signaling Functions' describes the return code as 
below, just relocation seems fine.

   Each of these functions returns true if the signal was successfully 
sent and false if sending the signal failed.

> +# logging plan of the running query on the specified backend
> +{ oid => '8000', descr => 'log plan of the running query on the
> specified backend',
> +  proname => 'pg_log_query_plan',
> +  provolatile => 'v', prorettype => 'bool',
> you can add
> `proargnames => '{pid}'`

Hmm, pg_log_query_plan() can take one argument, I'm not sure how much 
sense it makes.
Other functions which take one argument such as pg_cancel_backend() does 
not have proargnames.


> + if (proc == NULL)
> + {
> + /*
> + * This is just a warning so a loop-through-resultset will not abort
> + * if one backend terminated on its own during the run.
> + */
> + ereport(WARNING,
> + (errmsg("PID %d is not a PostgreSQL backend process", pid)));
> + PG_RETURN_BOOL(false);
> + }
> +
> + be_status = pgstat_get_beentry_by_backend_id(proc->backendId);
> + if (be_status->st_backendType != B_BACKEND)
> + {
> + ereport(WARNING,
> + (errmsg("PID %d is not a PostgreSQL client backend process", pid)));
> + PG_RETURN_BOOL(false);
> + }
> +
> + if (SendProcSignal(pid, PROCSIG_LOG_QUERY_PLAN, proc->backendId) < 0)
> + {
> + /* Again, just a warning to allow loops */
> + ereport(WARNING,
> + (errmsg("could not send signal to process %d: %m", pid)));
> + PG_RETURN_BOOL(false);
> + }
> 
> I found out that pg_log_query_plan's comments look like
> pg_log_backend_memory_contexts.
> pg_log_backend_memory_contexts will iterate through many memory 
> contexts.
> but pg_log_query_plan for one specific pid will only output one plan?
> so I am a little bit confused by the comments.

These "loop" mean backend can run pg_log_query_plan() repeatedly even 
when failing sending signals.
pg_signal_backend() also have such comments.

> + /*
> + * Ensure no page lock is held on this process.
> + *
> + * If page lock is held at the time of the interrupt, we can't acquire 
> any
> + * other heavyweight lock, which might be necessary for explaining the 
> plan
> + * when retrieving column names.
> + *
> + * This may be overkill, but since page locks are held for a short 
> duration
> + * we check all the LocalLock entries and when finding even one, give 
> up
> + * logging the plan.
> + */
> + hash_seq_init(&status, GetLockMethodLocalHash());
> + while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL)
> + {
> + if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE)
> maybe not that self evident, the above comments still not explained
> why we need to ensure only
> PAGE lock was held on this process?

This is for preventing assertion error and it seems not necessary 
anymore as described in [1].
I'm going remove them.

> 
> In the commit message, can you add all the discussion links?
> My gmail account doesn't have a previous discussion history.

Sure.

> I am not sure this
> (https://www.postgresql.org/message-id/flat/d68c3ae31672664876b22d2dcbb526d2%40postgrespro.ru)
> is the only discussion link?

This is the original one:
https://www.postgresql.org/message-id/cf8501bcd95ba4d727cbba886ba9eea8%40oss.nttdata.com

> I found a bug:
> src8=# select *, pg_sleep(10)  from tenk1 for update;
> 2024-02-11 15:54:17.944 CST [48602] LOG:  query plan running on
> backend with PID 48602 is:
>         Query Text: select *, pg_sleep(10)  from tenk1 for update;
>         LockRows  (cost=0.00..570.00 rows=10000 width=254)
>           Output: unique1, unique2, two, four, ten, twenty, hundred,
> thousand, twothousand, fivethous, tenthous, odd, even, stringu1,
> stringu2, string4, (pg_sleep('10'::double precision)), ctid
>           ->  Seq Scan on public.tenk1  (cost=0.00..470.00 rows=10000 
> width=254)
>                 Output: unique1, unique2, two, four, ten, twenty,
> hundred, thousand, twothousand, fivethous, tenthous, odd, even,
> stringu1, stringu2, string4, pg_sleep('10'::double precision), ctid
> 
> another session (PID) executes `SELECT pg_log_query_plan(48602);` in
> the meantime.
> pg_log_query_plan returns true successfully, but PID 48602 was stuck.

Hmm, it's not simply sleeping, is it?
I'm concerned a bit this because estimated rows of tenk1 is 10000.
If so, the query will take 10000 * 10 seconds.

> I have problem using git apply:
> error: patch failed: src/include/commands/explain.h:94
> error: src/include/commands/explain.h: patch does not apply
> 
> `patch -p1 < /v35-0001-Add-function-to-log-the-plan-of-the-query.patch` 
> works

I'll update the patch including other points such as removing ensuring 
no page lock code.


[1] 
https://www.postgresql.org/message-id/1b2b247530f3ff3afab4ddc2df222e8b%40oss.nttdata.com

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

14 February 2024, 06:29:35

On 2024-02-13 11:30, torikoshia wrote:

> I'll update the patch including other points such as removing ensuring
> no page lock code.

Updated the patch.

> Using injection point support we should be able to add tests for
> testing pg_log_query_plan behaviour when there are page locks held or
> when auto_explain (with instrumentation) and pg_log_query_plan() work
> on the same query plan. Use injection point to make the backend
> running query wait at a suitable point to delay its execution and fire
> pg_log_query_plan() from other backend. May be the same test could
> examine the server log file to see if the plan is indeed output to the
> server log file.

Attached patch uses injection point as below:

- There may be more points to inject, but added an injection point at 
ExecutorRun(), which seems to be the first interruption point where 
plans can be reliably displayed.
- At injection point, it'd be possible to wait for some duration and 
fire pg_log_plan_query() as you suggested. However, I'm not sure how 
long duration is appropriate considering the variety of testing 
environments. Instead, attached patch calls 
HandleLogQueryPlanInterrupt() directly and set InterruptPending.
- Tests both pg_log_plan_query() and auto_explain logs for their output, 
and the logged plans are the same.


-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Attachment

v36-0001-Add-function-to-log-the-plan-of-the-query.patch

Re: RFC: Logging plan of the running query

From

Robert Haas

Date:

15 February 2024, 09:12:11

Hi,

I've just been catching up on this thread.

+ if (MyProc->heldLocks)
+ {
+ ereport(LOG_SERVER_ONLY,
+ errmsg("ignored request for logging query plan due to lock conflicts"),
+ errdetail("You can try again in a moment."));
+ return;
+ }

I don't like this for several reasons.

First, I think it's not nice to have a request just get ignored. A
user will expect that if we cannot immediately respond to some
request, we'll respond later at the first time that it's safe to do
so, rather than just ignoring it and telling them to retry.

Second, I don't think that the error message is very good. It talks
about lock conflicts, but we don't know that there is any real
problem. We know that, if we enter this block, the server is in the
middle of trying to acquire some lock, and we also know that we could
attempt to acquire a lock as part of generating the EXPLAIN output,
and maybe that's an issue. But that's not a lock conflict. That's a
re-entrancy problem. I don't know that we want to talk about
re-entrancy problems in an error message, and I don't really think
this error message should exist in the first place, but if we're going
to error out in this case surely we shouldn't do so with an error
message that describes a problem we don't have.

Third, I think that the re-entrancy problems with this patch may
extend well beyond lock acquisition. This is one example of how it can
be unsafe to do something as complicated as EXPLAIN at any arbitrary
CHECK_FOR_INTERRUPTS(). It is not correct to say, as
http://postgr.es/m/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw@mail.gmail.com
does, that the problems with running EXPLAIN at an arbitrary point are
specific to running this code in an aborted transaction. The existence
of this code shows that there is at least one hazard even if the
current transaction is not aborted, and I see no analysis on this
thread indicating whether there are any more such hazards, or how we
could go about finding them all.

I think the issue is very general. We have lots of subsystems that
both (a) use global variables and (b) contain CHECK_FOR_INTERRUPTS().
If we process an interrupt while that code is in the middle of
manipulating its global variables and then again re-enter that code,
bad things might happen. We could try to rule that out by analyzing
all such subsystems and all places where CHECK_FOR_INTERRUPTS() may
appear, but that's very difficult. Suppose we took the alternative
approach recommended by Andrey Lepikhov in
http://postgr.es/m/b1b110ae-61f6-4fd9-9b94-f967db9b53d4@app.fastmail.com
and instead set a flag that gets handled in some suitable place in the
executor code, like ExecProcNode(). If we did that, then we would know
that we're not in the middle of a syscache lookup, a catcache lookup,
a heavyweight lock acquisition, an ereport, or any other low-level
subsystem call that might create problems for the EXPLAIN code.

In that design, the hack shown above would go away, and we could be
much more certain that we don't need any other similar hacks for other
subsystems. The only downside is that we might experience a slightly
longer delay before a requested EXPLAIN plan actually shows up, but
that seems like a pretty small price to pay for being able to reason
about the behavior of the system. I don't *think* there are any cases
where we run in the executor for a particularly long time without a
new call to ExecProcNode().

I think this case is a bit like vacuum_delay_point(). You might think
that vacuum_delay_point() could be moved inside of
CHECK_FOR_INTERRUPTS(), but we've made the opposite decision: every
vacuum_delay_point() calls CHECK_FOR_INTERRUPTS() but not every
CHECK_FOR_INTERRUPTS() calls vacuum_delay_point(). That means that we
can allow vacuum_delay_point() only at cases where we know it's safe,
rather than at every CHECK_FOR_INTERRUPTS(). I think that's a pretty
smart decision, even for vacuum_delay_point(), and AFAICS the code
we're proposing to run here does things that are substantially more
complicated than what vacuum_delay_point() does. That code just does a
bit of reading of shared memory, reports a wait event, and sleeps.
That's really simple compared to code that could try to do relcache
builds, which can trigger syscache lookups, which can both trigger
heavyweight lock acquisition, lightweight lock acquisition, bringing
pages into shared_buffers possibly through physical I/O, processing of
invalidation messages, and a bunch of other stuff.

It's really hard for me to accept that the heavyweight lock problem
for which the patch contains a workaround is the only one that exists.
I can't see any reason why that should be true.

...Robert

Re: RFC: Logging plan of the running query

From

Andres Freund

Date:

15 February 2024, 18:59:11

Hi,

On 2024-02-15 14:42:11 +0530, Robert Haas wrote:
> I think the issue is very general. We have lots of subsystems that
> both (a) use global variables and (b) contain CHECK_FOR_INTERRUPTS().
> If we process an interrupt while that code is in the middle of
> manipulating its global variables and then again re-enter that code,
> bad things might happen. We could try to rule that out by analyzing
> all such subsystems and all places where CHECK_FOR_INTERRUPTS() may
> appear, but that's very difficult. Suppose we took the alternative
> approach recommended by Andrey Lepikhov in
> http://postgr.es/m/b1b110ae-61f6-4fd9-9b94-f967db9b53d4@app.fastmail.com
> and instead set a flag that gets handled in some suitable place in the
> executor code, like ExecProcNode(). If we did that, then we would know
> that we're not in the middle of a syscache lookup, a catcache lookup,
> a heavyweight lock acquisition, an ereport, or any other low-level
> subsystem call that might create problems for the EXPLAIN code.
> 
> In that design, the hack shown above would go away, and we could be
> much more certain that we don't need any other similar hacks for other
> subsystems. The only downside is that we might experience a slightly
> longer delay before a requested EXPLAIN plan actually shows up, but
> that seems like a pretty small price to pay for being able to reason
> about the behavior of the system.

I am very wary of adding overhead to ExecProcNode() - I'm quite sure that
adding code there would trigger visible overhead for query times.

If we went with something like tht approach, I think we'd have to do something
like redirecting node->ExecProcNode to a wrapper, presumably from within a
CFI. That wrapper could then implement the explain support, without slowing
down the normal execution path.


> I don't *think* there are any cases where we run in the executor for a
> particularly long time without a new call to ExecProcNode().

I guess it depends on what you call a long time. A large sort, for example,
could spend a fair amount of time inside tuplesort, similarly, a gather node
might need to wait for a worker for a while etc.


> It's really hard for me to accept that the heavyweight lock problem
> for which the patch contains a workaround is the only one that exists.
> I can't see any reason why that should be true.

I suspect you're right.

Greetings,

Andres Freund

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

16 February 2024, 14:42:36

On Thu, Feb 15, 2024 at 6:12 PM Robert Haas <robertmhaas@gmail.com> 
wrote:
> Hi,
> 
> I've just been catching up on this thread.
> 
> + if (MyProc->heldLocks)
> + {
> + ereport(LOG_SERVER_ONLY,
> + errmsg("ignored request for logging query plan due to lock 
> conflicts"),
> + errdetail("You can try again in a moment."));
> + return;
> + }
> 
> I don't like this for several reasons.
> 
> First, I think it's not nice to have a request just get ignored. A
> user will expect that if we cannot immediately respond to some
> request, we'll respond later at the first time that it's safe to do
> so, rather than just ignoring it and telling them to retry.
> 
> Second, I don't think that the error message is very good. It talks
> about lock conflicts, but we don't know that there is any real
> problem. We know that, if we enter this block, the server is in the
> middle of trying to acquire some lock, and we also know that we could
> attempt to acquire a lock as part of generating the EXPLAIN output,
> and maybe that's an issue. But that's not a lock conflict. That's a
> re-entrancy problem. I don't know that we want to talk about
> re-entrancy problems in an error message, and I don't really think
> this error message should exist in the first place, but if we're going
> to error out in this case surely we shouldn't do so with an error
> message that describes a problem we don't have.
> 
> Third, I think that the re-entrancy problems with this patch may
> extend well beyond lock acquisition. This is one example of how it can
> be unsafe to do something as complicated as EXPLAIN at any arbitrary
> CHECK_FOR_INTERRUPTS(). It is not correct to say, as
> http://postgr.es/m/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw@mail.gmail.com
> does, that the problems with running EXPLAIN at an arbitrary point are
> specific to running this code in an aborted transaction. The existence
> of this code shows that there is at least one hazard even if the
> current transaction is not aborted, and I see no analysis on this
> thread indicating whether there are any more such hazards, or how we
> could go about finding them all.
> 
> I think the issue is very general. We have lots of subsystems that
> both (a) use global variables and (b) contain CHECK_FOR_INTERRUPTS().
> If we process an interrupt while that code is in the middle of
> manipulating its global variables and then again re-enter that code,
> bad things might happen. We could try to rule that out by analyzing
> all such subsystems and all places where CHECK_FOR_INTERRUPTS() may
> appear, but that's very difficult. Suppose we took the alternative
> approach recommended by Andrey Lepikhov in
> http://postgr.es/m/b1b110ae-61f6-4fd9-9b94-f967db9b53d4@app.fastmail.com
> and instead set a flag that gets handled in some suitable place in the
> executor code, like ExecProcNode(). If we did that, then we would know
> that we're not in the middle of a syscache lookup, a catcache lookup,
> a heavyweight lock acquisition, an ereport, or any other low-level
> subsystem call that might create problems for the EXPLAIN code.
> 
> In that design, the hack shown above would go away, and we could be
> much more certain that we don't need any other similar hacks for other
> subsystems. The only downside is that we might experience a slightly
> longer delay before a requested EXPLAIN plan actually shows up, but
> that seems like a pretty small price to pay for being able to reason
> about the behavior of the system. I don't *think* there are any cases
> where we run in the executor for a particularly long time without a
> new call to ExecProcNode().
> 
> I think this case is a bit like vacuum_delay_point(). You might think
> that vacuum_delay_point() could be moved inside of
> CHECK_FOR_INTERRUPTS(), but we've made the opposite decision: every
> vacuum_delay_point() calls CHECK_FOR_INTERRUPTS() but not every
> CHECK_FOR_INTERRUPTS() calls vacuum_delay_point(). That means that we
> can allow vacuum_delay_point() only at cases where we know it's safe,
> rather than at every CHECK_FOR_INTERRUPTS(). I think that's a pretty
> smart decision, even for vacuum_delay_point(), and AFAICS the code
> we're proposing to run here does things that are substantially more
> complicated than what vacuum_delay_point() does. That code just does a
> bit of reading of shared memory, reports a wait event, and sleeps.
> That's really simple compared to code that could try to do relcache
> builds, which can trigger syscache lookups, which can both trigger
> heavyweight lock acquisition, lightweight lock acquisition, bringing
> pages into shared_buffers possibly through physical I/O, processing of
> invalidation messages, and a bunch of other stuff.
> 
> It's really hard for me to accept that the heavyweight lock problem
> for which the patch contains a workaround is the only one that exists.
> I can't see any reason why that should be true.

Thanks for the review and the very detailed explanation!

I'm convinced that it's unsafe to execute EXPLAIN codes during 
CHECK_FOR_INTERRUPTS() and we need to execute it in other safe place, as 
well as the first and second point.


On 2024-02-16 03:59, Andres Freund wrote:
> Hi,
> 
> On 2024-02-15 14:42:11 +0530, Robert Haas wrote:
>> I think the issue is very general. We have lots of subsystems that
>> both (a) use global variables and (b) contain CHECK_FOR_INTERRUPTS().
>> If we process an interrupt while that code is in the middle of
>> manipulating its global variables and then again re-enter that code,
>> bad things might happen. We could try to rule that out by analyzing
>> all such subsystems and all places where CHECK_FOR_INTERRUPTS() may
>> appear, but that's very difficult. Suppose we took the alternative
>> approach recommended by Andrey Lepikhov in
>> http://postgr.es/m/b1b110ae-61f6-4fd9-9b94-f967db9b53d4@app.fastmail.com
>> and instead set a flag that gets handled in some suitable place in the
>> executor code, like ExecProcNode(). If we did that, then we would know
>> that we're not in the middle of a syscache lookup, a catcache lookup,
>> a heavyweight lock acquisition, an ereport, or any other low-level
>> subsystem call that might create problems for the EXPLAIN code.
>> 
>> In that design, the hack shown above would go away, and we could be
>> much more certain that we don't need any other similar hacks for other
>> subsystems. The only downside is that we might experience a slightly
>> longer delay before a requested EXPLAIN plan actually shows up, but
>> that seems like a pretty small price to pay for being able to reason
>> about the behavior of the system.
> 
> I am very wary of adding overhead to ExecProcNode() - I'm quite sure 
> that
> adding code there would trigger visible overhead for query times.
> 
> If we went with something like tht approach, I think we'd have to do 
> something
> like redirecting node->ExecProcNode to a wrapper, presumably from 
> within a
> CFI. That wrapper could then implement the explain support, without 
> slowing
> down the normal execution path.

Thanks for the idea!
I'm not so sure about the implementation now, i.e. finding the next node 
to be executed from the planstate tree, but I'm going to try this 
approach.

>> I don't *think* there are any cases where we run in the executor for a
>> particularly long time without a new call to ExecProcNode().
> 
> I guess it depends on what you call a long time. A large sort, for 
> example,
> could spend a fair amount of time inside tuplesort, similarly, a gather 
> node
> might need to wait for a worker for a while etc.
> 
> 
>> It's really hard for me to accept that the heavyweight lock problem
>> for which the patch contains a workaround is the only one that exists.
>> I can't see any reason why that should be true.
> 
> I suspect you're right.
> 
> Greetings,
> 
> Andres Freund

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

Robert Haas

Date:

20 February 2024, 04:53:04

On Fri, Feb 16, 2024 at 12:29 AM Andres Freund <andres@anarazel.de> wrote:
> If we went with something like tht approach, I think we'd have to do something
> like redirecting node->ExecProcNode to a wrapper, presumably from within a
> CFI. That wrapper could then implement the explain support, without slowing
> down the normal execution path.

That's an annoying complication; maybe there's some better way to
handle this. But I think we need to do something different than what
the patch does currently because...

> > It's really hard for me to accept that the heavyweight lock problem
> > for which the patch contains a workaround is the only one that exists.
> > I can't see any reason why that should be true.
>
> I suspect you're right.

...I think the current approach is just plain dead, because of this
issue. We can't take an approach that creates an unbounded number of
unclear reentrancy issues and then insert hacks one by one to cure
them (or hack around them, more to the point) as they're discovered.

The premise has to be that we only allow logging the query plan at
points where we know it's safe, rather than, as at present, allowing
it in places that are unsafe and then trying to compensate with code
elsewhere. That's not likely to ever be as stable as we want
PostgreSQL to be.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

James Coleman

Date:

22 February 2024, 00:54:57

On Mon, Feb 19, 2024 at 11:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Feb 16, 2024 at 12:29 AM Andres Freund <andres@anarazel.de> wrote:
> > If we went with something like tht approach, I think we'd have to do something
> > like redirecting node->ExecProcNode to a wrapper, presumably from within a
> > CFI. That wrapper could then implement the explain support, without slowing
> > down the normal execution path.
>
> That's an annoying complication; maybe there's some better way to
> handle this. But I think we need to do something different than what
> the patch does currently because...
>
> > > It's really hard for me to accept that the heavyweight lock problem
> > > for which the patch contains a workaround is the only one that exists.
> > > I can't see any reason why that should be true.
> >
> > I suspect you're right.
>
> ...I think the current approach is just plain dead, because of this
> issue. We can't take an approach that creates an unbounded number of
> unclear reentrancy issues and then insert hacks one by one to cure
> them (or hack around them, more to the point) as they're discovered.
>
> The premise has to be that we only allow logging the query plan at
> points where we know it's safe, rather than, as at present, allowing
> it in places that are unsafe and then trying to compensate with code
> elsewhere. That's not likely to ever be as stable as we want
> PostgreSQL to be.

This is potentially a bit of a wild idea, but I wonder if having some
kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in
"normal" as opposed to "critical" (using that word differently than
the existing critical sections) would be worth it.

Regards,
James Coleman

Re: RFC: Logging plan of the running query

From

Ashutosh Bapat

Date:

22 February 2024, 04:06:06

On Thu, Feb 22, 2024 at 6:25 AM James Coleman <jtc331@gmail.com> wrote:
> >
> > ...I think the current approach is just plain dead, because of this
> > issue. We can't take an approach that creates an unbounded number of
> > unclear reentrancy issues and then insert hacks one by one to cure
> > them (or hack around them, more to the point) as they're discovered.
> >
> > The premise has to be that we only allow logging the query plan at
> > points where we know it's safe, rather than, as at present, allowing
> > it in places that are unsafe and then trying to compensate with code
> > elsewhere. That's not likely to ever be as stable as we want
> > PostgreSQL to be.
>
> This is potentially a bit of a wild idea, but I wonder if having some
> kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in
> "normal" as opposed to "critical" (using that word differently than
> the existing critical sections) would be worth it.

My hunch is this will end up being a maintenance burden since every
caller has to decide (carefully) whether the call is under normal
condition or not. Developers will tend to take a safe approach and
flag calls as critical. But importantly, what's normal for one
interrupt action may be critical for another and vice versa. Approach
would be useful depending upon how easy it is to comprehend the
definition of "normal".

If a query executes for longer than a user defined threashold (session
level GUC? or same value as auto_explain parameter), the executor
proactively prepares an EXPLAIN output and keeps it handy in case
asked for. It can do so at a "known" safe place and time rather than
at any random time and location. Extra time spent in creating EXPLAIN
output may not be noticeable in a long running query. The EXPLAIN
output could be saved in pg_stat_activity or similar place. This will
avoid signaling the backend.

--
Best Wishes,
Ashutosh Bapat

Re: RFC: Logging plan of the running query

From

Robert Haas

Date:

23 February 2024, 04:52:32

On Thu, Feb 22, 2024 at 6:25 AM James Coleman <jtc331@gmail.com> wrote:
> This is potentially a bit of a wild idea, but I wonder if having some
> kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in
> "normal" as opposed to "critical" (using that word differently than
> the existing critical sections) would be worth it.

It's worth considering, but the definition of "normal" vs. "critical"
might be hard to pin down. Or, we might end up with a definition that
is specific to this particular case and not generalizable to others.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

Julien Rouhaud

Date:

23 February 2024, 14:20:51

Hi,

On Fri, Feb 23, 2024 at 10:22:32AM +0530, Robert Haas wrote:
> On Thu, Feb 22, 2024 at 6:25 AM James Coleman <jtc331@gmail.com> wrote:
> > This is potentially a bit of a wild idea, but I wonder if having some
> > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in
> > "normal" as opposed to "critical" (using that word differently than
> > the existing critical sections) would be worth it.
>
> It's worth considering, but the definition of "normal" vs. "critical"
> might be hard to pin down. Or, we might end up with a definition that
> is specific to this particular case and not generalizable to others.

But it doesn't have to be all or nothing right?  I mean each call could say
what the situation is like in their context, like
CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), and
slowly tag calls as needed, similarly to how we add already CFI based on users
report.

Re: RFC: Logging plan of the running query

From

Robert Haas

Date:

23 February 2024, 15:23:22

On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> On Fri, Feb 23, 2024 at 10:22:32AM +0530, Robert Haas wrote:
> > On Thu, Feb 22, 2024 at 6:25 AM James Coleman <jtc331@gmail.com> wrote:
> > > This is potentially a bit of a wild idea, but I wonder if having some
> > > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in
> > > "normal" as opposed to "critical" (using that word differently than
> > > the existing critical sections) would be worth it.
> >
> > It's worth considering, but the definition of "normal" vs. "critical"
> > might be hard to pin down. Or, we might end up with a definition that
> > is specific to this particular case and not generalizable to others.
>
> But it doesn't have to be all or nothing right?  I mean each call could say
> what the situation is like in their context, like
> CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), and
> slowly tag calls as needed, similarly to how we add already CFI based on users
> report.

Absolutely. My gut feeling is that it's going to be simpler to pick a
small number of places that are safe and sufficient for this
particular feature and add an extra call there, similar to how we do
vacuum_delay_point(). The reason I think that's likely to be better is
that it will likely require changing only a relatively small number of
places. If we instead start annotating CFIs, well, we've got hundreds
of those. That's a lot more to change, and it also inconveniences
third-party extension authors and people doing back-patching. I'm not
here to say it can't work; I just think it's likely not the easiest
path.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

James Coleman

Date:

24 February 2024, 13:56:41

On Fri, Feb 23, 2024 at 10:23 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > On Fri, Feb 23, 2024 at 10:22:32AM +0530, Robert Haas wrote:
> > > On Thu, Feb 22, 2024 at 6:25 AM James Coleman <jtc331@gmail.com> wrote:
> > > > This is potentially a bit of a wild idea, but I wonder if having some
> > > > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in
> > > > "normal" as opposed to "critical" (using that word differently than
> > > > the existing critical sections) would be worth it.
> > >
> > > It's worth considering, but the definition of "normal" vs. "critical"
> > > might be hard to pin down. Or, we might end up with a definition that
> > > is specific to this particular case and not generalizable to others.
> >
> > But it doesn't have to be all or nothing right?  I mean each call could say
> > what the situation is like in their context, like
> > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), and
> > slowly tag calls as needed, similarly to how we add already CFI based on users
> > report.
>
> Absolutely. My gut feeling is that it's going to be simpler to pick a
> small number of places that are safe and sufficient for this
> particular feature and add an extra call there, similar to how we do
> vacuum_delay_point(). The reason I think that's likely to be better is
> that it will likely require changing only a relatively small number of
> places. If we instead start annotating CFIs, well, we've got hundreds
> of those. That's a lot more to change, and it also inconveniences
> third-party extension authors and people doing back-patching. I'm not
> here to say it can't work; I just think it's likely not the easiest
> path.

Yes, I suspect it's not the easiest path. I have a small hunch it
might end up paying more dividends in the future: there isn't just one
of these things that is regularly a thorny discussion for the same
reasons each time (basically "no way to trigger this safely from
another backend interrupting another one at an arbitrary point"), and
if we were able to generalize a solution we may have multiple wins (a
very obvious one in my mind is the inability of auto explain to run an
explain at the precise time it's most useful: when statement timeout
fires).

But it's also possible there are simply ways that get us more than
this scenario also, so I might be wrong; it's merely a hunch.

Regards,
James Coleman

Re: RFC: Logging plan of the running query

From

Julien Rouhaud

Date:

25 February 2024, 11:30:37

On Sat, Feb 24, 2024 at 08:56:41AM -0500, James Coleman wrote:
> On Fri, Feb 23, 2024 at 10:23 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > But it doesn't have to be all or nothing right?  I mean each call could say
> > > what the situation is like in their context, like
> > > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), and
> > > slowly tag calls as needed, similarly to how we add already CFI based on users
> > > report.
> >
> > Absolutely. My gut feeling is that it's going to be simpler to pick a
> > small number of places that are safe and sufficient for this
> > particular feature and add an extra call there, similar to how we do
> > vacuum_delay_point(). The reason I think that's likely to be better is
> > that it will likely require changing only a relatively small number of
> > places. If we instead start annotating CFIs, well, we've got hundreds
> > of those. That's a lot more to change, and it also inconveniences
> > third-party extension authors and people doing back-patching. I'm not
> > here to say it can't work; I just think it's likely not the easiest
> > path.
>
> Yes, I suspect it's not the easiest path. I have a small hunch it
> might end up paying more dividends in the future: there isn't just one
> of these things that is regularly a thorny discussion for the same
> reasons each time (basically "no way to trigger this safely from
> another backend interrupting another one at an arbitrary point"), and
> if we were able to generalize a solution we may have multiple wins (a
> very obvious one in my mind is the inability of auto explain to run an
> explain at the precise time it's most useful: when statement timeout
> fires).

Yeah, trying to find a generalized solution seems like worth investing some
time to avoid having a bunch of CHECK_FOR_XXX() calls scattered in the code a
few years down the road.

I might be missing something, but since we already have a ton of macro hacks,
why not get another to allow CFI() overloading rather than modifying every
single call?  Something like that should do the trick (and CFIFlagHandler() is
just a naive example with a function call to avoid multiple evaluation, should
be replaced with anything that required more than 10s thinking):

#define CHECK_FOR_INTERRUPTS_0() \
do { \
    if (INTERRUPTS_PENDING_CONDITION()) \
        ProcessInterrupts(); \
} while(0)

#define CHECK_FOR_INTERRUPTS_1(f) \
do { \
    if (INTERRUPTS_PENDING_CONDITION()) \
        ProcessInterrupts(); \
    \
    CFIFlagHandler(f); \
} while(0)

#define CHECK_FOR_INTERRUPTS_X(x, f, CFI_IMPL, ...) CFI_IMPL

#define CHECK_FOR_INTERRUPTS(...) \
    CHECK_FOR_INTERRUPTS_X(, ##__VA_ARGS__, \
                            CHECK_FOR_INTERRUPTS_1(__VA_ARGS__), \
                            CHECK_FOR_INTERRUPTS_0(__VA_ARGS__) \
                        )

We would have to duplicate the current CFI body, but it should never really
change, and we shouldn't end up with more than 2 overloads anyway so I don't
see it being much of a problem.

Re: RFC: Logging plan of the running query

From

Ashutosh Bapat

Date:

26 February 2024, 06:49:42

On Sun, Feb 25, 2024 at 5:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Sat, Feb 24, 2024 at 08:56:41AM -0500, James Coleman wrote:
> > On Fri, Feb 23, 2024 at 10:23 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > >
> > > On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > >
> > > > But it doesn't have to be all or nothing right?  I mean each call could say
> > > > what the situation is like in their context, like
> > > > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), and
> > > > slowly tag calls as needed, similarly to how we add already CFI based on users
> > > > report.

That has some potential ...

>
> Yeah, trying to find a generalized solution seems like worth investing some
> time to avoid having a bunch of CHECK_FOR_XXX() calls scattered in the code a
> few years down the road.
>
> I might be missing something, but since we already have a ton of macro hacks,
> why not get another to allow CFI() overloading rather than modifying every
> single call?  Something like that should do the trick (and CFIFlagHandler() is
> just a naive example with a function call to avoid multiple evaluation, should
> be replaced with anything that required more than 10s thinking):
>
> #define CHECK_FOR_INTERRUPTS_0() \
> do { \
>         if (INTERRUPTS_PENDING_CONDITION()) \
>                 ProcessInterrupts(); \
> } while(0)
>
> #define CHECK_FOR_INTERRUPTS_1(f) \
> do { \
>         if (INTERRUPTS_PENDING_CONDITION()) \
>                 ProcessInterrupts(); \
>         \
>         CFIFlagHandler(f); \
> } while(0)

From your earlier description I thought you are talking about flags
that can be ORed. We need only two macros above. Why are we calling
CFIFLagHandler() after ProcessInterrupts()? Shouldn't we pass flags to
ProcessInterrupts() itself.

>
> #define CHECK_FOR_INTERRUPTS_X(x, f, CFI_IMPL, ...) CFI_IMPL
>
> #define CHECK_FOR_INTERRUPTS(...) \
>         CHECK_FOR_INTERRUPTS_X(, ##__VA_ARGS__, \
>                                                         CHECK_FOR_INTERRUPTS_1(__VA_ARGS__), \
>                                                         CHECK_FOR_INTERRUPTS_0(__VA_ARGS__) \
>                                                 )
>
> We would have to duplicate the current CFI body, but it should never really
> change, and we shouldn't end up with more than 2 overloads anyway so I don't
> see it being much of a problem.

Why do we need this complex macro?

--
Best Wishes,
Ashutosh Bapat

Re: RFC: Logging plan of the running query

From

Julien Rouhaud

Date:

26 February 2024, 07:55:04

On Mon, Feb 26, 2024 at 12:19:42PM +0530, Ashutosh Bapat wrote:
> On Sun, Feb 25, 2024 at 5:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > > > On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > > >
> > > > > But it doesn't have to be all or nothing right?  I mean each call could say
> > > > > what the situation is like in their context, like
> > > > > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), and
> > > > > slowly tag calls as needed, similarly to how we add already CFI based on users
> > > > > report.
>
> That has some potential ...
>
> >
> > I might be missing something, but since we already have a ton of macro hacks,
> > why not get another to allow CFI() overloading rather than modifying every
> > single call?  Something like that should do the trick (and CFIFlagHandler() is
> > just a naive example with a function call to avoid multiple evaluation, should
> > be replaced with anything that required more than 10s thinking):
> >
> > #define CHECK_FOR_INTERRUPTS_0() \
> > do { \
> >         if (INTERRUPTS_PENDING_CONDITION()) \
> >                 ProcessInterrupts(); \
> > } while(0)
> >
> > #define CHECK_FOR_INTERRUPTS_1(f) \
> > do { \
> >         if (INTERRUPTS_PENDING_CONDITION()) \
> >                 ProcessInterrupts(); \
> >         \
> >         CFIFlagHandler(f); \
> > } while(0)
>
> From your earlier description I thought you are talking about flags
> that can be ORed. We need only two macros above. Why are we calling
> CFIFLagHandler() after ProcessInterrupts()? Shouldn't we pass flags to
> ProcessInterrupts() itself.

Yes, I'm still talking about ORed flags passed to CFI().  That CFIFlagHandler
call is just an example for a generalized function that would act those flags,
rather than having it coded inside the macro.
>
> >
> > #define CHECK_FOR_INTERRUPTS_X(x, f, CFI_IMPL, ...) CFI_IMPL
> >
> > #define CHECK_FOR_INTERRUPTS(...) \
> >         CHECK_FOR_INTERRUPTS_X(, ##__VA_ARGS__, \
> >                                                         CHECK_FOR_INTERRUPTS_1(__VA_ARGS__), \
> >                                                         CHECK_FOR_INTERRUPTS_0(__VA_ARGS__) \
> >                                                 )
> >
> > We would have to duplicate the current CFI body, but it should never really
> > change, and we shouldn't end up with more than 2 overloads anyway so I don't
> > see it being much of a problem.
>
> Why do we need this complex macro?

So that client code can use either CHECK_FOR_INTERRUPTS() or
CHECK_FOR_INTERRUPTS(flag) rather that transforming every single
CHECK_FOR_INTERRUPTS() to CHECK_FOR_INTERRUPTS(0), which was Robert's
complaint about this approach.

Re: RFC: Logging plan of the running query

From

Ashutosh Bapat

Date:

26 February 2024, 08:15:08

On Mon, Feb 26, 2024 at 1:25 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > >
> > > #define CHECK_FOR_INTERRUPTS_X(x, f, CFI_IMPL, ...) CFI_IMPL
> > >
> > > #define CHECK_FOR_INTERRUPTS(...) \
> > >         CHECK_FOR_INTERRUPTS_X(, ##__VA_ARGS__, \
> > >                                                         CHECK_FOR_INTERRUPTS_1(__VA_ARGS__), \
> > >                                                         CHECK_FOR_INTERRUPTS_0(__VA_ARGS__) \
> > >                                                 )
> > >
> > > We would have to duplicate the current CFI body, but it should never really
> > > change, and we shouldn't end up with more than 2 overloads anyway so I don't
> > > see it being much of a problem.
> >
> > Why do we need this complex macro?
>
> So that client code can use either CHECK_FOR_INTERRUPTS() or
> CHECK_FOR_INTERRUPTS(flag) rather that transforming every single
> CHECK_FOR_INTERRUPTS() to CHECK_FOR_INTERRUPTS(0), which was Robert's
> complaint about this approach.

It might be better to just create two marcos  (with names like
CHECK_FOR_INTERRUPTS() and CHECK_FOR_INTERRUPTS_SAFE()) to call
ProcessInterrupt() directly and modify ProcessInterrupts() to accept
the flag (and if required call CFIFlagHandler() additionally). Last
macro is hard to understand.

--
Best Wishes,
Ashutosh Bapat

Re: RFC: Logging plan of the running query

From

Robert Haas

Date:

26 February 2024, 08:26:44

On Sun, Feb 25, 2024 at 5:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> Yeah, trying to find a generalized solution seems like worth investing some
> time to avoid having a bunch of CHECK_FOR_XXX() calls scattered in the code a
> few years down the road.

I just don't really see how to do it. I suspect that every task that
wants to run at some CFIs but not others is going to have slightly
different requirements, and we probably can't foresee what all of
those requirements are.

Said another way, if in the future we want to call
DoSomethingOrOther() from the CFI handler, well then we need to know
that we're not already in the middle of using any subsystem that
DoSomethingOrOther() might also try to use ... and we also need to
know that we're not in the middle of doing anything that's more
critical than DoSomethingOrOther(). But both of these are likely to
vary in each case.

EXPLAIN might be one member of a general class of things that require
catalog access (and thus might take locks or lwlocks, access the
catalogs, trigger invalidations, etc.) but it's not clear how general
that class really is. Also, I think if we try to do too many different
kinds of things at CFIs, the whole thing is going to fall apart.
You'll end up failing to foresee some interactions, or the stack will
get too deep, or ... something.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

26 February 2024, 12:01:45

On 2024-02-24 00:23, Robert Haas wrote:
> On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud <rjuju123@gmail.com> 
> wrote:
>> On Fri, Feb 23, 2024 at 10:22:32AM +0530, Robert Haas wrote:
>> > On Thu, Feb 22, 2024 at 6:25 AM James Coleman <jtc331@gmail.com> wrote:
>> > > This is potentially a bit of a wild idea, but I wonder if having some
>> > > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in
>> > > "normal" as opposed to "critical" (using that word differently than
>> > > the existing critical sections) would be worth it.
>> >
>> > It's worth considering, but the definition of "normal" vs. "critical"
>> > might be hard to pin down. Or, we might end up with a definition that
>> > is specific to this particular case and not generalizable to others.
>> 
>> But it doesn't have to be all or nothing right?  I mean each call 
>> could say
>> what the situation is like in their context, like
>> CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | 
>> GUARANTEE_WHATEVER), and
>> slowly tag calls as needed, similarly to how we add already CFI based 
>> on users
>> report.
> 
> Absolutely. My gut feeling is that it's going to be simpler to pick a
> small number of places that are safe and sufficient for this
> particular feature and add an extra call there

Hmm, whether extending CHECK_FOR_INTERRUPTS() or adding extras call 
directly, currently I'm not sure where are the good 'places', which 
don't give performance impact.

As attached PoC patch, I experimentally added extra calls on 
ExecScanFetch() which would be less called than ExecProcNode()[1].
When running sequential scan on pgbench_accounts which is on the memory, 
there seems a performance degradation.

   - Executed "select * from pgbench_accounts" for 20 times
   - Compared the elapsed time between the patch applied and not applied 
on 874d817baa160ca7e68
   - there were no heap_blks_read during the query
   - pgbench_accounts has 3000000 rows

   patch NOT applied:
   - average: 335.88 ms
   - max: 367.313 ms
   - min: 309.609 ms

   patch applied:
   - average: 342.57 ms
   - max: 380.099 ms
   - min: 324.270 ms

It would be nice if there was a place accessed once every few seconds or 
so..

[1] 
https://www.postgresql.org/message-id/20240215185911.v4o6fo444md6a3w7%40awork3.anarazel.de

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Attachment

v37-0001-Add-function-to-log-the-plan-of-the-query.patch

Re: RFC: Logging plan of the running query

From

Julien Rouhaud

Date:

26 February 2024, 13:54:41

On Mon, Feb 26, 2024 at 01:56:44PM +0530, Robert Haas wrote:
> On Sun, Feb 25, 2024 at 5:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > Yeah, trying to find a generalized solution seems like worth investing some
> > time to avoid having a bunch of CHECK_FOR_XXX() calls scattered in the code a
> > few years down the road.
>
> I just don't really see how to do it. I suspect that every task that
> wants to run at some CFIs but not others is going to have slightly
> different requirements, and we probably can't foresee what all of
> those requirements are.
>
> Said another way, if in the future we want to call
> DoSomethingOrOther() from the CFI handler, well then we need to know
> that we're not already in the middle of using any subsystem that
> DoSomethingOrOther() might also try to use ... and we also need to
> know that we're not in the middle of doing anything that's more
> critical than DoSomethingOrOther(). But both of these are likely to
> vary in each case.
>
> EXPLAIN might be one member of a general class of things that require
> catalog access (and thus might take locks or lwlocks, access the
> catalogs, trigger invalidations, etc.) but it's not clear how general
> that class really is. Also, I think if we try to do too many different
> kinds of things at CFIs, the whole thing is going to fall apart.
> You'll end up failing to foresee some interactions, or the stack will
> get too deep, or ... something.

I still fail to understand your point.  So you say we might want to check for
safe condition to run explain or DoSomethingOrOther or even DoMeh, with all
different requirements.  IIUC what you're saying is that we should have
CHECK_FOR_EXPLAIN(), CHECK_FOR_DOSOMETHINGOROTHER() and CHECK_FOR_MEH()?

And so in some code path A we could have

CHECK_FOR_INTERRUPTS();
CHECK_FOR_EXPLAIN();

In another

CHECK_FOR_INTERRUPTS();
CHECK_FOR_DOSOMETHINGOROTHER();

and in one happy path

CHECK_FOR_INTERRUPTS();
CHECK_FOR_EXPLAIN();
CHECK_FOR_DOSOMETHINGOROTHER();
CHECK_FOR_MEH();

Or that we should still have all of those but they shouldn't be anywhere close
to a CHECK_FOR_INTERRUPTS(), or something totally different?

In the first case, I'm not sure why having CHECK_FOR_INTERRUPTS(EXPLAIN|...),
with a combination of flags and with the flag handling being done after
ProcessIterrupts(), would be any different apart from having less boilerplate
lines.  Similarly for the 2nd case, but relying on a single more general
CHECK_FOR_CONDITION(EXPLAIN | ...) rather than N CHECK_FOR_XXX?

If you just want something totally different then sure.

Re: RFC: Logging plan of the running query

From

Robert Haas

Date:

28 February 2024, 06:18:32

On Mon, Feb 26, 2024 at 5:31 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
> It would be nice if there was a place accessed once every few seconds or
> so..

I think this comment earlier from Andres deserves close attention:

# If we went with something like tht approach, I think we'd have to do something
# like redirecting node->ExecProcNode to a wrapper, presumably from within a
# CFI. That wrapper could then implement the explain support, without slowing
# down the normal execution path.

If this is correctly implemented, the overhead in the case where the
feature isn't used should be essentially zero, I believe.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

James Coleman

Date:

02 March 2024, 15:46:36

On Wed, Feb 28, 2024 at 1:18 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Feb 26, 2024 at 5:31 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
> > It would be nice if there was a place accessed once every few seconds or
> > so..
>
> I think this comment earlier from Andres deserves close attention:
>
> # If we went with something like tht approach, I think we'd have to do something
> # like redirecting node->ExecProcNode to a wrapper, presumably from within a
> # CFI. That wrapper could then implement the explain support, without slowing
> # down the normal execution path.
>
> If this is correctly implemented, the overhead in the case where the
> feature isn't used should be essentially zero, I believe.

If I can rephrase this idea: it's basically "delay this interrupt
until inline to the next ExecProcNode execution".

That seems pretty promising to me as well.

Regards,
James Coleman

Re: RFC: Logging plan of the running query

From

Robert Haas

Date:

04 March 2024, 18:19:45

On Sat, Mar 2, 2024 at 10:46 AM James Coleman <jtc331@gmail.com> wrote:
> If I can rephrase this idea: it's basically "delay this interrupt
> until inline to the next ExecProcNode execution".

Yes, but it's not just that. It also means that the code which would
handle the interrupt doesn't need to be called at every ExecProcNode.
Only when the interrupt actually arrives do we enable the code that
handles it.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

13 March 2024, 05:28:42

On Fri, Feb 16, 2024 at 11:42 PM torikoshia <torikoshia@oss.nttdata.com> 
wrote:
> I'm not so sure about the implementation now, i.e. finding the next 
> node
> to be executed from the planstate tree, but I'm going to try this
> approach.

Attached a patch which takes this approach.

- I saw no way to find the next node to be executed from the planstate 
tree, so the patch wraps all the ExecProcNode of the planstate tree at 
CHECK_FOR_INTERRUPTS().
- To prevent overhead of this wrapped function call, unwrap it at the 
end of EXPLAIN code execution.
- I first tried to use ExecSetExecProcNode() for wrapping, but it 
'changes' ExecProcNodeMtd of nodes, not 'adds' some process to 
ExecProcNodeMtd. I'm not sure this is the right approach, but attached 
patch adds new member ExecProcNodeOriginal to PlanState to preserve 
original ExecProcNodeMtd.

Any comments are welcomed.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Attachment

v38-0001-Add-function-to-log-the-plan-of-the-query.patch

Re: RFC: Logging plan of the running query

From

Robert Haas

Date:

13 March 2024, 19:33:02

On Wed, Mar 13, 2024 at 1:28 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
> - I saw no way to find the next node to be executed from the planstate
> tree, so the patch wraps all the ExecProcNode of the planstate tree at
> CHECK_FOR_INTERRUPTS().

I don't think it does this correctly, because some node types have
children other than the left and right node. See /* special child
plans */ in ExplainNode().

But also ... having to wrap the entire plan tree like this seems
pretty awful. I don't really like the idea of a large-scan plan
modification like this in the middle of the query. I also wonder
whether it interacts properly with JIT. But at the same time, I wonder
how you're supposed to avoid it.

Andres, did you have some clever idea for this feature that would
avoid the need to do this?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

jian he

Date:

14 March 2024, 01:02:47

On Wed, Mar 13, 2024 at 1:28 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On Fri, Feb 16, 2024 at 11:42 PM torikoshia <torikoshia@oss.nttdata.com>
> wrote:
> > I'm not so sure about the implementation now, i.e. finding the next
> > node
> > to be executed from the planstate tree, but I'm going to try this
> > approach.
>
> Attached a patch which takes this approach.
>

one minor issue.
I understand the regress test, compare the expected outcome with
testrun outcome,
but can you enlighten me about how you check if the change you made in
contrib/auto_explain/t/001_auto_explain.pl is correct.
(i am not familiar with perl).

diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index cf195f1359..2d06bf297e 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -17,6 +17,8 @@
 #include "lib/stringinfo.h"
 #include "parser/parse_node.h"
+extern PGDLLIMPORT bool ProcessLogQueryPlanInterruptActive;

diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 054dd2bf62..3c6bd1ea7c 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -167,6 +167,7 @@ struct Node;
 extern bool message_level_is_interesting(int elevel);
+extern PGDLLIMPORT bool ProcessLogQueryPlanInterruptActive;

utils/elog.h is already included in src/include/postgres.h.
you don't need to declare ProcessLogQueryPlanInterruptActive at
include/commands/explain.h?
generally a variable should only declare once?

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

18 March 2024, 05:39:48

On 2024-03-14 04:33, Robert Haas wrote:
Thanks for your review!

> On Wed, Mar 13, 2024 at 1:28 AM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> - I saw no way to find the next node to be executed from the planstate
>> tree, so the patch wraps all the ExecProcNode of the planstate tree at
>> CHECK_FOR_INTERRUPTS().
> 
> I don't think it does this correctly, because some node types have
> children other than the left and right node. See /* special child
> plans */ in ExplainNode().

Agreed.

> But also ... having to wrap the entire plan tree like this seems
> pretty awful. I don't really like the idea of a large-scan plan
> modification like this in the middle of the query.

Yeah, but I haven't come up with other ideas to select the appropriate 
node to wrap.

> Andres, did you have some clever idea for this feature that would
> avoid the need to do this?



On 2024-03-14 10:02, jian he wrote:
> On Wed, Mar 13, 2024 at 1:28 PM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> 
>> On Fri, Feb 16, 2024 at 11:42 PM torikoshia 
>> <torikoshia@oss.nttdata.com>
>> wrote:
>> > I'm not so sure about the implementation now, i.e. finding the next
>> > node
>> > to be executed from the planstate tree, but I'm going to try this
>> > approach.
>> 
>> Attached a patch which takes this approach.
>> 
> 
> one minor issue.
> I understand the regress test, compare the expected outcome with
> testrun outcome,
> but can you enlighten me about how you check if the change you made in
> contrib/auto_explain/t/001_auto_explain.pl is correct.
> (i am not familiar with perl).

$pg_log_plan_query_output saves the output plan of pg_log_query_plan() 
and $auto_explain_output saves the output plan of auto_explain.
The test checks the both are the same using cmp_ok().

Did I answer your question?

> diff --git a/src/include/commands/explain.h 
> b/src/include/commands/explain.h
> index cf195f1359..2d06bf297e 100644
> --- a/src/include/commands/explain.h
> +++ b/src/include/commands/explain.h
> @@ -17,6 +17,8 @@
>  #include "lib/stringinfo.h"
>  #include "parser/parse_node.h"
> +extern PGDLLIMPORT bool ProcessLogQueryPlanInterruptActive;
> 
> diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
> index 054dd2bf62..3c6bd1ea7c 100644
> --- a/src/include/utils/elog.h
> +++ b/src/include/utils/elog.h
> @@ -167,6 +167,7 @@ struct Node;
>  extern bool message_level_is_interesting(int elevel);
> +extern PGDLLIMPORT bool ProcessLogQueryPlanInterruptActive;
> 
> utils/elog.h is already included in src/include/postgres.h.
> you don't need to declare ProcessLogQueryPlanInterruptActive at
> include/commands/explain.h?
> generally a variable should only declare once?

Yeah, this declaration is not necessary and we should add include 
commands/explain.h to src/backend/access/transam/xact.c.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: RFC: Logging plan of the running query

From

Andres Freund

Date:

26 March 2024, 02:34:47

Hi,

On 2024-03-13 15:33:02 -0400, Robert Haas wrote:
> But also ... having to wrap the entire plan tree like this seems
> pretty awful. I don't really like the idea of a large-scan plan
> modification like this in the middle of the query.

It's not great. But I also don't really see an alternative with this approach.

I guess we could invent a new CFI version that gets the current PlanState and
use that in all of src/backend/executor/node* and pass the PlanState to that -
but then we could just as well just directly process the interrupt there.

> I also wonder whether it interacts properly with JIT.

I don't think there's a problem unless somebody invests a lot of time in
JITing much more of the query. Which will require a lot more work, basically
redesigning the executor...

> Andres, did you have some clever idea for this feature that would
> avoid the need to do this?

No. I think it's acceptable though.

However it might be worth inventing an executor tree walker in a preliminary
step. We have already quite a few switches over all plan nodes, which we could
largely replace with a helper.

Greetings,

Andres Freund

Re: RFC: Logging plan of the running query

From

Akshat Jaimini

Date:

08 March, 18:42:21

Hi,
I think there is still some problem with the patch. I am not able to apply it to the master branch.

Can you please take another look at it?

Thanks,
Akshat Jaimini

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

10 March, 08:10:27

On 2025-03-09 00:42, Akshat Jaimini wrote:
> Hi,
> I think there is still some problem with the patch. I am not able to
> apply it to the master branch.
> 
> Can you please take another look at it?

Thanks for pointing it out!
Modified it.

BTW the patch adds about 400 lines to explain.c and it may be better to 
split the file as well as 9173e8b6046, but I leave it as it is for now.

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Attachment

v41-0001-Add-function-to-log-the-plan-of-the-currently-ru.patch

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

21 March, 15:40:00

Hi,

Rebased it again.

On 2025-03-10 14:10, torikoshia wrote:
> BTW the patch adds about 400 lines to explain.c and it may be better
> to split the file as well as 9173e8b6046, but I leave it as it is for
> now.

This part remains unchanged.

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Attachment

v42-0001-Add-function-to-log-the-plan-of-the-currently-ru.patch

Re: RFC: Logging plan of the running query

From

Sadeq Dousti

Date:

31 March, 21:51:28

Hi,

Sergey and I reviewed this patch and have a few comments, listed below.

> BTW the patch adds about 400 lines to explain.c and it may be better
> to split the file as well as 9173e8b6046, but I leave it as it is for
> now.

1. As rightfully described by the OP above, explain.c has grown too big. In addition, explain.h is added to quite a number of other files.

2. We wanted to entertain the possibility of a GUC variable to enable the feature. That is, having it disabled by default, and the DBA can selectively enable it on the fly. The reasoning is that it can affect some workloads in an unforeseen way, so this choice can make the next Postgres release safer. In future releases, we can make the default "enabled", assuming enough real-world scenarios have proven the feature safe (i.e., not affecting the DB performance noticeably).

3. In the email thread for this patch, we saw some attempts to measure the performance overhead of the feature. We suggest a more rigorous study, including memory overhead in addition to time overhead. We came to the conclusion that analytical workloads - with complicated query plans - and a high query rate are the best to attempt such benchmarks.

4. In PGConf EU 2024, there was a talk by Rafael Castro titled "Debugging active queries" [1], and he also submitted a recent patch titled "Proposal: Progressive explain" [2]. We see a lot of similarities in the idea behind that patch and this one, and would like to suggest joining forces for an ideal outcome.

[1] https://www.youtube.com/watch?v=6ahTb-7C05c

[2] https://commitfest.postgresql.org/patch/5473/

Best Regards,

Sadeq Dousti & Sergey Dudoladov

Re: RFC: Logging plan of the running query

From

Robert Haas

Date:

31 March, 22:24:51

On Fri, Mar 21, 2025 at 8:40 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
> Rebased it again.

Hi,

I apologize for not having noticed this thread sooner. I just became
aware of it as a result of a discussion in the hacking Discord server.
I think this has got a lot over overlap with the progressive EXPLAIN
patch from Rafael Castro, which I have been reviewing, but I am not
sure why we have two different patch sets here. Why is that?

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

01 April, 04:43:10

On 2025-04-01 04:24, Robert Haas wrote:
> On Fri, Mar 21, 2025 at 8:40 AM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> Rebased it again.
> 
> Hi,
> 
> I apologize for not having noticed this thread sooner.

Thank you for your checking! No worries.

> I just became
> aware of it as a result of a discussion in the hacking Discord server.

> I think this has got a lot over overlap with the progressive EXPLAIN
> patch from Rafael Castro, which I have been reviewing, but I am not
> sure why we have two different patch sets here. Why is that?

This thread proposes a feature to log the results of a plain EXPLAIN 
(without ANALYZE). From Rafael Castro's presentation materials at 
pgconf.eu 2024 [1], I understand that he built a new patch on top of 
this one, adding functionality to track execution progress and display 
results in a view.
He also mentioned that he used the way of wrapping plan nodes from this 
patch during the development to solve a problem[2].
I think that's why there are overlaps between the two.

Previously, Rafael proposed a patch in this thread that added execution 
progress tracking. However, I felt that expanding the scope could make 
it harder to get the patch reviewed or committed. So, I suggested first 
completing a feature that only retrieves the execution plan of a running 
query, and then developing execution progress tracking afterward[3].

As far as I remember, he did not respond to this thread after my 
suggestion but instead started a new thread later[4]. Based on that, I 
assume he would not have agreed with my proposed approach.

Rafael, please let us know if there are any misunderstandings in the 
above.

In this thread, I aimed to output the plan without requiring prior 
configuration if possible. However, based on your comment on Rafael's 
thread[5], it seems that this approach would be difficult:

   One way in which this proposal seems safer than previous proposals is
   that previous proposals have involved session A poking session B and
   trying to get session B to emit an EXPLAIN on the fly with no prior
   setup. That would be very useful, but I think it's more difficult and
   more risky than this proposal, where all the configuration happens in
   the session that is going to emit the EXPLAIN output.

I was considering whether to introduce a GUC in this patch to allow for 
prior setup before outputting the plan or to switch to Rafael's patch 
after reviewing its details. However, since there isn’t much time left 
before the feature freeze, if you have already reviewed Rafael's patch 
and there is a chance it could be committed, it would be better to focus 
on that.

[1] https://www.youtube.com/watch?v=6ahTb-7C05c (mentions this patch 
after the 12-minute mark)
[2] 
https://www.postgresql.org/message-id/CAG0ozMo30smtXXOR8bSCbhaZAQHo4%3DezerLitpERk85Q0ga%2BFw%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/c161b5e7e1888eb9c9eb182a7d9dcf89%40oss.nttdata.com
[4] 

https://www.postgresql.org/message-id/flat/CAG0ozMo30smtXXOR8bSCbhaZAQHo4%3DezerLitpERk85Q0ga%2BFw%40mail.gmail.com#4b261994a0d44649ded6e1434a00e134
[5] 
https://www.postgresql.org/message-id/CA%2BTgmoaD985%2BVLwR93c8PjSaoBqxw72Eu7pfBJcArzhjJ71aRw%40mail.gmail.com

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Re: RFC: Logging plan of the running query

From

Robert Haas

Date:

01 April, 15:32:04

On Mon, Mar 31, 2025 at 9:43 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
> Previously, Rafael proposed a patch in this thread that added execution
> progress tracking. However, I felt that expanding the scope could make
> it harder to get the patch reviewed or committed. So, I suggested first
> completing a feature that only retrieves the execution plan of a running
> query, and then developing execution progress tracking afterward[3].

That's reasonable. Comparing the two patches, I think that you have
correctly solved a couple of key problems that Rafael did not handle
correctly. Specifically, I believe you have correctly implemented what
Andres described i.e. use CFI to get control to do the wrapping and
then from the ExecProcNode wrapper do the actual EXPLAIN, whereas I
believe Rafael was doing the wrapping directly from the signal
handler, which did not seem safe to me:

http://postgr.es/m/CA+TgmobrzeDep+Z1BPQqGNsCqTQ8M58wNNKJB_8Lwpwbqbz3GQ@mail.gmail.com

I also like your version of (sub)transaction abort handling much
better than the memory-context cleanup that Rafael initially chose.
He's since revised that.

That said, I think the view that Rafael proposes, with a
periodically-updating version of the EXPLAIN ANALYZE output up to that
point, could be extremely useful in some situations. To make that
work, he mostly just hacked InstrStopNode(), although as I'm thinking
about it, that probably doesn't handle all of the parallel query cases
correctly. I'm somewhat inclined to hope that we eventually end up
with both interfaces.

> I was considering whether to introduce a GUC in this patch to allow for
> prior setup before outputting the plan or to switch to Rafael's patch
> after reviewing its details. However, since there isn’t much time left
> before the feature freeze, if you have already reviewed Rafael's patch
> and there is a chance it could be committed, it would be better to focus
> on that.

I wasn't feeling very confident about my ability to get that patch
committed before feature freeze. I don't want to rush into something
that we might later regret. I'm going to spend a bit more time
studying your patch next.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

01 April, 16:57:44

On 2025-04-01 21:32, Robert Haas wrote:
Thanks for your comment.

> On Mon, Mar 31, 2025 at 9:43 PM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> Previously, Rafael proposed a patch in this thread that added 
>> execution
>> progress tracking. However, I felt that expanding the scope could make
>> it harder to get the patch reviewed or committed. So, I suggested 
>> first
>> completing a feature that only retrieves the execution plan of a 
>> running
>> query, and then developing execution progress tracking afterward[3].
> 
> That's reasonable. Comparing the two patches, I think that you have
> correctly solved a couple of key problems that Rafael did not handle
> correctly. Specifically, I believe you have correctly implemented what
> Andres described i.e. use CFI to get control to do the wrapping and
> then from the ExecProcNode wrapper do the actual EXPLAIN, whereas I
> believe Rafael was doing the wrapping directly from the signal
> handler, which did not seem safe to me:
> 
> http://postgr.es/m/CA+TgmobrzeDep+Z1BPQqGNsCqTQ8M58wNNKJB_8Lwpwbqbz3GQ@mail.gmail.com
> 
> I also like your version of (sub)transaction abort handling much
> better than the memory-context cleanup that Rafael initially chose.
> He's since revised that.
> 
> That said, I think the view that Rafael proposes, with a
> periodically-updating version of the EXPLAIN ANALYZE output up to that
> point, could be extremely useful in some situations.
+1.

> To make that
> work, he mostly just hacked InstrStopNode(), although as I'm thinking
> about it, that probably doesn't handle all of the parallel query cases
> correctly. I'm somewhat inclined to hope that we eventually end up
> with both interfaces.

>> I was considering whether to introduce a GUC in this patch to allow 
>> for
>> prior setup before outputting the plan or to switch to Rafael's patch
>> after reviewing its details. However, since there isn’t much time left
>> before the feature freeze, if you have already reviewed Rafael's patch
>> and there is a chance it could be committed, it would be better to 
>> focus
>> on that.
> 
> I wasn't feeling very confident about my ability to get that patch
> committed before feature freeze. I don't want to rush into something
> that we might later regret. I'm going to spend a bit more time
> studying your patch next.

That’s really appreciated!
I believe some of the comments in Rafael's thread should be reflected in 
this one as well, but I haven’t incorporated them yet. Apologies for 
that.


-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Re: RFC: Logging plan of the running query

From

Robert Haas

Date:

01 April, 21:52:30

On Fri, Mar 21, 2025 at 8:40 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
> Rebased it again.
>
> On 2025-03-10 14:10, torikoshia wrote:
> > BTW the patch adds about 400 lines to explain.c and it may be better
> > to split the file as well as 9173e8b6046, but I leave it as it is for
> > now.
>
> This part remains unchanged.

Looking at ExplainAssembleLogOutput() is making me realize that
auto_explain is in serious need of some cleanup. That's not really the
fault of this patch, but the hack whereby we overwrite the [] that
would have surrounded the JSON output with {} is not very nice. I also
think that the auto_explain GUCs need rethinking. In theory, new
EXPLAIN options should be mirrored into auto_explain, but if you
compare ExplainAssembleLogOutput() to ExplainOnePlan(), you can see
that they are diverging. The PLANNING, SUMMARY, and SERIALIZE options
that are known to regular EXPLAIN aren't known to auto_explain, and
any customizable options that use explain_per_plan_hook won't be able
to work with auto_explain, either. Changing this is non-trivial
because SERIALIZE, for example, can't work the same way for
auto_explain as it does for EXPLAIN, and a full solution might also
require user-visible changes like replacing
auto_explain.log_<whatever> with auto_explain.explain, so I don't
really know. Maybe we should just live with it the way it is for now,
but it doesn't look very nice.

Rafael and I were just discussing off-list to what extent the parallel
query problems with his patch also apply to yours. His design makes
the problem easier to hit, I think, because setting
progressive_explain = on makes every backend attempt to dump a query
plan, whereas you'd have to hit the worker process with a signal and
just the right time. But the fundamental problem appears to be the
same. Another interesting wrinkle is that he settled on showing the
outermost running query whereas you settled on the innermost. If you
were showing the outermost, perhaps you could just prohibit
dump-the-query-plan on a parallel worker, but you don't really want to
prohibit it categorically, because the worker could be running an
inner query that could be dumped. So maybe you want to avoid setting
ActiveQueryDesc at the toplevel and then set/clear it for inner
queries. However, that's a bit weird, too. If we wanted to undertake a
bigger redesign here, we could try pushing the entire query plan
(including all subqueries) down to the worker, and just tell it which
part it's actually supposed to execute. However, that would have some
overhead and would arguably open up some risk of future bugs, since
passing subqueries as NULL is, I think, intended partly as a guard
against accidentally executing the wrong subqueries.

I don't think it's a good idea to put the logic to update
ActiveQueryDesc into ExecutorRun. I think that function should really
just call the hook, or standard_ExecutorRun. I don't know for sure
whether this work should be moved down into ExecutorRun or up into the
caller, but I don't think it should stay where it is.

My comment about the naming of WrapMultiExecProcNodesWithExplain() on
the other thread also applies here: MultiExecProcNode() is an
unrelated function. Likewise, WrapExecProcNodeWithExplain() misses
walking the node's subplan list. Also, I don't think an
ereport(DEBUG1, ...) is appropriate here.

Do we really need es->signaled? Couldn't we test es->analyze instead?

Do we really need ExecProcNodeOriginal? Can we find some way to reuse
ExecProcNodeReal instead of making the structure bigger?

I do think we should try to move some of this new code out into a
separate source file, but I'm not yet sure what we should call it. We
might want to share infrastructure with something what Rafael's patch,
which he called progressive EXPLAIN but which is really closer to a
general query progress facility, albeit perhaps too expensive to have
enabled by default.

Does the documentation typically mention when a function is
superuser-only? If so, it should do so here, too, using similar
wording.

It seems unnecessary to remark that you can't log a query plan after a
subtransaction abort, because the query wouldn't be running any more.
It's also true that you can't log a query after a toplevel abort, even
if you're still waiting for the aborted transaction to roll back. But
it also seems like once the aborted subtransaction is popped off the
stack and we return to the parent transaction, we should be able to
log any query running at the outer level. If this is meant to imply
that something doesn't work in this scenario, perhaps there is
something about the design that needs fixing.

Does ActiveQueryDesc really need to be exposed to the whole system?
Could it be a file-level variable?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

Sami Imseih

Date:

01 April, 22:05:02

> Looking at ExplainAssembleLogOutput() is making me realize that
> auto_explain is in serious need of some cleanup. That's not really the
> fault of this patch, but the hack whereby we overwrite the [] that
> would have surrounded the JSON output with {} is not very nice. I also
> think that the auto_explain GUCs need rethinking. In theory, new
> EXPLAIN options should be mirrored into auto_explain, but if you
> compare ExplainAssembleLogOutput() to ExplainOnePlan(), you can see
> that they are diverging. The PLANNING, SUMMARY, and SERIALIZE options
> that are known to regular EXPLAIN aren't known to auto_explain, and
> any customizable options that use explain_per_plan_hook won't be able
> to work with auto_explain, either. Changing this is non-trivial
> because SERIALIZE, for example, can't work the same way for
> auto_explain as it does for EXPLAIN, and a full solution might also
> require user-visible changes like replacing
> auto_explain.log_<whatever> with auto_explain.explain, so I don't
> really know. Maybe we should just live with it the way it is for now,
> but it doesn't look very nice.

FWIW, I have been thinking about auto_explain for another task,
remote plans for fdw [0], and perhaps there are now other good
reasons, some that you mention, that can be simplified if "auto_explain"
becomes a core feature. This could be a proposal taken up in 19.

[0]
https://www.postgresql.org/message-id/flat/CAP%2BB4TD%3Diy-C2EnsrJgjpwSc7_4pd3Xh-gFzA0bwsw3q8u860g%40mail.gmail.com


--
Sami Imseih
Amazon Web Services (AWS)

Re: RFC: Logging plan of the running query

From

Robert Haas

Date:

01 April, 22:29:27

On Tue, Apr 1, 2025 at 3:05 PM Sami Imseih <samimseih@gmail.com> wrote:
> > Looking at ExplainAssembleLogOutput() is making me realize that
> > auto_explain is in serious need of some cleanup. That's not really the
> > fault of this patch, but the hack whereby we overwrite the [] that
> > would have surrounded the JSON output with {} is not very nice. I also
> > think that the auto_explain GUCs need rethinking. In theory, new
> > EXPLAIN options should be mirrored into auto_explain, but if you
> > compare ExplainAssembleLogOutput() to ExplainOnePlan(), you can see
> > that they are diverging. The PLANNING, SUMMARY, and SERIALIZE options
> > that are known to regular EXPLAIN aren't known to auto_explain, and
> > any customizable options that use explain_per_plan_hook won't be able
> > to work with auto_explain, either. Changing this is non-trivial
> > because SERIALIZE, for example, can't work the same way for
> > auto_explain as it does for EXPLAIN, and a full solution might also
> > require user-visible changes like replacing
> > auto_explain.log_<whatever> with auto_explain.explain, so I don't
> > really know. Maybe we should just live with it the way it is for now,
> > but it doesn't look very nice.
>
> FWIW, I have been thinking about auto_explain for another task,
> remote plans for fdw [0], and perhaps there are now other good
> reasons, some that you mention, that can be simplified if "auto_explain"
> becomes a core feature. This could be a proposal taken up in 19.

For what we're talking about here, I don't think we would need to go
that far -- maybe put a few functions in core but no real need to move
the whole module into core. However, I don't rule out that there are
other reasons to do as you suggest.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

Sami Imseih

Date:

02 April, 19:22:54

> > FWIW, I have been thinking about auto_explain for another task,
> > remote plans for fdw [0], and perhaps there are now other good
> > reasons, some that you mention, that can be simplified if "auto_explain"
> > becomes a core feature. This could be a proposal taken up in 19.
>
> For what we're talking about here, I don't think we would need to go
> that far -- maybe put a few functions in core but no real need to move
> the whole module into core. However, I don't rule out that there are
> other reasons to do as you suggest.

This is the first core feature that will allow users to log explain plans for
a live workload. EXPLAIN is not really good for this purpose. So this
proposal is a step in the correct direction. I think the appetite for more
plan logging/visibility options will likely increase, and auto_explainlike
features in core will be desired IMO. We will see.

As far as this patch goes, I took a look and I have some comments:

1/
The name of ExplainAssembleLogOutput is not appropriate as
it does not really do anything with logs. Maybe ExplainStringAssemble
is more appropriate?

2/
It should be noted that the plan will not print to the log until
the plan begins executing the next plan node? depending on the
operation, that could take some time ( i.e.long seq scan of a table, etc.)
Does this behavior need to be called out in docs?

3/
+ * By default, only superusers are allowed to signal to log the plan because
+ * allowing any users to issue this request at an unbounded rate would

Only superuser allowed to do this is very restrictive. Many shops do
not, for good
reasons, want DBAs or monitoring tools to connect as superuser. Why not allow
this functionality with "pg_monitor" ?

4/
nit: line removed here unnecessarily

 extern PGDLLIMPORT Portal ActivePortal;
-
+extern PGDLLIMPORT QueryDesc *ActiveQueryDesc;

5/
WrapCustomPlanChildExecProcNodesWithExplain

This function name is too long, can the name be simplified?

6/
are such DEBUG1's really necessary, considering this is
a manually triggered function?

        case T_Append:
            ereport(DEBUG1, errmsg("wrapping Append"));

7/
Why do we only support TEXT format? I tried it with JSON
and it looks fine in the log as well. I can imagine automated
tools will want to be able to retrieve the plans using the
structured formats.

8/
+       es->format = EXPLAIN_FORMAT_TEXT;
+       es->settings = true;
+       es->verbose = true;
+       es->signaled = true;
+
+       ExplainAssembleLogOutput(es, ActiveQueryDesc,
EXPLAIN_FORMAT_TEXT, 0, -1);
+
Can we just pass es->format to ExplainAssembleLogOutput as the 3rd argument?


--
Sami Imseih
Amazon Web Services (AWS)

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

03 April, 08:32:16

On 2025-04-02 03:52, Robert Haas wrote:
Thank you for review!
> On Fri, Mar 21, 2025 at 8:40 AM torikoshia <torikoshia@oss.nttdata.com> 
> wrote:
>> Rebased it again.
>> 
>> On 2025-03-10 14:10, torikoshia wrote:
>> > BTW the patch adds about 400 lines to explain.c and it may be better
>> > to split the file as well as 9173e8b6046, but I leave it as it is for
>> > now.
>> 
>> This part remains unchanged.
> 
> Looking at ExplainAssembleLogOutput() is making me realize that
> auto_explain is in serious need of some cleanup. That's not really the
> fault of this patch, but the hack whereby we overwrite the [] that
> would have surrounded the JSON output with {} is not very nice. I also
> think that the auto_explain GUCs need rethinking. In theory, new
> EXPLAIN options should be mirrored into auto_explain, but if you
> compare ExplainAssembleLogOutput() to ExplainOnePlan(), you can see
> that they are diverging. The PLANNING, SUMMARY, and SERIALIZE options
> that are known to regular EXPLAIN aren't known to auto_explain, and
> any customizable options that use explain_per_plan_hook won't be able
> to work with auto_explain, either. Changing this is non-trivial
> because SERIALIZE, for example, can't work the same way for
> auto_explain as it does for EXPLAIN, and a full solution might also
> require user-visible changes like replacing
> auto_explain.log_<whatever> with auto_explain.explain, so I don't
> really know. Maybe we should just live with it the way it is for now,
> but it doesn't look very nice.
> 
> Rafael and I were just discussing off-list to what extent the parallel
> query problems with his patch also apply to yours. His design makes
> the problem easier to hit, I think, because setting
> progressive_explain = on makes every backend attempt to dump a query
> plan, whereas you'd have to hit the worker process with a signal and
> just the right time. But the fundamental problem appears to be the
> same.

In this patch, plan output is restricted to processes with B_BACKEND.
When targeting parallel workers, the plan is not attempted to be output, 
as shown below:

   =# select pg_log_query_plan(pid) from pg_stat_activity where 
backend_type = 'parallel worker';

    pg_log_query_plan
   -------------------
    f
    f
   (2 rows)

   WARNING:  PID 22720 is not a PostgreSQL client backend process
   WARNING:  PID 22719 is not a PostgreSQL client backend process

Given that the goal of this thread is simply to output the plan, I think 
this is sufficient. Users can view the plan by accessing their leader 
process.
However, I do agree that retrieving this information is necessary for 
Rafael's work on tracking execution progress.

I think tracking execution progress involves more challenges to solve 
compared to simply outputting the plan.
For this reason, I believe an incremental approach -- first completing 
the basic plan output functionality in this thread and then extending it 
to support progress tracking -- would be the good way forward.

> Does ActiveQueryDesc really need to be exposed to the whole system?
> Could it be a file-level variable?

Until the latest version patch, my goal was to output the plan without 
requiring prior configuration, and I haven't seen any other viable 
approach.

However, for the next patch, I'm considering introducing a GUC to allow 
prior setup before outputting the plan, in response to the previously 
quoted comment:

    One way in which this proposal seems safer than previous proposals is
    that previous proposals have involved session A poking session B and
    trying to get session B to emit an EXPLAIN on the fly with no prior
    setup. That would be very useful, but I think it's more difficult and
    more risky than this proposal, where all the configuration happens in
    the session that is going to emit the EXPLAIN output.

With this change, it should be possible to use a file-level variable 
instead.

I'm going to try to improve other points you raised.

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

03 April, 08:34:27

On 2025-04-01 03:51, Sadeq Dousti wrote:
> Hi,
> 
> Sergey and I reviewed this patch and have a few comments, listed
> below.
Thanks for your review!

>> BTW the patch adds about 400 lines to explain.c and it may be better
>> to split the file as well as 9173e8b6046, but I leave it as it is
> for
>> now.
> 
> 1. As rightfully described by the OP above, explain.c has grown too
> big. In addition, explain.h is added to quite a number of other files.
Agreed.

> 
> 2. We wanted to entertain the possibility of a GUC variable to enable
> the feature. That is, having it disabled by default, and the DBA can
> selectively enable it on the fly. The reasoning is that it can affect
> some workloads in an unforeseen way, so this choice can make the next
> Postgres release safer. In future releases, we can make the default
> "enabled", assuming enough real-world scenarios have proven the
> feature safe (i.e., not affecting the DB performance noticeably).
Agreed.

> 3. In the email thread for this patch, we saw some attempts to measure
> the performance overhead of the feature. We suggest a more rigorous
> study, including memory overhead in addition to time overhead. We came
> to the conclusion that analytical workloads - with complicated query
> plans - and a high query rate are the best to attempt such benchmarks.
Agreed.

> 4. In PGConf EU 2024, there was a talk by Rafael Castro titled
> "Debugging active queries" [1], and he also submitted a recent patch
> titled "Proposal: Progressive explain" [2]. We see a lot of
> similarities in the idea behind that patch and this one, and would
> like to suggest joining forces for an ideal outcome.
As we have been discussing in this thread recently, I believe that since 
this patch has a narrower scope, it makes sense to complete it first and 
then extend it to support progress tracking in a step-by-step manner.

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

03 April, 08:59:02

On 2025-04-03 01:22, Sami Imseih wrote:
>> > FWIW, I have been thinking about auto_explain for another task,
>> > remote plans for fdw [0], and perhaps there are now other good
>> > reasons, some that you mention, that can be simplified if "auto_explain"
>> > becomes a core feature. This could be a proposal taken up in 19.
>> 
>> For what we're talking about here, I don't think we would need to go
>> that far -- maybe put a few functions in core but no real need to move
>> the whole module into core. However, I don't rule out that there are
>> other reasons to do as you suggest.
> 
> This is the first core feature that will allow users to log explain 
> plans for
> a live workload. EXPLAIN is not really good for this purpose. So this
> proposal is a step in the correct direction. I think the appetite for 
> more
> plan logging/visibility options will likely increase, and 
> auto_explainlike
> features in core will be desired IMO. We will see.
> 
> As far as this patch goes, I took a look and I have some comments:

Thanks for your comments!


> 2/
> It should be noted that the plan will not print to the log until
> the plan begins executing the next plan node? depending on the
> operation, that could take some time ( i.e.long seq scan of a table, 
> etc.)
> Does this behavior need to be called out in docs?

Seems reasonable, but long seq scan of a table would not cause the case 
since ExecProcNode() is called at least the number of the rows in a 
table, as far as I remember.
Of course there can be the case where long time can elapse before 
executing ExecProcNode(), so I agree with adding the doc about this.

I'm going to improve including other than this comment in the next 
patch.


-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Re: RFC: Logging plan of the running query

From

Sami Imseih

Date:

03 April, 15:40:14

> > 2/
> > It should be noted that the plan will not print to the log until
> > the plan begins executing the next plan node? depending on the
> > operation, that could take some time ( i.e.long seq scan of a table,
> > etc.)
> > Does this behavior need to be called out in docs?
>
> Seems reasonable, but long seq scan of a table would not cause the case
> since ExecProcNode() is called at least the number of the rows in a
> table, as far as I remember.
> Of course there can be the case where long time can elapse before
> executing ExecProcNode(), so I agree with adding the doc about this.

Correct. Seq Scan is a bad example since ExecProcNode is called for
every tuple as you mention. MultiExecProcNode, functions doing
time consuming computations, pg_sleep, etc. can all delay the
signal being sent.

I also realized that the extended query protocol may have its own caveats.
A query running under the extended protocol will alternate between
"active" and "idle in transaction"
as it transitions through parse, bind, and execute. If a user calls
pg_log_query_plan while
the state is "idle in transaction," it will result in a "backend with
PID ... is not running a query" log message.
This is more likely if the query repeatedly reissues an "execute"
message (i.e., JDBC fetchSize).
Of course, if the user executes pg_log_query_plan again, it will
(eventually) log the plan,
but it may be very confusing to see the "is not running a query"
message in the logs.
the chances of this behavior is low, but not 0, so it's probably worth
calling out
in documentation.

> 3/
> + * By default, only superusers are allowed to signal to log the plan because
> + * allowing any users to issue this request at an unbounded rate would

> Only superuser allowed to do this is very restrictive. Many shops do
> not, for good
> reasons, want DBAs or monitoring tools to connect as superuser. Why not allow
> this functionality with "pg_monitor" ?

I just realized that my comment above is unwarranted. A superuser can
just simply GRANT EXECUTE ON FUNCTION to pg_monitor, or whatever
monitoring role if they choose. You can ignore this.

--
Sami Imseih
Amazon Web Services (AWS)

Re: RFC: Logging plan of the running query

From

Robert Haas

Date:

03 April, 17:10:26

On Thu, Apr 3, 2025 at 1:32 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
> I think tracking execution progress involves more challenges to solve
> compared to simply outputting the plan.
> For this reason, I believe an incremental approach -- first completing
> the basic plan output functionality in this thread and then extending it
> to support progress tracking -- would be the good way forward.

I think you might be right, although I am not totally certain yet.

> However, for the next patch, I'm considering introducing a GUC to allow
> prior setup before outputting the plan, in response to the previously
> quoted comment:
>
>     One way in which this proposal seems safer than previous proposals is
>     that previous proposals have involved session A poking session B and
>     trying to get session B to emit an EXPLAIN on the fly with no prior
>     setup. That would be very useful, but I think it's more difficult and
>     more risky than this proposal, where all the configuration happens in
>     the session that is going to emit the EXPLAIN output.

When I wrote this comment, I was unaware of Andres's proposal to use
ProcessInterrupts() to install the ExecProcNode() wrapper. With that
approach, which you have already implemented, I don't see a reason to
require prior configuration.

> With this change, it should be possible to use a file-level variable
> instead.

I think the question of whether ActiveQueryDesc can be file-level is
separate from whether prior configuration is needed. If it is
important to touch this from multiple source files, then it is fine
for it to be global. However, if we have a new source file, say
dynamic_explain.c, then you could have functions
ProcessDynamicExplainInterrupt() and DynamicExplainCleanup() in that
file to set, use, clear ActiveQueryDesc, and the rest of the system
might not need to know about it.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: RFC: Logging plan of the running query

From

Atsushi Torikoshi

Date:

05 April, 09:13:47

On Thu, Apr 3, 2025 at 11:10 PM Robert Haas <robertmhaas@gmail.com> wrote:

> Looking at ExplainAssembleLogOutput() is making me realize that
> auto_explain is in serious need of some cleanup. That's not really the
> fault of this patch, but the hack whereby we overwrite the [] that
> would have surrounded the JSON output with {} is not very nice. I also
> think that the auto_explain GUCs need rethinking. In theory, new
> EXPLAIN options should be mirrored into auto_explain, but if you
> compare ExplainAssembleLogOutput() to ExplainOnePlan(), you can see
> that they are diverging. The PLANNING, SUMMARY, and SERIALIZE options
> that are known to regular EXPLAIN aren't known to auto_explain, and
> any customizable options that use explain_per_plan_hook won't be able
> to work with auto_explain, either. Changing this is non-trivial
> because SERIALIZE, for example, can't work the same way for
> auto_explain as it does for EXPLAIN, and a full solution might also
> require user-visible changes like replacing
> auto_explain.log_<whatever> with auto_explain.explain, so I don't
> really know. Maybe we should just live with it the way it is for now,
> but it doesn't look very nice.

I agree that the current state isn’t ideal, and that addressing it wouldn’t be trivial.
While I do think it would be better to tackle this before proceeding with the current patch, I don’t think it's a prerequisite.
Also, since I’m not yet sure what the best way to fix it would be, I’ve left it as-is for now.

> I don't think it's a good idea to put the logic to update
> ActiveQueryDesc into ExecutorRun. I think that function should really
> just call the hook, or standard_ExecutorRun. I don't know for sure
> whether this work should be moved down into ExecutorRun or up into the
> caller, but I don't think it should stay where it is.

Moved the logic from ExecutorRun() to standard_ExecutorRun(), since it performs the necessary setup for collecting information during plan execution i.e. calling InstrStartNode().

> My comment about the naming of WrapMultiExecProcNodesWithExplain() on
> the other thread also applies here: MultiExecProcNode() is an
> unrelated function.

Renamed WrapMultiExecProcNodesWithExplain to WrapPlanStatesWithExplain.

> Likewise, WrapExecProcNodeWithExplain() misses
> walking the node's subplan list.

Added logic for subplan to WrapExecProcNodeWithExplain() and UnwrapExecProcNodeWithExplain().

> Also, I don't think an
> ereport(DEBUG1, ...) is appropriate here.

Removed these ereport(DEBUG1).

> Do we really need es->signaled? Couldn't we test es->analyze instead?

I think it's necessary.
Although when implementing progressive EXPLAIN, I believe we can use es->analyze instead, this patch aims to retrieve plans for queries that have neither an EXPLAIN nor an ANALYZE specified.

> Do we really need ExecProcNodeOriginal? Can we find some way to reuse
> ExecProcNodeReal instead of making the structure bigger?

I also wanted to implement this without adding elements to PlanState if possible, but I haven't found a good solution, so the patch uses ExecSetExecProcNode.

> I do think we should try to move some of this new code out into a
> separate source file, but I'm not yet sure what we should call it. We
> might want to share infrastructure with something what Rafael's patch,
> which he called progressive EXPLAIN but which is really closer to a
> general query progress facility, albeit perhaps too expensive to have
> enabled by default.

Made new files dynamic_explain.h and dynamic_explain.c, which you named and it seems fine to me, and move most of the code to them.

> Does the documentation typically mention when a function is
> superuser-only? If so, it should do so here, too, using similar
> wording.

Added below explanation like other functions:

This function is restricted to superusers by default, but other
users can be granted EXECUTE to run the function.

> It seems unnecessary to remark that you can't log a query plan after a
> subtransaction abort, because the query wouldn't be running any more.

Removed the description.

> It's also true that you can't log a query after a toplevel abort, even
> if you're still waiting for the aborted transaction to roll back. But
> it also seems like once the aborted subtransaction is popped off the
> stack and we return to the parent transaction, we should be able to
> log any query running at the outer level.

It works as below:

session-1
begin;
savepoint s1;
err;
ERROR: syntax error at or near "err"
rollback to s1;
select pg_sleep(5);

session-2
select pg_log_query_plan(pid of session-1);

log
LOG: query plan running on backend with PID 40025 is:
Query Text: select pg_sleep(5);
Result (cost=0.00..0.01 rows=1 width=4)
Output: pg_sleep('5'::double precision)

> Does ActiveQueryDesc really need to be exposed to the whole system?
> Could it be a file-level variable?

On Thu, Apr 3, 2025 at 11:10 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I think the question of whether ActiveQueryDesc can be file-level is
> separate from whether prior configuration is needed. If it is
> important to touch this from multiple source files, then it is fine
> for it to be global. However, if we have a new source file, say
> dynamic_explain.c, then you could have functions
> ProcessDynamicExplainInterrupt() and DynamicExplainCleanup() in that
> file to set, use, clear ActiveQueryDesc, and the rest of the system
> might not need to know about it.

Thanks for the advice, I took this approach and made ActiveQueryDesc file-level variable.

On Thu, Apr 3, 2025 at 1:23 AM Sami Imseih <samimseih@gmail.com> wrote:
> 7/
> Why do we only support TEXT format? I tried it with JSON
> and it looks fine in the log as well. I can imagine automated
> tools will want to be able to retrieve the plans using the
> structured formats.

I agree that allowing the choice of format would be useful.
However, I believe implementing this would require introducing new GUC or creating a mechanism to pass the format from the executor of pg_log_query_plan() to the target process.
For now, I think it would be better to minimize the scope of the initial patch.

I think I have reflected your other comments in the attached patch.

Regards,

Atsushi Torikoshi

Attachment

v43-0001-Add-function-to-log-the-plan-of-the-currently-ru.patch

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

24 April, 15:48:57

Hi,

Attached a rebased version of the patch.

-- 
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Attachment

v44-0001-Add-function-to-log-the-plan-of-the-currently-ru.patch

Re: RFC: Logging plan of the running query

From

Hannu Krosing

Date:

28 April, 02:55:31

Have you also checked out
https://github.com/postgrespro/pg_query_state which logs running query
plan AND collected counts and timings as a response to a signal?

Has this ever been discussed for inclusion in core ?

On Thu, Apr 24, 2025 at 2:49 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> Hi,
>
> Attached a rebased version of the patch.
>
> --
> Regards,
>
> --
> Atsushi Torikoshi
> Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

30 April, 12:53:40

Hi,

On 2025-04-28 08:55, Hannu Krosing wrote:
> Have you also checked out
> https://github.com/postgrespro/pg_query_state which logs running query
> plan AND collected counts and timings as a response to a signal?

Yes. For example, see the discussion:
https://www.postgresql.org/message-id/d68c3ae31672664876b22d2dcbb526d2%40postgrespro.ru

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a750dc8..e1b0be5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3492,6 +3492,8 @@ ProcessInterrupts(void)
         if (ParallelMessagePending)
                 HandleParallelMessages();

+       CheckAndHandleCustomSignals();

> Has this ever been discussed for inclusion in core ?

As far as I understand from reading a bit of pg_query_state, it 
registers custom interrupts to obtain the query state, including the 
output of EXPLAIN:

   -- pg_query_state/patches/custom_signals_17.0.patch
   diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
   index a750dc8..e1b0be5 100644
   --- a/src/backend/tcop/postgres.c
   +++ b/src/backend/tcop/postgres.c
   @@ -3492,6 +3492,8 @@ ProcessInterrupts(void)
           if (ParallelMessagePending)
                   HandleParallelMessages();

   +       CheckAndHandleCustomSignals();

However, we believe it is not safe to perform something as complex as 
EXPLAIN during an interrupt.
For more details, please refer to the discussion below:
https://www.postgresql.org/message-id/CA%2BTgmobH%2BUto9MCD%2BvWc71bVbOnd7d8zeYjRT8nXaeLe5hsNJQ%40mail.gmail.com

Previous patches in this thread also attempted a similar approach, but 
due to the safety concerns mentioned above, we decided to explore 
alternative solutions.
As a result, we are currently proposing an approach based on wrapping 
plan nodes instead.

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Re: RFC: Logging plan of the running query

From

torikoshia

Date:

02 June, 15:56:38

On 2025-05-23 17:50, Atsushi Torikoshi wrote:
> Thanks for the idea and the sample patch!
> Agreed. I’ll go ahead and implement a new patch based on this approach.

Updated the patch.

Here are some notes:

As with the previous patches, this one wraps not only the currently 
executing plan node but also recursively wraps the left, right, and 
child nodes' ExecProcNode with ExecProcNodeFirst.
This is because modifying only the currently executing node caused 
significant delays in plan logging when the left, right, or child nodes 
took a long time to execute.
I observed the situation with the following query:

   SELECT count(*) FROM pgbench_accounts a CROSS JOIN pgbench_accounts b;

Previous patches implemented unwrappers, but this one doesn’t.
This is because once the log is output, 
GetProcessLogQueryPlanInterruptActive() returns false, so LogQueryPlan() 
will no longer be called.

In the sample you previously provided, the LogQueryPlan function takes a 
PlanState as an argument, but in this patch, it’s defined as void.
I made this change under the assumption that the plan can be obtained 
from ActiveQueryDesc, and that PlanState is therefore unnecessary.
Please let me know if I’ve misunderstood anything.

Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

Attachment

v45-0001-Add-function-to-log-the-plan-of-the-currently-ru.patch