Thread: RFC: Logging plan of the running query
Hi, During the discussion about memory contexts dumping[1], there was a comment that exposing not only memory contexts but also query plans and untruncated query string would be useful. I also feel that it would be nice when thinking about situations such as troubleshooting a long-running query on production environments where we cannot use debuggers. At that point of the above comment, I was considering exposing such information on the shared memory. However, since memory contexts are now exposed on the log by pg_log_backend_memory_contexts(PID), I'm thinking about defining a function that logs the plan of a running query and untruncated query string on the specified PID in the same way as below. postgres=# SELECT * FROM pg_log_current_plan(2155192); pg_log_current_plan --------------------- t (1 row) $ tail -f data/log/postgresql-2021-05-12.log 2021-05-12 17:37:19.481 JST [2155192] LOG: logging the plan of running query on PID 2155192 Query Text: SELECT a.filler FROM pgbench_accounts a JOIN pgbench_accounts b ON a.aid = b.aid; Merge Join (cost=0.85..83357.85 rows=1000000 width=85) Merge Cond: (a.aid = b.aid) -> Index Scan using pgbench_accounts_pkey on pgbench_accounts a (cost=0.42..42377.43 rows=1000000 width=89) -> Index Only Scan using pgbench_accounts_pkey on pgbench_accounts b (cost=0.42..25980.42 rows=1000000 width=4) Attached a PoC patch. Any thoughts? [1] https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
st 12. 5. 2021 v 13:24 odesílatel torikoshia <torikoshia@oss.nttdata.com> napsal:
Hi,
During the discussion about memory contexts dumping[1], there
was a comment that exposing not only memory contexts but also
query plans and untruncated query string would be useful.
I also feel that it would be nice when thinking about situations
such as troubleshooting a long-running query on production
environments where we cannot use debuggers.
At that point of the above comment, I was considering exposing
such information on the shared memory.
However, since memory contexts are now exposed on the log by
pg_log_backend_memory_contexts(PID), I'm thinking about
defining a function that logs the plan of a running query and
untruncated query string on the specified PID in the same way
as below.
postgres=# SELECT * FROM pg_log_current_plan(2155192);
pg_log_current_plan
---------------------
t
(1 row)
$ tail -f data/log/postgresql-2021-05-12.log
2021-05-12 17:37:19.481 JST [2155192] LOG: logging the plan of
running query on PID 2155192
Query Text: SELECT a.filler FROM pgbench_accounts a JOIN
pgbench_accounts b ON a.aid = b.aid;
Merge Join (cost=0.85..83357.85 rows=1000000 width=85)
Merge Cond: (a.aid = b.aid)
-> Index Scan using pgbench_accounts_pkey on
pgbench_accounts a (cost=0.42..42377.43 rows=1000000 width=89)
-> Index Only Scan using pgbench_accounts_pkey on
pgbench_accounts b (cost=0.42..25980.42 rows=1000000 width=4)
Attached a PoC patch.
Any thoughts?
+1
Pavel
[1]
https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com
Regards,
--
Atsushi Torikoshi
NTT DATA CORPORATION
On Wed, May 12, 2021 at 4:54 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > > Hi, > > During the discussion about memory contexts dumping[1], there > was a comment that exposing not only memory contexts but also > query plans and untruncated query string would be useful. > > I also feel that it would be nice when thinking about situations > such as troubleshooting a long-running query on production > environments where we cannot use debuggers. > > At that point of the above comment, I was considering exposing > such information on the shared memory. > However, since memory contexts are now exposed on the log by > pg_log_backend_memory_contexts(PID), I'm thinking about > defining a function that logs the plan of a running query and > untruncated query string on the specified PID in the same way > as below. > > postgres=# SELECT * FROM pg_log_current_plan(2155192); > pg_log_current_plan > --------------------- > t > (1 row) > > $ tail -f data/log/postgresql-2021-05-12.log > > 2021-05-12 17:37:19.481 JST [2155192] LOG: logging the plan of > running query on PID 2155192 > Query Text: SELECT a.filler FROM pgbench_accounts a JOIN > pgbench_accounts b ON a.aid = b.aid; > Merge Join (cost=0.85..83357.85 rows=1000000 width=85) > Merge Cond: (a.aid = b.aid) > -> Index Scan using pgbench_accounts_pkey on > pgbench_accounts a (cost=0.42..42377.43 rows=1000000 width=89) > -> Index Only Scan using pgbench_accounts_pkey on > pgbench_accounts b (cost=0.42..25980.42 rows=1000000 width=4) > > > Attached a PoC patch. > > Any thoughts? > > [1] > https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com +1 for the idea. It looks like pg_log_current_plan is allowed to run by superusers. Since it also shows up the full query text and the plan in the server log as plain text, there are chances that the sensitive information might be logged into the server log which is a risky thing from security standpoint. There's another thread (see [1] below) which discusses this issue by having a separate role for all debugging purposes. Note that final consensus is not reached yet. We may want to use the same role for this patch as well. [1] - https://www.postgresql.org/message-id/CA%2BTgmoZz%3DK1bQRp0Ug%3D6uMGFWg-6kaxdHe6VSWaxq0U-YkppYQ%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Wed, 12 May 2021 at 13:24, torikoshia <torikoshia@oss.nttdata.com> wrote: > > Hi, > > During the discussion about memory contexts dumping[1], there > was a comment that exposing not only memory contexts but also > query plans and untruncated query string would be useful. > > I also feel that it would be nice when thinking about situations > such as troubleshooting a long-running query on production > environments where we cannot use debuggers. > > At that point of the above comment, I was considering exposing > such information on the shared memory. > However, since memory contexts are now exposed on the log by > pg_log_backend_memory_contexts(PID), I'm thinking about > defining a function that logs the plan of a running query and > untruncated query string on the specified PID in the same way > as below. > > postgres=# SELECT * FROM pg_log_current_plan(2155192); > pg_log_current_plan > --------------------- > t > (1 row) > > $ tail -f data/log/postgresql-2021-05-12.log > > 2021-05-12 17:37:19.481 JST [2155192] LOG: logging the plan of > running query on PID 2155192 > Query Text: SELECT a.filler FROM pgbench_accounts a JOIN > pgbench_accounts b ON a.aid = b.aid; > Merge Join (cost=0.85..83357.85 rows=1000000 width=85) > Merge Cond: (a.aid = b.aid) > -> Index Scan using pgbench_accounts_pkey on > pgbench_accounts a (cost=0.42..42377.43 rows=1000000 width=89) > -> Index Only Scan using pgbench_accounts_pkey on > pgbench_accounts b (cost=0.42..25980.42 rows=1000000 width=4) > > > Attached a PoC patch. > > Any thoughts? Great idea. One feature I'd suggest would be adding a 'format' option as well, if such feature would be feasable. With regards, Matthias van de Meent
On Wed, May 12, 2021 at 08:24:04PM +0900, torikoshia wrote: > Hi, > > During the discussion about memory contexts dumping[1], there > was a comment that exposing not only memory contexts but also > query plans and untruncated query string would be useful. > > I also feel that it would be nice when thinking about situations > such as troubleshooting a long-running query on production > environments where we cannot use debuggers. > > At that point of the above comment, I was considering exposing > such information on the shared memory. > However, since memory contexts are now exposed on the log by > pg_log_backend_memory_contexts(PID), I'm thinking about > defining a function that logs the plan of a running query and > untruncated query string on the specified PID in the same way > as below. > > postgres=# SELECT * FROM pg_log_current_plan(2155192); > pg_log_current_plan > --------------------- > t > (1 row) > > $ tail -f data/log/postgresql-2021-05-12.log > > 2021-05-12 17:37:19.481 JST [2155192] LOG: logging the plan of running > query on PID 2155192 > Query Text: SELECT a.filler FROM pgbench_accounts a JOIN > pgbench_accounts b ON a.aid = b.aid; > Merge Join (cost=0.85..83357.85 rows=1000000 width=85) > Merge Cond: (a.aid = b.aid) > -> Index Scan using pgbench_accounts_pkey on pgbench_accounts a > (cost=0.42..42377.43 rows=1000000 width=89) > -> Index Only Scan using pgbench_accounts_pkey on > pgbench_accounts b (cost=0.42..25980.42 rows=1000000 width=4) I didn't read the POC patch yet, but +1 for having that feature.
On Wed, 2021-05-12 at 18:03 +0530, Bharath Rupireddy wrote: > On Wed, May 12, 2021 at 4:54 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > > During the discussion about memory contexts dumping[1], there > > was a comment that exposing not only memory contexts but also > > query plans and untruncated query string would be useful. > > > > postgres=# SELECT * FROM pg_log_current_plan(2155192); > > pg_log_current_plan > > --------------------- > > t > > (1 row) > > > > $ tail -f data/log/postgresql-2021-05-12.log > > > > 2021-05-12 17:37:19.481 JST [2155192] LOG: logging the plan of > > running query on PID 2155192 > > Query Text: SELECT a.filler FROM pgbench_accounts a JOIN > > pgbench_accounts b ON a.aid = b.aid; > > Merge Join (cost=0.85..83357.85 rows=1000000 width=85) > > Merge Cond: (a.aid = b.aid) > > -> Index Scan using pgbench_accounts_pkey on > > pgbench_accounts a (cost=0.42..42377.43 rows=1000000 width=89) > > -> Index Only Scan using pgbench_accounts_pkey on > > pgbench_accounts b (cost=0.42..25980.42 rows=1000000 width=4) I love the idea, but I didn't look at the patch. > Since it also shows up the full query text and the plan > in the server log as plain text, there are chances that the sensitive > information might be logged into the server log which is a risky thing > from security standpoint. I think that is irrelevant. A superuser can already set "log_statement = 'all'" to get this. There is no protection from superusers, and it is pointless to require that. Yours, Laurenz Albe
Thank you all for your positive comments. On 2021-05-12 21:55, Matthias van de Meent wrote: > Great idea. One feature I'd suggest would be adding a 'format' option > as well, if such feature would be feasable. Thanks for the comment! During the development of pg_log_backend_memory_contexts(), I tried to make the number of contexts to record configurable by making it GUC variable or putting it on the shared memory, but the former seemed an overkill and the latter introduced some ugly behaviors, so we decided to make it a static number[1]. I think we face the same difficulty here. Allowing to select the format would be better as auto_explain does by auto_explain.log_format, but I'm a bit doubtful that it is worth the costs. [1] https://www.postgresql.org/message-id/flat/6738f309-a41b-cbe6-bb57-a1c58ce9f05a%40oss.nttdata.com#e687d583080c96563b2fa2e32dbf3fb6 Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
On 2021-05-13 01:08, Laurenz Albe wrote: > On Wed, 2021-05-12 at 18:03 +0530, Bharath Rupireddy wrote: >> Since it also shows up the full query text and the plan >> in the server log as plain text, there are chances that the sensitive >> information might be logged into the server log which is a risky thing >> from security standpoint. Thanks for the notification! > I think that is irrelevant. > > A superuser can already set "log_statement = 'all'" to get this. > There is no protection from superusers, and it is pointless to require > that. AFAIU, since that discussion is whether or not users other than superusers should be given the privilege to execute the backtrace printing function, I think it might be applicable to pg_log_current_plan(). Since restricting privilege to superusers is stricter, I'm going to proceed as it is for now, but depending on the above discussion, it may be better to change it. Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
On Wed, May 12, 2021 at 4:54 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > > Hi, > > During the discussion about memory contexts dumping[1], there > was a comment that exposing not only memory contexts but also > query plans and untruncated query string would be useful. > > I also feel that it would be nice when thinking about situations > such as troubleshooting a long-running query on production > environments where we cannot use debuggers. > > At that point of the above comment, I was considering exposing > such information on the shared memory. > However, since memory contexts are now exposed on the log by > pg_log_backend_memory_contexts(PID), I'm thinking about > defining a function that logs the plan of a running query and > untruncated query string on the specified PID in the same way > as below. > > postgres=# SELECT * FROM pg_log_current_plan(2155192); > pg_log_current_plan > --------------------- > t > (1 row) +1 for the idea. I did not read the complete patch but while reading through the patch, I noticed that you using elevel as LOG for printing the stack trace. But I think the backend whose pid you have passed, the connected client to that backend might not have superuser privileges and if you use elevel LOG then that message will be sent to that connected client as well and I don't think that is secure. So can we use LOG_SERVER_ONLY so that we can prevent it from sending to the client. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, May 13, 2021 at 2:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > +1 for the idea. I did not read the complete patch but while reading > through the patch, I noticed that you using elevel as LOG for printing > the stack trace. But I think the backend whose pid you have passed, > the connected client to that backend might not have superuser > privileges and if you use elevel LOG then that message will be sent to > that connected client as well and I don't think that is secure. So > can we use LOG_SERVER_ONLY so that we can prevent > it from sending to the client. True, we should use LOG_SERVER_ONLY and not send any logs to the client. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Thu, May 13, 2021 at 2:57 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > On Thu, May 13, 2021 at 2:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > +1 for the idea. I did not read the complete patch but while reading > > through the patch, I noticed that you using elevel as LOG for printing > > the stack trace. But I think the backend whose pid you have passed, > > the connected client to that backend might not have superuser > > privileges and if you use elevel LOG then that message will be sent to > > that connected client as well and I don't think that is secure. So > > can we use LOG_SERVER_ONLY so that we can prevent > > it from sending to the client. > > True, we should use LOG_SERVER_ONLY and not send any logs to the client. I further tend to think that, is it correct to log queries with LOG level when log_statement GUC is set? Or should it also be LOG_SERVER_ONLY? /* Log immediately if dictated by log_statement */ if (check_log_statement(parsetree_list)) { ereport(LOG, (errmsg("statement: %s", query_string), errhidestmt(true), errdetail_execute(parsetree_list))); With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Thu, May 13, 2021 at 1:56 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2021-05-13 01:08, Laurenz Albe wrote: > > On Wed, 2021-05-12 at 18:03 +0530, Bharath Rupireddy wrote: > >> Since it also shows up the full query text and the plan > >> in the server log as plain text, there are chances that the sensitive > >> information might be logged into the server log which is a risky thing > >> from security standpoint. > > Thanks for the notification! > > > I think that is irrelevant. > > > > A superuser can already set "log_statement = 'all'" to get this. > > There is no protection from superusers, and it is pointless to require > > that. > > AFAIU, since that discussion is whether or not users other than > superusers > should be given the privilege to execute the backtrace printing > function, > I think it might be applicable to pg_log_current_plan(). > > Since restricting privilege to superusers is stricter, I'm going to > proceed > as it is for now, but depending on the above discussion, it may be > better to > change it. Yeah, we can keep it as superuser-only for now. Might be unrelated, but just for info - there's another thread "Granting control of SUSET gucs to non-superusers" at [1] discussing the new roles. [1] - https://www.postgresql.org/message-id/F9408A5A-B20B-42D2-9E7F-49CD3D1547BC%40enterprisedb.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Thu, May 13, 2021 at 3:06 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, May 13, 2021 at 2:57 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, May 13, 2021 at 2:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > +1 for the idea. I did not read the complete patch but while reading > > > through the patch, I noticed that you using elevel as LOG for printing > > > the stack trace. But I think the backend whose pid you have passed, > > > the connected client to that backend might not have superuser > > > privileges and if you use elevel LOG then that message will be sent to > > > that connected client as well and I don't think that is secure. So > > > can we use LOG_SERVER_ONLY so that we can prevent > > > it from sending to the client. > > > > True, we should use LOG_SERVER_ONLY and not send any logs to the client. > > I further tend to think that, is it correct to log queries with LOG > level when log_statement GUC is set? Or should it also be > LOG_SERVER_ONLY? > > /* Log immediately if dictated by log_statement */ > if (check_log_statement(parsetree_list)) > { > ereport(LOG, > (errmsg("statement: %s", query_string), > errhidestmt(true), > errdetail_execute(parsetree_list))); > What is your argument behind logging it with LOG? I mean we are sending the signal to all the backend and some backend might have the client who is not connected as a superuser so sending the plan to those clients is not a good idea from a security perspective. Anyways, LOG_SERVER_ONLY is not an exposed logging level it is used for an internal purpose. So IMHO it should be logged with LOG_SERVER_ONLY level. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On 2021-05-13 18:36, Bharath Rupireddy wrote: > On Thu, May 13, 2021 at 2:57 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: >> On Thu, May 13, 2021 at 2:44 PM Dilip Kumar <dilipbalaut@gmail.com> >> wrote: >> > +1 for the idea. I did not read the complete patch but while reading >> > through the patch, I noticed that you using elevel as LOG for printing >> > the stack trace. But I think the backend whose pid you have passed, >> > the connected client to that backend might not have superuser >> > privileges and if you use elevel LOG then that message will be sent to >> > that connected client as well and I don't think that is secure. So >> > can we use LOG_SERVER_ONLY so that we can prevent >> > it from sending to the client. >> >> True, we should use LOG_SERVER_ONLY and not send any logs to the >> client. Thanks, agree with changing it to LOG_SERVER_ONLY. > I further tend to think that, is it correct to log queries with LOG > level when log_statement GUC is set? Or should it also be > LOG_SERVER_ONLY? I feel it's OK to log with LOG_SERVER_ONLY since the log from log_statement GUC would be printed already and independently. ISTM people don't expect to log_statement GUC works even on pg_log_current_plan(), do they? Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
On Thu, May 13, 2021 at 3:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, May 13, 2021 at 3:06 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Thu, May 13, 2021 at 2:57 PM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > On Thu, May 13, 2021 at 2:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > +1 for the idea. I did not read the complete patch but while reading > > > > through the patch, I noticed that you using elevel as LOG for printing > > > > the stack trace. But I think the backend whose pid you have passed, > > > > the connected client to that backend might not have superuser > > > > privileges and if you use elevel LOG then that message will be sent to > > > > that connected client as well and I don't think that is secure. So > > > > can we use LOG_SERVER_ONLY so that we can prevent > > > > it from sending to the client. > > > > > > True, we should use LOG_SERVER_ONLY and not send any logs to the client. > > > > I further tend to think that, is it correct to log queries with LOG > > level when log_statement GUC is set? Or should it also be > > LOG_SERVER_ONLY? > > > > /* Log immediately if dictated by log_statement */ > > if (check_log_statement(parsetree_list)) > > { > > ereport(LOG, > > (errmsg("statement: %s", query_string), > > errhidestmt(true), > > errdetail_execute(parsetree_list))); > > What is your argument behind logging it with LOG? I mean we are > sending the signal to all the backend and some backend might have the > client who is not connected as a superuser so sending the plan to > those clients is not a good idea from a security perspective. > Anyways, LOG_SERVER_ONLY is not an exposed logging level it is used > for an internal purpose. So IMHO it should be logged with > LOG_SERVER_ONLY level. I'm saying that - currently, queries are logged with LOG level when the log_statement GUC is set. The queries might be sent to the non-superuser clients. So, your point of "sending the plan to those clients is not a good idea from a security perspective" gets violated right? Should the log level be changed(in the below code) from "LOG" to "LOG_SERVER_ONLY"? I think we can discuss this separately so as not to sidetrack the main feature. /* Log immediately if dictated by log_statement */ if (check_log_statement(parsetree_list)) { ereport(LOG, (errmsg("statement: %s", query_string), errhidestmt(true), errdetail_execute(parsetree_list))); With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Thu, May 13, 2021 at 4:16 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > I'm saying that - currently, queries are logged with LOG level when > the log_statement GUC is set. The queries might be sent to the > non-superuser clients. So, your point of "sending the plan to those > clients is not a good idea from a security perspective" gets violated > right? Should the log level be changed(in the below code) from "LOG" > to "LOG_SERVER_ONLY"? I think we can discuss this separately so as not > to sidetrack the main feature. > > /* Log immediately if dictated by log_statement */ > if (check_log_statement(parsetree_list)) > { > ereport(LOG, > (errmsg("statement: %s", query_string), > errhidestmt(true), > errdetail_execute(parsetree_list))); > Yes, that was my exact point, that in this particular code log with LOG_SERVER_ONLY. Like this. /* Log immediately if dictated by log_statement */ if (check_log_statement(parsetree_list)) { ereport(LOG_SERVER_ONLY, ..... -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, May 13, 2021 at 5:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, May 13, 2021 at 4:16 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > I'm saying that - currently, queries are logged with LOG level when > > the log_statement GUC is set. The queries might be sent to the > > non-superuser clients. So, your point of "sending the plan to those > > clients is not a good idea from a security perspective" gets violated > > right? Should the log level be changed(in the below code) from "LOG" > > to "LOG_SERVER_ONLY"? I think we can discuss this separately so as not > > to sidetrack the main feature. > > > > /* Log immediately if dictated by log_statement */ > > if (check_log_statement(parsetree_list)) > > { > > ereport(LOG, > > (errmsg("statement: %s", query_string), > > errhidestmt(true), > > errdetail_execute(parsetree_list))); > > > > Yes, that was my exact point, that in this particular code log with > LOG_SERVER_ONLY. > > Like this. > /* Log immediately if dictated by log_statement */ > if (check_log_statement(parsetree_list)) > { > ereport(LOG_SERVER_ONLY, Agree, but let's discuss that in a separate thread. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Thu, May 13, 2021 at 5:15 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, May 13, 2021 at 5:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Thu, May 13, 2021 at 4:16 PM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > I'm saying that - currently, queries are logged with LOG level when > > > the log_statement GUC is set. The queries might be sent to the > > > non-superuser clients. So, your point of "sending the plan to those > > > clients is not a good idea from a security perspective" gets violated > > > right? Should the log level be changed(in the below code) from "LOG" > > > to "LOG_SERVER_ONLY"? I think we can discuss this separately so as not > > > to sidetrack the main feature. > > > > > > /* Log immediately if dictated by log_statement */ > > > if (check_log_statement(parsetree_list)) > > > { > > > ereport(LOG, > > > (errmsg("statement: %s", query_string), > > > errhidestmt(true), > > > errdetail_execute(parsetree_list))); > > > > > > > Yes, that was my exact point, that in this particular code log with > > LOG_SERVER_ONLY. > > > > Like this. > > /* Log immediately if dictated by log_statement */ > > if (check_log_statement(parsetree_list)) > > { > > ereport(LOG_SERVER_ONLY, > > Agree, but let's discuss that in a separate thread. Did not understand why separate thread? this is part of this thread no? but anyways now everyone agreed that we will log with LOG_SERVER_ONLY. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, May 13, 2021 at 5:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, May 13, 2021 at 5:15 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Thu, May 13, 2021 at 5:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Thu, May 13, 2021 at 4:16 PM Bharath Rupireddy > > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > > > > > I'm saying that - currently, queries are logged with LOG level when > > > > the log_statement GUC is set. The queries might be sent to the > > > > non-superuser clients. So, your point of "sending the plan to those > > > > clients is not a good idea from a security perspective" gets violated > > > > right? Should the log level be changed(in the below code) from "LOG" > > > > to "LOG_SERVER_ONLY"? I think we can discuss this separately so as not > > > > to sidetrack the main feature. > > > > > > > > /* Log immediately if dictated by log_statement */ > > > > if (check_log_statement(parsetree_list)) > > > > { > > > > ereport(LOG, > > > > (errmsg("statement: %s", query_string), > > > > errhidestmt(true), > > > > errdetail_execute(parsetree_list))); > > > > > > > > > > Yes, that was my exact point, that in this particular code log with > > > LOG_SERVER_ONLY. > > > > > > Like this. > > > /* Log immediately if dictated by log_statement */ > > > if (check_log_statement(parsetree_list)) > > > { > > > ereport(LOG_SERVER_ONLY, > > > > Agree, but let's discuss that in a separate thread. > > Did not understand why separate thread? this is part of this thread > no? but anyways now everyone agreed that we will log with > LOG_SERVER_ONLY. Bharat offlist pointed to me that here he was talking about another log that is logging the query and not specific to this patch, so let's not discuss this point here. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On 2021-05-13 21:57, Dilip Kumar wrote: > On Thu, May 13, 2021 at 5:18 PM Dilip Kumar <dilipbalaut@gmail.com> > wrote: >> >> On Thu, May 13, 2021 at 5:15 PM Bharath Rupireddy >> <bharath.rupireddyforpostgres@gmail.com> wrote: >> > >> > On Thu, May 13, 2021 at 5:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: >> > > >> > > On Thu, May 13, 2021 at 4:16 PM Bharath Rupireddy >> > > <bharath.rupireddyforpostgres@gmail.com> wrote: >> > > > >> > > > I'm saying that - currently, queries are logged with LOG level when >> > > > the log_statement GUC is set. The queries might be sent to the >> > > > non-superuser clients. So, your point of "sending the plan to those >> > > > clients is not a good idea from a security perspective" gets violated >> > > > right? Should the log level be changed(in the below code) from "LOG" >> > > > to "LOG_SERVER_ONLY"? I think we can discuss this separately so as not >> > > > to sidetrack the main feature. >> > > > >> > > > /* Log immediately if dictated by log_statement */ >> > > > if (check_log_statement(parsetree_list)) >> > > > { >> > > > ereport(LOG, >> > > > (errmsg("statement: %s", query_string), >> > > > errhidestmt(true), >> > > > errdetail_execute(parsetree_list))); >> > > > >> > > >> > > Yes, that was my exact point, that in this particular code log with >> > > LOG_SERVER_ONLY. >> > > >> > > Like this. >> > > /* Log immediately if dictated by log_statement */ >> > > if (check_log_statement(parsetree_list)) >> > > { >> > > ereport(LOG_SERVER_ONLY, >> > >> > Agree, but let's discuss that in a separate thread. >> >> Did not understand why separate thread? this is part of this thread >> no? but anyways now everyone agreed that we will log with >> LOG_SERVER_ONLY. Modified elevel from LOG to LOG_SERVER_ONLY. I also modified the patch to log JIT Summary and GUC settings information. If there is other useful information to log, I would appreciate it if you could point it out. > Bharat offlist pointed to me that here he was talking about another > log that is logging the query and not specific to this patch, so let's > not discuss this point here. Thanks for sharing the situation! -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2021-05-28 15:51, torikoshia wrote: > On 2021-05-13 21:57, Dilip Kumar wrote: >> On Thu, May 13, 2021 at 5:18 PM Dilip Kumar <dilipbalaut@gmail.com> >> wrote: >>> >>> On Thu, May 13, 2021 at 5:15 PM Bharath Rupireddy >>> <bharath.rupireddyforpostgres@gmail.com> wrote: >>> > >>> > On Thu, May 13, 2021 at 5:14 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: >>> > > >>> > > On Thu, May 13, 2021 at 4:16 PM Bharath Rupireddy >>> > > <bharath.rupireddyforpostgres@gmail.com> wrote: >>> > > > >>> > > > I'm saying that - currently, queries are logged with LOG level when >>> > > > the log_statement GUC is set. The queries might be sent to the >>> > > > non-superuser clients. So, your point of "sending the plan to those >>> > > > clients is not a good idea from a security perspective" gets violated >>> > > > right? Should the log level be changed(in the below code) from "LOG" >>> > > > to "LOG_SERVER_ONLY"? I think we can discuss this separately so as not >>> > > > to sidetrack the main feature. >>> > > > >>> > > > /* Log immediately if dictated by log_statement */ >>> > > > if (check_log_statement(parsetree_list)) >>> > > > { >>> > > > ereport(LOG, >>> > > > (errmsg("statement: %s", query_string), >>> > > > errhidestmt(true), >>> > > > errdetail_execute(parsetree_list))); >>> > > > >>> > > >>> > > Yes, that was my exact point, that in this particular code log with >>> > > LOG_SERVER_ONLY. >>> > > >>> > > Like this. >>> > > /* Log immediately if dictated by log_statement */ >>> > > if (check_log_statement(parsetree_list)) >>> > > { >>> > > ereport(LOG_SERVER_ONLY, >>> > >>> > Agree, but let's discuss that in a separate thread. >>> >>> Did not understand why separate thread? this is part of this thread >>> no? but anyways now everyone agreed that we will log with >>> LOG_SERVER_ONLY. > > Modified elevel from LOG to LOG_SERVER_ONLY. > > I also modified the patch to log JIT Summary and GUC settings > information. > If there is other useful information to log, I would appreciate it if > you could point it out. Updated the patch. - reordered superuser check which was pointed out in another thread[1] - added a regression test [1] https://postgr.es/m/YLxw1uVGIAP5uMPl@paquier.xyz Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2021/06/09 16:44, torikoshia wrote: > Updated the patch. Thanks for updating the patch! auto_explain can log the plan of even nested statement if auto_explain.log_nested_statements is enabled. But ISTM that pg_log_current_plan() cannot log that plan. Is this intentional? I think that it's better to make pg_log_current_plan() log the plan of even nested statement. + es->format = EXPLAIN_FORMAT_TEXT; + es->settings = true; Since pg_log_current_plan() is usually used to investigate and trouble-shoot the long running queries, IMO it's better to enable es->verbose by default and get additional information about the queries. Thought? + * pg_log_current_plan + * Signal a backend process to log plan the of running query. "plan the of" is typo? + * Only superusers are allowed to signal to log plan because any users to + * issue this request at an unbounded rate would cause lots of log messages + * and which can lead to denial of service. "because any users" should be "because allowing any users" like the comment for pg_log_backend_memory_contexts()? + * All the actual work is deferred to ProcessLogExplainInterrupt(), "ProcessLogExplainInterrupt()" should be "ProcessLogCurrentPlanInterrupt()"? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2021-06-09 23:04, Fujii Masao wrote: Thanks for your review! > auto_explain can log the plan of even nested statement > if auto_explain.log_nested_statements is enabled. > But ISTM that pg_log_current_plan() cannot log that plan. > Is this intentional? > I think that it's better to make pg_log_current_plan() log > the plan of even nested statement. +1. It would be better. But currently plan information is got from ActivePortal and ISTM there are no easy way to retrieve plan information of nested statements from ActivePortal. Anyway I'll do some more research. I think you are right about the following comments. I'll fix them. > + es->format = EXPLAIN_FORMAT_TEXT; > + es->settings = true; > > Since pg_log_current_plan() is usually used to investigate and > trouble-shoot the long running queries, IMO it's better to > enable es->verbose by default and get additional information > about the queries. Thought? > + * pg_log_current_plan > + * Signal a backend process to log plan the of running query. > > "plan the of" is typo? > > > + * Only superusers are allowed to signal to log plan because any users > to > + * issue this request at an unbounded rate would cause lots of log > messages > + * and which can lead to denial of service. > > "because any users" should be "because allowing any users" > like the comment for pg_log_backend_memory_contexts()? > > > + * All the actual work is deferred to ProcessLogExplainInterrupt(), > > "ProcessLogExplainInterrupt()" should be > "ProcessLogCurrentPlanInterrupt()"? > > Regards, -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
On Wed, Jun 9, 2021 at 1:14 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > Updated the patch. Thanks for the patch. Here are some comments on v3 patch: 1) We could just say "Requests to log query plan of the presently running query of a given backend along with an untruncated query string in the server logs." Instead of + They will be logged at <literal>LOG</literal> message level and + will appear in the server log based on the log + configuration set (See <xref linkend="runtime-config-logging"/> 2) It's better to do below, for reference see how pg_backend_pid, pg_terminate_backend, pg_relpages and so on are used in the tests. +SELECT pg_log_current_plan(pg_backend_pid()); rather than using the function in the FROM clause. +SELECT * FROM pg_log_current_plan(pg_backend_pid()); If okay, also change it for pg_log_backend_memory_contexts. 3) Since most of the code looks same in pg_log_backend_memory_contexts and pg_log_current_plan, I think we can have a common function something like below: bool SendProcSignalForLogInfo(ProcSignalReason reason) { Assert(reason == PROCSIG_LOG_MEMORY_CONTEXT || reason == PROCSIG_LOG_CURRENT_PLAN); if (!superuser()) { if (reason == PROCSIG_LOG_MEMORY_CONTEXT) errmsg("must be a superuser to log memory contexts") else if (reason == PROCSIG_LOG_CURRENT_PLAN) errmsg("must be a superuser to log plan of the running query") } if (SendProcSignal(pid, reason, proc->backendId) < 0) { } } Then we could just do: Datum pg_log_backend_memory_contexts(PG_FUNCTION_ARGS) { PG_RETURN_BOOL(SendProcSignalForLogInfo(PROCSIG_LOG_MEMORY_CONTEXT)); } Datum pg_log_current_plan(PG_FUNCTION_ARGS) { PG_RETURN_BOOL(SendProcSignalForLogInfo(PROCSIG_LOG_CURRENT_PLAN)); } We can have SendProcSignalForLogInfo function defined in procsignal.c and procsignal.h 4) I think we can have a sample function usage and how it returns true value, how the plan looks for a simple query(select 1 or some other simple/complex generic query or simply select pg_log_current_plan(pg_backend_pid());) in the documentation, much like pg_log_backend_memory_contexts. 5) Instead of just showing the true return value of the function pg_log_current_plan in the sql test, which just shows that the signal is sent, but it doesn't mean that the backend has processed that signal and logged the plan. I think we can add the test using TAP framework, one 6) Do we unnecessarily need to signal the processes such as autovacuum launcher/workers, logical replication launcher/workers, wal senders, wal receivers and so on. only to emit a "PID %d is not executing queries now" message? Moreover, those processes will be waiting in loops for timeouts to occur, then as soon as they wake up do they need to process this extra uninformative signal? We could choose to not signal those processes at all which might or might not be possible. Otherwise, we could just emit messages like "XXXX process cannot run a query" in ProcessInterrupts. 7)Instead of (errmsg("logging plan of the running query on PID %d\n%s", how about below? (errmsg("plan of the query running on backend with PID %d is:\n%s", 8) Instead of errmsg("PID %d is not executing queries now") how about below? errmsg("Backend with PID %d is not running a query") 9) We could just do: void ProcessLogCurrentPlanInterrupt(void) { ExplainState *es; LogCurrentPlanPending = false; if (!ActivePortal || !ActivePortal->queryDesc) errmsg("PID %d is not executing queries now"); es = NewExplainState(); ExplainQueryText(); ExplainPrintPlan(); 10) How about renaming the function pg_log_current_plan to pg_log_query_plan or pg_log_current_query_plan? 11) What happens if pg_log_current_plan is called for a parallel worker? With Regards, Bharath Rupireddy.
On 2021-06-11 01:20, Bharath Rupireddy wrote: Thanks for your review! > On Wed, Jun 9, 2021 at 1:14 PM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> Updated the patch. > > Thanks for the patch. Here are some comments on v3 patch: > > 1) We could just say "Requests to log query plan of the presently > running query of a given backend along with an untruncated query > string in the server logs." > Instead of > + They will be logged at <literal>LOG</literal> message level > and > + will appear in the server log based on the log > + configuration set (See <xref > linkend="runtime-config-logging"/> Actually this explanation is almost the same as that of pg_log_backend_memory_contexts(). Do you think we should change both of them? I think it may be too detailed but accurate. > 2) It's better to do below, for reference see how pg_backend_pid, > pg_terminate_backend, pg_relpages and so on are used in the tests. > +SELECT pg_log_current_plan(pg_backend_pid()); > rather than using the function in the FROM clause. > +SELECT * FROM pg_log_current_plan(pg_backend_pid()); > If okay, also change it for pg_log_backend_memory_contexts. Agreed. > 3) Since most of the code looks same in pg_log_backend_memory_contexts > and pg_log_current_plan, I think we can have a common function > something like below: Agreed. I'll do some refactoring. > bool > SendProcSignalForLogInfo(ProcSignalReason reason) > { > Assert(reason == PROCSIG_LOG_MEMORY_CONTEXT || reason == > PROCSIG_LOG_CURRENT_PLAN); > > if (!superuser()) > { > if (reason == PROCSIG_LOG_MEMORY_CONTEXT) > errmsg("must be a superuser to log memory contexts") > else if (reason == PROCSIG_LOG_CURRENT_PLAN) > errmsg("must be a superuser to log plan of the running query") > } > > if (SendProcSignal(pid, reason, proc->backendId) < 0) > { > } > } > Then we could just do: > Datum > pg_log_backend_memory_contexts(PG_FUNCTION_ARGS) > { > PG_RETURN_BOOL(SendProcSignalForLogInfo(PROCSIG_LOG_MEMORY_CONTEXT)); > } > Datum > pg_log_current_plan(PG_FUNCTION_ARGS) > { > PG_RETURN_BOOL(SendProcSignalForLogInfo(PROCSIG_LOG_CURRENT_PLAN)); > } > We can have SendProcSignalForLogInfo function defined in procsignal.c > and procsignal.h > > 4) I think we can have a sample function usage and how it returns true > value, how the plan looks for a simple query(select 1 or some other > simple/complex generic query or simply select > pg_log_current_plan(pg_backend_pid());) in the documentation, much > like pg_log_backend_memory_contexts. +1. > 5) Instead of just showing the true return value of the function > pg_log_current_plan in the sql test, which just shows that the signal > is sent, but it doesn't mean that the backend has processed that > signal and logged the plan. I think we can add the test using TAP > framework, one I once made a tap test for pg_log_backend_memory_contexts(), but it seemed an overkill and we agreed that it was appropriate just ensuring the function working as below discussion. https://www.postgresql.org/message-id/bbecd722d4f8e261b350186ac4bf68a7%40oss.nttdata.com > 6) Do we unnecessarily need to signal the processes such as autovacuum > launcher/workers, logical replication launcher/workers, wal senders, > wal receivers and so on. only to emit a "PID %d is not executing > queries now" message? Moreover, those processes will be waiting in > loops for timeouts to occur, then as soon as they wake up do they need > to process this extra uninformative signal? > We could choose to not signal those processes at all which might or > might not be possible. > Otherwise, we could just emit messages like "XXXX process cannot run a > query" in ProcessInterrupts. Agreed. However it needs to distinguish backends which can execute queries and other processes such as autovacuum launcher, I don't come up with easy ways to do so. I'm going to think about it. > 7)Instead of > (errmsg("logging plan of the running query on PID %d\n%s", > how about below? > (errmsg("plan of the query running on backend with PID %d is:\n%s", +1. > 8) Instead of > errmsg("PID %d is not executing queries now") > how about below? > errmsg("Backend with PID %d is not running a query") +1. > > 9) We could just do: > void > ProcessLogCurrentPlanInterrupt(void) > { > ExplainState *es; > LogCurrentPlanPending = false; > if (!ActivePortal || !ActivePortal->queryDesc) > errmsg("PID %d is not executing queries now"); > es = NewExplainState(); > ExplainQueryText(); > ExplainPrintPlan(); > > 10) How about renaming the function pg_log_current_plan to > pg_log_query_plan or pg_log_current_query_plan? +1. > 11) What happens if pg_log_current_plan is called for a parallel > worker? AFAIU Parallel worker doesn't have ActivePortal, so it would always emit the message 'PID %d is not executing queries now'. As 6), it would be better to distinguish the parallel worker and normal backend. Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
On Mon, Jun 14, 2021 at 5:48 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > > 1) We could just say "Requests to log query plan of the presently > > running query of a given backend along with an untruncated query > > string in the server logs." > > Instead of > > + They will be logged at <literal>LOG</literal> message level > > and > > + will appear in the server log based on the log > > + configuration set (See <xref > > linkend="runtime-config-logging"/> > > Actually this explanation is almost the same as that of > pg_log_backend_memory_contexts(). > Do you think we should change both of them? > I think it may be too detailed but accurate. I withdraw my comment. We can keep the explanation similar to pg_log_backend_memory_contexts as it was agreed upon and committed text. If the wordings are similar, then it will be easier for users to understand the documentation. > > 5) Instead of just showing the true return value of the function > > pg_log_current_plan in the sql test, which just shows that the signal > > is sent, but it doesn't mean that the backend has processed that > > signal and logged the plan. I think we can add the test using TAP > > framework, one > > I once made a tap test for pg_log_backend_memory_contexts(), but it > seemed an overkill and we agreed that it was appropriate just ensuring > the function working as below discussion. > > https://www.postgresql.org/message-id/bbecd722d4f8e261b350186ac4bf68a7%40oss.nttdata.com Okay. I withdraw my comment. > > 6) Do we unnecessarily need to signal the processes such as autovacuum > > launcher/workers, logical replication launcher/workers, wal senders, > > wal receivers and so on. only to emit a "PID %d is not executing > > queries now" message? Moreover, those processes will be waiting in > > loops for timeouts to occur, then as soon as they wake up do they need > > to process this extra uninformative signal? > > We could choose to not signal those processes at all which might or > > might not be possible. > > Otherwise, we could just emit messages like "XXXX process cannot run a > > query" in ProcessInterrupts. > > Agreed. > > However it needs to distinguish backends which can execute queries and > other processes such as autovacuum launcher, I don't come up with > easy ways to do so. > I'm going to think about it. I'm not sure if there is any information in the shared memory accessible to all the backends/sessions that can say a PID is autovacuum launcher/worker, logical replication launcher/worker or any other background or parallel worker. If we were to invent a new mechanism just for addressing the above comment, I would rather choose to not do that as it seems like an overkill. We can leave it up to the user whether or not to unnecessarily signal those processes which are bound to print "PID XXX is not executing queries now" message. > > 11) What happens if pg_log_current_plan is called for a parallel > > worker? > > AFAIU Parallel worker doesn't have ActivePortal, so it would always > emit the message 'PID %d is not executing queries now'. > As 6), it would be better to distinguish the parallel worker and normal > backend. As I said, above, I think it will be a bit tough to do. If done, it seems like an overkill. With Regards, Bharath Rupireddy.
On 2021-06-15 13:27, Bharath Rupireddy wrote: > On Mon, Jun 14, 2021 at 5:48 PM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> > 1) We could just say "Requests to log query plan of the presently >> > running query of a given backend along with an untruncated query >> > string in the server logs." >> > Instead of >> > + They will be logged at <literal>LOG</literal> message level >> > and >> > + will appear in the server log based on the log >> > + configuration set (See <xref >> > linkend="runtime-config-logging"/> >> >> Actually this explanation is almost the same as that of >> pg_log_backend_memory_contexts(). >> Do you think we should change both of them? >> I think it may be too detailed but accurate. > > I withdraw my comment. We can keep the explanation similar to > pg_log_backend_memory_contexts as it was agreed upon and committed > text. If the wordings are similar, then it will be easier for users to > understand the documentation. > >> > 5) Instead of just showing the true return value of the function >> > pg_log_current_plan in the sql test, which just shows that the signal >> > is sent, but it doesn't mean that the backend has processed that >> > signal and logged the plan. I think we can add the test using TAP >> > framework, one >> >> I once made a tap test for pg_log_backend_memory_contexts(), but it >> seemed an overkill and we agreed that it was appropriate just ensuring >> the function working as below discussion. >> >> https://www.postgresql.org/message-id/bbecd722d4f8e261b350186ac4bf68a7%40oss.nttdata.com > > Okay. I withdraw my comment. > >> > 6) Do we unnecessarily need to signal the processes such as autovacuum >> > launcher/workers, logical replication launcher/workers, wal senders, >> > wal receivers and so on. only to emit a "PID %d is not executing >> > queries now" message? Moreover, those processes will be waiting in >> > loops for timeouts to occur, then as soon as they wake up do they need >> > to process this extra uninformative signal? >> > We could choose to not signal those processes at all which might or >> > might not be possible. >> > Otherwise, we could just emit messages like "XXXX process cannot run a >> > query" in ProcessInterrupts. >> >> Agreed. >> >> However it needs to distinguish backends which can execute queries and >> other processes such as autovacuum launcher, I don't come up with >> easy ways to do so. >> I'm going to think about it. > > I'm not sure if there is any information in the shared memory > accessible to all the backends/sessions that can say a PID is > autovacuum launcher/worker, logical replication launcher/worker or any > other background or parallel worker. As far as I looked around, there seems no easy ways to do so. > If we were to invent a new > mechanism just for addressing the above comment, I would rather choose > to not do that as it seems like an overkill. We can leave it up to the > user whether or not to unnecessarily signal those processes which are > bound to print "PID XXX is not executing queries now" message. +1. I'm going to proceed in this direction. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
On 2021-06-16 20:36, torikoshia wrote: >> other background or parallel worker. > > As far as I looked around, there seems no easy ways to do so. > >> If we were to invent a new >> mechanism just for addressing the above comment, I would rather choose >> to not do that as it seems like an overkill. We can leave it up to the >> user whether or not to unnecessarily signal those processes which are >> bound to print "PID XXX is not executing queries now" message. > > +1. I'm going to proceed in this direction. Updated the patch. On Thu, Jun 10, 2021 at 11:09 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > On 2021-06-09 23:04, Fujii Masao wrote: > > auto_explain can log the plan of even nested statement > > if auto_explain.log_nested_statements is enabled. > > But ISTM that pg_log_current_plan() cannot log that plan. > > Is this intentional? > > > I think that it's better to make pg_log_current_plan() log > > the plan of even nested statement. > > +1. It would be better. > But currently plan information is got from ActivePortal and ISTM there > are no easy way to retrieve plan information of nested statements from > ActivePortal. > Anyway I'll do some more research. I haven't found a proper way yet but it seems necessary to use something other than ActivePortal and I'm now thinking this could be a separate patch in the future. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2021/06/22 11:30, torikoshia wrote: > On Thu, Jun 10, 2021 at 11:09 AM torikoshia <torikoshia@oss.nttdata.com> wrote: >> On 2021-06-09 23:04, Fujii Masao wrote: > >> > auto_explain can log the plan of even nested statement >> > if auto_explain.log_nested_statements is enabled. >> > But ISTM that pg_log_current_plan() cannot log that plan. >> > Is this intentional? >> >> > I think that it's better to make pg_log_current_plan() log >> > the plan of even nested statement. >> >> +1. It would be better. >> But currently plan information is got from ActivePortal and ISTM there >> are no easy way to retrieve plan information of nested statements from >> ActivePortal. >> Anyway I'll do some more research. > > I haven't found a proper way yet but it seems necessary to use something other than ActivePortal and I'm now thinking thiscould be a separate patch in the future. DO $$ BEGIN PERFORM pg_sleep(100); END$$; When I called pg_log_current_query_plan() to send the signal to the backend executing the above query, I got the following log message. I think that this is not expected message. I guess this issue happened because the information about query text and plan is retrieved from ActivePortal. If this understanding is right, ISTM that we should implement new mechanism so that we can retrieve those information even while nested query is being executed. LOG: backend with PID 42449 is not running a query Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Tue, Jun 22, 2021 at 8:00 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > Updated the patch. Thanks for the patch. Here are some comments on the v4 patch: 1) Can we do + ExplainState *es = NewExplainState(); and es assignments after if (!ActivePortal || !ActivePortal->queryDesc), just to avoid unnecessary call in case of error hit? Also note that, we can easily hit the error case. 2) It looks like there's an improper indentation. MyProcPid and es->str->data, should start from the ". + ereport(LOG_SERVER_ONLY, + (errmsg("backend with PID %d is not running a query", + MyProcPid))); + ereport(LOG_SERVER_ONLY, + (errmsg("plan of the query running on backend with PID %d is:\n%s", + MyProcPid, + es->str->data), For reference see errmsg("unrecognized value for EXPLAIN option \"%s\": \"%s\"", 3)I prefer to do this so that any new piece of code can be introduced in between easily and it will be more readable as well. +Datum +pg_log_current_query_plan(PG_FUNCTION_ARGS) +{ + pid_t pid; + bool result; + + pid = PG_GETARG_INT32(0); + result = SendProcSignalForLogInfo(pid, PROCSIG_LOG_CURRENT_PLAN); + + PG_RETURN_BOOL(result); +} If okay, please also change for the pg_log_backend_memory_contexts. 4) Extra whitespace before the second line i.e. 2nd line reason should be aligned with the 1st line reason. + Assert(reason == PROCSIG_LOG_MEMORY_CONTEXT || + reason == PROCSIG_LOG_CURRENT_PLAN); 5) How about "Requests to log the plan of the query currently running on the backend with specified process ID along with the untruncated query string"? + Requests to log the untruncated query string and its plan for + the query currently running on the backend with the specified + process ID. 6) A typo: it is "nested statements (..) are not" + Note that nested statements (statements executed inside a function) is not 7) I'm not sure what you mean by "Some functions output what you want to the log." --- Memory contexts are logged and they are not returned to the function. +-- Some functions output what you want to the log. Instead, can we say "These functions return true if the specified backend is successfully signaled, otherwise false. Upon receiving the signal, the backend will log the information to the server log." Regards, Bharath Rupireddy.
On 2021-07-02 23:21, Bharath Rupireddy wrote: > On Tue, Jun 22, 2021 at 8:00 AM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> Updated the patch. > > Thanks for the patch. Here are some comments on the v4 patch: Thanks for your comments and suggestions! I agree with you and updated the patch. On Thu, Jul 1, 2021 at 3:34 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > DO $$ > BEGIN > PERFORM pg_sleep(100); > END$$; > > When I called pg_log_current_query_plan() to send the signal to > the backend executing the above query, I got the following log message. > I think that this is not expected message. I guess this issue happened > because the information about query text and plan is retrieved > from ActivePortal. If this understanding is right, ISTM that we should > implement new mechanism so that we can retrieve those information > even while nested query is being executed. I'm now working on this comment. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On Tue, Jun 22, 2021 at 8:00 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > Updated the patch. Hi, torikoshi-san Thanks for your great work! I'd like to use this feature in v15. I confirmed that it works with queries I tried and make check-world has no error. When I tried this feature, I realized two things. So, I share them. (1) About output contents > The format of the query plan is the same as when <literal>FORMAT > TEXT</literal> > and <literal>VEBOSE</literal> are used in the > <command>EXPLAIN</command> command. > For example: I think the above needs to add COSTS and SETTINGS options too, and it's better to use an example which the SETTINGS option works like the following. ``` 2021-07-13 21:59:56 JST 69757 [client backend] LOG: plan of the query running on backend with PID 69757 is: Query Text: PREPARE query2 AS SELECT COUNT(*) FROM pgbench_accounts t1, pgbench_accounts t2; Aggregate (cost=3750027242.84..3750027242.86 rows=1 width=8) Output: count(*) -> Nested Loop (cost=0.84..3125027242.84 rows=250000000000 width=0) -> Index Only Scan using pgbench_accounts_pkey on public.pgbench_accounts t1 (cost=0.42..12996.42 rows=500000 width=0) Output: t1.aid -> Materialize (cost=0.42..15496.42 rows=500000 width=0) -> Index Only Scan using pgbench_accounts_pkey on public.pgbench_accounts t2 (cost=0.42..12996.42 rows=500000 width=0) Settings: effective_cache_size = '8GB', work_mem = '16MB' ``` (2) About EXPLAIN "BUFFER" option When I checked EXPLAIN option, I found there is another option "BUFFER" which can be used without the "ANALYZE" option. I'm not sure it's useful because your target use-case is analyzing a long-running query, not its planning phase. If so, the planning buffer usage is not so much useful. But, since the overhead to output buffer usages is not high and it's used for debugging use cases, I wonder it's not a bad idea to output buffer usages too. Thought? Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On Tue, Jul 13, 2021 at 11:11 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > When I tried this feature, I realized two things. So, I share them. Thanks for your review! > (1) About output contents > > > The format of the query plan is the same as when <literal>FORMAT > > TEXT</literal> > > and <literal>VEBOSE</literal> are used in the > > <command>EXPLAIN</command> command. > > For example: > I think the above needs to add COSTS and SETTINGS options too, and it's > better to use an > example which the SETTINGS option works like the following. Agreed. Updated the patch. > (2) About EXPLAIN "BUFFER" option > > When I checked EXPLAIN option, I found there is another option "BUFFER" > which can be > used without the "ANALYZE" option. > > I'm not sure it's useful because your target use-case is analyzing a > long-running query, > not its planning phase. If so, the planning buffer usage is not so much > useful. But, since > the overhead to output buffer usages is not high and it's used for > debugging use cases, > I wonder it's not a bad idea to output buffer usages too. Thought? As you pointed out, I also think it would be useful when queries are taking a long time in the planning phase. However, as far as I read ExplainOneQuery(), the buffer usages in the planner phase are not retrieved by default. They are retrieved only when BUFFERS is specified in the EXPLAIN. If we change it to always get the buffer usages and expose them as a global variable, we can get them through pg_log_current_plan(), but I think it doesn't pay. Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2021/07/19 11:28, torikoshia wrote: > Agreed. Updated the patch. Thanks for updating the patch! +bool +SendProcSignalForLogInfo(pid_t pid, ProcSignalReason reason) I don't think that procsignal.c is proper place to check the permission and check whether the specified PID indicates a PostgreSQL server process, etc because procsignal.c just provides fundamental routines for interprocess signaling. Isn't it better to move the function to signalfuncs.c or elsewhere? + ExplainQueryText(es, ActivePortal->queryDesc); + ExplainPrintPlan(es, ActivePortal->queryDesc); + ExplainPrintJITSummary(es, ActivePortal->queryDesc); When text format is used, ExplainBeginOutput() and ExplainEndOutput() do nothing. So (I guess) you thought that they don't need to be called and implemented the code in that way. But IMO it's better to comment why they don't need to be called, or to just call both of them even if they do nothing in text format. + ExplainPrintJITSummary(es, ActivePortal->queryDesc); It's better to check es->costs before calling this function, like explain_ExecutorEnd() and ExplainOnePlan() do? + result = SendProcSignalForLogInfo(pid, PROCSIG_LOG_CURRENT_PLAN); + + PG_RETURN_BOOL(result); Currently SendProcSignalForLogInfo() calls PG_RETURN_BOOL() in some cases, but instead it should just return true/false because pg_log_current_query_plan() expects that? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2021/07/09 14:05, torikoshia wrote: > On 2021-07-02 23:21, Bharath Rupireddy wrote: >> On Tue, Jun 22, 2021 at 8:00 AM torikoshia <torikoshia@oss.nttdata.com> wrote: >>> Updated the patch. >> >> Thanks for the patch. Here are some comments on the v4 patch: > > Thanks for your comments and suggestions! > I agree with you and updated the patch. > > On Thu, Jul 1, 2021 at 3:34 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> DO $$ >> BEGIN >> PERFORM pg_sleep(100); >> END$$; >> >> When I called pg_log_current_query_plan() to send the signal to >> the backend executing the above query, I got the following log message. >> I think that this is not expected message. I guess this issue happened >> because the information about query text and plan is retrieved >> from ActivePortal. If this understanding is right, ISTM that we should >> implement new mechanism so that we can retrieve those information >> even while nested query is being executed. > > I'm now working on this comment. One idea is to define new global pointer, e.g., "QueryDesc *ActiveQueryDesc;". This global pointer is set to queryDesc in ExecutorRun() (also maybe ExecutorStart()). And this is reset to NULL in ExecutorEnd() and when an error is thrown. Then ProcessLogCurrentPlanInterrupt() can get the plan of the currently running query from that global pointer instead of ActivePortal, and log it. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
út 27. 7. 2021 v 20:34 odesílatel Fujii Masao <masao.fujii@oss.nttdata.com> napsal:
On 2021/07/09 14:05, torikoshia wrote:
> On 2021-07-02 23:21, Bharath Rupireddy wrote:
>> On Tue, Jun 22, 2021 at 8:00 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>>> Updated the patch.
>>
>> Thanks for the patch. Here are some comments on the v4 patch:
>
> Thanks for your comments and suggestions!
> I agree with you and updated the patch.
>
> On Thu, Jul 1, 2021 at 3:34 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>> DO $$
>> BEGIN
>> PERFORM pg_sleep(100);
>> END$$;
>>
>> When I called pg_log_current_query_plan() to send the signal to
>> the backend executing the above query, I got the following log message.
>> I think that this is not expected message. I guess this issue happened
>> because the information about query text and plan is retrieved
>> from ActivePortal. If this understanding is right, ISTM that we should
>> implement new mechanism so that we can retrieve those information
>> even while nested query is being executed.
>
> I'm now working on this comment.
One idea is to define new global pointer, e.g., "QueryDesc *ActiveQueryDesc;".
This global pointer is set to queryDesc in ExecutorRun()
(also maybe ExecutorStart()). And this is reset to NULL in ExecutorEnd() and
when an error is thrown. Then ProcessLogCurrentPlanInterrupt() can
get the plan of the currently running query from that global pointer
instead of ActivePortal, and log it. Thought?
It cannot work - there can be a lot of nested queries, and at the end you cannot reset to null, but you should return back pointer to outer query.
Regards
Pavel
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On 2021-07-28 03:45, Pavel Stehule wrote: > út 27. 7. 2021 v 20:34 odesílatel Fujii Masao > <masao.fujii@oss.nttdata.com> napsal: > >> On 2021/07/09 14:05, torikoshia wrote: >>> On 2021-07-02 23:21, Bharath Rupireddy wrote: >>>> On Tue, Jun 22, 2021 at 8:00 AM torikoshia >> <torikoshia@oss.nttdata.com> wrote: >>>>> Updated the patch. >>>> >>>> Thanks for the patch. Here are some comments on the v4 patch: >>> >>> Thanks for your comments and suggestions! >>> I agree with you and updated the patch. >>> >>> On Thu, Jul 1, 2021 at 3:34 PM Fujii Masao >> <masao.fujii@oss.nttdata.com> wrote: >>> >>>> DO $$ >>>> BEGIN >>>> PERFORM pg_sleep(100); >>>> END$$; >>>> >>>> When I called pg_log_current_query_plan() to send the signal to >>>> the backend executing the above query, I got the following log >> message. >>>> I think that this is not expected message. I guess this issue >> happened >>>> because the information about query text and plan is retrieved >>>> from ActivePortal. If this understanding is right, ISTM that we >> should >>>> implement new mechanism so that we can retrieve those information >>>> even while nested query is being executed. >>> >>> I'm now working on this comment. >> >> One idea is to define new global pointer, e.g., "QueryDesc >> *ActiveQueryDesc;". >> This global pointer is set to queryDesc in ExecutorRun() >> (also maybe ExecutorStart()). And this is reset to NULL in >> ExecutorEnd() and >> when an error is thrown. Then ProcessLogCurrentPlanInterrupt() can >> get the plan of the currently running query from that global pointer >> instead of ActivePortal, and log it. Thought? > > It cannot work - there can be a lot of nested queries, and at the end > you cannot reset to null, but you should return back pointer to outer > query. Thanks for your comment! I'm wondering if we can avoid this problem by saving one outer level QueryDesc in addition to the current one. I'm going to try it. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
On 2021-07-28 20:44, torikoshia wrote: > On 2021-07-28 03:45, Pavel Stehule wrote: >> út 27. 7. 2021 v 20:34 odesílatel Fujii Masao >> <masao.fujii@oss.nttdata.com> napsal: >> >>> On 2021/07/09 14:05, torikoshia wrote: >>>> On 2021-07-02 23:21, Bharath Rupireddy wrote: >>>>> On Tue, Jun 22, 2021 at 8:00 AM torikoshia >>> <torikoshia@oss.nttdata.com> wrote: >>>>>> Updated the patch. >>>>> >>>>> Thanks for the patch. Here are some comments on the v4 patch: >>>> >>>> Thanks for your comments and suggestions! >>>> I agree with you and updated the patch. >>>> >>>> On Thu, Jul 1, 2021 at 3:34 PM Fujii Masao >>> <masao.fujii@oss.nttdata.com> wrote: >>>> >>>>> DO $$ >>>>> BEGIN >>>>> PERFORM pg_sleep(100); >>>>> END$$; >>>>> >>>>> When I called pg_log_current_query_plan() to send the signal to >>>>> the backend executing the above query, I got the following log >>> message. >>>>> I think that this is not expected message. I guess this issue >>> happened >>>>> because the information about query text and plan is retrieved >>>>> from ActivePortal. If this understanding is right, ISTM that we >>> should >>>>> implement new mechanism so that we can retrieve those information >>>>> even while nested query is being executed. >>>> >>>> I'm now working on this comment. >>> >>> One idea is to define new global pointer, e.g., "QueryDesc >>> *ActiveQueryDesc;". >>> This global pointer is set to queryDesc in ExecutorRun() >>> (also maybe ExecutorStart()). And this is reset to NULL in >>> ExecutorEnd() and >>> when an error is thrown. Then ProcessLogCurrentPlanInterrupt() can >>> get the plan of the currently running query from that global pointer >>> instead of ActivePortal, and log it. Thought? >> >> It cannot work - there can be a lot of nested queries, and at the end >> you cannot reset to null, but you should return back pointer to outer >> query. > > Thanks for your comment! > > I'm wondering if we can avoid this problem by saving one outer level > QueryDesc in addition to the current one. > I'm going to try it. I have updated the patch in this way. In this patch, getting the plan to the DO statement is as follows. --------------------------------- (pid:76608)=# DO $$ BEGIN PERFORM pg_sleep(15); END$$; (pid:74482)=# SELECT pg_log_current_query_plan(76608); LOG: 00000: plan of the query running on backend with PID 76608 is: Query Text: SELECT pg_sleep(15) Result (cost=0.00..0.01 rows=1 width=4) Output: pg_sleep('15'::double precision) -- pid:76608 finished DO statement: (pid:74482)=# SELECT pg_log_current_query_plan(76608); LOG: 00000: backend with PID 76608 is not running a query --------------------------------- Any thoughts? -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2021/08/10 21:22, torikoshia wrote: > I have updated the patch in this way. Thanks for updating the patch! > In this patch, getting the plan to the DO statement is as follows. Looks good to me. > Any thoughts? + ereport(LOG_SERVER_ONLY, + (errmsg("plan of the query running on backend with PID %d is:\n%s", + MyProcPid, es->str->data), + errhidestmt(true))); Shouldn't we hide context information by calling errhidecontext(true)? While "make installcheck" regression test was running, I repeated executing pg_log_current_query_plan() and got the failure of join_hash test with the following diff. This means that pg_log_current_query_plan() could cause the query that should be completed successfully to fail with the error. Isn't this a bug? I *guess* that the cause of this issue is that ExplainNode() can call InstrEndLoop() more than once unexpectedly. ------------------------------------------------------------------------------ $$ select count(*) from simple r join simple s using (id); $$); - initially_multibatch | increased_batches -----------------------+------------------- - f | f -(1 row) - +ERROR: InstrEndLoop called on running node +CONTEXT: PL/pgSQL function hash_join_batches(text) line 6 at FOR over EXECUTE statement rollback to settings; -- parallel with parallel-oblivious hash join savepoint settings; @@ -687,11 +684,9 @@ left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1; $$); - multibatch ------------- - t -(1 row) - +ERROR: InstrEndLoop called on running node +CONTEXT: parallel worker +PL/pgSQL function hash_join_batches(text) line 6 at FOR over EXECUTE statement rollback to settings; -- single-batch with rescan, parallel-aware savepoint settings; ------------------------------------------------------------------------------ Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2021-08-11 00:21, Fujii Masao wrote: > On 2021/08/10 21:22, torikoshia wrote: >> I have updated the patch in this way. > > Thanks for updating the patch! > > >> In this patch, getting the plan to the DO statement is as follows. > > Looks good to me. > > >> Any thoughts? > > + ereport(LOG_SERVER_ONLY, > + (errmsg("plan of the query running on backend with PID %d is:\n%s", > + MyProcPid, es->str->data), > + errhidestmt(true))); > > Shouldn't we hide context information by calling errhidecontext(true)? Agreed. > While "make installcheck" regression test was running, I repeated > executing pg_log_current_query_plan() and got the failure of join_hash > test > with the following diff. This means that pg_log_current_query_plan() > could > cause the query that should be completed successfully to fail with the > error. > Isn't this a bug? Thanks for finding the bug. I also reproduced it. > I *guess* that the cause of this issue is that ExplainNode() can call > InstrEndLoop() more than once unexpectedly. As far as I looked into, pg_log_current_plan() can call InstrEndLoop() through ExplainNode(). I added a flag to ExplainState to avoid calling InstrEndLoop() when ExplainNode() is called from pg_log_current_plan(). > > > ------------------------------------------------------------------------------ > $$ > select count(*) from simple r join simple s using (id); > $$); > - initially_multibatch | increased_batches > -----------------------+------------------- > - f | f > -(1 row) > - > +ERROR: InstrEndLoop called on running node > +CONTEXT: PL/pgSQL function hash_join_batches(text) line 6 at FOR > over EXECUTE statement > rollback to settings; > -- parallel with parallel-oblivious hash join > savepoint settings; > @@ -687,11 +684,9 @@ > left join (select b1.id, b1.t from join_bar b1 join join_bar b2 > using (id)) ss > on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1; > $$); > - multibatch > ------------- > - t > -(1 row) > - > +ERROR: InstrEndLoop called on running node > +CONTEXT: parallel worker > +PL/pgSQL function hash_join_batches(text) line 6 at FOR over EXECUTE > statement > rollback to settings; > -- single-batch with rescan, parallel-aware > savepoint settings; > > ------------------------------------------------------------------------------ > > Regards, -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2021/08/11 21:14, torikoshia wrote: > As far as I looked into, pg_log_current_plan() can call InstrEndLoop() through ExplainNode(). > I added a flag to ExplainState to avoid calling InstrEndLoop() when ExplainNode() is called from pg_log_current_plan(). Thanks for updating the patch! I tried to test the patch again and encountered two issues. (1) The following WITH RECURSIVE query failed with the error "ERROR: failed to find plan for CTE sg" when I ran pg_log_current_query_plan() against the backend executing that query. Is this a bug? create table graph0( f int, t int, label text ); insert into graph0 values (1, 2, 'arc 1 -> 2'),(1, 3, 'arc 1 -> 3'),(2, 3, 'arc 2 -> 3'),(1, 4, 'arc 1 -> 4'),(4, 5,'arc 4 -> 5'); with recursive search_graph(f, t, label, i) as ( select *, 1||pg_sleep(1)::text from graph0 g union distinct select g.*,1||pg_sleep(1)::text from graph0 g, search_graph sg where g.f = sg.t ) search breadth first by f, t set seq select * from search_graph order by seq; (2) When I ran pg_log_current_query_plan() while "make installcheck" test was running, I got the following assertion failure. TRAP: FailedAssertion("!IsPageLockHeld || (locktag->locktag_type == LOCKTAG_RELATION_EXTEND)", File: "lock.c", Line: 894,PID: 61512) 0 postgres 0x000000010ec23557 ExceptionalCondition + 231 1 postgres 0x000000010e9eff15 LockAcquireExtended + 1461 2 postgres 0x000000010e9ed14d LockRelationOid + 61 3 postgres 0x000000010e41251b relation_open + 91 4 postgres 0x000000010e509679 table_open + 25 5 postgres 0x000000010ebf9462 SearchCatCacheMiss + 274 6 postgres 0x000000010ebf5979 SearchCatCacheInternal + 761 7 postgres 0x000000010ebf566c SearchCatCache + 60 8 postgres 0x000000010ec1a9e0 SearchSysCache + 144 9 postgres 0x000000010ec1ae03 SearchSysCacheExists + 51 10 postgres 0x000000010e58ce35 TypeIsVisible + 437 11 postgres 0x000000010ea98e4c format_type_extended + 1964 12 postgres 0x000000010ea9900e format_type_with_typemod + 30 13 postgres 0x000000010eb78d76 get_const_expr + 742 14 postgres 0x000000010eb79bc8 get_rule_expr + 232 15 postgres 0x000000010eb8140f get_func_expr + 1247 16 postgres 0x000000010eb79dcd get_rule_expr + 749 17 postgres 0x000000010eb81688 get_rule_expr_paren + 136 18 postgres 0x000000010eb7bf38 get_rule_expr + 9304 19 postgres 0x000000010eb72ad5 deparse_expression_pretty + 149 20 postgres 0x000000010eb73463 deparse_expression + 83 21 postgres 0x000000010e68eaf1 show_plan_tlist + 353 22 postgres 0x000000010e68adaf ExplainNode + 4991 23 postgres 0x000000010e688b4b ExplainPrintPlan + 283 24 postgres 0x000000010e68e1aa ProcessLogCurrentPlanInterrupt + 266 25 postgres 0x000000010ea133bb ProcessInterrupts + 3435 26 postgres 0x000000010e738c97 vacuum_delay_point + 55 27 postgres 0x000000010e42bb4b ginInsertCleanup + 1531 28 postgres 0x000000010e42d418 gin_clean_pending_list + 776 29 postgres 0x000000010e74955a ExecInterpExpr + 2522 30 postgres 0x000000010e7487e2 ExecInterpExprStillValid + 82 31 postgres 0x000000010e7ae83b ExecEvalExprSwitchContext + 59 32 postgres 0x000000010e7ae7be ExecProject + 78 33 postgres 0x000000010e7ae4e9 ExecResult + 345 34 postgres 0x000000010e764e42 ExecProcNodeFirst + 82 35 postgres 0x000000010e75ccb2 ExecProcNode + 50 36 postgres 0x000000010e758301 ExecutePlan + 193 37 postgres 0x000000010e7581d1 standard_ExecutorRun + 609 38 auto_explain.so 0x000000010f1df3a7 explain_ExecutorRun + 247 39 postgres 0x000000010e757f3b ExecutorRun + 91 40 postgres 0x000000010ea1cb49 PortalRunSelect + 313 41 postgres 0x000000010ea1c4dd PortalRun + 861 42 postgres 0x000000010ea17474 exec_simple_query + 1540 43 postgres 0x000000010ea164d4 PostgresMain + 2580 44 postgres 0x000000010e91d159 BackendRun + 89 45 postgres 0x000000010e91c6a5 BackendStartup + 565 46 postgres 0x000000010e91b3fe ServerLoop + 638 47 postgres 0x000000010e918b9d PostmasterMain + 6717 48 postgres 0x000000010e7efd43 main + 819 49 libdyld.dylib 0x00007fff6a46e3d5 start + 1 50 ??? 0x0000000000000003 0x0 + 3 LOG: server process (PID 61512) was terminated by signal 6: Abort trap: 6 DETAIL: Failed process was running: select gin_clean_pending_list('t_gin_test_tbl_i_j_idx') is not null; Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2021-08-20 01:12, Fujii Masao wrote: > On 2021/08/11 21:14, torikoshia wrote: >> As far as I looked into, pg_log_current_plan() can call InstrEndLoop() >> through ExplainNode(). >> I added a flag to ExplainState to avoid calling InstrEndLoop() when >> ExplainNode() is called from pg_log_current_plan(). > > Thanks for updating the patch! > I tried to test the patch again and encountered two issues. Thanks for finding these issues! > > (1) > The following WITH RECURSIVE query failed with the error > "ERROR: failed to find plan for CTE sg" when I ran > pg_log_current_query_plan() against the backend executing that query. > Is this a bug? > > create table graph0( f int, t int, label text ); > insert into graph0 values (1, 2, 'arc 1 -> 2'),(1, 3, 'arc 1 -> > 3'),(2, 3, 'arc 2 -> 3'),(1, 4, 'arc 1 -> 4'),(4, 5, 'arc 4 -> 5'); > > with recursive search_graph(f, t, label, i) as ( > select *, 1||pg_sleep(1)::text from graph0 g > union distinct > select g.*,1||pg_sleep(1)::text > from graph0 g, search_graph sg > where g.f = sg.t > ) search breadth first by f, t set seq > select * from search_graph order by seq; This ERROR occurred without applying the patch and just calling EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST. I'm going to make another thread to discuss it. > (2) > When I ran pg_log_current_query_plan() while "make installcheck" test > was running, I got the following assertion failure. > > TRAP: FailedAssertion("!IsPageLockHeld || (locktag->locktag_type == > LOCKTAG_RELATION_EXTEND)", File: "lock.c", Line: 894, PID: 61512) > > 0 postgres 0x000000010ec23557 > ExceptionalCondition + 231 > 1 postgres 0x000000010e9eff15 > LockAcquireExtended + 1461 > 2 postgres 0x000000010e9ed14d > LockRelationOid + 61 > 3 postgres 0x000000010e41251b > relation_open + 91 > 4 postgres 0x000000010e509679 table_open + > 25 > 5 postgres 0x000000010ebf9462 > SearchCatCacheMiss + 274 > 6 postgres 0x000000010ebf5979 > SearchCatCacheInternal + 761 > 7 postgres 0x000000010ebf566c > SearchCatCache + 60 > 8 postgres 0x000000010ec1a9e0 > SearchSysCache + 144 > 9 postgres 0x000000010ec1ae03 > SearchSysCacheExists + 51 > 10 postgres 0x000000010e58ce35 > TypeIsVisible + 437 > 11 postgres 0x000000010ea98e4c > format_type_extended + 1964 > 12 postgres 0x000000010ea9900e > format_type_with_typemod + 30 > 13 postgres 0x000000010eb78d76 > get_const_expr + 742 > 14 postgres 0x000000010eb79bc8 > get_rule_expr + 232 > 15 postgres 0x000000010eb8140f > get_func_expr + 1247 > 16 postgres 0x000000010eb79dcd > get_rule_expr + 749 > 17 postgres 0x000000010eb81688 > get_rule_expr_paren + 136 > 18 postgres 0x000000010eb7bf38 > get_rule_expr + 9304 > 19 postgres 0x000000010eb72ad5 > deparse_expression_pretty + 149 > 20 postgres 0x000000010eb73463 > deparse_expression + 83 > 21 postgres 0x000000010e68eaf1 > show_plan_tlist + 353 > 22 postgres 0x000000010e68adaf ExplainNode > + 4991 > 23 postgres 0x000000010e688b4b > ExplainPrintPlan + 283 > 24 postgres 0x000000010e68e1aa > ProcessLogCurrentPlanInterrupt + 266 > 25 postgres 0x000000010ea133bb > ProcessInterrupts + 3435 > 26 postgres 0x000000010e738c97 > vacuum_delay_point + 55 > 27 postgres 0x000000010e42bb4b > ginInsertCleanup + 1531 > 28 postgres 0x000000010e42d418 > gin_clean_pending_list + 776 > 29 postgres 0x000000010e74955a > ExecInterpExpr + 2522 > 30 postgres 0x000000010e7487e2 > ExecInterpExprStillValid + 82 > 31 postgres 0x000000010e7ae83b > ExecEvalExprSwitchContext + 59 > 32 postgres 0x000000010e7ae7be ExecProject > + 78 > 33 postgres 0x000000010e7ae4e9 ExecResult + > 345 > 34 postgres 0x000000010e764e42 > ExecProcNodeFirst + 82 > 35 postgres 0x000000010e75ccb2 ExecProcNode > + 50 > 36 postgres 0x000000010e758301 ExecutePlan > + 193 > 37 postgres 0x000000010e7581d1 > standard_ExecutorRun + 609 > 38 auto_explain.so 0x000000010f1df3a7 > explain_ExecutorRun + 247 > 39 postgres 0x000000010e757f3b ExecutorRun > + 91 > 40 postgres 0x000000010ea1cb49 > PortalRunSelect + 313 > 41 postgres 0x000000010ea1c4dd PortalRun + > 861 > 42 postgres 0x000000010ea17474 > exec_simple_query + 1540 > 43 postgres 0x000000010ea164d4 PostgresMain > + 2580 > 44 postgres 0x000000010e91d159 BackendRun + > 89 > 45 postgres 0x000000010e91c6a5 > BackendStartup + 565 > 46 postgres 0x000000010e91b3fe ServerLoop + > 638 > 47 postgres 0x000000010e918b9d > PostmasterMain + 6717 > 48 postgres 0x000000010e7efd43 main + 819 > 49 libdyld.dylib 0x00007fff6a46e3d5 start + 1 > 50 ??? 0x0000000000000003 0x0 + 3 > > LOG: server process (PID 61512) was terminated by signal 6: Abort > trap: 6 > DETAIL: Failed process was running: select > gin_clean_pending_list('t_gin_test_tbl_i_j_idx') is not null; As far as I understand, since explaining plans can acquire heavyweight lock for example to get column names, when page lock is held at the time of the interrupt, this assertion error occurs. The attached patch tries to avoid this by checking each LocalLock entry and when finding even one, giving up logging the plan. Thoughts? Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2021-09-07 12:39, torikoshia wrote: > On 2021-08-20 01:12, Fujii Masao wrote: >> On 2021/08/11 21:14, torikoshia wrote: >>> As far as I looked into, pg_log_current_plan() can call >>> InstrEndLoop() through ExplainNode(). >>> I added a flag to ExplainState to avoid calling InstrEndLoop() when >>> ExplainNode() is called from pg_log_current_plan(). >> >> Thanks for updating the patch! >> I tried to test the patch again and encountered two issues. > > Thanks for finding these issues! > >> >> (1) >> The following WITH RECURSIVE query failed with the error >> "ERROR: failed to find plan for CTE sg" when I ran >> pg_log_current_query_plan() against the backend executing that query. >> Is this a bug? >> >> create table graph0( f int, t int, label text ); >> insert into graph0 values (1, 2, 'arc 1 -> 2'),(1, 3, 'arc 1 -> >> 3'),(2, 3, 'arc 2 -> 3'),(1, 4, 'arc 1 -> 4'),(4, 5, 'arc 4 -> 5'); >> >> with recursive search_graph(f, t, label, i) as ( >> select *, 1||pg_sleep(1)::text from graph0 g >> union distinct >> select g.*,1||pg_sleep(1)::text >> from graph0 g, search_graph sg >> where g.f = sg.t >> ) search breadth first by f, t set seq >> select * from search_graph order by seq; > > This ERROR occurred without applying the patch and just calling > EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST. > > I'm going to make another thread to discuss it. > >> (2) >> When I ran pg_log_current_query_plan() while "make installcheck" test >> was running, I got the following assertion failure. >> >> TRAP: FailedAssertion("!IsPageLockHeld || (locktag->locktag_type == >> LOCKTAG_RELATION_EXTEND)", File: "lock.c", Line: 894, PID: 61512) >> >> 0 postgres 0x000000010ec23557 >> ExceptionalCondition + 231 >> 1 postgres 0x000000010e9eff15 >> LockAcquireExtended + 1461 >> 2 postgres 0x000000010e9ed14d >> LockRelationOid + 61 >> 3 postgres 0x000000010e41251b >> relation_open + 91 >> 4 postgres 0x000000010e509679 table_open >> + 25 >> 5 postgres 0x000000010ebf9462 >> SearchCatCacheMiss + 274 >> 6 postgres 0x000000010ebf5979 >> SearchCatCacheInternal + 761 >> 7 postgres 0x000000010ebf566c >> SearchCatCache + 60 >> 8 postgres 0x000000010ec1a9e0 >> SearchSysCache + 144 >> 9 postgres 0x000000010ec1ae03 >> SearchSysCacheExists + 51 >> 10 postgres 0x000000010e58ce35 >> TypeIsVisible + 437 >> 11 postgres 0x000000010ea98e4c >> format_type_extended + 1964 >> 12 postgres 0x000000010ea9900e >> format_type_with_typemod + 30 >> 13 postgres 0x000000010eb78d76 >> get_const_expr + 742 >> 14 postgres 0x000000010eb79bc8 >> get_rule_expr + 232 >> 15 postgres 0x000000010eb8140f >> get_func_expr + 1247 >> 16 postgres 0x000000010eb79dcd >> get_rule_expr + 749 >> 17 postgres 0x000000010eb81688 >> get_rule_expr_paren + 136 >> 18 postgres 0x000000010eb7bf38 >> get_rule_expr + 9304 >> 19 postgres 0x000000010eb72ad5 >> deparse_expression_pretty + 149 >> 20 postgres 0x000000010eb73463 >> deparse_expression + 83 >> 21 postgres 0x000000010e68eaf1 >> show_plan_tlist + 353 >> 22 postgres 0x000000010e68adaf ExplainNode >> + 4991 >> 23 postgres 0x000000010e688b4b >> ExplainPrintPlan + 283 >> 24 postgres 0x000000010e68e1aa >> ProcessLogCurrentPlanInterrupt + 266 >> 25 postgres 0x000000010ea133bb >> ProcessInterrupts + 3435 >> 26 postgres 0x000000010e738c97 >> vacuum_delay_point + 55 >> 27 postgres 0x000000010e42bb4b >> ginInsertCleanup + 1531 >> 28 postgres 0x000000010e42d418 >> gin_clean_pending_list + 776 >> 29 postgres 0x000000010e74955a >> ExecInterpExpr + 2522 >> 30 postgres 0x000000010e7487e2 >> ExecInterpExprStillValid + 82 >> 31 postgres 0x000000010e7ae83b >> ExecEvalExprSwitchContext + 59 >> 32 postgres 0x000000010e7ae7be ExecProject >> + 78 >> 33 postgres 0x000000010e7ae4e9 ExecResult >> + 345 >> 34 postgres 0x000000010e764e42 >> ExecProcNodeFirst + 82 >> 35 postgres 0x000000010e75ccb2 >> ExecProcNode + 50 >> 36 postgres 0x000000010e758301 ExecutePlan >> + 193 >> 37 postgres 0x000000010e7581d1 >> standard_ExecutorRun + 609 >> 38 auto_explain.so 0x000000010f1df3a7 >> explain_ExecutorRun + 247 >> 39 postgres 0x000000010e757f3b ExecutorRun >> + 91 >> 40 postgres 0x000000010ea1cb49 >> PortalRunSelect + 313 >> 41 postgres 0x000000010ea1c4dd PortalRun + >> 861 >> 42 postgres 0x000000010ea17474 >> exec_simple_query + 1540 >> 43 postgres 0x000000010ea164d4 >> PostgresMain + 2580 >> 44 postgres 0x000000010e91d159 BackendRun >> + 89 >> 45 postgres 0x000000010e91c6a5 >> BackendStartup + 565 >> 46 postgres 0x000000010e91b3fe ServerLoop >> + 638 >> 47 postgres 0x000000010e918b9d >> PostmasterMain + 6717 >> 48 postgres 0x000000010e7efd43 main + 819 >> 49 libdyld.dylib 0x00007fff6a46e3d5 start + 1 >> 50 ??? 0x0000000000000003 0x0 + 3 >> >> LOG: server process (PID 61512) was terminated by signal 6: Abort >> trap: 6 >> DETAIL: Failed process was running: select >> gin_clean_pending_list('t_gin_test_tbl_i_j_idx') is not null; > > As far as I understand, since explaining plans can acquire heavyweight > lock for example to get column names, when page lock is held at the > time of the interrupt, this assertion error occurs. > > The attached patch tries to avoid this by checking each LocalLock > entry and when finding even one, giving up logging the plan. > > Thoughts? Regression tests failed on windows. Updated patch attached. Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
Hi, hackers! • The last version of patch is correct applied. It changes 8 files from /src/backend, and 9 other files. • I have 1 error and 1 warning during compilation on Mac. explain.c:4985:25: error: implicit declaration of function 'GetLockMethodLocalHash' is invalid in C99 [-Werror,-Wimplicit-function-declaration] hash_seq_init(&status, GetLockMethodLocalHash()); explain.c:4985:25: warning: incompatible integer to pointer conversion passing 'int' to parameter of type 'HTAB *' (aka 'struct HTAB *') [-Wint-conversion] hash_seq_init(&status, GetLockMethodLocalHash()); This error doesn't appear at my second machine with Ubuntu. I found the reason. You delete #ifdef USE_ASSERT_CHECKING from implementation of function GetLockMethodLocalHash(void), but this ifdef exists around function declaration. There may be a situation, when implementation exists without declaration, so files with using of function produce errors. I create new version of patch with fix of this problem. I'm agree that seeing the details of a query is a useful feature, but I have several doubts: 1) There are lots of changes of core's code. But not all users need this functionality. So adding this functionality like extension seemed more reasonable. 2) There are many tools available to monitor the status of a query. How much do we need another one? For example: • pg_stat_progress_* is set of views with current status of ANALYZE, CREATE INDEX, VACUUM, CLUSTER, COPY, Base Backup. You can find it in PostgreSQL documentation [1]. • pg_query_state is contrib with 2 patches for core (I hope someday Community will support adding this patches to PostgreSQL). It contains function with printing table with pid, full query text, plan and current progress of every node like momentary EXPLAIN ANALYSE for SELECT, UPDATE, INSERT, DELETE. So it supports every flags and formats of EXPLAIN. You can find current version of pg_query_state on github [2]. Also I found old discussion about first version of it in Community [3]. 3) Have you measured the overload of your feature? It would be really interesting to know the changes in speed and performance. Thank you for working on this issue. I would be glad to continue to follow the development of this issue. Links above: [1] https://www.postgresql.org/docs/current/progress-reporting.html [2] https://github.com/postgrespro/pg_query_state [3] https://www.postgresql.org/message-id/dbfb1a42-ee58-88fd-8d77-550498f52bc5@postgrespro.ru -- Ekaterina Sokolova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 2021-10-13 23:28, Ekaterina Sokolova wrote: > Hi, hackers! > > • The last version of patch is correct applied. It changes 8 files > from /src/backend, and 9 other files. > > • I have 1 error and 1 warning during compilation on Mac. > > explain.c:4985:25: error: implicit declaration of function > 'GetLockMethodLocalHash' is invalid in C99 > [-Werror,-Wimplicit-function-declaration] > hash_seq_init(&status, GetLockMethodLocalHash()); > explain.c:4985:25: warning: incompatible integer to pointer conversion > passing 'int' to parameter of type 'HTAB *' (aka 'struct HTAB *') > [-Wint-conversion] > hash_seq_init(&status, GetLockMethodLocalHash()); > > This error doesn't appear at my second machine with Ubuntu. > > I found the reason. You delete #ifdef USE_ASSERT_CHECKING from > implementation of function GetLockMethodLocalHash(void), but this > ifdef exists around function declaration. There may be a situation, > when implementation exists without declaration, so files with using of > function produce errors. I create new version of patch with fix of > this problem. Thanks for fixing that! > I'm agree that seeing the details of a query is a useful feature, but > I have several doubts: > > 1) There are lots of changes of core's code. But not all users need > this functionality. So adding this functionality like extension seemed > more reasonable. It would be good if we can implement this feature in an extension, but as pg_query_state extension needs applying patches to PostgreSQL, I think this kind of feature needs PostgreSQL core modification. IMHO extensions which need core modification are not easy to use in production environments.. > 2) There are many tools available to monitor the status of a query. > How much do we need another one? For example: > • pg_stat_progress_* is set of views with current status of > ANALYZE, CREATE INDEX, VACUUM, CLUSTER, COPY, Base Backup. You can > find it in PostgreSQL documentation [1]. > • pg_query_state is contrib with 2 patches for core (I hope > someday Community will support adding this patches to PostgreSQL). It > contains function with printing table with pid, full query text, plan > and current progress of every node like momentary EXPLAIN ANALYSE for > SELECT, UPDATE, INSERT, DELETE. So it supports every flags and formats > of EXPLAIN. You can find current version of pg_query_state on github > [2]. Also I found old discussion about first version of it in > Community [3]. Thanks for introducing the extension! I only took a quick look at pg_query_state, I have some questions. pg_query_state seems using shm_mq to expose the plan information, but there was a discussion that this kind of architecture would be tricky to do properly [1]. Does pg_query_state handle difficulties listed on the discussion? It seems the caller of the pg_query_state() has to wait until the target process pushes the plan information into shared memory, can it lead to deadlock situations? I came up with this question because when trying to make a view for memory contexts of other backends, we encountered deadlock situations. After all, we gave up view design and adopted sending signal and logging. Some of the comments of [3] seem useful for my patch, I'm going to consider them. Thanks! > 3) Have you measured the overload of your feature? It would be really > interesting to know the changes in speed and performance. I haven't measured it yet, but I believe that the overhead for backends which are not called pg_log_current_plan() would be slight since the patch just adds the logic for saving QueryDesc on ExecutorRun(). The overhead for backends which is called pg_log_current_plan() might not slight, but since the target process are assumed dealing with long-running query and the user want to know its plan, its overhead would be worth the cost. > Thank you for working on this issue. I would be glad to continue to > follow the development of this issue. Thanks for your help! -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
On 2021-10-15 15:17, torikoshia wrote: > I only took a quick look at pg_query_state, I have some questions. > > pg_query_state seems using shm_mq to expose the plan information, but > there was a discussion that this kind of architecture would be tricky > to do properly [1]. > Does pg_query_state handle difficulties listed on the discussion? Sorry, I forgot to add the URL. [1] https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com > It seems the caller of the pg_query_state() has to wait until the > target process pushes the plan information into shared memory, can it > lead to deadlock situations? > I came up with this question because when trying to make a view for > memory contexts of other backends, we encountered deadlock situations. > After all, we gave up view design and adopted sending signal and > logging. Discussion at the following URL. https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Hi! I'm here to answer your questions about contrib/pg_query_state. >> I only took a quick look at pg_query_state, I have some questions. >> pg_query_state seems using shm_mq to expose the plan information, but >> there was a discussion that this kind of architecture would be tricky >> to do properly [1]. >> Does pg_query_state handle difficulties listed on the discussion? > [1] > https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com I doubt that it was the right link. But on the topic I will say that extension really use shared memory, interaction is implemented by sending / receiving messages. This architecture provides the required reliability and convenience. >> It seems the caller of the pg_query_state() has to wait until the >> target process pushes the plan information into shared memory, can it >> lead to deadlock situations? >> I came up with this question because when trying to make a view for >> memory contexts of other backends, we encountered deadlock situations. >> After all, we gave up view design and adopted sending signal and >> logging. > > Discussion at the following URL. > https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com Before extracting information about side process we check its state. Information will only be retrieved for a process willing to provide it. Otherwise, we will receive an error message about impossibility of getting query execution statistics + process status. Also checking fact of extracting your own status exists. This is even verified in tests. Thanks for your attention. Just in case, I am ready to discuss this topic in more detail. About overhead: > I haven't measured it yet, but I believe that the overhead for backends > which are not called pg_log_current_plan() would be slight since the > patch just adds the logic for saving QueryDesc on ExecutorRun(). > The overhead for backends which is called pg_log_current_plan() might > not slight, but since the target process are assumed dealing with > long-running query and the user want to know its plan, its overhead > would be worth the cost. I think it would be useful for us to have couple of examples with a different number of rows compared to using without this functionality. Hope this helps. -- Ekaterina Sokolova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
This patch no longer applies on top of HEAD, please submit a rebased version. -- Daniel Gustafsson https://vmware.com/
On 2021-11-02 20:32, Ekaterina Sokolova wrote: Thanks for your response! > Hi! > > I'm here to answer your questions about contrib/pg_query_state. >>> I only took a quick look at pg_query_state, I have some questions. > >>> pg_query_state seems using shm_mq to expose the plan information, but >>> there was a discussion that this kind of architecture would be tricky >>> to do properly [1]. >>> Does pg_query_state handle difficulties listed on the discussion? >> [1] >> https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com > > I doubt that it was the right link. Sorry for make you confused, here is the link. https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com > But on the topic I will say that extension really use shared memory, > interaction is implemented by sending / receiving messages. This > architecture provides the required reliability and convenience. As described in the link, using shared memory for this kind of work would need DSM and It'd be also necessary to exchange information between requestor and responder. For example, when I looked at a little bit of pg_query_state code, it looks like the size of the queue is fixed at QUEUE_SIZE, and I wonder how plans that exceed QUEUE_SIZE are handled. >>> It seems the caller of the pg_query_state() has to wait until the >>> target process pushes the plan information into shared memory, can it >>> lead to deadlock situations? >>> I came up with this question because when trying to make a view for >>> memory contexts of other backends, we encountered deadlock >>> situations. >>> After all, we gave up view design and adopted sending signal and >>> logging. >> >> Discussion at the following URL. >> https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com > > Before extracting information about side process we check its state. > Information will only be retrieved for a process willing to provide > it. Otherwise, we will receive an error message about impossibility of > getting query execution statistics + process status. Also checking > fact of extracting your own status exists. This is even verified in > tests. > > Thanks for your attention. > Just in case, I am ready to discuss this topic in more detail. I imagined the following procedure. Does it cause dead lock in pg_query_state? - session1 BEGIN; TRUNCATE t; - session2 BEGIN; TRUNCATE t; -- wait - session1 SELECT * FROM pg_query_state(<pid of session>); -- wait and dead locked? > About overhead: >> I haven't measured it yet, but I believe that the overhead for >> backends >> which are not called pg_log_current_plan() would be slight since the >> patch just adds the logic for saving QueryDesc on ExecutorRun(). >> The overhead for backends which is called pg_log_current_plan() might >> not slight, but since the target process are assumed dealing with >> long-running query and the user want to know its plan, its overhead >> would be worth the cost. > I think it would be useful for us to have couple of examples with a > different number of rows compared to using without this functionality. Do you have any expectaion that the number of rows would affect the performance of this functionality? This patch adds some codes to ExecutorRun(), but I thought the number of rows would not give impact on the performance. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
On Wed, Oct 13, 2021 at 05:28:30PM +0300, Ekaterina Sokolova wrote: > Hi, hackers! > > • pg_query_state is contrib with 2 patches for core (I hope someday > Community will support adding this patches to PostgreSQL). It contains I reviewed this version of the patch - I have some language fixes. I didn't know about pg_query_state, thanks. > To improve this situation, this patch adds > pg_log_current_query_plan() function that requests to log the > plan of the specified backend process. To me, "current plan" seems to mean "plan of *this* backend" (which makes no sense to log). I think the user-facing function could be called pg_log_query_plan(). It's true that the implementation is a request to another backend to log its *own* query plan - but users shouldn't need to know about the implementation. > + Only superusers can request to log plan of the running query. .. log the plan of a running query. > + Note that nested statements (statements executed inside a function) are not > + considered for logging. Only the deepest nesting query's plan is logged. Only the plan of the most deeply nested query is logged. > + (errmsg("backend with PID %d is not running a query", > + MyProcPid))); The extra parens around errmsg() are not needed since e3a87b499. > + (errmsg("backend with PID %d is now holding a page lock. Try again", remove "now" > + (errmsg("plan of the query running on backend with PID %d is:\n%s", > + MyProcPid, es->str->data), Maybe this should say "query plan running on backend with PID 17793 is:" > + * would cause lots of log messages and which can lead to denial of remove "and" > + errmsg("must be a superuser to log information about specified process"))); I think it should say not say "specified", since that sounds like the user might have access to log information about some other processes: | must be a superuser to log information about processes > + > + proc = BackendPidGetProc(pid); > + > + /* > + * BackendPidGetProc returns NULL if the pid isn't valid; but by the time > + * we reach kill(), a process for which we get a valid proc here might > + * have terminated on its own. There's no way to acquire a lock on an > + * arbitrary process to prevent that. But since this mechanism is usually > + * used to below purposes, it might end its own first and the information used for below purposes, -- Justin
On Wed, Oct 13, 2021 at 7:58 PM Ekaterina Sokolova <e.sokolova@postgrespro.ru> wrote: > Thank you for working on this issue. I would be glad to continue to > follow the development of this issue. Thanks for the patch. I'm not sure if v11 is the latest patch, if yes, I have the following comments: 1) Firstly, v11 patch isn't getting applied on the master - http://cfbot.cputube.org/patch_35_3142.log. 2) I think we are moving away from if (!superuser()) checks, see the commit [1]. The goal is to let the GRANT-REVOKE system deal with who is supposed to run these system functions. Since pg_log_current_query_plan also writes the info to server logs, I think it should do the same thing as commit [1] did for pg_log_backend_memory_contexts. With v11, you are re-introducing the superuser() check in the pg_log_backend_memory_contexts which is wrong. 3) I think SendProcSignalForLogInfo can be more generic, meaning, it can also send signal to auxiliary processes if asked to do this will simplify the things for pg_log_backend_memory_contexts and other patches like pg_print_backtrace. I would imagine it to be "bool SendProcSignalForLogInfo(pid_t pid, ProcSignalReason reason, bool signal_aux_proc);". [1] commit f0b051e322d530a340e62f2ae16d99acdbcb3d05 Author: Jeff Davis <jdavis@postgresql.org> Date: Tue Oct 26 13:13:52 2021 -0700 Allow GRANT on pg_log_backend_memory_contexts(). Remove superuser check, allowing any user granted permissions on pg_log_backend_memory_contexts() to log the memory contexts of any backend. Note that this could allow a privileged non-superuser to log the memory contexts of a superuser backend, but as discussed, that does not seem to be a problem. Reviewed-by: Nathan Bossart, Bharath Rupireddy, Michael Paquier, Kyotaro Horiguchi, Andres Freund Discussion: https://postgr.es/m/e5cf6684d17c8d1ef4904ae248605ccd6da03e72.camel@j-davis.com Regards, Bharath Rupireddy.
On 2021-11-13 22:29, Bharath Rupireddy wrote: Thanks for your review! > On Wed, Oct 13, 2021 at 7:58 PM Ekaterina Sokolova > <e.sokolova@postgrespro.ru> wrote: >> Thank you for working on this issue. I would be glad to continue to >> follow the development of this issue. > > Thanks for the patch. I'm not sure if v11 is the latest patch, if yes, > I have the following comments: > > 1) Firstly, v11 patch isn't getting applied on the master - > http://cfbot.cputube.org/patch_35_3142.log. Updated the patch. > 2) I think we are moving away from if (!superuser()) checks, see the > commit [1]. The goal is to let the GRANT-REVOKE system deal with who > is supposed to run these system functions. Since > pg_log_current_query_plan also writes the info to server logs, I think > it should do the same thing as commit [1] did for > pg_log_backend_memory_contexts. > > With v11, you are re-introducing the superuser() check in the > pg_log_backend_memory_contexts which is wrong. Yeah, I removed superuser() check and make it possible to be executed by non-superusers when users are granted to do so. > > 3) I think SendProcSignalForLogInfo can be more generic, meaning, it > can also send signal to auxiliary processes if asked to do this will > simplify the things for pg_log_backend_memory_contexts and other > patches like pg_print_backtrace. I would imagine it to be "bool > SendProcSignalForLogInfo(pid_t pid, ProcSignalReason reason, bool > signal_aux_proc);". I agree with your idea. Since sending signals to auxiliary processes to dump memory contexts and pg_print_backtrace is still under discussion, IMHO it would be better to refactor SendProcSignalForLogInfo after these patches are commited. Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2021-11-13 03:37, Justin Pryzby wrote: > I reviewed this version of the patch - I have some language fixes. Thanks for your review! Attached patch that reflects your comments. Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On Mon, Nov 15, 2021 at 6:29 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > > 3) I think SendProcSignalForLogInfo can be more generic, meaning, it > > can also send signal to auxiliary processes if asked to do this will > > simplify the things for pg_log_backend_memory_contexts and other > > patches like pg_print_backtrace. I would imagine it to be "bool > > SendProcSignalForLogInfo(pid_t pid, ProcSignalReason reason, bool > > signal_aux_proc);". > > I agree with your idea. > Since sending signals to auxiliary processes to dump memory contexts and > pg_print_backtrace is still under discussion, IMHO it would be better to > refactor SendProcSignalForLogInfo after these patches are commited. +1. I have another comment: isn't it a good idea that an overloaded version of the new function pg_log_query_plan can take the available explain command options as a text argument? I'm not sure if it is possible to get the stats like buffers, costs etc of a running query, if yes, something like pg_log_query_plan(pid, 'buffers', 'costs'....);? It looks to be an overkill at first sight, but these can be useful to know more detailed plan of the query. Thoughts? Regards, Bharath Rupireddy.
On 2021-11-15 23:15, Bharath Rupireddy wrote: > I have another comment: isn't it a good idea that an overloaded > version of the new function pg_log_query_plan can take the available > explain command options as a text argument? I'm not sure if it is > possible to get the stats like buffers, costs etc of a running query, > if yes, something like pg_log_query_plan(pid, 'buffers', > 'costs'....);? It looks to be an overkill at first sight, but these > can be useful to know more detailed plan of the query. I also think the overloaded version would be useful. However as discussed in [1], it seems to introduce other difficulties. I think it would be enough that the first version of pg_log_query_plan doesn't take any parameters. [1] https://www.postgresql.org/message-id/ce86e4f72f09d5497e8ad3a162861d33%40oss.nttdata.com -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Hi! You forgot my last fix to build correctly on Mac. I have added it. About our discussion of pg_query_state: torikoshia писал 2021-11-04 15:49: >> I doubt that it was the right link. > Sorry for make you confused, here is the link. > https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com Thank you. I'll see it soon. > I imagined the following procedure. > Does it cause dead lock in pg_query_state? > > - session1 > BEGIN; TRUNCATE t; > > - session2 > BEGIN; TRUNCATE t; -- wait > > - session1 > SELECT * FROM pg_query_state(<pid of session>); -- wait and dead > locked? As I know, pg_query_state use non-blocking read and write. I have wrote few tests trying to deadlock it (on 14 version), but all finished correctly. Have a nice day. Please feel free to contact me if you need any further information. -- Ekaterina Sokolova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 2021-11-17 22:44, Ekaterina Sokolova wrote: > Hi! > > You forgot my last fix to build correctly on Mac. I have added it. Thanks for the notification! Since the patch could not be applied to the HEAD anymore, I also updated it. > > About our discussion of pg_query_state: > > torikoshia писал 2021-11-04 15:49: >>> I doubt that it was the right link. >> Sorry for make you confused, here is the link. >> https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com > > Thank you. I'll see it soon. > >> I imagined the following procedure. >> Does it cause dead lock in pg_query_state? >> >> - session1 >> BEGIN; TRUNCATE t; >> >> - session2 >> BEGIN; TRUNCATE t; -- wait >> >> - session1 >> SELECT * FROM pg_query_state(<pid of session>); -- wait and dead >> locked? > > As I know, pg_query_state use non-blocking read and write. I have > wrote few tests trying to deadlock it (on 14 version), but all > finished correctly. > > Have a nice day. Please feel free to contact me if you need any > further information. Thanks for your information and help! -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2021-11-26 12:39, torikoshia wrote: > Since the patch could not be applied to the HEAD anymore, I also > updated it. Updated the patch for fixing compiler warning about the format on windows. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2022-01-07 14:30, torikoshia wrote: > Updated the patch for fixing compiler warning about the format on > windows. I got another compiler warning, updated the patch again. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2022/01/07 20:58, torikoshia wrote: > On 2022-01-07 14:30, torikoshia wrote: > >> Updated the patch for fixing compiler warning about the format on windows. > > I got another compiler warning, updated the patch again. Thanks for updating the patch! I ran the following query every 0.1s by using \watch psql command from three different sessions while make installcheck testwas running. SELECT pg_log_query_plan(pid) FROM pg_stat_activity; And then I got the segmentation fault as follows. 2022-01-07 21:40:32 JST [postmaster] LOG: server process (PID 51017) was terminated by signal 11: Segmentation fault: 11 2022-01-07 21:40:32 JST [postmaster] DETAIL: Failed process was running: select description, (test_conv(inbytes, 'utf8','utf8')).* from utf8_verification_inputs; 2022-01-07 21:40:32 JST [postmaster] LOG: terminating any other active server processes 2022-01-07 21:40:32 JST [postmaster] LOG: all server processes terminated; reinitializing The backtrace I got from the core file was: (lldb) target create --core "/cores/core.51017" Core file '/cores/core.51017' (x86_64) was loaded. (lldb) bt * thread #1, stop reason = signal SIGSTOP * frame #0: 0x00007fff20484552 libsystem_platform.dylib`_platform_strlen + 18 frame #1: 0x000000011076e36c postgres`dopr(target=0x00007ffedfd1b1d8, format="\n", args=0x00007ffedfd1b450) at snprintf.c:444:20 frame #2: 0x000000011076e133 postgres`pg_vsnprintf(str="Query Text: \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f"..., count=1024,fmt="%s: %s\n", args=0x00007ffedfd1b450) at snprintf.c:195:2 frame #3: 0x000000011075e51d postgres`pvsnprintf(buf="Query Text: \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f"..., len=1024,fmt="%s: %s\n", args=0x00007ffedfd1b450) at psprintf.c:110:13 frame #4: 0x000000011076052d postgres`appendStringInfoVA(str=0x00007fe56804bec0, fmt="%s: %s\n", args=0x00007ffedfd1b450)at stringinfo.c:149:13 frame #5: 0x0000000110760449 postgres`appendStringInfo(str=0x00007fe56804bec0, fmt="%s: %s\n") at stringinfo.c:103:12 frame #6: 0x000000011016f6cf postgres`ExplainProperty(qlabel="Query Text", unit=0x0000000000000000, value="", numeric=false,es=0x00007fe56804be28) at explain.c:4390:5 frame #7: 0x000000011016eace postgres`ExplainPropertyText(qlabel="Query Text", value="", es=0x00007fe56804be28) at explain.c:4435:2 frame #8: 0x000000011016ea97 postgres`ExplainQueryText(es=0x00007fe56804be28, queryDesc=0x00007fe568022320) at explain.c:969:3 frame #9: 0x000000011016faeb postgres`ProcessLogCurrentPlanInterrupt at explain.c:5009:2 frame #10: 0x00000001104dda84 postgres`ProcessInterrupts at postgres.c:3373:3 frame #11: 0x0000000110c81c57 plpgsql.so`exec_stmts(estate=0x00007ffedfd1bab8, stmts=0x00007fe568050e00) at pl_exec.c:1994:3 frame #12: 0x0000000110c80a78 plpgsql.so`exec_stmt_block(estate=0x00007ffedfd1bab8, block=0x00007fe56801c570) at pl_exec.c:1777:9 frame #13: 0x0000000110c81c91 plpgsql.so`exec_stmts(estate=0x00007ffedfd1bab8, stmts=0x00007fe56801c5c8) at pl_exec.c:1999:10 frame #14: 0x0000000110c80f21 plpgsql.so`exec_stmt_block(estate=0x00007ffedfd1bab8, block=0x00007fe56801c678) at pl_exec.c:1926:8 frame #15: 0x0000000110c7ed92 plpgsql.so`exec_toplevel_block(estate=0x00007ffedfd1bab8, block=0x00007fe56801c678) atpl_exec.c:1617:7 frame #16: 0x0000000110c7d3e4 plpgsql.so`plpgsql_exec_function(func=0x00007fe56a00aa20, fcinfo=0x00007fe56803c888, simple_eval_estate=0x0000000000000000,simple_eval_resowner=0x0000000000000000, procedure_resowner=0x0000000000000000, atomic=true)at pl_exec.c:611:7 frame #17: 0x0000000110c9bf12 plpgsql.so`plpgsql_call_handler(fcinfo=0x00007fe56803c888) at pl_handler.c:277:13 frame #18: 0x0000000110228b06 postgres`ExecInterpExpr(state=0x00007fe56803bf88, econtext=0x00007fe56803ba70, isnull=0x00007ffedfd1bf7f)at execExprInterp.c:725:8 frame #19: 0x000000011024858b postgres`ExecEvalExprSwitchContext(state=0x00007fe56803bf88, econtext=0x00007fe56803ba70,isNull=0x00007ffedfd1bf7f) at executor.h:339:13 frame #20: 0x00000001102482aa postgres`ExecProject(projInfo=0x00007fe56803bf80) at executor.h:373:9 frame #21: 0x0000000110247e2f postgres`ExecScan(node=0x00007fe56803b958, accessMtd=(postgres`SeqNext at nodeSeqscan.c:51),recheckMtd=(postgres`SeqRecheck at nodeSeqscan.c:90)) at execScan.c:238:12 frame #22: 0x000000011028c575 postgres`ExecSeqScan(pstate=0x00007fe56803b958) at nodeSeqscan.c:112:9 frame #23: 0x000000011023bbe2 postgres`ExecProcNode(node=0x00007fe56803b958) at executor.h:257:9 frame #24: 0x0000000110237471 postgres`ExecutePlan(estate=0x00007fe56803b720, planstate=0x00007fe56803b958, use_parallel_mode=false,operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x00007fe568046ac8,execute_once=true) at execMain.c:1561:10 frame #25: 0x0000000110237320 postgres`standard_ExecutorRun(queryDesc=0x00007fe568037320, direction=ForwardScanDirection,count=0, execute_once=true) at execMain.c:371:3 frame #26: 0x00000001102370b8 postgres`ExecutorRun(queryDesc=0x00007fe568037320, direction=ForwardScanDirection, count=0,execute_once=true) at execMain.c:313:3 frame #27: 0x00000001104e6bdb postgres`PortalRunSelect(portal=0x00007fe567825920, forward=true, count=0, dest=0x00007fe568046ac8)at pquery.c:921:4 frame #28: 0x00000001104e65ba postgres`PortalRun(portal=0x00007fe567825920, count=9223372036854775807, isTopLevel=true,run_once=true, dest=0x00007fe568046ac8, altdest=0x00007fe568046ac8, qc=0x00007ffedfd1c350) at pquery.c:765:18 frame #29: 0x00000001104e1861 postgres`exec_simple_query(query_string="select description, (test_conv(inbytes, 'utf8','utf8')).* from utf8_verification_inputs;") at postgres.c:1216:10 frame #30: 0x00000001104e099c postgres`PostgresMain(dbname="regression", username="postgres") at postgres.c:4505:7 frame #31: 0x00000001103ee2a2 postgres`BackendRun(port=0x00007fe567704080) at postmaster.c:4594:2 frame #32: 0x00000001103ed868 postgres`BackendStartup(port=0x00007fe567704080) at postmaster.c:4322:3 frame #33: 0x00000001103ec64c postgres`ServerLoop at postmaster.c:1802:7 frame #34: 0x00000001103e9e29 postgres`PostmasterMain(argc=3, argv=0x00007fe567405e70) at postmaster.c:1474:11 frame #35: 0x00000001102c8b69 postgres`main(argc=3, argv=0x00007fe567405e70) at main.c:198:3 frame #36: 0x00007fff2045cf3d libdyld.dylib`start + 1 Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2022-01-14 15:38, Julien Rouhaud wrote: > Hi, > > On Fri, Jan 07, 2022 at 09:54:31PM +0900, Fujii Masao wrote: >> I ran the following query every 0.1s by using \watch psql command from >> three different sessions while make installcheck test was running. >> >> SELECT pg_log_query_plan(pid) FROM pg_stat_activity; >> >> >> And then I got the segmentation fault as follows. >> >> 2022-01-07 21:40:32 JST [postmaster] LOG: server process (PID 51017) >> was terminated by signal 11: Segmentation fault: 11 >> 2022-01-07 21:40:32 JST [postmaster] DETAIL: Failed process was >> running: select description, (test_conv(inbytes, 'utf8', 'utf8')).* >> from utf8_verification_inputs; >> 2022-01-07 21:40:32 JST [postmaster] LOG: terminating any other >> active server processes >> 2022-01-07 21:40:32 JST [postmaster] LOG: all server processes >> terminated; reinitializing >> >> >> The backtrace I got from the core file was: >> >> (lldb) target create --core "/cores/core.51017" >> Core file '/cores/core.51017' (x86_64) was loaded. >> (lldb) bt >> * thread #1, stop reason = signal SIGSTOP >> * frame #0: 0x00007fff20484552 >> libsystem_platform.dylib`_platform_strlen + 18 >> frame #1: 0x000000011076e36c >> postgres`dopr(target=0x00007ffedfd1b1d8, format="\n", >> args=0x00007ffedfd1b450) at snprintf.c:444:20 >> frame #2: 0x000000011076e133 postgres`pg_vsnprintf(str="Query >> Text: >> \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x 7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x 7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f \x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x7f"..., >> count=1024, fmt="%s: %s\n", args=0x00007ffedfd1b450) at >> snprintf.c:195:2 Thanks for testing! I could reproduce the situation and confirmed almost the same backtrace. The cause seems that the previous patch didn't do cleanup when the transactions were aborted. It leaded segmentation faults when referencing ActiveQueryDesc-> after the transaction were aborted. The attached patch added cleanups when transactions and subtransactions are aborted. BTW, the latest patch for pg_log_backtrace() added a new function for deduplicating codes for pg_log_*()[1]. It seems better to use it for pg_log_query_plan() after the patch is merged. [1] https://www.postgresql.org/message-id/flat/20211115194251.GP17618%40telsasoft.com#4e681d04c78867b2cce44b3cb1791bf5 duplicate code to a new function CheckPostgresProcessId which can be used by pg_log_backtrace, > On top of the visibly use-after-pfree memory issue, the patch doesn't > apply Thanks for letting me know! As mentioned above, I updated the patch. Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2022/01/24 14:33, torikoshia wrote: > As mentioned above, I updated the patch. Thanks for updating the patch! Here are another review comments: +LOG: plan of the query running on backend with PID 17793 is: This seems not the same as what actually logged. + ereport(WARNING, + (errmsg("PID %d is not a PostgreSQL server process", pid))); Like commit 7fa945b857 changed, this warning message should be "PID %d is not a PostgreSQL backend process"? + if (SendProcSignal(pid, PROCSIG_LOG_CURRENT_PLAN, InvalidBackendId) < 0) proc->backendId should be specified instead of InvalidBackendId, to speed up the processing in SendProcSignal()? + PROCSIG_LOG_CURRENT_PLAN, /* ask backend to log plan of the current query */ +volatile sig_atomic_t LogCurrentPlanPending = false; +extern void HandleLogCurrentPlanInterrupt(void); +extern void ProcessLogCurrentPlanInterrupt(void); Isn't it better to use the names that are more consistent with the function name, i.e., pg_log_query_plan? For example, PROCSIG_LOG_QUERY_PLANinstead of PROCSIG_LOG_CURRENT_PLAN? + ereport(LOG_SERVER_ONLY, + errmsg("backend with PID %d is not running a query", + MyProcPid)); errhidestmt(true) and errhidecontext(true) need to be added, don't they? Otherwise, for example, if pg_log_query_plan() isexecuted after debug_query_string is set but before ActiveQueryDesc is set, STATEMENT message would be output even thoughthe message saying "not running a query" is output. Which seems confusing. + hash_seq_init(&status, GetLockMethodLocalHash()); + while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL) + { + if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) Why did you use the search for local lock hash instead of IsPageLockHeld flag variable, to check whether a page lock is heldor not? Because there is the corner case where the interrupt is processed after the local lock is registered into thehash but before IsPageLockHeld is enabled? There is the case where the request to log a query plan is skipped even while the target backend is running a query. If thishappens, users can just retry pg_log_query_plan(). These things should be documented? + ereport(LOG_SERVER_ONLY, + errmsg("backend with PID %d is holding a page lock. Try again", + MyProcPid)); It seems more proper to output this message in DETAIL or HINT, instead. So how about something like the following messages? LOG: could not log the query plan DETAIL: query plan cannot be logged while page level lock is being held HINT: Try pg_log_query_plan() after a few .... Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2022-01-27 20:18, Fujii Masao wrote: > Here are another review comments: Thanks for reviewing! > +LOG: plan of the query running on backend with PID 17793 is: > > This seems not the same as what actually logged. Modified. > + ereport(WARNING, > + (errmsg("PID %d is not a PostgreSQL server process", pid))); > > Like commit 7fa945b857 changed, this warning message should be "PID %d > is not a PostgreSQL backend process"? Modified. > + if (SendProcSignal(pid, PROCSIG_LOG_CURRENT_PLAN, InvalidBackendId) < > 0) > > proc->backendId should be specified instead of InvalidBackendId, to > speed up the processing in SendProcSignal()? Agreed. Modified. > + PROCSIG_LOG_CURRENT_PLAN, /* ask backend to log plan of the current > query */ > +volatile sig_atomic_t LogCurrentPlanPending = false; > +extern void HandleLogCurrentPlanInterrupt(void); > +extern void ProcessLogCurrentPlanInterrupt(void); > > Isn't it better to use the names that are more consistent with the > function name, i.e., pg_log_query_plan? For example, > PROCSIG_LOG_QUERY_PLAN instead of PROCSIG_LOG_CURRENT_PLAN? Agreed. I removed 'current' from the variable and function names and used 'query' instead. > + ereport(LOG_SERVER_ONLY, > + errmsg("backend with PID %d is not running a query", > + MyProcPid)); > > errhidestmt(true) and errhidecontext(true) need to be added, don't > they? Otherwise, for example, if pg_log_query_plan() is executed after > debug_query_string is set but before ActiveQueryDesc is set, STATEMENT > message would be output even though the message saying "not running a > query" is output. Which seems confusing. Agreed. Added errhidestmt(true) and errhidecontext(true). > + hash_seq_init(&status, GetLockMethodLocalHash()); > + while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL) > + { > + if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) > > Why did you use the search for local lock hash instead of > IsPageLockHeld flag variable, to check whether a page lock is held or > not? Because there is the corner case where the interrupt is processed > after the local lock is registered into the hash but before > IsPageLockHeld is enabled? As far as I read CheckAndSetLockHeld(), IsPageLockHeld can be used only when USE_ASSERT_CHECKING is enabled. Since removing USE_ASSERT_CHECKING from CheckAndSetLockHeld() would give performance impact on every granting/removing local lock, I used the search for local local hash. > There is the case where the request to log a query plan is skipped > even while the target backend is running a query. If this happens, > users can just retry pg_log_query_plan(). These things should be > documented? Agreed. Added following: + Note that there is the case where the request to log a query + plan is skipped even while the target backend is running a + query due to lock conflict avoidance. + If this happens, users can just retry pg_log_query_plan(). | > + ereport(LOG_SERVER_ONLY, > + errmsg("backend with PID %d is holding a page lock. Try again", > + MyProcPid)); > > It seems more proper to output this message in DETAIL or HINT, > instead. So how about something like the following messages? > > LOG: could not log the query plan > DETAIL: query plan cannot be logged while page level lock is being held > HINT: Try pg_log_query_plan() after a few .... Agreed. I felt the HINT message 'after a few ...' is difficult to describe, and wrote as below. | HINT: Retrying pg_log_query_plan() might succeed since the lock duration of page level locks are usually short How do you think? -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2022/01/28 17:45, torikoshia wrote: >> There is the case where the request to log a query plan is skipped >> even while the target backend is running a query. If this happens, >> users can just retry pg_log_query_plan(). These things should be >> documented? > > Agreed. > Added following: > > + Note that there is the case where the request to log a query > + plan is skipped even while the target backend is running a > + query due to lock conflict avoidance. > + If this happens, users can just retry pg_log_query_plan(). This may cause users to misunderstand that pg_log_query_plan() fails while the target backend is holding *any* locks? Isn'tit better to mention "page-level locks", instead? So how about the following? -------------------------- Note that the request to log the query plan will be ignored if it's received during a short period while the target backendis holding a page-level lock. -------------------------- >> + ereport(LOG_SERVER_ONLY, >> + errmsg("backend with PID %d is holding a page lock. Try again", >> + MyProcPid)); >> >> It seems more proper to output this message in DETAIL or HINT, >> instead. So how about something like the following messages? >> >> LOG: could not log the query plan >> DETAIL: query plan cannot be logged while page level lock is being held >> HINT: Try pg_log_query_plan() after a few .... > > Agreed. > I felt the HINT message 'after a few ...' is difficult to describe, and wrote as below. > > | HINT: Retrying pg_log_query_plan() might succeed since the lock duration of page level locks are usually short > > How do you think? Or we don't need HINT message? + errmsg("could not log the query plan"), + errdetail("query plan cannot be logged while page level lock is being held"), In detail message, the first word of sentences should be capitalized. How about "Cannot log the query plan while holdingpage-level lock.", instead? Thanks for updating the patch! Here are some review comments. + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>pg_log_query_plan</primary> + </indexterm> This entry is placed before one for pg_log_backend_memory_contexts(). But it should be *after* that since those entries seemto be placed in alphabetical order in the table? + Requests to log the plan of the query currently running on the + backend with specified process ID along with the untruncated + query string. Other descriptions about logging of query string seem not to mention something like "untruncated query string". For example,auto_explain, log_statement, etc. Why do we need to mention "along with the untruncated query string" specially forpg_log_query_plan()? + Note that nested statements (statements executed inside a function) are not + considered for logging. Only the plan of the most deeply nested query is logged. Now the plan of even nested statement can be logged. So this description needs to be updated? @@ -440,6 +450,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc) MemoryContextSwitchTo(oldcontext); + ActiveQueryDesc = NULL; ActiveQueryDesc seems unnecessary. Why does ActiveQueryDesc need to be reset to NULL in standard_ExecutorFinish()? Currently even during ProcessLogQueryPlanInterrupt(), pg_log_query_plan() can be call and another ProcessLogQueryPlanInterrupt()can be executed. So repeatable re-entrances to ProcessLogQueryPlanInterrupt() could cause "stackdepth limit exceeded" error. To avoid this, shouldn't we make ProcessLogQueryPlanInterrupt() do nothing and returnimmediately, if it's called during another ProcessLogQueryPlanInterrupt()? pg_log_backend_memory_contexts() also might have the same issue. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Tue, 1 Feb 2022 01:51:11 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2022/01/28 17:45, torikoshia wrote: > >> There is the case where the request to log a query plan is skipped > >> even while the target backend is running a query. If this happens, > >> users can just retry pg_log_query_plan(). These things should be > >> documented? > > Agreed. > > Added following: > > + Note that there is the case where the request to log a > > query > > + plan is skipped even while the target backend is running a > > + query due to lock conflict avoidance. > > + If this happens, users can just retry pg_log_query_plan(). > > This may cause users to misunderstand that pg_log_query_plan() fails > while the target backend is holding *any* locks? Isn't it better to > mention "page-level locks", instead? So how about the following? > > -------------------------- > Note that the request to log the query plan will be ignored if it's > received during a short period while the target backend is holding a > page-level lock. > -------------------------- > > > >> + ereport(LOG_SERVER_ONLY, > >> + errmsg("backend with PID %d is holding a page > >> lock. Try again", > >> + MyProcPid)); > >> > >> It seems more proper to output this message in DETAIL or HINT, > >> instead. So how about something like the following messages? > >> > >> LOG: could not log the query plan > >> DETAIL: query plan cannot be logged while page level lock is being > >> held > >> HINT: Try pg_log_query_plan() after a few .... > > Agreed. > > I felt the HINT message 'after a few ...' is difficult to describe, > > and wrote as below. > > | HINT: Retrying pg_log_query_plan() might succeed since the lock > > | duration of page level locks are usually short > > How do you think? > > Or we don't need HINT message? > > > + errmsg("could not log the query plan"), > + errdetail("query plan cannot be logged while page level lock is > being held"), > > In detail message, the first word of sentences should be > capitalized. How about "Cannot log the query plan while holding > page-level lock.", instead? > > > Thanks for updating the patch! Here are some review comments. > > + <row> > + <entry role="func_table_entry"><para role="func_signature"> > + <indexterm> > + <primary>pg_log_query_plan</primary> > + </indexterm> > > This entry is placed before one for > pg_log_backend_memory_contexts(). But it should be *after* that since > those entries seem to be placed in alphabetical order in the table? > > > + Requests to log the plan of the query currently running on the > + backend with specified process ID along with the untruncated > + query string. > > Other descriptions about logging of query string seem not to mention > something like "untruncated query string". For example, auto_explain, > log_statement, etc. Why do we need to mention "along with the > untruncated query string" specially for pg_log_query_plan()? > > > + Note that nested statements (statements executed inside a function) > are not > + considered for logging. Only the plan of the most deeply nested > query is logged. > > Now the plan of even nested statement can be logged. So this > description needs to be updated? > > > @@ -440,6 +450,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc) > MemoryContextSwitchTo(oldcontext); > + ActiveQueryDesc = NULL; > > ActiveQueryDesc seems unnecessary. Why does ActiveQueryDesc need to be > reset to NULL in standard_ExecutorFinish()? > > > Currently even during ProcessLogQueryPlanInterrupt(), > pg_log_query_plan() can be call and another > ProcessLogQueryPlanInterrupt() can be executed. So repeatable > re-entrances to ProcessLogQueryPlanInterrupt() could cause "stack > depth limit exceeded" error. To avoid this, shouldn't we make > ProcessLogQueryPlanInterrupt() do nothing and return immediately, if > it's called during another ProcessLogQueryPlanInterrupt()? > > pg_log_backend_memory_contexts() also might have the same issue. Good catch. By the way, I'm anxious about the following part and I'd like to remove it. + * Ensure no page lock is held on this process. It seems to me what is wrong is ginInsertCleanup(), not this feature. As I read the comment for the assertion, I don't believe we want to allow CFI while holding a page lock. And AFAICS the function is the only point where doing that. (It is the alone user of LockPage()...) This is the assertion. lock.c: 902 > /* > * We don't acquire any other heavyweight lock while holding the page lock > * except for relation extension. > */ > Assert(!IsPageLockHeld || > (locktag->locktag_type == LOCKTAG_RELATION_EXTEND)); It is added by a recent commit 72e78d831ab5550c39f2dcc7cc5d44c406ec3dc2. > * Similar to relation extension, page locks are also held for a short > * duration, so imposing such a restriction won't hurt. I don't believe a path involving vacuum_delay_point() calls is short-duration'ed. https://www.postgresql.org/message-id/CAH2-WznzCPUKnOV%2Bre30_rHLCkqQWm2JTSVjTCAei9LySTc2pw%40mail.gmail.com > One thing that really bothers me about commit e2c79e14 is that > LockPage() is called, not LockBuffer(). GIN had no LockPage() calls > before that commit, and is now the only code in the entire system that > calls LockPage()/ConditionalLockPage() (the hash am no longer uses > page heavyweight locks following recent work there). I agree to the discussion. Can't we use other mechanism here to get rid of the Lockpage()? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2022/02/01 17:27, Kyotaro Horiguchi wrote: >> * Similar to relation extension, page locks are also held for a short >> * duration, so imposing such a restriction won't hurt. > > I don't believe a path involving vacuum_delay_point() calls is > short-duration'ed. Yes. >> One thing that really bothers me about commit e2c79e14 is that >> LockPage() is called, not LockBuffer(). GIN had no LockPage() calls >> before that commit, and is now the only code in the entire system that >> calls LockPage()/ConditionalLockPage() (the hash am no longer uses >> page heavyweight locks following recent work there). > > I agree to the discussion. Can't we use other mechanism here to get > rid of the Lockpage()? I have no good idea for that yet, but I agree it's better to get rid of page level lock. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Tue, 1 Feb 2022 23:11:03 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > >> One thing that really bothers me about commit e2c79e14 is that > >> LockPage() is called, not LockBuffer(). GIN had no LockPage() calls > >> before that commit, and is now the only code in the entire system that > >> calls LockPage()/ConditionalLockPage() (the hash am no longer uses > >> page heavyweight locks following recent work there). > > I agree to the discussion. Can't we use other mechanism here to get > > rid of the Lockpage()? > > I have no good idea for that yet, but I agree it's better to get rid > of page level lock. It's my turn? The page lock is used to hold-off simultaneous cleanups on the same index. ShareUpdateExclusive lock on the index relation works that way. In that path it seems like we are always holding a RowExclusive lock, so it seems to me we can use ShareUpdateExclusive for our purpose. There might be a false blocking case when another backend is holding a conflicting lock on the index. They are, Share, ShareRowExclusive, Exclusive and AccessExclusive. The last three cases don't seem worth discussion. I'm not sure about Share and Share Row cases. AFAICS Share lock is taken on an index in ATExecReplicaIdentity, and there no use of ShareRowExclusive lock on an index. It's no use discussing about explicit locking. So aren't we able to use ShareUpdateExclusive lock for that? In the attached patch, ginInsertCleanup has an extra check for such stronger locks not being held. At least "make check" doesn't cause the extra assertion to fire. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c index 7409fdc165..1af9a69abb 100644 --- a/src/backend/access/gin/ginfast.c +++ b/src/backend/access/gin/ginfast.c @@ -791,20 +791,29 @@ ginInsertCleanup(GinState *ginstate, bool full_clean, bool fsm_vac = false; Size workMemory; - /* + /*r * We would like to prevent concurrent cleanup process. For that we will * lock metapage in exclusive mode using LockPage() call. Nobody other * will use that lock for metapage, so we keep possibility of concurrent * insertion into pending list */ + /* + * we use ShareUpdateExclusive lock on this relation to hold-off concurrent + * cleanup + */ + Assert(!CheckRelationLockedByMe(index, ShareUpdateExclusiveLock, false)); + + /* tentative debug-purpose assertion for stronger locks */ + Assert(!CheckRelationLockedByMe(index, ShareLock, true)); + if (forceCleanup) { /* * We are called from [auto]vacuum/analyze or gin_clean_pending_list() * and we would like to wait concurrent cleanup to finish. */ - LockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock); + LockRelation(index, ShareUpdateExclusiveLock); workMemory = (IsAutoVacuumWorkerProcess() && autovacuum_work_mem != -1) ? autovacuum_work_mem : maintenance_work_mem; @@ -816,7 +825,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean, * just exit in hope that concurrent process will clean up pending * list. */ - if (!ConditionalLockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock)) + if (!ConditionalLockRelation(index, ShareUpdateExclusiveLock)) return; workMemory = work_mem; } @@ -830,7 +839,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean, { /* Nothing to do */ UnlockReleaseBuffer(metabuffer); - UnlockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock); + UnlockRelation(index, ShareUpdateExclusiveLock); return; } @@ -1002,7 +1011,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean, page = BufferGetPage(buffer); } - UnlockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock); + UnlockRelation(index, ShareUpdateExclusiveLock); ReleaseBuffer(metabuffer); /*
At Wed, 02 Feb 2022 16:49:57 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > So aren't we able to use ShareUpdateExclusive lock for that? > > In the attached patch, ginInsertCleanup has an extra check for such > stronger locks not being held. At least "make check" doesn't cause > the extra assertion to fire. Actually, the discussion is a bit dubious. What we need really to check is wheter such locks are not held on an index *elsewhere*. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
2022-02-01 01:51, Fujii Masao wrote: Thanks for reviewing and suggestions! >> + Note that there is the case where the request to log a >> query >> + plan is skipped even while the target backend is running a >> + query due to lock conflict avoidance. >> + If this happens, users can just retry pg_log_query_plan(). > > This may cause users to misunderstand that pg_log_query_plan() fails > while the target backend is holding *any* locks? Isn't it better to > mention "page-level locks", instead? So how about the following? > > -------------------------- > Note that the request to log the query plan will be ignored if it's > received during a short period while the target backend is holding a > page-level lock. > -------------------------- Agreed. >>> + ereport(LOG_SERVER_ONLY, >>> + errmsg("backend with PID %d is holding a page lock. >>> Try again", >>> + MyProcPid)); >>> >>> It seems more proper to output this message in DETAIL or HINT, >>> instead. So how about something like the following messages? >>> >>> LOG: could not log the query plan >>> DETAIL: query plan cannot be logged while page level lock is being >>> held >>> HINT: Try pg_log_query_plan() after a few .... >> >> Agreed. >> I felt the HINT message 'after a few ...' is difficult to describe, >> and wrote as below. >> >> | HINT: Retrying pg_log_query_plan() might succeed since the lock >> duration of page level locks are usually short >> >> How do you think? > > Or we don't need HINT message? Removed the HINT message. > + errmsg("could not log the query plan"), > + errdetail("query plan cannot be logged > while page level lock is > being held"), > > In detail message, the first word of sentences should be capitalized. > How about "Cannot log the query plan while holding page-level lock.", > instead? Agreed. > Thanks for updating the patch! Here are some review comments. > > + <row> > + <entry role="func_table_entry"><para role="func_signature"> > + <indexterm> > + <primary>pg_log_query_plan</primary> > + </indexterm> > > This entry is placed before one for pg_log_backend_memory_contexts(). > But it should be *after* that since those entries seem to be placed in > alphabetical order in the table? Modified it. > + Requests to log the plan of the query currently running on the > + backend with specified process ID along with the untruncated > + query string. > > Other descriptions about logging of query string seem not to mention > something like "untruncated query string". For example, auto_explain, > log_statement, etc. Why do we need to mention "along with the > untruncated query string" specially for pg_log_query_plan()? Modified it as below: Requests to log the plan of the query currently running on the - backend with specified process ID along with the untruncated - query string. - They will be logged at <literal>LOG</literal> message level and + backend with specified process ID. + It will be logged at <literal>LOG</literal> message level and > + Note that nested statements (statements executed inside a > function) are not > + considered for logging. Only the plan of the most deeply nested > query is logged. > > Now the plan of even nested statement can be logged. So this > description needs to be updated? Modified it as below: - Note that nested statements (statements executed inside a function) are not - considered for logging. Only the plan of the most deeply nested query is logged. + Note that when the statements are executed inside a function, only the + plan of the most deeply nested query is logged. > @@ -440,6 +450,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc) > MemoryContextSwitchTo(oldcontext); > + ActiveQueryDesc = NULL; > > ActiveQueryDesc seems unnecessary. Why does ActiveQueryDesc need to be > reset to NULL in standard_ExecutorFinish()? ActiveQueryDesc should not be reset in standard_ExecutorFinish(). Removed it. > Currently even during ProcessLogQueryPlanInterrupt(), > pg_log_query_plan() can be call and another > ProcessLogQueryPlanInterrupt() can be executed. So repeatable > re-entrances to ProcessLogQueryPlanInterrupt() could cause "stack > depth limit exceeded" error. To avoid this, shouldn't we make > ProcessLogQueryPlanInterrupt() do nothing and return immediately, if > it's called during another ProcessLogQueryPlanInterrupt()? > > pg_log_backend_memory_contexts() also might have the same issue. As you pointed out offlist, the issue could occur even when both pg_log_backend_memory_contexts and pg_log_query_plan are called and it may be better to make another patch. You also pointed out offlist that it would be necessary to call LockErrorCleanup() if ProcessLogQueryPlanInterrupt() can incur ERROR. AFAICS it can call ereport(ERROR), i.e., palloc0() in NewExplainState(), so I added PG_TRY/CATCH and make it call LockErrorCleanup() when ERROR occurs. On 2022-02-01 17:27, Kyotaro Horiguchi wrote: Thanks for reviewing Horiguchi-san! > By the way, I'm anxious about the following part and I'd like to > remove it. I also think it would be nice if it's possible. > > + * Ensure no page lock is held on this process. > > It seems to me what is wrong is ginInsertCleanup(), not this feature. > Actually, the discussion is a bit dubious. What we need really to check is wheter such locks are not held on an index *elsewhere*. Since I'm not sure how long it will take to discuss this point, the attached patch is based on the current HEAD at this time. I also think it may be better to discuss it on another thread. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On Wed, Feb 2, 2022 at 7:59 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > 2022-02-01 01:51, Fujii Masao wrote: <snip> > > + Note that nested statements (statements executed inside a > > function) are not > > + considered for logging. Only the plan of the most deeply nested > > query is logged. > > > > Now the plan of even nested statement can be logged. So this > > description needs to be updated? > > Modified it as below: > > - Note that nested statements (statements executed inside a > function) are not > - considered for logging. Only the plan of the most deeply nested > query is logged. > + Note that when the statements are executed inside a function, > only the > + plan of the most deeply nested query is logged. > Minor nit, but I think the "the" is superfluous.. ie. "Note that when statements are executed inside a function, only the plan of the most deeply nested query is logged" <snip> > On 2022-02-01 17:27, Kyotaro Horiguchi wrote: > > Thanks for reviewing Horiguchi-san! > > > By the way, I'm anxious about the following part and I'd like to > > remove it. > > I also think it would be nice if it's possible. > > > > + * Ensure no page lock is held on this process. > > > > It seems to me what is wrong is ginInsertCleanup(), not this feature. > > > Actually, the discussion is a bit dubious. What we need really to > check is wheter such locks are not held on an index *elsewhere*. > > Since I'm not sure how long it will take to discuss this point, the > attached patch is based on the current HEAD at this time. > I also think it may be better to discuss it on another thread. > While I agree on the above points, IMHO I don't believe it should be a show-stopper for adding this functionality to v15, but we have a few more commitments before we get to that point. Robert Treat https://xzilla.net
On 2022/02/02 21:59, torikoshia wrote: >> This may cause users to misunderstand that pg_log_query_plan() fails >> while the target backend is holding *any* locks? Isn't it better to >> mention "page-level locks", instead? So how about the following? >> >> -------------------------- >> Note that the request to log the query plan will be ignored if it's >> received during a short period while the target backend is holding a >> page-level lock. >> -------------------------- > > Agreed. On second thought, this note is confusing rather than helpful? Because the users don't know when and what operation needspage-level lock. So now I'm thinking it's better to remove this note. > As you pointed out offlist, the issue could occur even when both pg_log_backend_memory_contexts and pg_log_query_plan arecalled and it may be better to make another patch. OK. > You also pointed out offlist that it would be necessary to call LockErrorCleanup() if ProcessLogQueryPlanInterrupt() canincur ERROR. > AFAICS it can call ereport(ERROR), i.e., palloc0() in NewExplainState(), so I added PG_TRY/CATCH and make it call LockErrorCleanup()when ERROR occurs. As we discussed off-list, this treatment might not be necessary. Because when ERROR or FATAL error happens, AbortTransaction()is called and LockErrorCleanup() is also called inside there. > Since I'm not sure how long it will take to discuss this point, the attached patch is based on the current HEAD at thistime. Thanks for updating the patch! @@ -5048,6 +5055,12 @@ AbortSubTransaction(void) */ PG_SETMASK(&UnBlockSig); + /* + * When ActiveQueryDesc is referenced after abort, some of its elements + * are freed. To avoid accessing them, reset ActiveQueryDesc here. + */ + ActiveQueryDesc = NULL; AbortSubTransaction() should reset ActiveQueryDesc to save_ActiveQueryDesc that ExecutorRun() set, instead of NULL? OtherwiseActiveQueryDesc of top-level statement will be unavailable after subtransaction is aborted in the nested statements. For example, no plan is logged while the following "select pg_sleep(test())" is running, because the exception inside test()function aborts the subtransaction and resets ActiveQueryDesc to NULL. create or replace function test () returns int as $$ begin perform 1/0; exception when others then return 30; end; $$ language plpgsql; select pg_sleep(test()); -CREATE ROLE regress_log_memory; +CREATE ROLE regress_log; Isn't this name too generic? How about regress_log_function, for example? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Tue, 8 Feb 2022 01:13:44 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2022/02/02 21:59, torikoshia wrote: > >> This may cause users to misunderstand that pg_log_query_plan() fails > >> while the target backend is holding *any* locks? Isn't it better to > >> mention "page-level locks", instead? So how about the following? > >> > >> -------------------------- > >> Note that the request to log the query plan will be ignored if it's > >> received during a short period while the target backend is holding a > >> page-level lock. > >> -------------------------- > > Agreed. > > On second thought, this note is confusing rather than helpful? Because > the users don't know when and what operation needs page-level lock. So > now I'm thinking it's better to remove this note. *I* agree to removing the note. And the following error message looks as mysterious as the note is, and the DETAIL doesn't help.. ereport(LOG_SERVER_ONLY, + errmsg("could not log the query plan"), + errdetail("Cannot log the query plan while holding page-level lock.")); + hash_seq_term(&status); We should tell the command can be retried soon, like this? "LOG: ignored request for logging query plan due to lock confilcts" "HINT: You can try again in a moment." regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2022-02-03 17:09, Robert Treat wrote: > On Wed, Feb 2, 2022 at 7:59 AM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> >> 2022-02-01 01:51, Fujii Masao wrote: > <snip> >> > + Note that nested statements (statements executed inside a >> > function) are not >> > + considered for logging. Only the plan of the most deeply nested >> > query is logged. >> > >> > Now the plan of even nested statement can be logged. So this >> > description needs to be updated? >> >> Modified it as below: >> >> - Note that nested statements (statements executed inside a >> function) are not >> - considered for logging. Only the plan of the most deeply >> nested >> query is logged. >> + Note that when the statements are executed inside a >> function, >> only the >> + plan of the most deeply nested query is logged. >> > > Minor nit, but I think the "the" is superfluous.. ie. > > "Note that when statements are executed inside a function, > only the plan of the most deeply nested query is logged" Thanks! Modified it. On 2022-02-08 01:13, Fujii Masao wrote: Thanks for the comments! > On 2022/02/02 21:59, torikoshia wrote: >>> This may cause users to misunderstand that pg_log_query_plan() fails >>> while the target backend is holding *any* locks? Isn't it better to >>> mention "page-level locks", instead? So how about the following? >>> >>> -------------------------- >>> Note that the request to log the query plan will be ignored if it's >>> received during a short period while the target backend is holding a >>> page-level lock. >>> -------------------------- >> >> Agreed. > > On second thought, this note is confusing rather than helpful? Because > the users don't know when and what operation needs page-level lock. So > now I'm thinking it's better to remove this note. Removed it. > > >> As you pointed out offlist, the issue could occur even when both >> pg_log_backend_memory_contexts and pg_log_query_plan are called and it >> may be better to make another patch. > > OK. > > >> You also pointed out offlist that it would be necessary to call >> LockErrorCleanup() if ProcessLogQueryPlanInterrupt() can incur ERROR. >> AFAICS it can call ereport(ERROR), i.e., palloc0() in >> NewExplainState(), so I added PG_TRY/CATCH and make it call >> LockErrorCleanup() when ERROR occurs. > > As we discussed off-list, this treatment might not be necessary. > Because when ERROR or FATAL error happens, AbortTransaction() is > called and LockErrorCleanup() is also called inside there. Agreed. >> Since I'm not sure how long it will take to discuss this point, the >> attached patch is based on the current HEAD at this time. > > Thanks for updating the patch! > > @@ -5048,6 +5055,12 @@ AbortSubTransaction(void) > */ > PG_SETMASK(&UnBlockSig); > + /* > + * When ActiveQueryDesc is referenced after abort, some of its > elements > + * are freed. To avoid accessing them, reset ActiveQueryDesc here. > + */ > + ActiveQueryDesc = NULL; > > AbortSubTransaction() should reset ActiveQueryDesc to > save_ActiveQueryDesc that ExecutorRun() set, instead of NULL? > Otherwise ActiveQueryDesc of top-level statement will be unavailable > after subtransaction is aborted in the nested statements. > > For example, no plan is logged while the following "select > pg_sleep(test())" is running, because the exception inside test() > function aborts the subtransaction and resets ActiveQueryDesc to NULL. > > create or replace function test () returns int as $$ > begin > perform 1/0; > exception when others then > return 30; > end; > $$ language plpgsql; > > select pg_sleep(test()); Agreed. BTW, since the above example results in calling ExecutorRun() only once, the output didn't differ even after ActiveQueryDesc is reset to save_ActiveQueryDesc. The below definition of test() worked as expected. create or replace function test () returns int as $$ begin perform 1; perform 1/0; exception when others then return 30; end; $$ language plpgsql; > > > -CREATE ROLE regress_log_memory; > +CREATE ROLE regress_log; > > Isn't this name too generic? How about regress_log_function, for > example? Agreed. On 2022-02-08 17:18, Kyotaro Horiguchi wrote: > At Tue, 8 Feb 2022 01:13:44 +0900, Fujii Masao > <masao.fujii@oss.nttdata.com> wrote in >> >> >> On 2022/02/02 21:59, torikoshia wrote: >> >> This may cause users to misunderstand that pg_log_query_plan() fails >> >> while the target backend is holding *any* locks? Isn't it better to >> >> mention "page-level locks", instead? So how about the following? >> >> >> >> -------------------------- >> >> Note that the request to log the query plan will be ignored if it's >> >> received during a short period while the target backend is holding a >> >> page-level lock. >> >> -------------------------- >> > Agreed. >> >> On second thought, this note is confusing rather than helpful? Because >> the users don't know when and what operation needs page-level lock. So >> now I'm thinking it's better to remove this note. > > *I* agree to removing the note. And the following error message looks > as mysterious as the note is, and the DETAIL doesn't help.. > > ereport(LOG_SERVER_ONLY, > + errmsg("could not log the query plan"), > + errdetail("Cannot log the query plan while holding page-level > lock.")); > + hash_seq_term(&status); > > We should tell the command can be retried soon, like this? > > "LOG: ignored request for logging query plan due to lock confilcts" > "HINT: You can try again in a moment." Thanks, it seems better. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2022/02/09 0:12, torikoshia wrote: > BTW, since the above example results in calling ExecutorRun() only once, the output didn't differ even after ActiveQueryDescis reset to save_ActiveQueryDesc. > > The below definition of test() worked as expected. > > create or replace function test () returns int as $$ > begin > perform 1; > perform 1/0; > exception when others then > return 30; > end; > $$ language plpgsql; So in this case ActiveQueryDesc set by ExecutorRun() called only once is still valid even when AbortSubTransaction() is called.That is, that ActiveQueryDesc should NOT be reset to save_ActiveQueryDesc in this case, should it? OTOH, in your example, ActiveQueryDesc set by the second call to ExecutorRun() should be reset in AbortSubTransaction().Then ActiveQueryDesc set by the first call should be valid. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2022-02-08 01:13, Fujii Masao wrote: > AbortSubTransaction() should reset ActiveQueryDesc to > save_ActiveQueryDesc that ExecutorRun() set, instead of NULL? > Otherwise ActiveQueryDesc of top-level statement will be unavailable > after subtransaction is aborted in the nested statements. I once agreed above suggestion and made v20 patch making save_ActiveQueryDesc a global variable, but it caused segfault when calling pg_log_query_plan() after FreeQueryDesc(). OTOH, doing some kind of reset of ActiveQueryDesc seems necessary since it also caused segfault when running pg_log_query_plan() during installcheck. There may be a better way, but resetting ActiveQueryDesc to NULL seems safe and simple. Of course it makes pg_log_query_plan() useless after a subtransaction is aborted. However, if it does not often happen that people want to know the running query's plan whose subtransaction is aborted, resetting ActiveQueryDesc to NULL would be acceptable. Attached is a patch that sets ActiveQueryDesc to NULL when a subtransaction is aborted. How do you think? -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2022-03-09 19:04, torikoshia wrote: > On 2022-02-08 01:13, Fujii Masao wrote: >> AbortSubTransaction() should reset ActiveQueryDesc to >> save_ActiveQueryDesc that ExecutorRun() set, instead of NULL? >> Otherwise ActiveQueryDesc of top-level statement will be unavailable >> after subtransaction is aborted in the nested statements. > > I once agreed above suggestion and made v20 patch making > save_ActiveQueryDesc a global variable, but it caused segfault when > calling pg_log_query_plan() after FreeQueryDesc(). > > OTOH, doing some kind of reset of ActiveQueryDesc seems necessary > since it also caused segfault when running pg_log_query_plan() during > installcheck. > > There may be a better way, but resetting ActiveQueryDesc to NULL seems > safe and simple. > Of course it makes pg_log_query_plan() useless after a subtransaction > is aborted. > However, if it does not often happen that people want to know the > running query's plan whose subtransaction is aborted, resetting > ActiveQueryDesc to NULL would be acceptable. > > Attached is a patch that sets ActiveQueryDesc to NULL when a > subtransaction is aborted. > > How do you think? Attached new patch to fix patch apply failures. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2022-05-16 17:02, torikoshia wrote: > On 2022-03-09 19:04, torikoshia wrote: >> On 2022-02-08 01:13, Fujii Masao wrote: >>> AbortSubTransaction() should reset ActiveQueryDesc to >>> save_ActiveQueryDesc that ExecutorRun() set, instead of NULL? >>> Otherwise ActiveQueryDesc of top-level statement will be unavailable >>> after subtransaction is aborted in the nested statements. >> >> I once agreed above suggestion and made v20 patch making >> save_ActiveQueryDesc a global variable, but it caused segfault when >> calling pg_log_query_plan() after FreeQueryDesc(). >> >> OTOH, doing some kind of reset of ActiveQueryDesc seems necessary >> since it also caused segfault when running pg_log_query_plan() during >> installcheck. >> >> There may be a better way, but resetting ActiveQueryDesc to NULL seems >> safe and simple. >> Of course it makes pg_log_query_plan() useless after a subtransaction >> is aborted. >> However, if it does not often happen that people want to know the >> running query's plan whose subtransaction is aborted, resetting >> ActiveQueryDesc to NULL would be acceptable. >> >> Attached is a patch that sets ActiveQueryDesc to NULL when a >> subtransaction is aborted. >> >> How do you think? Attached new patch to fix patch apply failures again. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 2022-09-19 17:47, Алена Рыбакина wrote: Thanks for your review and comments! > Hi, > > I'm sorry,if this message is duplicated previous this one, but I'm not > sure that the previous message is sent correctly. I sent it from email > address a.rybakina@postgrespro.ru and I couldn't send this one email > from those address. I've successfully received your mail from both a.rybakina@postgrespro.ru and lena.ribackina@yandex.ru. > I like idea to create patch for logging query plan. After reviewing > this code and notice some moments and I'd rather ask you some > questions. > > Firstly, I suggest some editing in the comment of commit. I think, it > is > turned out the more laconic and the same clear. I wrote it below since > I > can't think of any other way to add it. > ``` > Currently, we have to wait for finishing of the query execution to > check > its plan. > This is not so convenient in investigation long-running queries on > production > environments where we cannot use debuggers. > To improve this situation there is proposed the patch containing the > pg_log_query_plan() > function which request to log plan of the specified backend process. > By default, only superusers are allowed to request log of the plan > otherwise > allowing any users to issue this request could create cause lots of log > messages > and it can lead to denial of service. > > At the next requesting CHECK_FOR_INTERRUPTS(), the target backend logs > its plan at > LOG_SERVER_ONLY level and therefore this plan will appear in the server > log only, > not to be sent to the client. Thanks, I have incorporated your comments. Since the latter part of the original message comes from the commit message of pg_log_backend_memory_contexts(43620e328617c), so I left it as it was for consistency. > Secondly, I have question about deleting USE_ASSERT_CHECKING in lock.h? > It supposed to have been checked in another placed of the code by > matching values. I am worry about skipping errors due to untesting with > assert option in the places where it (GetLockMethodLocalHash) > participates and we won't able to get core file in segfault cases. I > might not understand something, then can you please explain to me? Since GetLockMethodLocalHash() is only used for assertions, this is only defined when USE_ASSERT_CHECKING is enabled. This patch uses GetLockMethodLocalHash() not only for the assertion purpose, so I removed "ifdef USE_ASSERT_CHECKING" for this function. I belive it does not lead to skip errors. > Thirdly, I have incomprehension of the point why save_ActiveQueryDesc > is > declared in the pquery.h? I am seemed to save_ActiveQueryDesc to be > used > in an once time in the ExecutorRun function and its declaration > superfluous. I added it in the attached patch. Exactly. > Fourthly, it seems to me there are not enough explanatory comments in > the code. I also added them in the attached patch. Thanks! | + /* | + * Save value of ActiveQueryDesc before having called | + * ExecutorRun_hook function due to having reset by | + * AbortSubTransaction. | + */ | + | save_ActiveQueryDesc = ActiveQueryDesc; | ActiveQueryDesc = queryDesc; | | @@ -314,6 +320,7 @@ ExecutorRun(QueryDesc *queryDesc, | else | standard_ExecutorRun(queryDesc, direction, count, execute_once); | | + /* We set the actual value of ActiveQueryDesc */ | ActiveQueryDesc = save_ActiveQueryDesc; Since these processes are needed for nested queries, not only for AbortSubTransaction[1], added comments on it. | +/* Function to handle the signal to output the query plan. */ | extern void HandleLogQueryPlanInterrupt(void); I feel this comment is unnecessary since the explanation of HandleLogQueryPlanInterrupt() is written in explain.c and no functions in explain.h have comments in it. > Lastly, I have incomprehension about handling signals since have been > unused it before. Could another signal disabled calling this signal to > log query plan? I noticed this signal to be checked the latest in > procsignal_sigusr1_handler function. Are you concerned that one signal will not be processed when multiple signals are sent in succession? AFAIU both of them are processed since SendProcSignal flags ps_signalFlags for each signal. ``` SendProcSignal(pid_t pid, ProcSignalReason reason, BackendId backendId) { volatile ProcSignalSlot *slot; ...(snip)... 278 if (slot->pss_pid == pid) 279 { 280 /* Atomically set the proper flag */ 281 slot->pss_signalFlags[reason] = true; 282 /* Send signal */ 283 return kill(pid, SIGUSR1); ``` Comments of ProcSignalReason also say 'We can cope with concurrent signals for different reasons'. ```C /* * Reasons for signaling a Postgres child process (a backend or an auxiliary * process, like checkpointer). We can cope with concurrent signals for different * reasons. However, if the same reason is signaled multiple times in quick * succession, the process is likely to observe only one notification of it. * This is okay for the present uses. ... typedef enum { PROCSIG_CATCHUP_INTERRUPT, /* sinval catchup interrupt */ PROCSIG_NOTIFY_INTERRUPT, /* listen/notify interrupt */ PROCSIG_PARALLEL_MESSAGE, /* message from cooperating parallel backend */ PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */ PROCSIG_BARRIER, /* global barrier interrupt */ PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */ PROCSIG_LOG_QUERY_PLAN, /* ask backend to log plan of the current query */ ... } ProcSignalReason; ``` [1] https://www.postgresql.org/message-id/8b53b32f-26cc-0531-4ac0-27310e0bef4b%40oss.nttdata.com -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
Ok, I get it.
Since GetLockMethodLocalHash() is only used for assertions, this is only defined when USE_ASSERT_CHECKING is enabled. This patch uses GetLockMethodLocalHash() not only for the assertion purpose, so I removed "ifdef USE_ASSERT_CHECKING" for this function. I belive it does not lead to skip errors.Agree.
Since these processes are needed for nested queries, not only for AbortSubTransaction[1], added comments on it.
I also noticed it. However I also discovered that above function declarations to be aplied for explain command and used to be printed details of the executed query.
We have a similar task to print the plan of an interrupted process making a request for a specific pid.
In short, I think, this task is different and for separating these parts I added this comment.
I feel this comment is unnecessary since the explanation of HandleLogQueryPlanInterrupt() is written in explain.c and no functions in explain.h have comments in it.
Yes, I was worried about it. I understood it, thank for explaining.
AFAIU both of them are processed since SendProcSignal flags ps_signalFlags for each signal.
```
SendProcSignal(pid_t pid, ProcSignalReason reason, BackendId backendId)
{
volatile ProcSignalSlot *slot;
...(snip)...
278 if (slot->pss_pid == pid)
279 {
280 /* Atomically set the proper flag */
281 slot->pss_signalFlags[reason] = true;
282 /* Send signal */
283 return kill(pid, SIGUSR1);
```
Comments of ProcSignalReason also say 'We can cope with concurrent signals for different reasons'.
```C
/*
* Reasons for signaling a Postgres child process (a backend or an auxiliary
* process, like checkpointer). We can cope with concurrent signals for different
* reasons. However, if the same reason is signaled multiple times in quick
* succession, the process is likely to observe only one notification of it.
* This is okay for the present uses.
...
typedef enum
{
PROCSIG_CATCHUP_INTERRUPT, /* sinval catchup interrupt */
PROCSIG_NOTIFY_INTERRUPT, /* listen/notify interrupt */
PROCSIG_PARALLEL_MESSAGE, /* message from cooperating parallel backend */
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for shutdown */
PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_LOG_QUERY_PLAN, /* ask backend to log plan of the current query */
...
} ProcSignalReason;
```
[1] https://www.postgresql.org/message-id/8b53b32f-26cc-0531-4ac0-27310e0bef4b%40oss.nttdata.com
On 2022-09-21 17:30, Alena Rybakina wrote: Thanks for your reply! > I also noticed it. However I also discovered that above function > declarations to be aplied for explain command and used to be printed > details of the executed query. > > We have a similar task to print the plan of an interrupted process > making a request for a specific pid. > > In short, I think, this task is different and for separating these > parts I added this comment. I'm not sure I understand your comment correctly, do you mean HandleLogQueryPlanInterrupt() should not be placed in explain.c? It may be so. However, given that ProcesLogMemoryContextInterrupt(), which similarly handles interrupts for pg_log_backend_memory_contexts(), is located in mcxt.c, I also think current location might be acceptable. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Sorry, I wrote confusingly at that time. No, I suggested adding comment about the explanation of HandleLogQueryPlanInterrupt() only in the explain.h and not removing from the explain.c. I seemed to be necessary separating declaration function for 'explaining feature' of executed query from our logging plan of the running query interrupts function. But now, I doubt it. > I'm not sure I understand your comment correctly, do you mean > HandleLogQueryPlanInterrupt() should not be placed in explain.c? Thank you for having reminded about this function and I looked at ProcessLogMemoryContextInterrupt() declaration. I'm noticed comments in the memutils.h are missed tooю Description of this function is written only in mcxt.c. > However, given that ProcesLogMemoryContextInterrupt(), which similarly > handles interrupts for pg_log_backend_memory_contexts(), is located in > mcxt.c, I also think current location might be acceptable. So I think you are right and the comment about the explanation of HandleLogQueryPlanInterrupt() written in explain.h is redundant. > I feel this comment is unnecessary since the explanation of > HandleLogQueryPlanInterrupt() is written in explain.c and no functions > in explain.h have comments in it. Regards, -- Alena Rybakina Postgres Professional
Hi, This patch does not currently build, due to a conflicting oid: https://cirrus-ci.com/task/4638460594618368?logs=build#L108 [17:26:44.602] /usr/bin/perl ../src/include/catalog/../../backend/catalog/genbki.pl --include-path=../src/include --set-version=16--output=src/include/catalog ../src/include/catalog/pg_proc.h ../src/include/catalog/pg_type.h ../src/include/catalog/pg_attribute.h../src/include/catalog/pg_class.h ../src/include/catalog/pg_attrdef.h ../src/include/catalog/pg_constraint.h../src/include/catalog/pg_inherits.h ../src/include/catalog/pg_index.h ../src/include/catalog/pg_operator.h../src/include/catalog/pg_opfamily.h ../src/include/catalog/pg_opclass.h ../src/include/catalog/pg_am.h../src/include/catalog/pg_amop.h ../src/include/catalog/pg_amproc.h ../src/include/catalog/pg_language.h../src/include/catalog/pg_largeobject_metadata.h ../src/include/catalog/pg_largeobject.h../src/include/catalog/pg_aggregate.h ../src/include/catalog/pg_statistic.h ../src/include/catalog/pg_statistic_ext.h../src/include/catalog/pg_statistic_ext_data.h ../src/include/catalog/pg_rewrite.h../src/include/catalog/pg_trigger.h ../src/include/catalog/pg_event_trigger.h ../src/include/catalog/pg_description.h../src/include/catalog/pg_cast.h ../src/include/catalog/pg_enum.h ../src/include/catalog/pg_namespace.h../src/include/catalog/pg_conversion.h ../src/include/catalog/pg_depend.h ../src/include/catalog/pg_database.h../src/include/catalog/pg_db_role_setting.h ../src/include/catalog/pg_tablespace.h ../src/include/catalog/pg_authid.h../src/include/catalog/pg_auth_members.h ../src/include/catalog/pg_shdepend.h ../src/include/catalog/pg_shdescription.h../src/include/catalog/pg_ts_config.h ../src/include/catalog/pg_ts_config_map.h../src/include/catalog/pg_ts_dict.h ../src/include/catalog/pg_ts_parser.h ../src/include/catalog/pg_ts_template.h../src/include/catalog/pg_extension.h ../src/include/catalog/pg_foreign_data_wrapper.h../src/include/catalog/pg_foreign_server.h ../src/include/catalog/pg_user_mapping.h../src/include/catalog/pg_foreign_table.h ../src/include/catalog/pg_policy.h ../src/include/catalog/pg_replication_origin.h../src/include/catalog/pg_default_acl.h ../src/include/catalog/pg_init_privs.h../src/include/catalog/pg_seclabel.h ../src/include/catalog/pg_shseclabel.h ../src/include/catalog/pg_collation.h../src/include/catalog/pg_parameter_acl.h ../src/include/catalog/pg_partitioned_table.h../src/include/catalog/pg_range.h ../src/include/catalog/pg_transform.h ../src/include/catalog/pg_sequence.h../src/include/catalog/pg_publication.h ../src/include/catalog/pg_publication_namespace.h../src/include/catalog/pg_publication_rel.h ../src/include/catalog/pg_subscription.h../src/include/catalog/pg_subscription_rel.h [17:26:44.602] Duplicate OIDs detected: [17:26:44.602] 4550 [17:26:44.602] found 1 duplicate OID(s) in catalog data I suggest you choose a random oid out of the "development purposes" range: * OIDs 1-9999 are reserved for manual assignment (see .dat files in * src/include/catalog/). Of these, 8000-9999 are reserved for * development purposes (such as in-progress patches and forks); * they should not appear in released versions. Greetings, Andres Freund
On 2022-12-07 03:41, Andres Freund wrote: > Hi, > > This patch does not currently build, due to a conflicting oid: > > I suggest you choose a random oid out of the "development purposes" > range: Thanks for your advice! Attached updated patch. BTW, since this patch depends on ProcessInterrupts() and EXPLAIN codes which is used in auto_explain, I'm feeling that the following discussion also applies to this patch. > -- > https://www.postgresql.org/message-id/CA%2BTgmoYW_rSOW4JMQ9_0Df9PKQ%3DsQDOKUGA4Gc9D8w4wui8fSA%40mail.gmail.com > > explaining a query is a pretty > complicated operation that involves catalog lookups and lots of > complicated stuff, and I don't think that it would be safe to do all > of that at any arbitrary point in the code where ProcessInterrupts() > happened to fire. If I can't come up with some workaround during the next Commitfest, I'm going to withdraw this proposal. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
Hello, Thanks for working on this patch! On Thu, Dec 8, 2022 at 12:10 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > BTW, since this patch depends on ProcessInterrupts() and EXPLAIN codes > which is used in auto_explain, I'm feeling that the following discussion > also applies to this patch. > > > -- > > https://www.postgresql.org/message-id/CA%2BTgmoYW_rSOW4JMQ9_0Df9PKQ%3DsQDOKUGA4Gc9D8w4wui8fSA%40mail.gmail.com > > > > explaining a query is a pretty > > complicated operation that involves catalog lookups and lots of > > complicated stuff, and I don't think that it would be safe to do all > > of that at any arbitrary point in the code where ProcessInterrupts() > > happened to fire. > > If I can't come up with some workaround during the next Commitfest, I'm > going to withdraw this proposal. While at PGCon this week I'd brought up this idea with a few people without realizing you'd already worked on it previously, and then after I discovered this thread several of us (Greg, Ronan, David, Heikki, and myself -- all cc'd) discussed the safety concerns over dinner last night. Our conclusion was that all of the concerns we could come up with (for example, walking though the code for ExplainTargetRel() and discussing the relevant catalog and syscache accesses) all applied specifically to Robert's concerns about running explain inside an aborted transaction. After all, I'd originally started that thread to ask about running auto-explain after a query timeout. To put it positively: we believe that, for example, catalog accesses inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside an existing valid transaction/query state, as it would be for this patch -- are safe. If there were problems, then those problems are likely bugs we already have in other CFI cases. Another concern Robert raised may apply here: what if a person tries to cancel a query when we're explaining? I believe that's a reasonable question to ask, but I believe it's a trade-off that's worth making for the (significant) introspection benefits this patch would provide. On to the patch itself: overall I think it looks like it's in pretty good shape. I also noticed we don't have any tests (I assume it'd have to be TAP tests) of the actual output happening, and I think it would be worth adding that. Are you interested in re-opening this patch? I'd be happy to provide further review and help to try to push this along. I've rebased the patch and attached as v26. Thanks, James Coleman
Attachment
On 2023-06-03 02:51, James Coleman wrote: > Hello, > > Thanks for working on this patch! > > On Thu, Dec 8, 2022 at 12:10 AM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> >> BTW, since this patch depends on ProcessInterrupts() and EXPLAIN codes >> which is used in auto_explain, I'm feeling that the following >> discussion >> also applies to this patch. >> >> > -- >> > https://www.postgresql.org/message-id/CA%2BTgmoYW_rSOW4JMQ9_0Df9PKQ%3DsQDOKUGA4Gc9D8w4wui8fSA%40mail.gmail.com >> > >> > explaining a query is a pretty >> > complicated operation that involves catalog lookups and lots of >> > complicated stuff, and I don't think that it would be safe to do all >> > of that at any arbitrary point in the code where ProcessInterrupts() >> > happened to fire. >> >> If I can't come up with some workaround during the next Commitfest, >> I'm >> going to withdraw this proposal. > > While at PGCon this week I'd brought up this idea with a few people > without realizing you'd already worked on it previously, and then > after I discovered this thread several of us (Greg, Ronan, David, > Heikki, and myself -- all cc'd) discussed the safety concerns over > dinner last night. > > Our conclusion was that all of the concerns we could come up with (for > example, walking though the code for ExplainTargetRel() and discussing > the relevant catalog and syscache accesses) all applied specifically > to Robert's concerns about running explain inside an aborted > transaction. After all, I'd originally started that thread to ask > about running auto-explain after a query timeout. > > To put it positively: we believe that, for example, catalog accesses > inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside > an existing valid transaction/query state, as it would be for this > patch -- are safe. If there were problems, then those problems are > likely bugs we already have in other CFI cases. Thanks a lot for the discussion and sharing it! I really appreciate it. BTW I'm not sure whether all the CFI are called in valid transaction, do you think we should check each of them? > Another concern Robert raised may apply here: what if a person tries > to cancel a query when we're explaining? I believe that's a reasonable > question to ask, but I believe it's a trade-off that's worth making > for the (significant) introspection benefits this patch would provide. Is the concern here limited to the case where explain code goes crazy as Robert pointed out? If so, this may be a trade-off worth doing. I am a little concerned about whether there will be cases where the explain code is not problematic but just takes long time. > On to the patch itself: overall I think it looks like it's in pretty > good shape. I also noticed we don't have any tests (I assume it'd have > to be TAP tests) of the actual output happening, and I think it would > be worth adding that. Checking the log output may be better, but I didn't add it since there is only a little content that can be checked, and similar function pg_log_backend_memory_contexts() does not do such type of tests. > Are you interested in re-opening this patch? I'd be happy to provide > further review and help to try to push this along. Sure, I'm going to re-open this. > I've rebased the patch and attached as v26. Thanks again for your work! -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
On Mon, Jun 5, 2023 at 4:30 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2023-06-03 02:51, James Coleman wrote: > > Hello, > > > > Thanks for working on this patch! Sure thing! I'm *very interested* in seeing this available, and I think it paves the way for some additional features later on... > > On Thu, Dec 8, 2022 at 12:10 AM torikoshia <torikoshia@oss.nttdata.com> > ... > > To put it positively: we believe that, for example, catalog accesses > > inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside > > an existing valid transaction/query state, as it would be for this > > patch -- are safe. If there were problems, then those problems are > > likely bugs we already have in other CFI cases. > > Thanks a lot for the discussion and sharing it! > I really appreciate it. > > BTW I'm not sure whether all the CFI are called in valid transaction, > do you think we should check each of them? I kicked off the regressions tests with a call to ProcessLogQueryPlanInterrupt() in every single CHECK_FOR_INTERRUPTS() call. Several hours and 52 GB of logs later I have confirmed that (with the attached revision) at the very least the regression test suite can't trigger any kind of failures regardless of when we trigger this. The existing code in the patch for only running the explain when there's an active query handling that. > > Another concern Robert raised may apply here: what if a person tries > > to cancel a query when we're explaining? I believe that's a reasonable > > question to ask, but I believe it's a trade-off that's worth making > > for the (significant) introspection benefits this patch would provide. > > Is the concern here limited to the case where explain code goes crazy > as Robert pointed out? > If so, this may be a trade-off worth doing. > I am a little concerned about whether there will be cases where the > explain code is not problematic but just takes long time. The explain code could take a long time in some rare cases (e.g., we discovered a bug a few years back with the planning code that actually descends an index to find the max value), but I think the trade-off is worth it. > > On to the patch itself: overall I think it looks like it's in pretty > > good shape. I also noticed we don't have any tests (I assume it'd have > > to be TAP tests) of the actual output happening, and I think it would > > be worth adding that. > > Checking the log output may be better, but I didn't add it since there > is only a little content that can be checked, and similar function > pg_log_backend_memory_contexts() does not do such type of tests. Fair enough. I still think that would be an improvement here, but others could also weigh in. > > Are you interested in re-opening this patch? I'd be happy to provide > > further review and help to try to push this along. > Sure, I'm going to re-open this. I've attached v27. The important change here in 0001 is that it guarantees the interrupt handler is re-entrant, since that was a bug exposed by my testing. I've also included 0002 which is only meant for testing (it attempts to log in the plan in every CHECK_FOR_INTERRUPTS() call). Regards, James
Attachment
On 2023-06-06 03:26, James Coleman wrote: > On Mon, Jun 5, 2023 at 4:30 AM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> >> On 2023-06-03 02:51, James Coleman wrote: >> > Hello, >> > >> > Thanks for working on this patch! > > Sure thing! I'm *very interested* in seeing this available, and I > think it paves the way for some additional features later on... > >> > On Thu, Dec 8, 2022 at 12:10 AM torikoshia <torikoshia@oss.nttdata.com> >> ... >> > To put it positively: we believe that, for example, catalog accesses >> > inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside >> > an existing valid transaction/query state, as it would be for this >> > patch -- are safe. If there were problems, then those problems are >> > likely bugs we already have in other CFI cases. >> >> Thanks a lot for the discussion and sharing it! >> I really appreciate it. >> >> BTW I'm not sure whether all the CFI are called in valid transaction, >> do you think we should check each of them? > > I kicked off the regressions tests with a call to > ProcessLogQueryPlanInterrupt() in every single CHECK_FOR_INTERRUPTS() > call. Several hours and 52 GB of logs later I have confirmed that > (with the attached revision) at the very least the regression test > suite can't trigger any kind of failures regardless of when we trigger > this. The existing code in the patch for only running the explain when > there's an active query handling that. Thanks for the testing! > I've attached v27. The important change here in 0001 is that it > guarantees the interrupt handler is re-entrant, since that was a bug > exposed by my testing. I've also included 0002 which is only meant for > testing (it attempts to log in the plan in every > CHECK_FOR_INTERRUPTS() call). When SIGINT is sent during ProcessLogQueryPlanInterrupt(), ProcessLogQueryPlanInterruptActive can remain true. After that, pg_log_query_plan() does nothing but just returns. AFAIU, v26 logs plan for each pg_log_query_plan() even when another pg_log_query_plan() is running, but it doesn't cause errors or critical problem. Considering the problem solved and introduced by v27, I think v26 might be fine. How do you think? > > Regards, > James -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
On Sun, Jun 11, 2023 at 11:07 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2023-06-06 03:26, James Coleman wrote: > > On Mon, Jun 5, 2023 at 4:30 AM torikoshia <torikoshia@oss.nttdata.com> > > wrote: > >> > >> On 2023-06-03 02:51, James Coleman wrote: > >> > Hello, > >> > > >> > Thanks for working on this patch! > > > > Sure thing! I'm *very interested* in seeing this available, and I > > think it paves the way for some additional features later on... > > > >> > On Thu, Dec 8, 2022 at 12:10 AM torikoshia <torikoshia@oss.nttdata.com> > >> ... > >> > To put it positively: we believe that, for example, catalog accesses > >> > inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside > >> > an existing valid transaction/query state, as it would be for this > >> > patch -- are safe. If there were problems, then those problems are > >> > likely bugs we already have in other CFI cases. > >> > >> Thanks a lot for the discussion and sharing it! > >> I really appreciate it. > >> > >> BTW I'm not sure whether all the CFI are called in valid transaction, > >> do you think we should check each of them? > > > > I kicked off the regressions tests with a call to > > ProcessLogQueryPlanInterrupt() in every single CHECK_FOR_INTERRUPTS() > > call. Several hours and 52 GB of logs later I have confirmed that > > (with the attached revision) at the very least the regression test > > suite can't trigger any kind of failures regardless of when we trigger > > this. The existing code in the patch for only running the explain when > > there's an active query handling that. > > Thanks for the testing! > > > I've attached v27. The important change here in 0001 is that it > > guarantees the interrupt handler is re-entrant, since that was a bug > > exposed by my testing. I've also included 0002 which is only meant for > > testing (it attempts to log in the plan in every > > CHECK_FOR_INTERRUPTS() call). > > When SIGINT is sent during ProcessLogQueryPlanInterrupt(), > ProcessLogQueryPlanInterruptActive can remain true. > After that, pg_log_query_plan() does nothing but just returns. > > AFAIU, v26 logs plan for each pg_log_query_plan() even when another > pg_log_query_plan() is running, but it doesn't cause errors or critical > problem. > > Considering the problem solved and introduced by v27, I think v26 might > be fine. > How do you think? The testing I did with calling this during every CFI is what uncovered the re-entrancy problem. IIRC (without running that test again) the problem was a stack overflow. Now, to be sure this is a particularly degenerate case because in real-world usage it'd be impossible in practice, I think, to trigger that many calls to this function (and by extension the interrupt handler). If SIGINT is the only concern we could reset ProcessLogQueryPlanInterruptActive in error handling code. I admit that part of my thought process here is thinking ahead to an additional patch I'd like to see on top of this, which is logging a query plan before cleaning up when statement timeout occurs. The re-entrancy issue becomes more interesting then, I think, since we would then have automated calling of the logging code. BTW: I'd thought that would make a nice follow-up patch for this, but if you'd prefer I could add it as another patch in the series here. What do you think about resetting the flag versus just not having it? Regards, James Coleman
On 2023-06-13 00:52, James Coleman wrote: >> >> > I've attached v27. The important change here in 0001 is that it >> > guarantees the interrupt handler is re-entrant, since that was a bug >> > exposed by my testing. I've also included 0002 which is only meant for >> > testing (it attempts to log in the plan in every >> > CHECK_FOR_INTERRUPTS() call). >> >> When SIGINT is sent during ProcessLogQueryPlanInterrupt(), >> ProcessLogQueryPlanInterruptActive can remain true. >> After that, pg_log_query_plan() does nothing but just returns. >> >> AFAIU, v26 logs plan for each pg_log_query_plan() even when another >> pg_log_query_plan() is running, but it doesn't cause errors or >> critical >> problem. >> >> Considering the problem solved and introduced by v27, I think v26 >> might >> be fine. >> How do you think? > > The testing I did with calling this during every CFI is what uncovered > the re-entrancy problem. IIRC (without running that test again) the > problem was a stack overflow. Now, to be sure this is a particularly > degenerate case because in real-world usage it'd be impossible in > practice, I think, to trigger that many calls to this function (and by > extension the interrupt handler). Yeah.In addition, currently only superusers are allowed to execute pg_log_query_plan(), I think we don't need to think about cases where users are malicious. > If SIGINT is the only concern we could reset > ProcessLogQueryPlanInterruptActive in error handling code. I admit > that part of my thought process here is thinking ahead to an > additional patch I'd like to see on top of this, which is logging a > query plan before cleaning up when statement timeout occurs. I remember this is what you wanted do.[1] > The > re-entrancy issue becomes more interesting then, I think, since we > would then have automated calling of the logging code. BTW: I'd > thought that would make a nice follow-up patch for this, but if you'd > prefer I could add it as another patch in the series here. > > What do you think about resetting the flag versus just not having it? If I understand you correctly, adding the flag is not necessary for this proposal. To keep the patch simple, I prefer not having it. [1] https://www.postgresql.org/message-id/flat/CA%2BTgmoYW_rSOW4JMQ9_0Df9PKQ%3DsQDOKUGA4Gc9D8w4wui8fSA%40mail.gmail.com#b57432077f8045be8588049269f7a8dd -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
On Tue, Jun 13, 2023 at 11:22 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2023-06-13 00:52, James Coleman wrote: > >> > >> > I've attached v27. The important change here in 0001 is that it > >> > guarantees the interrupt handler is re-entrant, since that was a bug > >> > exposed by my testing. I've also included 0002 which is only meant for > >> > testing (it attempts to log in the plan in every > >> > CHECK_FOR_INTERRUPTS() call). > >> > >> When SIGINT is sent during ProcessLogQueryPlanInterrupt(), > >> ProcessLogQueryPlanInterruptActive can remain true. > >> After that, pg_log_query_plan() does nothing but just returns. > >> > >> AFAIU, v26 logs plan for each pg_log_query_plan() even when another > >> pg_log_query_plan() is running, but it doesn't cause errors or > >> critical > >> problem. > >> > >> Considering the problem solved and introduced by v27, I think v26 > >> might > >> be fine. > >> How do you think? > > > > The testing I did with calling this during every CFI is what uncovered > > the re-entrancy problem. IIRC (without running that test again) the > > problem was a stack overflow. Now, to be sure this is a particularly > > degenerate case because in real-world usage it'd be impossible in > > practice, I think, to trigger that many calls to this function (and by > > extension the interrupt handler). > > Yeah.In addition, currently only superusers are allowed to execute > pg_log_query_plan(), I think we don't need to think about cases > where users are malicious. > > > If SIGINT is the only concern we could reset > > ProcessLogQueryPlanInterruptActive in error handling code. I admit > > that part of my thought process here is thinking ahead to an > > additional patch I'd like to see on top of this, which is logging a > > query plan before cleaning up when statement timeout occurs. > > I remember this is what you wanted do.[1] > > > The > > re-entrancy issue becomes more interesting then, I think, since we > > would then have automated calling of the logging code. BTW: I'd > > thought that would make a nice follow-up patch for this, but if you'd > > prefer I could add it as another patch in the series here. > > > > What do you think about resetting the flag versus just not having it? > > If I understand you correctly, adding the flag is not necessary for this > proposal. > To keep the patch simple, I prefer not having it. > I'm going to re-run tests with my patch version + resetting the flag on SIGINT (and any other error condition) to be certain that the issue you uncovered (where backends get stuck after a SIGINT not responding to the requested plan logging) wasn't masking any other issues. As long as that run is clean also then I believe the patch is safe as-is even without the re-entrancy guard. I'll report back with the results of that testing. Regards, James Coleman
On Tue, Jun 13, 2023 at 11:53 AM James Coleman <jtc331@gmail.com> wrote: > > ... > I'm going to re-run tests with my patch version + resetting the flag > on SIGINT (and any other error condition) to be certain that the issue > you uncovered (where backends get stuck after a SIGINT not responding > to the requested plan logging) wasn't masking any other issues. > > As long as that run is clean also then I believe the patch is safe > as-is even without the re-entrancy guard. > > I'll report back with the results of that testing. The tests have been running since last night, but have been apparently hung now for many hours. I haven't been able to fully look into it, but so far I know the hung (100% CPU) backend last logged this: 2023-06-14 02:00:30.045 UTC client backend[84461] pg_regress/updatable_views LOG: query plan running on backend with PID 84461 is: Query Text: SELECT table_name, column_name, is_updatable FROM information_schema.columns WHERE table_name LIKE E'r_\\_view%' ORDER BY table_name, ordinal_position; The last output from the regression test harness was: # parallel group (5 tests): index_including create_view index_including_gist create_index create_index_spgist ok 66 + create_index 36508 ms ok 67 + create_index_spgist 38588 ms ok 68 + create_view 1394 ms ok 69 + index_including 654 ms ok 70 + index_including_gist 1701 ms # parallel group (16 tests): errors create_cast drop_if_exists create_aggregate roleattributes constraints hash_func typed_table infinite_recurse Attaching gdb to the hung backend shows this: #0 0x00005601ab1f9529 in ProcLockWakeup (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 <default_lockmethod>, lock=lock@entry=0x7f5325c913f0) at proc.c:1655 #1 0x00005601ab1e99dc in CleanUpLock (lock=lock@entry=0x7f5325c913f0, proclock=proclock@entry=0x7f5325d40d60, lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 <default_lockmethod>, hashcode=hashcode@entry=573498161, wakeupNeeded=<optimized out>) at lock.c:1673 #2 0x00005601ab1e9e21 in LockRefindAndRelease (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 <default_lockmethod>, proc=<optimized out>, locktag=locktag@entry=0x5601ac3d7998, lockmode=lockmode@entry=1, decrement_strong_lock_count=decrement_strong_lock_count@entry=false) at lock.c:3150 #3 0x00005601ab1edb27 in LockReleaseAll (lockmethodid=lockmethodid@entry=1, allLocks=false) at lock.c:2295 #4 0x00005601ab1f8599 in ProcReleaseLocks (isCommit=isCommit@entry=true) at proc.c:781 #5 0x00005601ab37f1f4 in ResourceOwnerReleaseInternal (owner=<optimized out>, phase=phase@entry=RESOURCE_RELEASE_LOCKS, isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at resowner.c:618 #6 0x00005601ab37f7b7 in ResourceOwnerRelease (owner=<optimized out>, phase=phase@entry=RESOURCE_RELEASE_LOCKS, isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at resowner.c:494 #7 0x00005601aaec1d84 in CommitTransaction () at xact.c:2334 #8 0x00005601aaec2b22 in CommitTransactionCommand () at xact.c:3067 #9 0x00005601ab200a66 in finish_xact_command () at postgres.c:2783 #10 0x00005601ab20338f in exec_simple_query ( query_string=query_string@entry=0x5601ac3b0858 "SELECT table_name, column_name, is_updatable\n FROM information_schema.columns\n WHERE table_name LIKE E'r_\\\\_view%'\n ORDER BY table_name, ordinal_position;") at postgres.c:1300 I am unable to connect to the regression test Postgres instance -- psql just hangs, so the lock seems to have affected the postmaster also. I'm wondering if this might represent a bug in the current patch. Regards, James Coleman
On 2023-06-15 01:48, James Coleman wrote: > On Tue, Jun 13, 2023 at 11:53 AM James Coleman <jtc331@gmail.com> > wrote: >> >> ... >> I'm going to re-run tests with my patch version + resetting the flag >> on SIGINT (and any other error condition) to be certain that the issue >> you uncovered (where backends get stuck after a SIGINT not responding >> to the requested plan logging) wasn't masking any other issues. >> >> As long as that run is clean also then I believe the patch is safe >> as-is even without the re-entrancy guard. >> >> I'll report back with the results of that testing. > > The tests have been running since last night, but have been apparently > hung now for many hours. I haven't been able to fully look into it, > but so far I know the hung (100% CPU) backend last logged this: > > 2023-06-14 02:00:30.045 UTC client backend[84461] > pg_regress/updatable_views LOG: query plan running on backend with > PID 84461 is: > Query Text: SELECT table_name, column_name, is_updatable > FROM information_schema.columns > WHERE table_name LIKE E'r_\\_view%' > ORDER BY table_name, ordinal_position; > > The last output from the regression test harness was: > > # parallel group (5 tests): index_including create_view > index_including_gist create_index create_index_spgist > ok 66 + create_index 36508 ms > ok 67 + create_index_spgist 38588 ms > ok 68 + create_view 1394 ms > ok 69 + index_including 654 ms > ok 70 + index_including_gist 1701 ms > # parallel group (16 tests): errors create_cast drop_if_exists > create_aggregate roleattributes constraints hash_func typed_table > infinite_recurse > > Attaching gdb to the hung backend shows this: > > #0 0x00005601ab1f9529 in ProcLockWakeup > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 > <default_lockmethod>, lock=lock@entry=0x7f5325c913f0) at proc.c:1655 > #1 0x00005601ab1e99dc in CleanUpLock (lock=lock@entry=0x7f5325c913f0, > proclock=proclock@entry=0x7f5325d40d60, > lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 > <default_lockmethod>, > hashcode=hashcode@entry=573498161, wakeupNeeded=<optimized out>) > at lock.c:1673 > #2 0x00005601ab1e9e21 in LockRefindAndRelease > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 > <default_lockmethod>, proc=<optimized out>, > locktag=locktag@entry=0x5601ac3d7998, lockmode=lockmode@entry=1, > > decrement_strong_lock_count=decrement_strong_lock_count@entry=false) > at lock.c:3150 > #3 0x00005601ab1edb27 in LockReleaseAll > (lockmethodid=lockmethodid@entry=1, allLocks=false) at lock.c:2295 > #4 0x00005601ab1f8599 in ProcReleaseLocks > (isCommit=isCommit@entry=true) at proc.c:781 > #5 0x00005601ab37f1f4 in ResourceOwnerReleaseInternal > (owner=<optimized out>, phase=phase@entry=RESOURCE_RELEASE_LOCKS, > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at > resowner.c:618 > #6 0x00005601ab37f7b7 in ResourceOwnerRelease (owner=<optimized out>, > phase=phase@entry=RESOURCE_RELEASE_LOCKS, > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at > resowner.c:494 > #7 0x00005601aaec1d84 in CommitTransaction () at xact.c:2334 > #8 0x00005601aaec2b22 in CommitTransactionCommand () at xact.c:3067 > #9 0x00005601ab200a66 in finish_xact_command () at postgres.c:2783 > #10 0x00005601ab20338f in exec_simple_query ( > query_string=query_string@entry=0x5601ac3b0858 "SELECT table_name, > column_name, is_updatable\n FROM information_schema.columns\n WHERE > table_name LIKE E'r_\\\\_view%'\n ORDER BY table_name, > ordinal_position;") at postgres.c:1300 > > I am unable to connect to the regression test Postgres instance -- > psql just hangs, so the lock seems to have affected the postmaster > also. > > I'm wondering if this might represent a bug in the current patch. Thanks for running and analyzing the test! Could you share me how you are running the test? I imagined something like below, but currently couldn't reproduce it. - apply both v26-0001 and v27-0002 and build - run PostgreSQL with default GUCssaaa - make installcheck-world - run 'SELECT pg_log_query_plan(pid) FROM pg_stat_activity \watch 0.1' during make installcheck-world -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
On Thu, Jun 15, 2023 at 9:00 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2023-06-15 01:48, James Coleman wrote: > > On Tue, Jun 13, 2023 at 11:53 AM James Coleman <jtc331@gmail.com> > > wrote: > >> > >> ... > >> I'm going to re-run tests with my patch version + resetting the flag > >> on SIGINT (and any other error condition) to be certain that the issue > >> you uncovered (where backends get stuck after a SIGINT not responding > >> to the requested plan logging) wasn't masking any other issues. > >> > >> As long as that run is clean also then I believe the patch is safe > >> as-is even without the re-entrancy guard. > >> > >> I'll report back with the results of that testing. > > > > The tests have been running since last night, but have been apparently > > hung now for many hours. I haven't been able to fully look into it, > > but so far I know the hung (100% CPU) backend last logged this: > > > > 2023-06-14 02:00:30.045 UTC client backend[84461] > > pg_regress/updatable_views LOG: query plan running on backend with > > PID 84461 is: > > Query Text: SELECT table_name, column_name, is_updatable > > FROM information_schema.columns > > WHERE table_name LIKE E'r_\\_view%' > > ORDER BY table_name, ordinal_position; > > > > The last output from the regression test harness was: > > > > # parallel group (5 tests): index_including create_view > > index_including_gist create_index create_index_spgist > > ok 66 + create_index 36508 ms > > ok 67 + create_index_spgist 38588 ms > > ok 68 + create_view 1394 ms > > ok 69 + index_including 654 ms > > ok 70 + index_including_gist 1701 ms > > # parallel group (16 tests): errors create_cast drop_if_exists > > create_aggregate roleattributes constraints hash_func typed_table > > infinite_recurse > > > > Attaching gdb to the hung backend shows this: > > > > #0 0x00005601ab1f9529 in ProcLockWakeup > > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 > > <default_lockmethod>, lock=lock@entry=0x7f5325c913f0) at proc.c:1655 > > #1 0x00005601ab1e99dc in CleanUpLock (lock=lock@entry=0x7f5325c913f0, > > proclock=proclock@entry=0x7f5325d40d60, > > lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 > > <default_lockmethod>, > > hashcode=hashcode@entry=573498161, wakeupNeeded=<optimized out>) > > at lock.c:1673 > > #2 0x00005601ab1e9e21 in LockRefindAndRelease > > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 > > <default_lockmethod>, proc=<optimized out>, > > locktag=locktag@entry=0x5601ac3d7998, lockmode=lockmode@entry=1, > > > > decrement_strong_lock_count=decrement_strong_lock_count@entry=false) > > at lock.c:3150 > > #3 0x00005601ab1edb27 in LockReleaseAll > > (lockmethodid=lockmethodid@entry=1, allLocks=false) at lock.c:2295 > > #4 0x00005601ab1f8599 in ProcReleaseLocks > > (isCommit=isCommit@entry=true) at proc.c:781 > > #5 0x00005601ab37f1f4 in ResourceOwnerReleaseInternal > > (owner=<optimized out>, phase=phase@entry=RESOURCE_RELEASE_LOCKS, > > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at > > resowner.c:618 > > #6 0x00005601ab37f7b7 in ResourceOwnerRelease (owner=<optimized out>, > > phase=phase@entry=RESOURCE_RELEASE_LOCKS, > > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at > > resowner.c:494 > > #7 0x00005601aaec1d84 in CommitTransaction () at xact.c:2334 > > #8 0x00005601aaec2b22 in CommitTransactionCommand () at xact.c:3067 > > #9 0x00005601ab200a66 in finish_xact_command () at postgres.c:2783 > > #10 0x00005601ab20338f in exec_simple_query ( > > query_string=query_string@entry=0x5601ac3b0858 "SELECT table_name, > > column_name, is_updatable\n FROM information_schema.columns\n WHERE > > table_name LIKE E'r_\\\\_view%'\n ORDER BY table_name, > > ordinal_position;") at postgres.c:1300 > > > > I am unable to connect to the regression test Postgres instance -- > > psql just hangs, so the lock seems to have affected the postmaster > > also. > > > > I'm wondering if this might represent a bug in the current patch. > > Thanks for running and analyzing the test! Sure thing! > Could you share me how you are running the test? > > I imagined something like below, but currently couldn't reproduce it. > - apply both v26-0001 and v27-0002 and build > - run PostgreSQL with default GUCssaaa > - make installcheck-world > - run 'SELECT pg_log_query_plan(pid) FROM pg_stat_activity \watch 0.1' > during make installcheck-world Apologies, I should have attached my updated patch (with the fix for the bug you'd reporting with the re-entrancy guard). Attached is v28 which sets ProcessLogQueryPlanInterruptActive to false in errfinish when necessary. Once built with those two patches I'm simply running `make check`. Regards, James Coleman
Attachment
On 2023-06-16 01:34, James Coleman wrote: > On Thu, Jun 15, 2023 at 9:00 AM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> >> On 2023-06-15 01:48, James Coleman wrote: >> > On Tue, Jun 13, 2023 at 11:53 AM James Coleman <jtc331@gmail.com> >> > wrote: >> >> >> >> ... >> >> I'm going to re-run tests with my patch version + resetting the flag >> >> on SIGINT (and any other error condition) to be certain that the issue >> >> you uncovered (where backends get stuck after a SIGINT not responding >> >> to the requested plan logging) wasn't masking any other issues. >> >> >> >> As long as that run is clean also then I believe the patch is safe >> >> as-is even without the re-entrancy guard. >> >> >> >> I'll report back with the results of that testing. >> > >> > The tests have been running since last night, but have been apparently >> > hung now for many hours. I haven't been able to fully look into it, >> > but so far I know the hung (100% CPU) backend last logged this: >> > >> > 2023-06-14 02:00:30.045 UTC client backend[84461] >> > pg_regress/updatable_views LOG: query plan running on backend with >> > PID 84461 is: >> > Query Text: SELECT table_name, column_name, is_updatable >> > FROM information_schema.columns >> > WHERE table_name LIKE E'r_\\_view%' >> > ORDER BY table_name, ordinal_position; >> > >> > The last output from the regression test harness was: >> > >> > # parallel group (5 tests): index_including create_view >> > index_including_gist create_index create_index_spgist >> > ok 66 + create_index 36508 ms >> > ok 67 + create_index_spgist 38588 ms >> > ok 68 + create_view 1394 ms >> > ok 69 + index_including 654 ms >> > ok 70 + index_including_gist 1701 ms >> > # parallel group (16 tests): errors create_cast drop_if_exists >> > create_aggregate roleattributes constraints hash_func typed_table >> > infinite_recurse >> > >> > Attaching gdb to the hung backend shows this: >> > >> > #0 0x00005601ab1f9529 in ProcLockWakeup >> > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 >> > <default_lockmethod>, lock=lock@entry=0x7f5325c913f0) at proc.c:1655 >> > #1 0x00005601ab1e99dc in CleanUpLock (lock=lock@entry=0x7f5325c913f0, >> > proclock=proclock@entry=0x7f5325d40d60, >> > lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 >> > <default_lockmethod>, >> > hashcode=hashcode@entry=573498161, wakeupNeeded=<optimized out>) >> > at lock.c:1673 >> > #2 0x00005601ab1e9e21 in LockRefindAndRelease >> > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0 >> > <default_lockmethod>, proc=<optimized out>, >> > locktag=locktag@entry=0x5601ac3d7998, lockmode=lockmode@entry=1, >> > >> > decrement_strong_lock_count=decrement_strong_lock_count@entry=false) >> > at lock.c:3150 >> > #3 0x00005601ab1edb27 in LockReleaseAll >> > (lockmethodid=lockmethodid@entry=1, allLocks=false) at lock.c:2295 >> > #4 0x00005601ab1f8599 in ProcReleaseLocks >> > (isCommit=isCommit@entry=true) at proc.c:781 >> > #5 0x00005601ab37f1f4 in ResourceOwnerReleaseInternal >> > (owner=<optimized out>, phase=phase@entry=RESOURCE_RELEASE_LOCKS, >> > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at >> > resowner.c:618 >> > #6 0x00005601ab37f7b7 in ResourceOwnerRelease (owner=<optimized out>, >> > phase=phase@entry=RESOURCE_RELEASE_LOCKS, >> > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at >> > resowner.c:494 >> > #7 0x00005601aaec1d84 in CommitTransaction () at xact.c:2334 >> > #8 0x00005601aaec2b22 in CommitTransactionCommand () at xact.c:3067 >> > #9 0x00005601ab200a66 in finish_xact_command () at postgres.c:2783 >> > #10 0x00005601ab20338f in exec_simple_query ( >> > query_string=query_string@entry=0x5601ac3b0858 "SELECT table_name, >> > column_name, is_updatable\n FROM information_schema.columns\n WHERE >> > table_name LIKE E'r_\\\\_view%'\n ORDER BY table_name, >> > ordinal_position;") at postgres.c:1300 >> > >> > I am unable to connect to the regression test Postgres instance -- >> > psql just hangs, so the lock seems to have affected the postmaster >> > also. >> > >> > I'm wondering if this might represent a bug in the current patch. >> >> Thanks for running and analyzing the test! > > Sure thing! > >> Could you share me how you are running the test? >> >> I imagined something like below, but currently couldn't reproduce it. >> - apply both v26-0001 and v27-0002 and build >> - run PostgreSQL with default GUCssaaa >> - make installcheck-world >> - run 'SELECT pg_log_query_plan(pid) FROM pg_stat_activity \watch 0.1' >> during make installcheck-world > > Apologies, I should have attached my updated patch (with the fix for > the bug you'd reporting with the re-entrancy guard). Attached is v28 > which sets ProcessLogQueryPlanInterruptActive to false in errfinish > when necessary. Once built with those two patches I'm simply running > `make check`. Thanks! However, I haven't succeeded in reproducing the problem as below: > The tests have been running since last night, but have been apparently > hung now for many hours. I haven't been able to fully look into it, > but so far I know the hung (100% CPU) backend last logged this: Did you do something like this? $ ./configure --prefix=/home/ubuntu/pgsql/master --enable-cassert $ git apply ../patch/v28-0001-Add-function-to-log-the-plan-of-the-query.patch ../patch/v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch $ make $ make check -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
On 2023-06-16 01:34, James Coleman wrote: > Attached is v28 > which sets ProcessLogQueryPlanInterruptActive to false in errfinish > when necessary. Once built with those two patches I'm simply running > `make check`. With v28-0001 and v28-0002 patch, I confirmed backend processes consume huge amount of memory and under some environments they were terminated by OOM killer. This was because memory was allocated from existing memory contexts and they were not freed after ProcessLogQueryPlanInterrupt(). Updated the patch to use dedicated memory context for ProcessLogQueryPlanInterrupt(). Applying attached patch and v28-0002 patch, `make check` successfully completed after 20min and 50GB of logs on my environment. >>> On 2023-06-15 01:48, James Coleman wrote: >>> > The tests have been running since last night, but have been apparently >>> > hung now for many hours. I don't know if this has anything to do with the hung you faced, but I thought it might be possible that the large amount of memory usage resulted in swapping, which caused a significant delay in processing. If possible, I would be very grateful if you could try to reproduce this with the v29 patch. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On Thu, Aug 17, 2023 at 10:02 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2023-06-16 01:34, James Coleman wrote: > > Attached is v28 > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish > > when necessary. Once built with those two patches I'm simply running > > `make check`. > > With v28-0001 and v28-0002 patch, I confirmed backend processes consume > huge > amount of memory and under some environments they were terminated by OOM > killer. > > This was because memory was allocated from existing memory contexts and > they > were not freed after ProcessLogQueryPlanInterrupt(). > Updated the patch to use dedicated memory context for > ProcessLogQueryPlanInterrupt(). > > Applying attached patch and v28-0002 patch, `make check` successfully > completed after 20min and 50GB of logs on my environment. > > >>> On 2023-06-15 01:48, James Coleman wrote: > >>> > The tests have been running since last night, but have been apparently > >>> > hung now for many hours. > > I don't know if this has anything to do with the hung you faced, but I > thought > it might be possible that the large amount of memory usage resulted in > swapping, which caused a significant delay in processing. Ah, yes, I think that could be a possible explanation. I was delaying on this thread because I wasn't comfortable with having caused an issue once (even if I couldn't easily reproduce) without at least some theory as to the cause (and a fix). > If possible, I would be very grateful if you could try to reproduce this > with > the v29 patch. I'll kick off some testing. Thanks, James Coleman
On Fri, Aug 25, 2023 at 7:43 AM James Coleman <jtc331@gmail.com> wrote: > > On Thu, Aug 17, 2023 at 10:02 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > > > On 2023-06-16 01:34, James Coleman wrote: > > > Attached is v28 > > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish > > > when necessary. Once built with those two patches I'm simply running > > > `make check`. > > > > With v28-0001 and v28-0002 patch, I confirmed backend processes consume > > huge > > amount of memory and under some environments they were terminated by OOM > > killer. > > > > This was because memory was allocated from existing memory contexts and > > they > > were not freed after ProcessLogQueryPlanInterrupt(). > > Updated the patch to use dedicated memory context for > > ProcessLogQueryPlanInterrupt(). > > > > Applying attached patch and v28-0002 patch, `make check` successfully > > completed after 20min and 50GB of logs on my environment. > > > > >>> On 2023-06-15 01:48, James Coleman wrote: > > >>> > The tests have been running since last night, but have been apparently > > >>> > hung now for many hours. > > > > I don't know if this has anything to do with the hung you faced, but I > > thought > > it might be possible that the large amount of memory usage resulted in > > swapping, which caused a significant delay in processing. > > Ah, yes, I think that could be a possible explanation. I was delaying > on this thread because I wasn't comfortable with having caused an > issue once (even if I couldn't easily reproduce) without at least some > theory as to the cause (and a fix). > > > If possible, I would be very grateful if you could try to reproduce this > > with > > the v29 patch. > > I'll kick off some testing. > I don't have time to investigate what's happening here, but 24 hours later the first "make check" is still running, and at first glance it seems to have the same behavior I'd seen that first time. The test output is to this point: # parallel group (5 tests): index_including create_view index_including_gist create_index create_index_spgist ok 66 + create_index 26365 ms ok 67 + create_index_spgist 27675 ms ok 68 + create_view 1235 ms ok 69 + index_including 1102 ms ok 70 + index_including_gist 1633 ms # parallel group (16 tests): create_aggregate create_cast errors roleattributes drop_if_exists hash_func typed_table create_am infinite_recurse and it hasn't progressed past that point since at least ~16 hours ago (the first several hours of the run I wasn't monitoring it). I haven't connected up gdb yet, and won't be able to until maybe tomorrow, but here's the ps output for postgres processes that are running: admin 3213249 0.0 0.0 196824 20552 ? Ss Aug25 0:00 /home/admin/postgresql-test/bin/postgres -D /home/admin/postgresql-test-data admin 3213250 0.0 0.0 196964 7284 ? Ss Aug25 0:00 postgres: checkpointer admin 3213251 0.0 0.0 196956 4276 ? Ss Aug25 0:00 postgres: background writer admin 3213253 0.0 0.0 196956 8600 ? Ss Aug25 0:00 postgres: walwriter admin 3213254 0.0 0.0 198424 7316 ? Ss Aug25 0:00 postgres: autovacuum launcher admin 3213255 0.0 0.0 198412 5488 ? Ss Aug25 0:00 postgres: logical replication launcher admin 3237967 0.0 0.0 2484 516 pts/4 S+ Aug25 0:00 /bin/sh -c echo "# +++ regress check in src/test/regress +++" && PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH" LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib" INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template ../../../src/test/regress/pg_regress --temp-instance=./tmp_check --inputdir=. --bindir= --dlpath=. --max-concurrent-tests=20 --schedule=./parallel_schedule admin 3237973 0.0 0.0 197880 20688 pts/4 S+ Aug25 0:00 postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c listen_addresses= -k /tmp/pg_regress-7mmGUa admin 3237976 0.0 0.1 198332 44608 ? Ss Aug25 0:00 postgres: checkpointer admin 3237977 0.0 0.0 198032 4640 ? Ss Aug25 0:00 postgres: background writer admin 3237979 0.0 0.0 197880 8580 ? Ss Aug25 0:00 postgres: walwriter admin 3237980 0.0 0.0 199484 7608 ? Ss Aug25 0:00 postgres: autovacuum launcher admin 3237981 0.0 0.0 199460 5488 ? Ss Aug25 0:00 postgres: logical replication launcher admin 3243644 0.0 0.2 252400 74656 ? Ss Aug25 0:01 postgres: admin regression [local] SELECT waiting admin 3243645 0.0 0.1 205480 33992 ? Ss Aug25 0:00 postgres: admin regression [local] SELECT waiting admin 3243654 99.9 0.0 203156 31504 ? Rs Aug25 1437:49 postgres: admin regression [local] VACUUM admin 3243655 0.0 0.1 212036 38504 ? Ss Aug25 0:00 postgres: admin regression [local] SELECT waiting admin 3243656 0.0 0.0 206024 30892 ? Ss Aug25 0:00 postgres: admin regression [local] DELETE waiting admin 3243657 0.0 0.1 205568 32232 ? Ss Aug25 0:00 postgres: admin regression [local] ALTER TABLE waiting admin 3243658 0.0 0.0 203740 21532 ? Ss Aug25 0:00 postgres: admin regression [local] ANALYZE waiting admin 3243798 0.0 0.0 199884 8464 ? Ss Aug25 0:00 postgres: autovacuum worker admin 3244733 0.0 0.0 199492 5956 ? Ss Aug25 0:00 postgres: autovacuum worker admin 3245652 0.0 0.0 199884 8468 ? Ss Aug25 0:00 postgres: autovacuum worker As you can see there are a bunch of backends presumably waiting, and also the VACUUM process has been pegging a single CPU core for at least since that ~16 hour ago mark. I hope to be able to do more investigation later, but I wanted to at least give you this information now. James Coleman
On 2023-08-26 21:03, James Coleman wrote: > On Fri, Aug 25, 2023 at 7:43 AM James Coleman <jtc331@gmail.com> wrote: >> >> On Thu, Aug 17, 2023 at 10:02 AM torikoshia >> <torikoshia@oss.nttdata.com> wrote: >> > >> > On 2023-06-16 01:34, James Coleman wrote: >> > > Attached is v28 >> > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish >> > > when necessary. Once built with those two patches I'm simply running >> > > `make check`. >> > >> > With v28-0001 and v28-0002 patch, I confirmed backend processes consume >> > huge >> > amount of memory and under some environments they were terminated by OOM >> > killer. >> > >> > This was because memory was allocated from existing memory contexts and >> > they >> > were not freed after ProcessLogQueryPlanInterrupt(). >> > Updated the patch to use dedicated memory context for >> > ProcessLogQueryPlanInterrupt(). >> > >> > Applying attached patch and v28-0002 patch, `make check` successfully >> > completed after 20min and 50GB of logs on my environment. >> > >> > >>> On 2023-06-15 01:48, James Coleman wrote: >> > >>> > The tests have been running since last night, but have been apparently >> > >>> > hung now for many hours. >> > >> > I don't know if this has anything to do with the hung you faced, but I >> > thought >> > it might be possible that the large amount of memory usage resulted in >> > swapping, which caused a significant delay in processing. >> >> Ah, yes, I think that could be a possible explanation. I was delaying >> on this thread because I wasn't comfortable with having caused an >> issue once (even if I couldn't easily reproduce) without at least some >> theory as to the cause (and a fix). >> >> > If possible, I would be very grateful if you could try to reproduce this >> > with >> > the v29 patch. >> >> I'll kick off some testing. >> > > I don't have time to investigate what's happening here, but 24 hours > later the first "make check" is still running, and at first glance it > seems to have the same behavior I'd seen that first time. The test > output is to this point: > > # parallel group (5 tests): index_including create_view > index_including_gist create_index create_index_spgist > ok 66 + create_index 26365 ms > ok 67 + create_index_spgist 27675 ms > ok 68 + create_view 1235 ms > ok 69 + index_including 1102 ms > ok 70 + index_including_gist 1633 ms > # parallel group (16 tests): create_aggregate create_cast errors > roleattributes drop_if_exists hash_func typed_table create_am > infinite_recurse > > and it hasn't progressed past that point since at least ~16 hours ago > (the first several hours of the run I wasn't monitoring it). > > I haven't connected up gdb yet, and won't be able to until maybe > tomorrow, but here's the ps output for postgres processes that are > running: > > admin 3213249 0.0 0.0 196824 20552 ? Ss Aug25 0:00 > /home/admin/postgresql-test/bin/postgres -D > /home/admin/postgresql-test-data > admin 3213250 0.0 0.0 196964 7284 ? Ss Aug25 0:00 > postgres: checkpointer > admin 3213251 0.0 0.0 196956 4276 ? Ss Aug25 0:00 > postgres: background writer > admin 3213253 0.0 0.0 196956 8600 ? Ss Aug25 0:00 > postgres: walwriter > admin 3213254 0.0 0.0 198424 7316 ? Ss Aug25 0:00 > postgres: autovacuum launcher > admin 3213255 0.0 0.0 198412 5488 ? Ss Aug25 0:00 > postgres: logical replication launcher > admin 3237967 0.0 0.0 2484 516 pts/4 S+ Aug25 0:00 > /bin/sh -c echo "# +++ regress check in src/test/regress +++" && > PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH" > LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib" > INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template > ../../../src/test/regress/pg_regress --temp-instance=./tmp_check > --inputdir=. --bindir= --dlpath=. --max-concurrent-tests=20 > --schedule=./parallel_schedule > admin 3237973 0.0 0.0 197880 20688 pts/4 S+ Aug25 0:00 > postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c > listen_addresses= -k /tmp/pg_regress-7mmGUa > admin 3237976 0.0 0.1 198332 44608 ? Ss Aug25 0:00 > postgres: checkpointer > admin 3237977 0.0 0.0 198032 4640 ? Ss Aug25 0:00 > postgres: background writer > admin 3237979 0.0 0.0 197880 8580 ? Ss Aug25 0:00 > postgres: walwriter > admin 3237980 0.0 0.0 199484 7608 ? Ss Aug25 0:00 > postgres: autovacuum launcher > admin 3237981 0.0 0.0 199460 5488 ? Ss Aug25 0:00 > postgres: logical replication launcher > admin 3243644 0.0 0.2 252400 74656 ? Ss Aug25 0:01 > postgres: admin regression [local] SELECT waiting > admin 3243645 0.0 0.1 205480 33992 ? Ss Aug25 0:00 > postgres: admin regression [local] SELECT waiting > admin 3243654 99.9 0.0 203156 31504 ? Rs Aug25 1437:49 > postgres: admin regression [local] VACUUM > admin 3243655 0.0 0.1 212036 38504 ? Ss Aug25 0:00 > postgres: admin regression [local] SELECT waiting > admin 3243656 0.0 0.0 206024 30892 ? Ss Aug25 0:00 > postgres: admin regression [local] DELETE waiting > admin 3243657 0.0 0.1 205568 32232 ? Ss Aug25 0:00 > postgres: admin regression [local] ALTER TABLE waiting > admin 3243658 0.0 0.0 203740 21532 ? Ss Aug25 0:00 > postgres: admin regression [local] ANALYZE waiting > admin 3243798 0.0 0.0 199884 8464 ? Ss Aug25 0:00 > postgres: autovacuum worker > admin 3244733 0.0 0.0 199492 5956 ? Ss Aug25 0:00 > postgres: autovacuum worker > admin 3245652 0.0 0.0 199884 8468 ? Ss Aug25 0:00 > postgres: autovacuum worker > > As you can see there are a bunch of backends presumably waiting, and > also the VACUUM process has been pegging a single CPU core for at > least since that ~16 hour ago mark. > > I hope to be able to do more investigation later, but I wanted to at > least give you this information now. Thanks a lot for testing the patch! I really appreciate your cooperation. Hmm, I also tested on the current HEAD(165d581f146b09) again on Ubuntu 22.04 and macOS, but unfortunately(fortunately?) they succeeded as below: ``` $ git apply v29-0001-Add-function-to-log-the-plan-of-the-query.patch $ git apply v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch $ ./configure --enable-debug --enable-cassert $ make $ make check ...(snip)... # parallel group (5 tests): index_including index_including_gist create_view create_index create_index_spgist ok 66 + create_index 25033 ms ok 67 + create_index_spgist 26144 ms ok 68 + create_view 3061 ms ok 69 + index_including 976 ms ok 70 + index_including_gist 2998 ms # parallel group (16 tests): create_cast errors create_aggregate roleattributes drop_if_exists hash_func typed_table create_am select constraints updatable_views inherit triggers vacuum create_function_sql infinite_recurse ok 71 + create_aggregate 225 ms ok 72 + create_function_sql 18874 ms ok 73 + create_cast 168 ms ...(snip)... # All 215 tests passed. ``` If you notice any difference, I would be grateful if you could let me know. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
On Mon, Aug 28, 2023 at 3:01 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2023-08-26 21:03, James Coleman wrote: > > On Fri, Aug 25, 2023 at 7:43 AM James Coleman <jtc331@gmail.com> wrote: > >> > >> On Thu, Aug 17, 2023 at 10:02 AM torikoshia > >> <torikoshia@oss.nttdata.com> wrote: > >> > > >> > On 2023-06-16 01:34, James Coleman wrote: > >> > > Attached is v28 > >> > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish > >> > > when necessary. Once built with those two patches I'm simply running > >> > > `make check`. > >> > > >> > With v28-0001 and v28-0002 patch, I confirmed backend processes consume > >> > huge > >> > amount of memory and under some environments they were terminated by OOM > >> > killer. > >> > > >> > This was because memory was allocated from existing memory contexts and > >> > they > >> > were not freed after ProcessLogQueryPlanInterrupt(). > >> > Updated the patch to use dedicated memory context for > >> > ProcessLogQueryPlanInterrupt(). > >> > > >> > Applying attached patch and v28-0002 patch, `make check` successfully > >> > completed after 20min and 50GB of logs on my environment. > >> > > >> > >>> On 2023-06-15 01:48, James Coleman wrote: > >> > >>> > The tests have been running since last night, but have been apparently > >> > >>> > hung now for many hours. > >> > > >> > I don't know if this has anything to do with the hung you faced, but I > >> > thought > >> > it might be possible that the large amount of memory usage resulted in > >> > swapping, which caused a significant delay in processing. > >> > >> Ah, yes, I think that could be a possible explanation. I was delaying > >> on this thread because I wasn't comfortable with having caused an > >> issue once (even if I couldn't easily reproduce) without at least some > >> theory as to the cause (and a fix). > >> > >> > If possible, I would be very grateful if you could try to reproduce this > >> > with > >> > the v29 patch. > >> > >> I'll kick off some testing. > >> > > > > I don't have time to investigate what's happening here, but 24 hours > > later the first "make check" is still running, and at first glance it > > seems to have the same behavior I'd seen that first time. The test > > output is to this point: > > > > # parallel group (5 tests): index_including create_view > > index_including_gist create_index create_index_spgist > > ok 66 + create_index 26365 ms > > ok 67 + create_index_spgist 27675 ms > > ok 68 + create_view 1235 ms > > ok 69 + index_including 1102 ms > > ok 70 + index_including_gist 1633 ms > > # parallel group (16 tests): create_aggregate create_cast errors > > roleattributes drop_if_exists hash_func typed_table create_am > > infinite_recurse > > > > and it hasn't progressed past that point since at least ~16 hours ago > > (the first several hours of the run I wasn't monitoring it). > > > > I haven't connected up gdb yet, and won't be able to until maybe > > tomorrow, but here's the ps output for postgres processes that are > > running: > > > > admin 3213249 0.0 0.0 196824 20552 ? Ss Aug25 0:00 > > /home/admin/postgresql-test/bin/postgres -D > > /home/admin/postgresql-test-data > > admin 3213250 0.0 0.0 196964 7284 ? Ss Aug25 0:00 > > postgres: checkpointer > > admin 3213251 0.0 0.0 196956 4276 ? Ss Aug25 0:00 > > postgres: background writer > > admin 3213253 0.0 0.0 196956 8600 ? Ss Aug25 0:00 > > postgres: walwriter > > admin 3213254 0.0 0.0 198424 7316 ? Ss Aug25 0:00 > > postgres: autovacuum launcher > > admin 3213255 0.0 0.0 198412 5488 ? Ss Aug25 0:00 > > postgres: logical replication launcher > > admin 3237967 0.0 0.0 2484 516 pts/4 S+ Aug25 0:00 > > /bin/sh -c echo "# +++ regress check in src/test/regress +++" && > > PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH" > > LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib" > > INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template > > ../../../src/test/regress/pg_regress --temp-instance=./tmp_check > > --inputdir=. --bindir= --dlpath=. --max-concurrent-tests=20 > > --schedule=./parallel_schedule > > admin 3237973 0.0 0.0 197880 20688 pts/4 S+ Aug25 0:00 > > postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c > > listen_addresses= -k /tmp/pg_regress-7mmGUa > > admin 3237976 0.0 0.1 198332 44608 ? Ss Aug25 0:00 > > postgres: checkpointer > > admin 3237977 0.0 0.0 198032 4640 ? Ss Aug25 0:00 > > postgres: background writer > > admin 3237979 0.0 0.0 197880 8580 ? Ss Aug25 0:00 > > postgres: walwriter > > admin 3237980 0.0 0.0 199484 7608 ? Ss Aug25 0:00 > > postgres: autovacuum launcher > > admin 3237981 0.0 0.0 199460 5488 ? Ss Aug25 0:00 > > postgres: logical replication launcher > > admin 3243644 0.0 0.2 252400 74656 ? Ss Aug25 0:01 > > postgres: admin regression [local] SELECT waiting > > admin 3243645 0.0 0.1 205480 33992 ? Ss Aug25 0:00 > > postgres: admin regression [local] SELECT waiting > > admin 3243654 99.9 0.0 203156 31504 ? Rs Aug25 1437:49 > > postgres: admin regression [local] VACUUM > > admin 3243655 0.0 0.1 212036 38504 ? Ss Aug25 0:00 > > postgres: admin regression [local] SELECT waiting > > admin 3243656 0.0 0.0 206024 30892 ? Ss Aug25 0:00 > > postgres: admin regression [local] DELETE waiting > > admin 3243657 0.0 0.1 205568 32232 ? Ss Aug25 0:00 > > postgres: admin regression [local] ALTER TABLE waiting > > admin 3243658 0.0 0.0 203740 21532 ? Ss Aug25 0:00 > > postgres: admin regression [local] ANALYZE waiting > > admin 3243798 0.0 0.0 199884 8464 ? Ss Aug25 0:00 > > postgres: autovacuum worker > > admin 3244733 0.0 0.0 199492 5956 ? Ss Aug25 0:00 > > postgres: autovacuum worker > > admin 3245652 0.0 0.0 199884 8468 ? Ss Aug25 0:00 > > postgres: autovacuum worker > > > > As you can see there are a bunch of backends presumably waiting, and > > also the VACUUM process has been pegging a single CPU core for at > > least since that ~16 hour ago mark. > > > > I hope to be able to do more investigation later, but I wanted to at > > least give you this information now. > > Thanks a lot for testing the patch! > I really appreciate your cooperation. > > Hmm, I also tested on the current HEAD(165d581f146b09) again on Ubuntu > 22.04 and macOS, but unfortunately(fortunately?) they succeeded as > below: > > ``` > $ git apply v29-0001-Add-function-to-log-the-plan-of-the-query.patch > $ git apply v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch > $ ./configure --enable-debug --enable-cassert > $ make > $ make check > > ...(snip)... > > # parallel group (5 tests): index_including index_including_gist > create_view create_index create_index_spgist > ok 66 + create_index 25033 ms > ok 67 + create_index_spgist 26144 ms > ok 68 + create_view 3061 ms > ok 69 + index_including 976 ms > ok 70 + index_including_gist 2998 ms > # parallel group (16 tests): create_cast errors create_aggregate > roleattributes drop_if_exists hash_func typed_table > create_am select constraints updatable_views inherit triggers vacuum > create_function_sql infinite_recurse > ok 71 + create_aggregate 225 ms > ok 72 + create_function_sql 18874 ms > ok 73 + create_cast 168 ms > > ...(snip)... > > # All 215 tests passed. > ``` > > If you notice any difference, I would be grateful if you could let me > know. I've never been able to reproduce it (haven't tested the new version, but v28 at least) on my M1 Mac; where I've reproduced it is on Debian (first buster and now bullseye). I'm attaching several stacktraces in the hope that they provide some clues. These all match the ps output I sent earlier, though note in that output there is both the regress instance and my test instance (pid 3213249) running (different ports, of course, and they are from the exact same compilation run). I've attached ps output for the postgres processes under the make check process to simplify cross referencing. A few interesting things: - There's definitely a lock on a relation that seems to be what's blocking the processes. - When I try to connect with psql the process forks but then hangs (see the ps output with task names stuck in "authentication"). I've also included a trace from one of these. If you think a core file for any of these processes would be helpful for debugging I'd be happy to try to figure out a way to get that to you. Regards, James Coleman
Attachment
On 2023-08-28 22:47, James Coleman wrote: > On Mon, Aug 28, 2023 at 3:01 AM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> >> On 2023-08-26 21:03, James Coleman wrote: >> > On Fri, Aug 25, 2023 at 7:43 AM James Coleman <jtc331@gmail.com> wrote: >> >> >> >> On Thu, Aug 17, 2023 at 10:02 AM torikoshia >> >> <torikoshia@oss.nttdata.com> wrote: >> >> > >> >> > On 2023-06-16 01:34, James Coleman wrote: >> >> > > Attached is v28 >> >> > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish >> >> > > when necessary. Once built with those two patches I'm simply running >> >> > > `make check`. >> >> > >> >> > With v28-0001 and v28-0002 patch, I confirmed backend processes consume >> >> > huge >> >> > amount of memory and under some environments they were terminated by OOM >> >> > killer. >> >> > >> >> > This was because memory was allocated from existing memory contexts and >> >> > they >> >> > were not freed after ProcessLogQueryPlanInterrupt(). >> >> > Updated the patch to use dedicated memory context for >> >> > ProcessLogQueryPlanInterrupt(). >> >> > >> >> > Applying attached patch and v28-0002 patch, `make check` successfully >> >> > completed after 20min and 50GB of logs on my environment. >> >> > >> >> > >>> On 2023-06-15 01:48, James Coleman wrote: >> >> > >>> > The tests have been running since last night, but have been apparently >> >> > >>> > hung now for many hours. >> >> > >> >> > I don't know if this has anything to do with the hung you faced, but I >> >> > thought >> >> > it might be possible that the large amount of memory usage resulted in >> >> > swapping, which caused a significant delay in processing. >> >> >> >> Ah, yes, I think that could be a possible explanation. I was delaying >> >> on this thread because I wasn't comfortable with having caused an >> >> issue once (even if I couldn't easily reproduce) without at least some >> >> theory as to the cause (and a fix). >> >> >> >> > If possible, I would be very grateful if you could try to reproduce this >> >> > with >> >> > the v29 patch. >> >> >> >> I'll kick off some testing. >> >> >> > >> > I don't have time to investigate what's happening here, but 24 hours >> > later the first "make check" is still running, and at first glance it >> > seems to have the same behavior I'd seen that first time. The test >> > output is to this point: >> > >> > # parallel group (5 tests): index_including create_view >> > index_including_gist create_index create_index_spgist >> > ok 66 + create_index 26365 ms >> > ok 67 + create_index_spgist 27675 ms >> > ok 68 + create_view 1235 ms >> > ok 69 + index_including 1102 ms >> > ok 70 + index_including_gist 1633 ms >> > # parallel group (16 tests): create_aggregate create_cast errors >> > roleattributes drop_if_exists hash_func typed_table create_am >> > infinite_recurse >> > >> > and it hasn't progressed past that point since at least ~16 hours ago >> > (the first several hours of the run I wasn't monitoring it). >> > >> > I haven't connected up gdb yet, and won't be able to until maybe >> > tomorrow, but here's the ps output for postgres processes that are >> > running: >> > >> > admin 3213249 0.0 0.0 196824 20552 ? Ss Aug25 0:00 >> > /home/admin/postgresql-test/bin/postgres -D >> > /home/admin/postgresql-test-data >> > admin 3213250 0.0 0.0 196964 7284 ? Ss Aug25 0:00 >> > postgres: checkpointer >> > admin 3213251 0.0 0.0 196956 4276 ? Ss Aug25 0:00 >> > postgres: background writer >> > admin 3213253 0.0 0.0 196956 8600 ? Ss Aug25 0:00 >> > postgres: walwriter >> > admin 3213254 0.0 0.0 198424 7316 ? Ss Aug25 0:00 >> > postgres: autovacuum launcher >> > admin 3213255 0.0 0.0 198412 5488 ? Ss Aug25 0:00 >> > postgres: logical replication launcher >> > admin 3237967 0.0 0.0 2484 516 pts/4 S+ Aug25 0:00 >> > /bin/sh -c echo "# +++ regress check in src/test/regress +++" && >> > PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH" >> > LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib" >> > INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template >> > ../../../src/test/regress/pg_regress --temp-instance=./tmp_check >> > --inputdir=. --bindir= --dlpath=. --max-concurrent-tests=20 >> > --schedule=./parallel_schedule >> > admin 3237973 0.0 0.0 197880 20688 pts/4 S+ Aug25 0:00 >> > postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c >> > listen_addresses= -k /tmp/pg_regress-7mmGUa >> > admin 3237976 0.0 0.1 198332 44608 ? Ss Aug25 0:00 >> > postgres: checkpointer >> > admin 3237977 0.0 0.0 198032 4640 ? Ss Aug25 0:00 >> > postgres: background writer >> > admin 3237979 0.0 0.0 197880 8580 ? Ss Aug25 0:00 >> > postgres: walwriter >> > admin 3237980 0.0 0.0 199484 7608 ? Ss Aug25 0:00 >> > postgres: autovacuum launcher >> > admin 3237981 0.0 0.0 199460 5488 ? Ss Aug25 0:00 >> > postgres: logical replication launcher >> > admin 3243644 0.0 0.2 252400 74656 ? Ss Aug25 0:01 >> > postgres: admin regression [local] SELECT waiting >> > admin 3243645 0.0 0.1 205480 33992 ? Ss Aug25 0:00 >> > postgres: admin regression [local] SELECT waiting >> > admin 3243654 99.9 0.0 203156 31504 ? Rs Aug25 1437:49 >> > postgres: admin regression [local] VACUUM >> > admin 3243655 0.0 0.1 212036 38504 ? Ss Aug25 0:00 >> > postgres: admin regression [local] SELECT waiting >> > admin 3243656 0.0 0.0 206024 30892 ? Ss Aug25 0:00 >> > postgres: admin regression [local] DELETE waiting >> > admin 3243657 0.0 0.1 205568 32232 ? Ss Aug25 0:00 >> > postgres: admin regression [local] ALTER TABLE waiting >> > admin 3243658 0.0 0.0 203740 21532 ? Ss Aug25 0:00 >> > postgres: admin regression [local] ANALYZE waiting >> > admin 3243798 0.0 0.0 199884 8464 ? Ss Aug25 0:00 >> > postgres: autovacuum worker >> > admin 3244733 0.0 0.0 199492 5956 ? Ss Aug25 0:00 >> > postgres: autovacuum worker >> > admin 3245652 0.0 0.0 199884 8468 ? Ss Aug25 0:00 >> > postgres: autovacuum worker >> > >> > As you can see there are a bunch of backends presumably waiting, and >> > also the VACUUM process has been pegging a single CPU core for at >> > least since that ~16 hour ago mark. >> > >> > I hope to be able to do more investigation later, but I wanted to at >> > least give you this information now. >> >> Thanks a lot for testing the patch! >> I really appreciate your cooperation. >> >> Hmm, I also tested on the current HEAD(165d581f146b09) again on Ubuntu >> 22.04 and macOS, but unfortunately(fortunately?) they succeeded as >> below: >> >> ``` >> $ git apply v29-0001-Add-function-to-log-the-plan-of-the-query.patch >> $ git apply >> v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch >> $ ./configure --enable-debug --enable-cassert >> $ make >> $ make check >> >> ...(snip)... >> >> # parallel group (5 tests): index_including index_including_gist >> create_view create_index create_index_spgist >> ok 66 + create_index 25033 ms >> ok 67 + create_index_spgist 26144 ms >> ok 68 + create_view 3061 ms >> ok 69 + index_including 976 ms >> ok 70 + index_including_gist 2998 ms >> # parallel group (16 tests): create_cast errors create_aggregate >> roleattributes drop_if_exists hash_func typed_table >> create_am select constraints updatable_views inherit triggers vacuum >> create_function_sql infinite_recurse >> ok 71 + create_aggregate 225 ms >> ok 72 + create_function_sql 18874 ms >> ok 73 + create_cast 168 ms >> >> ...(snip)... >> >> # All 215 tests passed. >> ``` >> >> If you notice any difference, I would be grateful if you could let me >> know. > > I've never been able to reproduce it (haven't tested the new version, > but v28 at least) on my M1 Mac; where I've reproduced it is on Debian > (first buster and now bullseye). > > I'm attaching several stacktraces in the hope that they provide some > clues. These all match the ps output I sent earlier, though note in > that output there is both the regress instance and my test instance > (pid 3213249) running (different ports, of course, and they are from > the exact same compilation run). I've attached ps output for the > postgres processes under the make check process to simplify cross > referencing. > > A few interesting things: > - There's definitely a lock on a relation that seems to be what's > blocking the processes. > - When I try to connect with psql the process forks but then hangs > (see the ps output with task names stuck in "authentication"). I've > also included a trace from one of these. Thanks for sharing them! Many processes are waiting to acquire the LW lock, including the process trying to output the plan(select1.trace). I suspect that this is due to a lock that was acquired prior to being interrupted by ProcessLogQueryPlanInterrupt(), but have not been able to reproduce the same situation.. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
On Tue, Sep 5, 2023 at 9:59 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2023-08-28 22:47, James Coleman wrote: > > On Mon, Aug 28, 2023 at 3:01 AM torikoshia <torikoshia@oss.nttdata.com> > > wrote: > >> > >> On 2023-08-26 21:03, James Coleman wrote: > >> > On Fri, Aug 25, 2023 at 7:43 AM James Coleman <jtc331@gmail.com> wrote: > >> >> > >> >> On Thu, Aug 17, 2023 at 10:02 AM torikoshia > >> >> <torikoshia@oss.nttdata.com> wrote: > >> >> > > >> >> > On 2023-06-16 01:34, James Coleman wrote: > >> >> > > Attached is v28 > >> >> > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish > >> >> > > when necessary. Once built with those two patches I'm simply running > >> >> > > `make check`. > >> >> > > >> >> > With v28-0001 and v28-0002 patch, I confirmed backend processes consume > >> >> > huge > >> >> > amount of memory and under some environments they were terminated by OOM > >> >> > killer. > >> >> > > >> >> > This was because memory was allocated from existing memory contexts and > >> >> > they > >> >> > were not freed after ProcessLogQueryPlanInterrupt(). > >> >> > Updated the patch to use dedicated memory context for > >> >> > ProcessLogQueryPlanInterrupt(). > >> >> > > >> >> > Applying attached patch and v28-0002 patch, `make check` successfully > >> >> > completed after 20min and 50GB of logs on my environment. > >> >> > > >> >> > >>> On 2023-06-15 01:48, James Coleman wrote: > >> >> > >>> > The tests have been running since last night, but have been apparently > >> >> > >>> > hung now for many hours. > >> >> > > >> >> > I don't know if this has anything to do with the hung you faced, but I > >> >> > thought > >> >> > it might be possible that the large amount of memory usage resulted in > >> >> > swapping, which caused a significant delay in processing. > >> >> > >> >> Ah, yes, I think that could be a possible explanation. I was delaying > >> >> on this thread because I wasn't comfortable with having caused an > >> >> issue once (even if I couldn't easily reproduce) without at least some > >> >> theory as to the cause (and a fix). > >> >> > >> >> > If possible, I would be very grateful if you could try to reproduce this > >> >> > with > >> >> > the v29 patch. > >> >> > >> >> I'll kick off some testing. > >> >> > >> > > >> > I don't have time to investigate what's happening here, but 24 hours > >> > later the first "make check" is still running, and at first glance it > >> > seems to have the same behavior I'd seen that first time. The test > >> > output is to this point: > >> > > >> > # parallel group (5 tests): index_including create_view > >> > index_including_gist create_index create_index_spgist > >> > ok 66 + create_index 26365 ms > >> > ok 67 + create_index_spgist 27675 ms > >> > ok 68 + create_view 1235 ms > >> > ok 69 + index_including 1102 ms > >> > ok 70 + index_including_gist 1633 ms > >> > # parallel group (16 tests): create_aggregate create_cast errors > >> > roleattributes drop_if_exists hash_func typed_table create_am > >> > infinite_recurse > >> > > >> > and it hasn't progressed past that point since at least ~16 hours ago > >> > (the first several hours of the run I wasn't monitoring it). > >> > > >> > I haven't connected up gdb yet, and won't be able to until maybe > >> > tomorrow, but here's the ps output for postgres processes that are > >> > running: > >> > > >> > admin 3213249 0.0 0.0 196824 20552 ? Ss Aug25 0:00 > >> > /home/admin/postgresql-test/bin/postgres -D > >> > /home/admin/postgresql-test-data > >> > admin 3213250 0.0 0.0 196964 7284 ? Ss Aug25 0:00 > >> > postgres: checkpointer > >> > admin 3213251 0.0 0.0 196956 4276 ? Ss Aug25 0:00 > >> > postgres: background writer > >> > admin 3213253 0.0 0.0 196956 8600 ? Ss Aug25 0:00 > >> > postgres: walwriter > >> > admin 3213254 0.0 0.0 198424 7316 ? Ss Aug25 0:00 > >> > postgres: autovacuum launcher > >> > admin 3213255 0.0 0.0 198412 5488 ? Ss Aug25 0:00 > >> > postgres: logical replication launcher > >> > admin 3237967 0.0 0.0 2484 516 pts/4 S+ Aug25 0:00 > >> > /bin/sh -c echo "# +++ regress check in src/test/regress +++" && > >> > PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH" > >> > LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib" > >> > INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template > >> > ../../../src/test/regress/pg_regress --temp-instance=./tmp_check > >> > --inputdir=. --bindir= --dlpath=. --max-concurrent-tests=20 > >> > --schedule=./parallel_schedule > >> > admin 3237973 0.0 0.0 197880 20688 pts/4 S+ Aug25 0:00 > >> > postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c > >> > listen_addresses= -k /tmp/pg_regress-7mmGUa > >> > admin 3237976 0.0 0.1 198332 44608 ? Ss Aug25 0:00 > >> > postgres: checkpointer > >> > admin 3237977 0.0 0.0 198032 4640 ? Ss Aug25 0:00 > >> > postgres: background writer > >> > admin 3237979 0.0 0.0 197880 8580 ? Ss Aug25 0:00 > >> > postgres: walwriter > >> > admin 3237980 0.0 0.0 199484 7608 ? Ss Aug25 0:00 > >> > postgres: autovacuum launcher > >> > admin 3237981 0.0 0.0 199460 5488 ? Ss Aug25 0:00 > >> > postgres: logical replication launcher > >> > admin 3243644 0.0 0.2 252400 74656 ? Ss Aug25 0:01 > >> > postgres: admin regression [local] SELECT waiting > >> > admin 3243645 0.0 0.1 205480 33992 ? Ss Aug25 0:00 > >> > postgres: admin regression [local] SELECT waiting > >> > admin 3243654 99.9 0.0 203156 31504 ? Rs Aug25 1437:49 > >> > postgres: admin regression [local] VACUUM > >> > admin 3243655 0.0 0.1 212036 38504 ? Ss Aug25 0:00 > >> > postgres: admin regression [local] SELECT waiting > >> > admin 3243656 0.0 0.0 206024 30892 ? Ss Aug25 0:00 > >> > postgres: admin regression [local] DELETE waiting > >> > admin 3243657 0.0 0.1 205568 32232 ? Ss Aug25 0:00 > >> > postgres: admin regression [local] ALTER TABLE waiting > >> > admin 3243658 0.0 0.0 203740 21532 ? Ss Aug25 0:00 > >> > postgres: admin regression [local] ANALYZE waiting > >> > admin 3243798 0.0 0.0 199884 8464 ? Ss Aug25 0:00 > >> > postgres: autovacuum worker > >> > admin 3244733 0.0 0.0 199492 5956 ? Ss Aug25 0:00 > >> > postgres: autovacuum worker > >> > admin 3245652 0.0 0.0 199884 8468 ? Ss Aug25 0:00 > >> > postgres: autovacuum worker > >> > > >> > As you can see there are a bunch of backends presumably waiting, and > >> > also the VACUUM process has been pegging a single CPU core for at > >> > least since that ~16 hour ago mark. > >> > > >> > I hope to be able to do more investigation later, but I wanted to at > >> > least give you this information now. > >> > >> Thanks a lot for testing the patch! > >> I really appreciate your cooperation. > >> > >> Hmm, I also tested on the current HEAD(165d581f146b09) again on Ubuntu > >> 22.04 and macOS, but unfortunately(fortunately?) they succeeded as > >> below: > >> > >> ``` > >> $ git apply v29-0001-Add-function-to-log-the-plan-of-the-query.patch > >> $ git apply > >> v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch > >> $ ./configure --enable-debug --enable-cassert > >> $ make > >> $ make check > >> > >> ...(snip)... > >> > >> # parallel group (5 tests): index_including index_including_gist > >> create_view create_index create_index_spgist > >> ok 66 + create_index 25033 ms > >> ok 67 + create_index_spgist 26144 ms > >> ok 68 + create_view 3061 ms > >> ok 69 + index_including 976 ms > >> ok 70 + index_including_gist 2998 ms > >> # parallel group (16 tests): create_cast errors create_aggregate > >> roleattributes drop_if_exists hash_func typed_table > >> create_am select constraints updatable_views inherit triggers vacuum > >> create_function_sql infinite_recurse > >> ok 71 + create_aggregate 225 ms > >> ok 72 + create_function_sql 18874 ms > >> ok 73 + create_cast 168 ms > >> > >> ...(snip)... > >> > >> # All 215 tests passed. > >> ``` > >> > >> If you notice any difference, I would be grateful if you could let me > >> know. > > > > I've never been able to reproduce it (haven't tested the new version, > > but v28 at least) on my M1 Mac; where I've reproduced it is on Debian > > (first buster and now bullseye). > > > > I'm attaching several stacktraces in the hope that they provide some > > clues. These all match the ps output I sent earlier, though note in > > that output there is both the regress instance and my test instance > > (pid 3213249) running (different ports, of course, and they are from > > the exact same compilation run). I've attached ps output for the > > postgres processes under the make check process to simplify cross > > referencing. > > > > A few interesting things: > > - There's definitely a lock on a relation that seems to be what's > > blocking the processes. > > - When I try to connect with psql the process forks but then hangs > > (see the ps output with task names stuck in "authentication"). I've > > also included a trace from one of these. > > Thanks for sharing them! > > Many processes are waiting to acquire the LW lock, including the process > trying to output the plan(select1.trace). > > I suspect that this is due to a lock that was acquired prior to being > interrupted by ProcessLogQueryPlanInterrupt(), but have not been able to > reproduce the same situation.. > I don't have time immediately to dig into this, but perhaps loading up the core dumps would allow us to see what query is running in each backend process (if it hasn't already been discarded by that point) and thereby determine what point in each test process led to the error condition? Alternatively we might be able to apply the same trick to the test client instead... BTW, for my own easy reference in this thread: relid 1259 is pg_class if I'm not mistaken. Thoughts? James Coleman
On 2023-09-06 11:17, James Coleman wrote: >> > I've never been able to reproduce it (haven't tested the new version, >> > but v28 at least) on my M1 Mac; where I've reproduced it is on Debian >> > (first buster and now bullseye). >> > >> > I'm attaching several stacktraces in the hope that they provide some >> > clues. These all match the ps output I sent earlier, though note in >> > that output there is both the regress instance and my test instance >> > (pid 3213249) running (different ports, of course, and they are from >> > the exact same compilation run). I've attached ps output for the >> > postgres processes under the make check process to simplify cross >> > referencing. >> > >> > A few interesting things: >> > - There's definitely a lock on a relation that seems to be what's >> > blocking the processes. >> > - When I try to connect with psql the process forks but then hangs >> > (see the ps output with task names stuck in "authentication"). I've >> > also included a trace from one of these. >> >> Thanks for sharing them! >> >> Many processes are waiting to acquire the LW lock, including the >> process >> trying to output the plan(select1.trace). >> >> I suspect that this is due to a lock that was acquired prior to being >> interrupted by ProcessLogQueryPlanInterrupt(), but have not been able >> to >> reproduce the same situation.. >> > > I don't have time immediately to dig into this, but perhaps loading up > the core dumps would allow us to see what query is running in each > backend process (if it hasn't already been discarded by that point) > and thereby determine what point in each test process led to the error > condition? Thanks for the suggestion. I am concerned that core dumps may not be readable on different operating systems or other environments. (Unfortunately, I do not have Debian on hand) It seems that we can know what queries were running from the stack traces you shared. As described above, I suspect a lock which was acquired prior to ProcessLogQueryPlanInterrupt() caused the issue. I will try a little more to see if I can devise a way to create the same situation. > Alternatively we might be able to apply the same trick to the test > client instead... > > BTW, for my own easy reference in this thread: relid 1259 is pg_class > if I'm not mistaken. Yeah, and I think it's strange that the lock to 1259 appears twice and must be avoided. #10 0x0000559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) at lmgr.c:117 .. #49 0x0000559d61b4989d in ProcessLogQueryPlanInterrupt () at explain.c:5158 .. #53 0x0000559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) at lmgr.c:117 -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
On Thu, Sep 7, 2023, at 1:09 PM, torikoshia wrote: > On 2023-09-06 11:17, James Coleman wrote: > It seems that we can know what queries were running from the stack > traces you shared. > As described above, I suspect a lock which was acquired prior to > ProcessLogQueryPlanInterrupt() caused the issue. > I will try a little more to see if I can devise a way to create the same > situation. Hi, I have explored this patch and, by and large, agree with the code. But some questions I want to discuss: 1. Maybe add a hook to give a chance for extensions, like pg_query_state, to do their job? 2. In this implementation, ProcessInterrupts does a lot of work during the explain creation: memory allocations, pass throughthe plan, systables locks, syscache access, etc. I guess it can change the semantic meaning of 'safe place' whereCHECK_FOR_INTERRUPTS can be called - I can imagine a SELECT query, which would call a stored procedure, which executessome DDL and acquires row exclusive lock at the time of interruption. So, in my mind, it is too unpredictable tomake the explain in the place of interruption processing. Maybe it is better to think about some hook at the end of ExecProcNode,where a pending explain could be created? Also, I suggest minor code change with the diff in attachment. -- Regards, Andrei Lepikhov
Attachment
On 2023-09-15 15:21, Lepikhov Andrei wrote: > On Thu, Sep 7, 2023, at 1:09 PM, torikoshia wrote: >> On 2023-09-06 11:17, James Coleman wrote: >> It seems that we can know what queries were running from the stack >> traces you shared. >> As described above, I suspect a lock which was acquired prior to >> ProcessLogQueryPlanInterrupt() caused the issue. >> I will try a little more to see if I can devise a way to create the >> same >> situation. > Hi, > I have explored this patch and, by and large, agree with the code. But > some questions I want to discuss: > 1. Maybe add a hook to give a chance for extensions, like > pg_query_state, to do their job? Do you imagine adding a hook which enables adding custom interrupt codes like below? https://github.com/postgrespro/pg_query_state/blob/master/patches/custom_signals_15.0.patch If so, that would be possible, but this patch doesn't require the functionality and I feel it'd be better doing in independent patch. > 2. In this implementation, ProcessInterrupts does a lot of work during > the explain creation: memory allocations, pass through the plan, > systables locks, syscache access, etc. I guess it can change the > semantic meaning of 'safe place' where CHECK_FOR_INTERRUPTS can be > called - I can imagine a SELECT query, which would call a stored > procedure, which executes some DDL and acquires row exclusive lock at > the time of interruption. So, in my mind, it is too unpredictable to > make the explain in the place of interruption processing. Maybe it is > better to think about some hook at the end of ExecProcNode, where a > pending explain could be created? Yeah, I withdrew this patch once for that reason, but we are resuming development in response to the results of a discussion by James and others at this year's pgcon about the safety of this process in CFI: https://www.postgresql.org/message-id/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw%40mail.gmail.com BTW it seems pg_query_state also enables users to get running query plans using CFI. Does pg_query_state do something for this safety concern? > Also, I suggest minor code change with the diff in attachment. Thanks! This might be biased opinion and objections are welcomed, but I feel the original patch is easier to read since PG_RETURN_BOOL(true/false) is located in near place to each cases. Also the existing function pg_log_backend_memory_contexts(), which does the same thing, has the same form as the original patch. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
On Tue, Sep 19, 2023, at 8:39 PM, torikoshia wrote: > On 2023-09-15 15:21, Lepikhov Andrei wrote: >> On Thu, Sep 7, 2023, at 1:09 PM, torikoshia wrote: >> I have explored this patch and, by and large, agree with the code. But >> some questions I want to discuss: >> 1. Maybe add a hook to give a chance for extensions, like >> pg_query_state, to do their job? > > Do you imagine adding a hook which enables adding custom interrupt codes > like below? > > https://github.com/postgrespro/pg_query_state/blob/master/patches/custom_signals_15.0.patch No, I think around the hook, which would allow us to rewrite the pg_query_state extension without additional patches byusing the functionality provided by your patch. I mean, an extension could provide console UI, not only server logging.And obtain from target backend so much information in the explain as the instrumentation level of the current querycan give. It may make pg_query_state shorter and more stable. >> 2. In this implementation, ProcessInterrupts does a lot of work during >> the explain creation: memory allocations, pass through the plan, >> systables locks, syscache access, etc. I guess it can change the >> semantic meaning of 'safe place' where CHECK_FOR_INTERRUPTS can be >> called - I can imagine a SELECT query, which would call a stored >> procedure, which executes some DDL and acquires row exclusive lock at >> the time of interruption. So, in my mind, it is too unpredictable to >> make the explain in the place of interruption processing. Maybe it is >> better to think about some hook at the end of ExecProcNode, where a >> pending explain could be created? > > Yeah, I withdrew this patch once for that reason, but we are resuming > development in response to the results of a discussion by James and > others at this year's pgcon about the safety of this process in CFI: > > https://www.postgresql.org/message-id/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw%40mail.gmail.com I can't track the logic path of the decision provided at this conference. But my anxiety related to specific place, whereActiveQueryDesc points top-level query and interruption comes during DDL, wrapped up in stored procedure. For example: CREATE TABLE test(); CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ BEGIN EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); ... END; $$ LANGUAGE plpgsql VOLATILE; SELECT ddl(), ... FROM ...; > BTW it seems pg_query_state also enables users to get running query > plans using CFI. > Does pg_query_state do something for this safety concern? No, and I'm looking for the solution, which could help to rewrite pg_query_state as a clean extension, without patches. >> Also, I suggest minor code change with the diff in attachment. > > Thanks! > > This might be biased opinion and objections are welcomed, but I feel the > original patch is easier to read since PG_RETURN_BOOL(true/false) is > located in near place to each cases. > Also the existing function pg_log_backend_memory_contexts(), which does > the same thing, has the same form as the original patch. I got it, thank you. -- Regards, Andrei Lepikhov
On 2023-09-20 14:39, Lepikhov Andrei wrote: Thanks for your reply. > On Tue, Sep 19, 2023, at 8:39 PM, torikoshia wrote: >> On 2023-09-15 15:21, Lepikhov Andrei wrote: >>> On Thu, Sep 7, 2023, at 1:09 PM, torikoshia wrote: >>> I have explored this patch and, by and large, agree with the code. >>> But >>> some questions I want to discuss: >>> 1. Maybe add a hook to give a chance for extensions, like >>> pg_query_state, to do their job? >> >> Do you imagine adding a hook which enables adding custom interrupt >> codes >> like below? >> >> https://github.com/postgrespro/pg_query_state/blob/master/patches/custom_signals_15.0.patch > > No, I think around the hook, which would allow us to rewrite the > pg_query_state extension without additional patches by using the > functionality provided by your patch. I mean, an extension could > provide console UI, not only server logging. And obtain from target > backend so much information in the explain as the instrumentation > level of the current query can give. > It may make pg_query_state shorter and more stable. > >>> 2. In this implementation, ProcessInterrupts does a lot of work >>> during >>> the explain creation: memory allocations, pass through the plan, >>> systables locks, syscache access, etc. I guess it can change the >>> semantic meaning of 'safe place' where CHECK_FOR_INTERRUPTS can be >>> called - I can imagine a SELECT query, which would call a stored >>> procedure, which executes some DDL and acquires row exclusive lock at >>> the time of interruption. So, in my mind, it is too unpredictable to >>> make the explain in the place of interruption processing. Maybe it is >>> better to think about some hook at the end of ExecProcNode, where a >>> pending explain could be created? >> >> Yeah, I withdrew this patch once for that reason, but we are resuming >> development in response to the results of a discussion by James and >> others at this year's pgcon about the safety of this process in CFI: >> >> https://www.postgresql.org/message-id/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw%40mail.gmail.com > > I can't track the logic path of the decision provided at this > conference. But my anxiety related to specific place, where > ActiveQueryDesc points top-level query and interruption comes during > DDL, wrapped up in stored procedure. For example: > CREATE TABLE test(); > CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ > BEGIN > EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); > ... > END; $$ LANGUAGE plpgsql VOLATILE; > SELECT ddl(), ... FROM ...; Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on all CFI using v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then ran the following query but did not cause any problems. ``` =# CREATE TABLE test(); =# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ BEGIN EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); PERFORM pg_sleep(5); END; $$ LANGUAGE plpgsql VOLATILE; =# SELECT ddl(); ``` Is this the case you're worrying about? -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
On 25/9/2023 14:21, torikoshia wrote: > On 2023-09-20 14:39, Lepikhov Andrei wrote: > Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on > all CFI using > v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then > ran the following query but did not cause any problems. > > ``` > =# CREATE TABLE test(); > =# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ > BEGIN > EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); > PERFORM pg_sleep(5); > END; $$ LANGUAGE plpgsql VOLATILE; > =# SELECT ddl(); > ``` > > Is this the case you're worrying about? I didn't find a problem either. I just feel uncomfortable if, at the moment of interruption, we have a descriptor of another query than the query have been executing and holding resources. -- regards, Andrey Lepikhov Postgres Professional
On 2023-09-25 18:49, Andrey Lepikhov wrote: > On 25/9/2023 14:21, torikoshia wrote: >> On 2023-09-20 14:39, Lepikhov Andrei wrote: >> Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on >> all CFI using >> v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and then >> ran the following query but did not cause any problems. >> >> ``` >> =# CREATE TABLE test(); >> =# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ >> BEGIN >> EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); >> PERFORM pg_sleep(5); >> END; $$ LANGUAGE plpgsql VOLATILE; >> =# SELECT ddl(); >> ``` >> >> Is this the case you're worrying about? > > I didn't find a problem either. I just feel uncomfortable if, at the > moment of interruption, we have a descriptor of another query than the > query have been executing and holding resources. I think that "descriptor" here refers to ActiveQueryDesc, in which case it is updated at the beginning of ExecutorRun(), so I am wondering if the situation you're worried about would not occur. --------- @@ -303,10 +306,21 @@ ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, uint64 count, bool execute_once) { + /* + * Update ActiveQueryDesc here to enable retrieval of the currently + * running queryDesc for nested queries. + */ + QueryDesc *save_ActiveQueryDesc; + + save_ActiveQueryDesc = ActiveQueryDesc; + ActiveQueryDesc = queryDesc; --------- -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
On 28/9/2023 09:04, torikoshia wrote: > On 2023-09-25 18:49, Andrey Lepikhov wrote: >> On 25/9/2023 14:21, torikoshia wrote: >>> On 2023-09-20 14:39, Lepikhov Andrei wrote: >>> Hmm, as a test, I made sure to call ProcessLogQueryPlanInterrupt() on >>> all CFI using >>> v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch, and >>> then ran the following query but did not cause any problems. >>> >>> ``` >>> =# CREATE TABLE test(); >>> =# CREATE OR REPLACE FUNCTION ddl() RETURNS void AS $$ >>> BEGIN >>> EXECUTE format('ALTER TABLE test ADD COLUMN x integer;'); >>> PERFORM pg_sleep(5); >>> END; $$ LANGUAGE plpgsql VOLATILE; >>> =# SELECT ddl(); >>> ``` >>> >>> Is this the case you're worrying about? >> >> I didn't find a problem either. I just feel uncomfortable if, at the >> moment of interruption, we have a descriptor of another query than the >> query have been executing and holding resources. > > I think that "descriptor" here refers to ActiveQueryDesc, in which case > it is updated at the beginning of ExecutorRun(), so I am wondering if > the situation you're worried about would not occur. As you can see, in my example we have the only DDL and no queries with plans. In this case postgres doesn't call ExecutorRun() just because it doesn't have a plan. But locks will be obtained. -- regards, Andrey Lepikhov Postgres Professional
On Thu, Sep 7, 2023 at 2:09 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2023-09-06 11:17, James Coleman wrote: > > >> > I've never been able to reproduce it (haven't tested the new version, > >> > but v28 at least) on my M1 Mac; where I've reproduced it is on Debian > >> > (first buster and now bullseye). > >> > > >> > I'm attaching several stacktraces in the hope that they provide some > >> > clues. These all match the ps output I sent earlier, though note in > >> > that output there is both the regress instance and my test instance > >> > (pid 3213249) running (different ports, of course, and they are from > >> > the exact same compilation run). I've attached ps output for the > >> > postgres processes under the make check process to simplify cross > >> > referencing. > >> > > >> > A few interesting things: > >> > - There's definitely a lock on a relation that seems to be what's > >> > blocking the processes. > >> > - When I try to connect with psql the process forks but then hangs > >> > (see the ps output with task names stuck in "authentication"). I've > >> > also included a trace from one of these. > >> > >> Thanks for sharing them! > >> > >> Many processes are waiting to acquire the LW lock, including the > >> process > >> trying to output the plan(select1.trace). > >> > >> I suspect that this is due to a lock that was acquired prior to being > >> interrupted by ProcessLogQueryPlanInterrupt(), but have not been able > >> to > >> reproduce the same situation.. > >> > > > > I don't have time immediately to dig into this, but perhaps loading up > > the core dumps would allow us to see what query is running in each > > backend process (if it hasn't already been discarded by that point) > > and thereby determine what point in each test process led to the error > > condition? > > Thanks for the suggestion. > I am concerned that core dumps may not be readable on different > operating systems or other environments. (Unfortunately, I do not have > Debian on hand) > > It seems that we can know what queries were running from the stack > traces you shared. > As described above, I suspect a lock which was acquired prior to > ProcessLogQueryPlanInterrupt() caused the issue. > I will try a little more to see if I can devise a way to create the same > situation. > > > Alternatively we might be able to apply the same trick to the test > > client instead... > > > > BTW, for my own easy reference in this thread: relid 1259 is pg_class > > if I'm not mistaken. > > Yeah, and I think it's strange that the lock to 1259 appears twice and > must be avoided. > > #10 0x0000559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) at > lmgr.c:117 > .. > #49 0x0000559d61b4989d in ProcessLogQueryPlanInterrupt () at > explain.c:5158 > .. > #53 0x0000559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) at > lmgr.c:117 I chatted with Andres and David about this at PGConf.NYC, and I think what we need to do is explicitly disallow running this code any time we are inside of lock acquisition code. Regards, James Coleman
On 2023-10-04 03:00, James Coleman wrote: > On Thu, Sep 7, 2023 at 2:09 AM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> >> On 2023-09-06 11:17, James Coleman wrote: >> >> >> > I've never been able to reproduce it (haven't tested the new version, >> >> > but v28 at least) on my M1 Mac; where I've reproduced it is on Debian >> >> > (first buster and now bullseye). >> >> > >> >> > I'm attaching several stacktraces in the hope that they provide some >> >> > clues. These all match the ps output I sent earlier, though note in >> >> > that output there is both the regress instance and my test instance >> >> > (pid 3213249) running (different ports, of course, and they are from >> >> > the exact same compilation run). I've attached ps output for the >> >> > postgres processes under the make check process to simplify cross >> >> > referencing. >> >> > >> >> > A few interesting things: >> >> > - There's definitely a lock on a relation that seems to be what's >> >> > blocking the processes. >> >> > - When I try to connect with psql the process forks but then hangs >> >> > (see the ps output with task names stuck in "authentication"). I've >> >> > also included a trace from one of these. >> >> >> >> Thanks for sharing them! >> >> >> >> Many processes are waiting to acquire the LW lock, including the >> >> process >> >> trying to output the plan(select1.trace). >> >> >> >> I suspect that this is due to a lock that was acquired prior to being >> >> interrupted by ProcessLogQueryPlanInterrupt(), but have not been able >> >> to >> >> reproduce the same situation.. >> >> >> > >> > I don't have time immediately to dig into this, but perhaps loading up >> > the core dumps would allow us to see what query is running in each >> > backend process (if it hasn't already been discarded by that point) >> > and thereby determine what point in each test process led to the error >> > condition? >> >> Thanks for the suggestion. >> I am concerned that core dumps may not be readable on different >> operating systems or other environments. (Unfortunately, I do not have >> Debian on hand) >> >> It seems that we can know what queries were running from the stack >> traces you shared. >> As described above, I suspect a lock which was acquired prior to >> ProcessLogQueryPlanInterrupt() caused the issue. >> I will try a little more to see if I can devise a way to create the >> same >> situation. >> >> > Alternatively we might be able to apply the same trick to the test >> > client instead... >> > >> > BTW, for my own easy reference in this thread: relid 1259 is pg_class >> > if I'm not mistaken. >> >> Yeah, and I think it's strange that the lock to 1259 appears twice and >> must be avoided. >> >> #10 0x0000559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) >> at >> lmgr.c:117 >> .. >> #49 0x0000559d61b4989d in ProcessLogQueryPlanInterrupt () at >> explain.c:5158 >> .. >> #53 0x0000559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) >> at >> lmgr.c:117 > > I chatted with Andres and David about this at PGConf.NYC, Thanks again for the discussion with hackers! > and I think > what we need to do is explicitly disallow running this code any time > we are inside of lock acquisition code. Yeah, I think it's a straightforward workaround. And I'm also concerned that the condition of being in the process of acquiring some kind of lock is too strict and will make it almost impossible to output a running plan. Anyway I'm going to implement it and run pg_log_query_plan() while the regression test is running to see how successful the plan output is. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
On Fri, Oct 6, 2023 at 8:58 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2023-10-04 03:00, James Coleman wrote: > > On Thu, Sep 7, 2023 at 2:09 AM torikoshia <torikoshia@oss.nttdata.com> > > wrote: > >> > >> On 2023-09-06 11:17, James Coleman wrote: > >> > >> >> > I've never been able to reproduce it (haven't tested the new version, > >> >> > but v28 at least) on my M1 Mac; where I've reproduced it is on Debian > >> >> > (first buster and now bullseye). > >> >> > > >> >> > I'm attaching several stacktraces in the hope that they provide some > >> >> > clues. These all match the ps output I sent earlier, though note in > >> >> > that output there is both the regress instance and my test instance > >> >> > (pid 3213249) running (different ports, of course, and they are from > >> >> > the exact same compilation run). I've attached ps output for the > >> >> > postgres processes under the make check process to simplify cross > >> >> > referencing. > >> >> > > >> >> > A few interesting things: > >> >> > - There's definitely a lock on a relation that seems to be what's > >> >> > blocking the processes. > >> >> > - When I try to connect with psql the process forks but then hangs > >> >> > (see the ps output with task names stuck in "authentication"). I've > >> >> > also included a trace from one of these. > >> >> > >> >> Thanks for sharing them! > >> >> > >> >> Many processes are waiting to acquire the LW lock, including the > >> >> process > >> >> trying to output the plan(select1.trace). > >> >> > >> >> I suspect that this is due to a lock that was acquired prior to being > >> >> interrupted by ProcessLogQueryPlanInterrupt(), but have not been able > >> >> to > >> >> reproduce the same situation.. > >> >> > >> > > >> > I don't have time immediately to dig into this, but perhaps loading up > >> > the core dumps would allow us to see what query is running in each > >> > backend process (if it hasn't already been discarded by that point) > >> > and thereby determine what point in each test process led to the error > >> > condition? > >> > >> Thanks for the suggestion. > >> I am concerned that core dumps may not be readable on different > >> operating systems or other environments. (Unfortunately, I do not have > >> Debian on hand) > >> > >> It seems that we can know what queries were running from the stack > >> traces you shared. > >> As described above, I suspect a lock which was acquired prior to > >> ProcessLogQueryPlanInterrupt() caused the issue. > >> I will try a little more to see if I can devise a way to create the > >> same > >> situation. > >> > >> > Alternatively we might be able to apply the same trick to the test > >> > client instead... > >> > > >> > BTW, for my own easy reference in this thread: relid 1259 is pg_class > >> > if I'm not mistaken. > >> > >> Yeah, and I think it's strange that the lock to 1259 appears twice and > >> must be avoided. > >> > >> #10 0x0000559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) > >> at > >> lmgr.c:117 > >> .. > >> #49 0x0000559d61b4989d in ProcessLogQueryPlanInterrupt () at > >> explain.c:5158 > >> .. > >> #53 0x0000559d61d8ee6e in LockRelationOid (relid=1259, lockmode=1) > >> at > >> lmgr.c:117 > > > > I chatted with Andres and David about this at PGConf.NYC, > Thanks again for the discussion with hackers! > > > and I think > > what we need to do is explicitly disallow running this code any time > > we are inside of lock acquisition code. > > Yeah, I think it's a straightforward workaround. > And I'm also concerned that the condition of being in the process > of acquiring some kind of lock is too strict and will make it almost > impossible to output a running plan. I was concerned about this too, but I was wondering if we might be able to cheat a bit by making such a case not clear the flag but instead just skip it for now. Regards, James
Hi, On 2023-10-06 21:58:46 +0900, torikoshia wrote: > Yeah, I think it's a straightforward workaround. > And I'm also concerned that the condition of being in the process > of acquiring some kind of lock is too strict and will make it almost > impossible to output a running plan. How so? We shouldn't commonly acquire relevant locks while executing a query? With a few exceptions, they should instead be acquired t the start of query processing. We do acquire a lot of lwlocks, obviously, but we don't process interrupts during the acquisition / holding of lwlocks. And presumably the interrupt would just be processed the next time interrupt processing is happening? Greetings, Andres Freund
On 2023-10-04 03:00, James Coleman wrote: > and I think > what we need to do is explicitly disallow running this code any time > we are inside of lock acquisition code. Updated patch to check if any locks have already been acquired by examining MyProc->heldLocks. I'm not sure this change can "disallow running this code `any time` we are inside of lock acquisition code", but as far as select1.trace, which you shared, I believe it can prevent running explain codes since it must have set MyProc->heldLocks in LockAcquireExtended() before WaitOnLock(): ``` /* * Set bitmask of locks this process already holds on this object. */ MyProc->heldLocks = proclock->holdMask; ..(snip).. WaitOnLock(locallock, owner); ``` On 2023-10-07 00:58, Andres Freund wrote: > How so? We shouldn't commonly acquire relevant locks while executing a > query? > With a few exceptions, they should instead be acquired t the start of > query > processing. We do acquire a lot of lwlocks, obviously, but we don't > process > interrupts during the acquisition / holding of lwlocks. > > And presumably the interrupt would just be processed the next time > interrupt > processing is happening? Thanks for your comments! I tested v30 patch with v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch which makes CFI() call ProcessLogQueryPlanInterrupt() internally, and confirmed that very few logging queries failed with v30 patch. This is a result in line with your prediction. ``` $ rg -c'ignored request for logging query plan due to lock confilcts' postmaster.log 8 ``` -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Attachment
Hi, On Tue, Oct 10, 2023 at 7:00 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > Thanks for your comments! > > I tested v30 patch with > v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch which makes > CFI() call ProcessLogQueryPlanInterrupt() internally, and confirmed that > very few logging queries failed with v30 patch. > > This is a result in line with your prediction. > > > ``` > $ rg -c'ignored request for logging query plan due to lock confilcts' > postmaster.log > 8 Like many others I think this feature is useful to debug a long running query. Sorry for jumping late into this. I have a few of high level comments There is a lot of similarity between what this feature does and what auto explain does. I see the code is also duplicated. There is some merit in avoiding this duplication 1. we will get all the features of auto_explain automatically like choosing a format (this was expressed somebody earlier in this thread), setings etc. 2. avoid bugs. E.g your code switches context after ExplainState has been allocated. These states may leak depending upon when this function gets called. 3. Building features on top as James envisions will be easier. Considering the similarity with auto_explain I wondered whether this function should be part of auto_explain contrib module itself? If we do that users will need to load auto_explain extension and thus install executor hooks when this function doesn't need those. So may not be such a good idea. I didn't see any discussion on this. I tried following query to pass PID of a non-client backend to this function. #select pg_log_query_plan(pid), application_name, backend_type from pg_stat_activity where backend_type = 'autovacuum launcher'; pg_log_query_plan | application_name | backend_type -------------------+------------------+--------------------- t | | autovacuum launcher (1 row) I see "LOG: backend with PID 2733631 is not running a query or a subtransaction is aborted" in server logs. That's ok. But may be we should not send signal to these kinds of backends at all, thus avoiding some system calls. I am also wondering whether it's better to report the WARNING as status column in the output. E.g. instead of #select pg_log_query_plan(100); WARNING: PID 100 is not a PostgreSQL backend process pg_log_query_plan ------------------- f (1 row) we output #select pg_log_query_plan(100); pg_log_query_plan | status -------------------+--------------------------------------------- f | PID 100 is not a PostgreSQL backend process (1 row) That looks neater and can easily be handled by scripts, applications and such. But it will be inconsistent with other functions like pg_terminate_backend() and pg_log_backend_memory_contexts(). I do share a concern that was discussed earlier. If a query is running longer, there's something problematic with it. A diagnostic intervention breaking it further would be unwelcome. James has run experiments to shake this code for any loose breakages. He has not found any. So may be we are good. And we wouldn't know about very rare corner cases so easily without using it in the field. So fine with it. If we could add some safety net that will be great but may not be necessary for the first cut. -- Best Wishes, Ashutosh Bapat
On 2023-10-11 16:22, Ashutosh Bapat wrote: > Like many others I think this feature is useful to debug a long running > query. > > Sorry for jumping late into this. > > I have a few of high level comments Thanks for your comments! > There is a lot of similarity between what this feature does and what > auto explain does. I see the code is also duplicated. There is some > merit in avoiding this duplication > 1. we will get all the features of auto_explain automatically like > choosing a format (this was expressed somebody earlier in this > thread), setings etc. > 2. avoid bugs. E.g your code switches context after ExplainState has > been allocated. These states may leak depending upon when this > function gets called. > 3. Building features on top as James envisions will be easier. > > Considering the similarity with auto_explain I wondered whether this > function should be part of auto_explain contrib module itself? If we > do that users will need to load auto_explain extension and thus > install executor hooks when this function doesn't need those. So may > not be such a good idea. I didn't see any discussion on this. I once thought about adding this to auto_explain, but I left it asis for below reasons: - One of the typical use case of pg_log_query_plan() would be analyzing slow query on customer environments. On such environments, We cannot always control what extensions to install. Of course auto_explain is a major extension and it is quite possible that they installed auto_explain, but but it is also possible they do not. - It seems a bit counter-intuitive that pg_log_query_plan() is in an extension called auto_explain, since it `manually`` logs plans > I tried following query to pass PID of a non-client backend to this > function. > #select pg_log_query_plan(pid), application_name, backend_type from > pg_stat_activity where backend_type = 'autovacuum launcher'; > pg_log_query_plan | application_name | backend_type > -------------------+------------------+--------------------- > t | | autovacuum launcher > (1 row) > I see "LOG: backend with PID 2733631 is not running a query or a > subtransaction is aborted" in server logs. That's ok. But may be we > should not send signal to these kinds of backends at all, thus > avoiding some system calls. Agreed, it seems better. Attached patch checks if the backendType of target process is 'client backend'. =# select pg_log_query_plan(pid), application_name, backend_type from pg_stat_activity where backend_type = 'autovacuum launcher'; WARNING: PID 63323 is not a PostgreSQL client backend process pg_log_query_plan | application_name | backend_type -------------------+------------------+--------------------- f | | autovacuum launcher > I am also wondering whether it's better to report the WARNING as > status column in the output. E.g. instead of > #select pg_log_query_plan(100); > WARNING: PID 100 is not a PostgreSQL backend process > pg_log_query_plan > ------------------- > f > (1 row) > we output > #select pg_log_query_plan(100); > pg_log_query_plan | status > -------------------+--------------------------------------------- > f | PID 100 is not a PostgreSQL backend process > (1 row) > > That looks neater and can easily be handled by scripts, applications > and such. But it will be inconsistent with other functions like > pg_terminate_backend() and pg_log_backend_memory_contexts(). It seems neater, but it might be inconvenient because we can no longer use it in select list like the following query as you wrote: #select pg_log_query_plan(pid), application_name, backend_type from pg_stat_activity where backend_type = 'autovacuum launcher'; > I do share a concern that was discussed earlier. If a query is running > longer, there's something problematic with it. A diagnostic > intervention breaking it further would be unwelcome. James has run > experiments to shake this code for any loose breakages. He has not > found any. So may be we are good. And we wouldn't know about very rare > corner cases so easily without using it in the field. So fine with it. > If we could add some safety net that will be great but may not be > necessary for the first cut. If there are candidates for the safety net, I'm willing to add them. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Attachment
On Thu, Oct 12, 2023 at 6:51 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2023-10-11 16:22, Ashutosh Bapat wrote: > > > > Considering the similarity with auto_explain I wondered whether this > > function should be part of auto_explain contrib module itself? If we > > do that users will need to load auto_explain extension and thus > > install executor hooks when this function doesn't need those. So may > > not be such a good idea. I didn't see any discussion on this. > > I once thought about adding this to auto_explain, but I left it asis for > below reasons: > > - One of the typical use case of pg_log_query_plan() would be analyzing > slow query on customer environments. On such environments, We cannot > always control what extensions to install. The same argument applies to auto_explain functionality as well. But it's not part of the core. > Of course auto_explain is a major extension and it is quite possible > that they installed auto_explain, but but it is also possible they do > not. > - It seems a bit counter-intuitive that pg_log_query_plan() is in an > extension called auto_explain, since it `manually`` logs plans > pg_log_query_plan() may not fit auto_explain but pg_explain_backend_query() does. What we are logging is more than just plan of the query, it might expand to be closer to explain output. While auto in auto_explain would refer to its automatically logging explain outputs, it can provide an additional function which provides similar functionality by manually triggering it. But we can defer this to a committer, if you want. I am more interested in avoiding the duplication of code, esp. the first comment in my reply >> There is a lot of similarity between what this feature does and what >> auto explain does. I see the code is also duplicated. There is some >> merit in avoiding this duplication >> 1. we will get all the features of auto_explain automatically like >> choosing a format (this was expressed somebody earlier in this >> thread), setings etc. >> 2. avoid bugs. E.g your code switches context after ExplainState has >> been allocated. These states may leak depending upon when this >> function gets called. >> 3. Building features on top as James envisions will be easier. > > =# select pg_log_query_plan(pid), application_name, backend_type from > pg_stat_activity where backend_type = 'autovacuum launcher'; > WARNING: PID 63323 is not a PostgreSQL client backend process > pg_log_query_plan | application_name | backend_type > -------------------+------------------+--------------------- > f | | autovacuum launcher > > > > I am also wondering whether it's better to report the WARNING as > > status column in the output. E.g. instead of > > #select pg_log_query_plan(100); > > WARNING: PID 100 is not a PostgreSQL backend process > > pg_log_query_plan > > ------------------- > > f > > (1 row) > > we output > > #select pg_log_query_plan(100); > > pg_log_query_plan | status > > -------------------+--------------------------------------------- > > f | PID 100 is not a PostgreSQL backend process > > (1 row) > > > > That looks neater and can easily be handled by scripts, applications > > and such. But it will be inconsistent with other functions like > > pg_terminate_backend() and pg_log_backend_memory_contexts(). > > It seems neater, but it might be inconvenient because we can no longer > use it in select list like the following query as you wrote: > > #select pg_log_query_plan(pid), application_name, backend_type from > pg_stat_activity where backend_type = 'autovacuum launcher'; Why is that? -- Best Wishes, Ashutosh Bapat
On 2023-10-16 18:46, Ashutosh Bapat wrote: > On Thu, Oct 12, 2023 at 6:51 PM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> >> On 2023-10-11 16:22, Ashutosh Bapat wrote: >> > >> > Considering the similarity with auto_explain I wondered whether this >> > function should be part of auto_explain contrib module itself? If we >> > do that users will need to load auto_explain extension and thus >> > install executor hooks when this function doesn't need those. So may >> > not be such a good idea. I didn't see any discussion on this. >> >> I once thought about adding this to auto_explain, but I left it asis >> for >> below reasons: >> >> - One of the typical use case of pg_log_query_plan() would be >> analyzing >> slow query on customer environments. On such environments, We cannot >> always control what extensions to install. > > The same argument applies to auto_explain functionality as well. But > it's not part of the core. Yeah, and when we have a situation where we want to run pg_log_query_plan(), we can run it in any environment as long as it is bundled with the core. On the other hand, if it is built into auto_explain, we need to start by installing auto_explain if we do not have auto_explain, which is often difficult to do in production environments. >> Of course auto_explain is a major extension and it is quite >> possible >> that they installed auto_explain, but but it is also possible they do >> not. >> - It seems a bit counter-intuitive that pg_log_query_plan() is in an >> extension called auto_explain, since it `manually`` logs plans >> > > pg_log_query_plan() may not fit auto_explain but > pg_explain_backend_query() does. What we are logging is more than just > plan of the query, it might expand to be closer to explain output. > While auto in auto_explain would refer to its automatically logging > explain outputs, it can provide an additional function which provides > similar functionality by manually triggering it. > > But we can defer this to a committer, if you want. > > I am more interested in avoiding the duplication of code, esp. the > first comment in my reply If there are no objections, I will try porting it to auto_explain and see its feasibility. >>> There is a lot of similarity between what this feature does and what >>> auto explain does. I see the code is also duplicated. There is some >>> merit in avoiding this duplication >>> 1. we will get all the features of auto_explain automatically like >>> choosing a format (this was expressed somebody earlier in this >>> thread), setings etc. >>> 2. avoid bugs. E.g your code switches context after ExplainState has >>> been allocated. These states may leak depending upon when this >>> function gets called. >>> 3. Building features on top as James envisions will be easier. > >> >> =# select pg_log_query_plan(pid), application_name, backend_type >> from >> pg_stat_activity where backend_type = 'autovacuum launcher'; >> WARNING: PID 63323 is not a PostgreSQL client backend process >> pg_log_query_plan | application_name | backend_type >> -------------------+------------------+--------------------- >> f | | autovacuum launcher >> >> >> > I am also wondering whether it's better to report the WARNING as >> > status column in the output. E.g. instead of >> > #select pg_log_query_plan(100); >> > WARNING: PID 100 is not a PostgreSQL backend process >> > pg_log_query_plan >> > ------------------- >> > f >> > (1 row) >> > we output >> > #select pg_log_query_plan(100); >> > pg_log_query_plan | status >> > -------------------+--------------------------------------------- >> > f | PID 100 is not a PostgreSQL backend process >> > (1 row) >> > >> > That looks neater and can easily be handled by scripts, applications >> > and such. But it will be inconsistent with other functions like >> > pg_terminate_backend() and pg_log_backend_memory_contexts(). >> >> It seems neater, but it might be inconvenient because we can no longer >> use it in select list like the following query as you wrote: >> >> #select pg_log_query_plan(pid), application_name, backend_type from >> pg_stat_activity where backend_type = 'autovacuum launcher'; > > Why is that? Sorry, I misunderstood and confirmed we can run queries like below: ``` =# CREATE OR REPLACE FUNCTION pg_log_query_plan_stab(i int) RETURNS TABLE( pg_log_query_plan bool, status text ) AS $$ DECLARE BEGIN RETURN QUERY SELECT false::bool, 'PID 100 is not a PostgreSQL backend process'::text; END; $$ LANGUAGE plpgsql; =# select pg_log_query_plan_stab(pid), application_name, backend_type from pg_stat_activity where backend_type = 'autovacuum launcher'; pg_log_query_plan_stab | application_name | backend_type ---------------------------------------------------+------------------+--------------------- (f,"PID 100 is not a PostgreSQL backend process") | | autovacuum launcher ``` -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
On Wed, Oct 18, 2023 at 11:39 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > > > I am more interested in avoiding the duplication of code, esp. the > > first comment in my reply > > If there are no objections, I will try porting it to auto_explain and > see its feasibility. If you want it in core, you will need to port relevant parts of auto_explain code to core. -- Best Wishes, Ashutosh Bapat
On Wed, Oct 18, 2023 at 2:09 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2023-10-16 18:46, Ashutosh Bapat wrote: > > On Thu, Oct 12, 2023 at 6:51 PM torikoshia <torikoshia@oss.nttdata.com> > > wrote: > >> > >> On 2023-10-11 16:22, Ashutosh Bapat wrote: > >> > > >> > Considering the similarity with auto_explain I wondered whether this > >> > function should be part of auto_explain contrib module itself? If we > >> > do that users will need to load auto_explain extension and thus > >> > install executor hooks when this function doesn't need those. So may > >> > not be such a good idea. I didn't see any discussion on this. > >> > >> I once thought about adding this to auto_explain, but I left it asis > >> for > >> below reasons: > >> > >> - One of the typical use case of pg_log_query_plan() would be > >> analyzing > >> slow query on customer environments. On such environments, We cannot > >> always control what extensions to install. > > > > The same argument applies to auto_explain functionality as well. But > > it's not part of the core. > > Yeah, and when we have a situation where we want to run > pg_log_query_plan(), we can run it in any environment as long as it is > bundled with the core. > On the other hand, if it is built into auto_explain, we need to start by > installing auto_explain if we do not have auto_explain, which is often > difficult to do in production environments. > > >> Of course auto_explain is a major extension and it is quite > >> possible > >> that they installed auto_explain, but but it is also possible they do > >> not. > >> - It seems a bit counter-intuitive that pg_log_query_plan() is in an > >> extension called auto_explain, since it `manually`` logs plans > >> > > > > pg_log_query_plan() may not fit auto_explain but > > pg_explain_backend_query() does. What we are logging is more than just > > plan of the query, it might expand to be closer to explain output. > > While auto in auto_explain would refer to its automatically logging > > explain outputs, it can provide an additional function which provides > > similar functionality by manually triggering it. > > > > But we can defer this to a committer, if you want. > > > > I am more interested in avoiding the duplication of code, esp. the > > first comment in my reply > > If there are no objections, I will try porting it to auto_explain and > see its feasibility. > > >>> There is a lot of similarity between what this feature does and what > >>> auto explain does. I see the code is also duplicated. There is some > >>> merit in avoiding this duplication > >>> 1. we will get all the features of auto_explain automatically like > >>> choosing a format (this was expressed somebody earlier in this > >>> thread), setings etc. > >>> 2. avoid bugs. E.g your code switches context after ExplainState has > >>> been allocated. These states may leak depending upon when this > >>> function gets called. > >>> 3. Building features on top as James envisions will be easier. In my view the fact that auto_explain is itself not part of core is a mistake, and I know there are more prominent hackers than myself who hold that view. While that decision as regards auto_explain has long since been made (and there would be work to undo it), I don't believe that we should repeat that choice here. I'm -10 on moving this into auto_explain. However perhaps there is still an opportunity for moving some of the auto_explain code into core so as to enable deduplicating the code. Regards, James Coleman
> Hi, > > Yeah, and when we have a situation where we want to run > pg_log_query_plan(), we can run it in any environment as long as it > is bundled with the core. Is it possible to get the plan of a backend programmatically without access to the logs? Something like pg_get_query_plan(pid, format) which output would be the same as EXPLAIN. Regards, Étienne BERSAC Dalibo
On 2023-10-24 16:12, Étienne BERSAC wrote: >> Hi, >> >> Yeah, and when we have a situation where we want to run >> pg_log_query_plan(), we can run it in any environment as long as it >> is bundled with the core. > > Is it possible to get the plan of a backend programmatically without > access to the logs? > > Something like pg_get_query_plan(pid, format) which output would be the > same as EXPLAIN. Do you imagine a function like below? =# select pg_get_query_plan(100, 'plain'); QUERY PLAN ------------------------------------------------------------------- Limit (cost=0.00..0.04 rows=1 width=273) -> Seq Scan on pg_class (cost=0.00..17.14 rows=414 width=273) If so, we once tried to implement such function for getting memory contexts. However, this attempt didn't succeed because it could lead dead lock situation[1]. I think the same problem can occur when implementing pg_get_query_plan(). [1] https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com > > Regards, > Étienne BERSAC > Dalibo -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
On Wed, Oct 18, 2023 at 10:04 PM James Coleman <jtc331@gmail.com> wrote: > > While that decision as regards auto_explain has long since been made > (and there would be work to undo it), I don't believe that we should > repeat that choice here. I'm -10 on moving this into auto_explain. > Right. > However perhaps there is still an opportunity for moving some of the > auto_explain code into core so as to enable deduplicating the code. > Right. That's what I also think. -- Best Wishes, Ashutosh Bapat
On 2023-10-25 12:40, Ashutosh Bapat wrote: > On Wed, Oct 18, 2023 at 10:04 PM James Coleman <jtc331@gmail.com> > wrote: >> >> While that decision as regards auto_explain has long since been made >> (and there would be work to undo it), I don't believe that we should >> repeat that choice here. I'm -10 on moving this into auto_explain. >> > > Right. > >> However perhaps there is still an opportunity for moving some of the >> auto_explain code into core so as to enable deduplicating the code. >> > > Right. That's what I also think. Thanks for your comments. Attached patch adds a new function which assembles es->str for logging according to specified contents and format. This is called from both auto_explain and pg_log_query_plan(). On 2023-10-11 16:22, Ashutosh Bapat wrote: > I am also wondering whether it's better to report the WARNING as > status column in the output. Attached patch left as it was since the inconsistency with pg_terminate_backend() and pg_log_backend_memory_contexts() as you pointed out. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Attachment
Hi Torikoshia, > If so, we once tried to implement such function for getting memory > contexts. > However, this attempt didn't succeed because it could lead dead lock > situation[1]. Thanks for the pointer. Why not use client log message to allow client to get plan without locking backend memory context and without access to server log ? I missed the rationnal for not sending the plan to client. Regards, Étienne
On 2023-10-27 16:06, Étienne BERSAC wrote: > Hi Torikoshia, > >> If so, we once tried to implement such function for getting memory >> contexts. >> However, this attempt didn't succeed because it could lead dead lock >> situation[1]. > > Thanks for the pointer. Why not use client log message to allow client > to get plan without locking backend memory context and without access > to server log ? I missed the rationnal for not sending the plan to > client. If we use client log message, that message is shown on the target process whose pid is specified by the parameter of pg_log_query_plan(): (pid:1000)=# select pg_sleep(60); (pid:1001)=# select pg_log_query_plan(1000); (pid:1000)=# LOG: query plan running on backend with PID 1000 is: Query Text: select pg_sleep(1000); Result (cost=0.00..0.01 rows=1 width=4) Output: pg_sleep('1000'::double precision) I think this is not an expected behavior and we set elevel to LOG_SERVER_ONLY. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
> Hi, > > If we use client log message, that message is shown on the target > process whose pid is specified by the parameter of > pg_log_query_plan(): > > (pid:1000)=# select pg_sleep(60); > (pid:1001)=# select pg_log_query_plan(1000); > (pid:1000)=# LOG: query plan running on backend with PID 1000 is: > Query Text: select pg_sleep(1000); > Result (cost=0.00..0.01 rows=1 width=4) > Output: pg_sleep('1000'::double precision) > > I think this is not an expected behavior and we set elevel to > LOG_SERVER_ONLY. Makes sens. Thanks.
I have following questions related to the functionality. (Please point me to the relevant discussion if this has been already discussed.) 1. When a backend is running nested queries, we will see the plan of the innermost query. That query may not be the actual culprit if the user query is running slowly. E.g a query being run as part of inner side nested loop when the nested loop itself is the bottleneck. I think it will be useful to print plans of all the whole query stack. 2. When a query is running in parallel worker do we want to print that query? It may or may not be interesting given the situation. If the overall plan itself is faulty, output of the parallel worker query is not useful. If the plan is fine but a given worker's leg is running slowly it may be interesting. As a side note, you may want to fix the indentation in ExplainAssembleLogOutput(). On Fri, Oct 27, 2023 at 6:24 PM Étienne BERSAC <etienne.bersac@dalibo.com> wrote: > > > Hi, > > > > > If we use client log message, that message is shown on the target > > process whose pid is specified by the parameter of > > pg_log_query_plan(): > > > > (pid:1000)=# select pg_sleep(60); > > (pid:1001)=# select pg_log_query_plan(1000); > > (pid:1000)=# LOG: query plan running on backend with PID 1000 is: > > Query Text: select pg_sleep(1000); > > Result (cost=0.00..0.01 rows=1 width=4) > > Output: pg_sleep('1000'::double precision) > > > > I think this is not an expected behavior and we set elevel to > > LOG_SERVER_ONLY. > > > Makes sens. Thanks. -- Best Wishes, Ashutosh Bapat
On Fri, Nov 3, 2023 at 7:31 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > I have following questions related to the functionality. (Please point > me to the relevant discussion if this has been already discussed.) > > 1. When a backend is running nested queries, we will see the plan of > the innermost query. That query may not be the actual culprit if the > user query is running slowly. E.g a query being run as part of inner > side nested loop when the nested loop itself is the bottleneck. I > think it will be useful to print plans of all the whole query stack. To further explain this point, consider following scenario -- A completely useless function which executes two SQL statements which take small amount individually #create function multstmt() returns int language sql as $$ select count(*) from pg_class where reltype = 12345; select count(*) from pg_type limit 10; $$; -- force a suboptimal plan #set enable_hashjoin to false; #set enable_mergejoin to false; -- A completely useless but long running query #select c.oid, t.oid from pg_class c, pg_type t where multstmt(c.oid) = multstmt(t.oid); This take a few minutes on my laptop. In a separate session query pg_stat_activity. We will see the original query #select pid, query, backend_type from pg_stat_activity where pid = 349330; pid | query | backend_type --------+-----------------------------------------------------------------------------------------+---------------- 349330 | select c.oid, t.oid from pg_class c, pg_type t where multstmt(c.oid) = multstmt(t.oid); | client backend (1 row) Run the plan output function a few times #select pg_log_query_plan(349330); You will observe different plans based on which of the innermost query is runnning LOG: query plan running on backend with PID 349330 is: Query Text: select count(*) from pg_class where reltype = typeid; select count(*) from pg_type where oid = typeid; Query Parameters: $1 = '600' Aggregate (cost=18.18..18.19 rows=1 width=8) Output: count(*) -> Seq Scan on pg_catalog.pg_class (cost=0.00..18.18 rows=2 width=0) Output: oid, relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, reltoastrelid, relhasindex, relisshared, relpersistence, relkind, relnatts, relchecks, relhasrules, relhastriggers, relhassubclass, relrowsecurity, relforcerowsecurity, relispopulated, relreplident, relispartition, relrewrite, relfrozenxid, relminmxid, relacl, reloptions, relpartbound Filter: (pg_class.reltype = $1) Settings: enable_hashjoin = 'off', enable_mergejoin = 'off' 2023-11-06 11:52:25.610 IST [349330] LOG: query plan running on backend with PID 349330 is: Query Text: select count(*) from pg_class where reltype = typeid; select count(*) from pg_type where oid = typeid; Query Parameters: $1 = '2203' Aggregate (cost=4.30..4.31 rows=1 width=4) Output: count(*) -> Index Only Scan using pg_type_oid_index on pg_catalog.pg_type (cost=0.28..4.29 rows=1 width=0) Output: oid Index Cond: (pg_type.oid = $1) Settings: enable_hashjoin = 'off', enable_mergejoin = 'off' Individual pieces are confusing here since the query run by the backend is not the one being EXPLAINed. A user may not know that the queries being EXPLAINed arise from the function call multstmt(). So they won't be able to stitch the pieces together unless they see plan of the outermost query with loops and costs. What might help if we explain each query in the hierarchy. I think we can start with what auto_explain is doing. Always print the plan of the outermost query; the query found in pg_stat_activity. In a later version we might find a way to print plans of all the queries in the stack and do so in a readable manner. This makes tracking activeQueryDesc a bit tricky. My guess is that the outermost query's descriptor will be available through ActivePortal most of the time. But there are cases when ExecutorRun is called by passing a queryDesc directly. So may be there are some cases where that's not true. -- Best Wishes, Ashutosh Bapat
On 2023-11-06 15:32, Ashutosh Bapat wrote: Thanks for the suggestion and example. > On Fri, Nov 3, 2023 at 7:31 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: >> >> I have following questions related to the functionality. (Please point >> me to the relevant discussion if this has been already discussed.) >> >> 1. When a backend is running nested queries, we will see the plan of >> the innermost query. That query may not be the actual culprit if the >> user query is running slowly. E.g a query being run as part of inner >> side nested loop when the nested loop itself is the bottleneck. I >> think it will be useful to print plans of all the whole query stack. This was discussed in previous threads[1] and we thought it'd be useful but since it needed some extra works, we stopped widening the scope. > To further explain this point, consider following scenario > > -- A completely useless function which executes two SQL statements > which take small amount individually > #create function multstmt() returns int > language sql as $$ > select count(*) from pg_class where reltype = 12345; > select count(*) from pg_type limit 10; > $$; > > -- force a suboptimal plan > #set enable_hashjoin to false; > #set enable_mergejoin to false; > > -- A completely useless but long running query > #select c.oid, t.oid from pg_class c, pg_type t where multstmt(c.oid) > = multstmt(t.oid); > > This take a few minutes on my laptop. > > In a separate session query pg_stat_activity. We will see the original > query > #select pid, query, backend_type from pg_stat_activity where pid = > 349330; > pid | query > | backend_type > --------+-----------------------------------------------------------------------------------------+---------------- > 349330 | select c.oid, t.oid from pg_class c, pg_type t where > multstmt(c.oid) = multstmt(t.oid); | client backend > (1 row) > > Run the plan output function a few times > #select pg_log_query_plan(349330); > > You will observe different plans based on which of the innermost query > is runnning > LOG: query plan running on backend with PID 349330 is: > Query Text: > select count(*) from pg_class where reltype = typeid; > select count(*) from pg_type where oid = typeid; > > Query Parameters: $1 = '600' > Aggregate (cost=18.18..18.19 rows=1 width=8) > Output: count(*) > -> Seq Scan on pg_catalog.pg_class (cost=0.00..18.18 rows=2 > width=0) > Output: oid, relname, relnamespace, reltype, > reloftype, relowner, relam, relfilenode, reltablespace, relpages, > reltuples, relallvisible, reltoastrelid, relhasindex, relisshared, > relpersistence, relkind, relnatts, relchecks, relhasrules, > relhastriggers, relhassubclass, relrowsecurity, relforcerowsecurity, > relispopulated, relreplident, relispartition, relrewrite, > relfrozenxid, relminmxid, relacl, reloptions, relpartbound > Filter: (pg_class.reltype = $1) > Settings: enable_hashjoin = 'off', enable_mergejoin = 'off' > 2023-11-06 11:52:25.610 IST [349330] LOG: query plan running on > backend with PID 349330 is: > Query Text: > select count(*) from pg_class where reltype = typeid; > select count(*) from pg_type where oid = typeid; > > Query Parameters: $1 = '2203' > Aggregate (cost=4.30..4.31 rows=1 width=4) > Output: count(*) > -> Index Only Scan using pg_type_oid_index on > pg_catalog.pg_type (cost=0.28..4.29 rows=1 width=0) > Output: oid > Index Cond: (pg_type.oid = $1) > Settings: enable_hashjoin = 'off', enable_mergejoin = 'off' > > Individual pieces are confusing here since the query run by the > backend is not the one being EXPLAINed. A user may not know that the > queries being EXPLAINed arise from the function call multstmt(). So > they won't be able to stitch the pieces together unless they see plan > of the outermost query with loops and costs. What might help if we > explain each query in the hierarchy. > > I think we can start with what auto_explain is doing. Always print the > plan of the outermost query; the query found in pg_stat_activity. In a > later version we might find a way to print plans of all the queries in > the stack and do so in a readable manner. Agreed there are cases printing plan of the outermost query is more useful. > > This makes tracking activeQueryDesc a bit tricky. My guess is that the > outermost query's descriptor will be available through ActivePortal > most of the time. But there are cases when ExecutorRun is called by > passing a queryDesc directly. So may be there are some cases where > that's not true. Yeah, actually the original version of the patch got the plan from ActivePortal, but it failed logging plan when the query was something like this[2]: DO $$ BEGIN PERFORM pg_sleep(100); END$$; > 2. When a query is running in parallel worker do we want to print that > query? It may or may not be interesting given the situation. If the > overall plan itself is faulty, output of the parallel worker query is > not useful. If the plan is fine but a given worker's leg is running > slowly it may be interesting. I think it can be useful. I'm wondering if we can add this after the first patch for this feature is committed. > As a side note, you may want to fix the indentation in > ExplainAssembleLogOutput(). Thanks, modified it. Since the documentation was obsoleted, attached patch also updated it. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Attachment
On Thu, Nov 9, 2023 at 12:03 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > >> > >> 1. When a backend is running nested queries, we will see the plan of > >> the innermost query. That query may not be the actual culprit if the > >> user query is running slowly. E.g a query being run as part of inner > >> side nested loop when the nested loop itself is the bottleneck. I > >> think it will be useful to print plans of all the whole query stack. > > This was discussed in previous threads[1] and we thought it'd be useful > but since it needed some extra works, we stopped widening the scope. > > > > > I think we can start with what auto_explain is doing. Always print the > > plan of the outermost query; the query found in pg_stat_activity. In a > > later version we might find a way to print plans of all the queries in > > the stack and do so in a readable manner. > > Agreed there are cases printing plan of the outermost query is more > useful. > I am fine printing the plan of the outermost query. This will help many cases. Printing plans of the whole query stack can be added as an add on later. > > > > This makes tracking activeQueryDesc a bit tricky. My guess is that the > > outermost query's descriptor will be available through ActivePortal > > most of the time. But there are cases when ExecutorRun is called by > > passing a queryDesc directly. So may be there are some cases where > > that's not true. > > Yeah, actually the original version of the patch got the plan from > ActivePortal, but it failed logging plan when the query was something > like this[2]: > > DO $$ > BEGIN > PERFORM pg_sleep(100); > END$$; References [1] and [2] are not listed in your email. Is that because there was no ActivePortal created or the ActivePortal pointed to DO block instead of PERFORM pg_sleep? > > > 2. When a query is running in parallel worker do we want to print that > > query? It may or may not be interesting given the situation. If the > > overall plan itself is faulty, output of the parallel worker query is > > not useful. If the plan is fine but a given worker's leg is running > > slowly it may be interesting. > > I think it can be useful. > I'm wondering if we can add this after the first patch for this feature > is committed. With the current patches, it will print the query from a parallel backend. If that's not desirable we should prohibit that case at least. -- Best Wishes, Ashutosh Bapat
On 2023-11-09 16:11, Ashutosh Bapat wrote: > On Thu, Nov 9, 2023 at 12:03 PM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> >> >> >> 1. When a backend is running nested queries, we will see the plan of >> >> the innermost query. That query may not be the actual culprit if the >> >> user query is running slowly. E.g a query being run as part of inner >> >> side nested loop when the nested loop itself is the bottleneck. I >> >> think it will be useful to print plans of all the whole query stack. >> >> This was discussed in previous threads[1] and we thought it'd be >> useful >> but since it needed some extra works, we stopped widening the scope. >> >> > >> > I think we can start with what auto_explain is doing. Always print the >> > plan of the outermost query; the query found in pg_stat_activity. In a >> > later version we might find a way to print plans of all the queries in >> > the stack and do so in a readable manner. >> >> Agreed there are cases printing plan of the outermost query is more >> useful. >> > > I am fine printing the plan of the outermost query. This will help > many cases. Printing plans of the whole query stack can be added as an > add on later. > >> > >> > This makes tracking activeQueryDesc a bit tricky. My guess is that the >> > outermost query's descriptor will be available through ActivePortal >> > most of the time. But there are cases when ExecutorRun is called by >> > passing a queryDesc directly. So may be there are some cases where >> > that's not true. >> >> Yeah, actually the original version of the patch got the plan from >> ActivePortal, but it failed logging plan when the query was something >> like this[2]: >> >> DO $$ >> BEGIN >> PERFORM pg_sleep(100); >> END$$; > > References [1] and [2] are not listed in your email. Oops, sorry. Here are links: [1] https://www.postgresql.org/message-id/64f716c44629e303b66e6c24502147cc%40oss.nttdata.com [2] https://www.postgresql.org/message-id/632e99eb-8090-53e6-1b1a-101601904cbd%40oss.nttdata.com > Is that because there was no ActivePortal created or the ActivePortal > pointed to DO block instead of PERFORM pg_sleep? ActivePortal is created but ActivePortal->queryDesc is null. >> > 2. When a query is running in parallel worker do we want to print that >> > query? It may or may not be interesting given the situation. If the >> > overall plan itself is faulty, output of the parallel worker query is >> > not useful. If the plan is fine but a given worker's leg is running >> > slowly it may be interesting. >> >> I think it can be useful. >> I'm wondering if we can add this after the first patch for this >> feature >> is committed. > > With the current patches, it will print the query from a parallel > backend. If that's not desirable we should prohibit that case at > least. Current patch prohibits printing plan if backend type is parallel worker as below: =# select pg_log_query_plan(pid), backend_type from pg_stat_activity where backend_type = 'parallel worker'; pg_log_query_plan | backend_type -------------------+----------------- f | parallel worker f | parallel worker (2 rows) WARNING: PID 4618 is not a PostgreSQL client backend process WARNING: PID 4617 is not a PostgreSQL client backend process Is this the behavior you expect? -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
On Thu, Nov 9, 2023 at 4:56 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2023-11-09 16:11, Ashutosh Bapat wrote: > > On Thu, Nov 9, 2023 at 12:03 PM torikoshia <torikoshia@oss.nttdata.com> > > wrote: > >> >> > >> >> 1. When a backend is running nested queries, we will see the plan of > >> >> the innermost query. That query may not be the actual culprit if the > >> >> user query is running slowly. E.g a query being run as part of inner > >> >> side nested loop when the nested loop itself is the bottleneck. I > >> >> think it will be useful to print plans of all the whole query stack. > >> > >> This was discussed in previous threads[1] and we thought it'd be > >> useful > >> but since it needed some extra works, we stopped widening the scope. > >> > >> > > >> > I think we can start with what auto_explain is doing. Always print the > >> > plan of the outermost query; the query found in pg_stat_activity. In a > >> > later version we might find a way to print plans of all the queries in > >> > the stack and do so in a readable manner. > >> > >> Agreed there are cases printing plan of the outermost query is more > >> useful. > >> > > > > I am fine printing the plan of the outermost query. This will help > > many cases. Printing plans of the whole query stack can be added as an > > add on later. > > > >> > > >> > This makes tracking activeQueryDesc a bit tricky. My guess is that the > >> > outermost query's descriptor will be available through ActivePortal > >> > most of the time. But there are cases when ExecutorRun is called by > >> > passing a queryDesc directly. So may be there are some cases where > >> > that's not true. > >> > >> Yeah, actually the original version of the patch got the plan from > >> ActivePortal, but it failed logging plan when the query was something > >> like this[2]: > >> > >> DO $$ > >> BEGIN > >> PERFORM pg_sleep(100); > >> END$$; > > > > References [1] and [2] are not listed in your email. > > Oops, sorry. Here are links: > > [1] > https://www.postgresql.org/message-id/64f716c44629e303b66e6c24502147cc%40oss.nttdata.com > [2] > https://www.postgresql.org/message-id/632e99eb-8090-53e6-1b1a-101601904cbd%40oss.nttdata.com > > > Is that because there was no ActivePortal created or the ActivePortal > > pointed to DO block instead of PERFORM pg_sleep? > > ActivePortal is created but ActivePortal->queryDesc is null. Thanks. > > >> > 2. When a query is running in parallel worker do we want to print that > >> > query? It may or may not be interesting given the situation. If the > >> > overall plan itself is faulty, output of the parallel worker query is > >> > not useful. If the plan is fine but a given worker's leg is running > >> > slowly it may be interesting. > >> > >> I think it can be useful. > >> I'm wondering if we can add this after the first patch for this > >> feature > >> is committed. > > > > With the current patches, it will print the query from a parallel > > backend. If that's not desirable we should prohibit that case at > > least. > > Current patch prohibits printing plan if backend type is parallel worker > as below: > > =# select pg_log_query_plan(pid), backend_type from pg_stat_activity > where backend_type = 'parallel worker'; > > pg_log_query_plan | backend_type > -------------------+----------------- > f | parallel worker > f | parallel worker > (2 rows) > > WARNING: PID 4618 is not a PostgreSQL client backend process > WARNING: PID 4617 is not a PostgreSQL client backend process > > Is this the behavior you expect? > I misread then. Thanks for correcting me. We could consider plans from parallel workers in v2 of this feature. -- Best Wishes, Ashutosh Bapat
Hello hackers,
Last Saturday I submitted a patch to the pgsql-hackers list with the title
"Proposal: In-flight explain logging" with a patch proposing a feature very
similar to the one being worked on in this thread. I should have done a better
search in the commitfest before implementing something from scratch.
So, as recommended by Ashutosh, I am sending an incremental patch containing
an additional feature I personally think we should include: logging the plan
with instrumentation details if enabled.
When targeting a query with instrumentation PG should log the complete
EXPLAIN ANALYZE plan with current row count and, if enabled, timing for each
node. This gives the user not only the ability to see what the plan is
but also what was executed so far, which is super useful when
Last Saturday I submitted a patch to the pgsql-hackers list with the title
"Proposal: In-flight explain logging" with a patch proposing a feature very
similar to the one being worked on in this thread. I should have done a better
search in the commitfest before implementing something from scratch.
So, as recommended by Ashutosh, I am sending an incremental patch containing
an additional feature I personally think we should include: logging the plan
with instrumentation details if enabled.
When targeting a query with instrumentation PG should log the complete
EXPLAIN ANALYZE plan with current row count and, if enabled, timing for each
node. This gives the user not only the ability to see what the plan is
but also what was executed so far, which is super useful when
troubleshooting queries that never finish.
Considering that the query is in progress the output will include the
statement (never executed) for nodes that weren't touched yet (or may
never be). This feature is already present in the current ExplainNode
implementation.
I added a new statement (in progress) for nodes currently being executed, ie,
InstrStartNode was called and clock is ticking there.
Back-reading this thread I saw the discussion about extending pg_log_query_plan
to parallel workers or not. I added it in my incremental patch as the new
capability of logging instrumentation makes it useful to be able to see
what parallel workers are currently doing.
# DEMONSTRATION:
postgres=# select pid, backend_type,pg_log_query_plan(pid)
postgres=# from pg_stat_activity
postgres=# where (backend_type = 'client backend' and pid != pg_backend_pid())
postgres=# or backend_type = 'parallel worker';
pid | backend_type | pg_log_query_plan
-------+-----------------+-------------------
33833 | client backend | t
47202 | parallel worker | t
47203 | parallel worker | t
(3 rows)
2023-12-06 23:14:41.756 UTC [33833] LOG: query plan running on backend with PID 33833 is:
Query Text: explain (analyze, buffers) select *
from t2 a
inner join t1 b on a.c1=b.c1
inner join t1 c on a.c1=c.c1
inner join t1 d on a.c1=d.c1
inner join t1 e on a.c1=e.c1;
Gather (cost=70894.63..202643.27 rows=1000000 width=20) (never executed)
Output: a.c1, b.c1, c.c1, d.c1, e.c1
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Join (cost=69894.63..101643.27 rows=416667 width=20) (never executed)
Output: a.c1, b.c1, c.c1, d.c1, e.c1
Hash Cond: (a.c1 = e.c1)
-> Parallel Hash Join (cost=54466.62..77218.65 rows=416667 width=16) (never executed)
Output: a.c1, b.c1, c.c1, d.c1
Hash Cond: (a.c1 = c.c1)
-> Parallel Hash Join (cost=15428.00..29997.42 rows=416667 width=8) (actual time=2500.914..2625.922 rows=250412 loops=1) (in progress)
Output: a.c1, b.c1
Inner Unique: true
Hash Cond: (b.c1 = a.c1)
Buffers: shared hit=755 read=2175, temp read=1473 written=1860
-> Parallel Seq Scan on public.t1 b (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.022..20.904 rows=331492 loops=1)
Output: b.c1
Buffers: shared hit=602 read=865
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=1745.107..1745.107 rows=330638 loops=1)
Output: a.c1
Buffers: shared hit=153 read=1310, temp written=868
-> Parallel Seq Scan on public.t2 a (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.042..27.695 rows=330638 loops=1)
Output: a.c1
Buffers: shared hit=153 read=1310
-> Parallel Hash (cost=32202.28..32202.28 rows=416667 width=8) (actual time=2450.489..2450.489 rows=407941 loops=1)
Output: c.c1, d.c1
Buffers: shared hit=1141 read=1833, temp read=1938 written=2836
-> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=8) (actual time=1323.422..1575.245 rows=407941 loops=1)
Output: c.c1, d.c1
Hash Cond: (c.c1 = d.c1)
Buffers: shared hit=1141 read=1833, temp read=1938 written=1864
-> Parallel Seq Scan on public.t1 c (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.026..22.223 rows=336238 loops=1)
Output: c.c1
Buffers: shared hit=590 read=898
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=653.306..653.306 rows=335836 loops=1)
Output: d.c1
Buffers: shared hit=551 read=935, temp written=872
-> Parallel Seq Scan on public.t1 d (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.022..23.127 rows=335836 loops=1)
Output: d.c1
Buffers: shared hit=551 read=935
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=590.086..590.086 rows=343696 loops=1)
Output: e.c1
Buffers: shared hit=519 read=1002, temp written=896
-> Parallel Seq Scan on public.t1 e (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.066..21.797 rows=343696 loops=1)
Output: e.c1
Buffers: shared hit=519 read=1002
2023-12-06 23:14:41.757 UTC [47203] LOG: query plan running on backend with PID 47203 is:
Query Text: explain (analyze, buffers) select *
from t2 a
inner join t1 b on a.c1=b.c1
inner join t1 c on a.c1=c.c1
inner join t1 d on a.c1=d.c1
inner join t1 e on a.c1=e.c1;
Parallel Hash Join (cost=69894.63..101643.27 rows=416667 width=20) (never executed)
Output: a.c1, b.c1, c.c1, d.c1, e.c1
Hash Cond: (a.c1 = e.c1)
-> Parallel Hash Join (cost=54466.62..77218.65 rows=416667 width=16) (never executed)
Output: a.c1, b.c1, c.c1, d.c1
Hash Cond: (a.c1 = c.c1)
-> Parallel Hash Join (cost=15428.00..29997.42 rows=416667 width=8) (actual time=2464.367..2628.476 rows=258293 loops=1)
Output: a.c1, b.c1
Inner Unique: true
Hash Cond: (b.c1 = a.c1)
Buffers: shared hit=704 read=2213, temp read=1497 written=1860
-> Parallel Seq Scan on public.t1 b (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.032..22.247 rows=330412 loops=1)
Output: b.c1
Buffers: shared hit=594 read=868
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=1745.093..1745.093 rows=328830 loops=1)
Output: a.c1
Buffers: shared hit=110 read=1345, temp written=868
-> Parallel Seq Scan on public.t2 a (cost=0.00..8591.67 rows=416667 width=4) (actual time=1.141..30.128 rows=328830 loops=1)
Output: a.c1
Buffers: shared hit=110 read=1345
-> Parallel Hash (cost=32202.28..32202.28 rows=416667 width=8) (actual time=2449.694..2449.694 rows=295462 loops=1)
Output: c.c1, d.c1
Buffers: shared hit=1161 read=1844, temp read=1971 written=2872
-> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=8) (actual time=1378.678..1577.182 rows=295462 loops=1)
Output: c.c1, d.c1
Hash Cond: (c.c1 = d.c1)
Buffers: shared hit=1161 read=1844, temp read=1971 written=1856
-> Parallel Seq Scan on public.t1 c (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.031..22.459 rows=336288 loops=1)
Output: c.c1
Buffers: shared hit=590 read=898
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=651.048..651.048 rows=328378 loops=1)
Output: d.c1
Buffers: shared hit=507 read=946, temp written=860
-> Parallel Seq Scan on public.t1 d (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.034..24.462 rows=328378 loops=1)
Output: d.c1
Buffers: shared hit=507 read=946
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=575.417..575.417 rows=327926 loops=1)
Output: e.c1
Buffers: shared hit=522 read=929, temp written=860
-> Parallel Seq Scan on public.t1 e (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.786..23.114 rows=327926 loops=1)
Output: e.c1
Buffers: shared hit=522 read=929
2023-12-06 23:14:41.758 UTC [47202] LOG: query plan running on backend with PID 47202 is:
Query Text: explain (analyze, buffers) select *
from t2 a
inner join t1 b on a.c1=b.c1
inner join t1 c on a.c1=c.c1
inner join t1 d on a.c1=d.c1
inner join t1 e on a.c1=e.c1;
Parallel Hash Join (cost=69894.63..101643.27 rows=416667 width=20) (never executed)
Output: a.c1, b.c1, c.c1, d.c1, e.c1
Hash Cond: (a.c1 = e.c1)
-> Parallel Hash Join (cost=54466.62..77218.65 rows=416667 width=16) (never executed)
Output: a.c1, b.c1, c.c1, d.c1
Hash Cond: (a.c1 = c.c1)
-> Parallel Hash Join (cost=15428.00..29997.42 rows=416667 width=8) (actual time=2500.138..2631.855 rows=254125 loops=1)
Output: a.c1, b.c1
Inner Unique: true
Hash Cond: (b.c1 = a.c1)
Buffers: shared hit=749 read=2254, temp read=1483 written=1896
-> Parallel Seq Scan on public.t1 b (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.027..22.224 rows=338096 loops=1)
Output: b.c1
Buffers: shared hit=628 read=868
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=1744.616..1744.616 rows=340532 loops=1)
Output: a.c1
Buffers: shared hit=121 read=1386, temp written=892
-> Parallel Seq Scan on public.t2 a (cost=0.00..8591.67 rows=416667 width=4) (actual time=1.138..30.003 rows=340532 loops=1)
Output: a.c1
Buffers: shared hit=121 read=1386
-> Parallel Hash (cost=32202.28..32202.28 rows=416667 width=8) (actual time=2450.470..2450.470 rows=296597 loops=1)
Output: c.c1, d.c1
Buffers: shared hit=1186 read=1813, temp read=1610 written=2872
-> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=8) (actual time=1380.286..1561.412 rows=296597 loops=1)
Output: c.c1, d.c1
Hash Cond: (c.c1 = d.c1)
Buffers: shared hit=1186 read=1813, temp read=1610 written=1848
-> Parallel Seq Scan on public.t1 c (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.032..22.170 rows=327474 loops=1)
Output: c.c1
Buffers: shared hit=548 read=901
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=653.297..653.297 rows=335786 loops=1)
Output: d.c1
Buffers: shared hit=574 read=912, temp written=868
-> Parallel Seq Scan on public.t1 d (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.023..24.210 rows=335786 loops=1)
Output: d.c1
Buffers: shared hit=574 read=912
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=574.553..574.553 rows=328378 loops=1)
Output: e.c1
Buffers: shared hit=495 read=958, temp written=864
-> Parallel Seq Scan on public.t1 e (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.717..23.270 rows=328378 loops=1)
Output: e.c1
Buffers: shared hit=495 read=958
The thread of my initial patch contains all important implementation
details. Sharing the ones relevant to my incremental patch here:
- Safely printing signaled plans with instrumentation
A plan string is built in function ExplainNode here (https://github.com/postgres/postgres/blob/REL_16_STABLE/src/backend/commands/explain.c#L1178)
which is called at the end of a query execution when EXPLAIN is used.
That function performs logic using a PlanState (part of QueryDesc) of
the running query and a ExplainState.
The main challenge there is that ExplainNode calls InstrEndLoop which
changes values in Instrumentation. This is ok for a regular EXPLAIN
where the query is already complete but not ok for the new feature with
signaled plan logging.
So the new code has custom logic to clone Instrumentation instance of
the current node. The cloned object can be safely written.
Function InstrEndLoop has a safety rule here (https://github.com/postgres/postgres/blob/REL_16_STABLE/src/backend/executor/instrument.c#L148)
that prevents adjusting instrumentation details in a running node. This
never happens in the current code logic but with the new feature
it will happen very often.
I didn't want to remove this safety rule as InstrEndLoop gets called in
other places too (even in auto_explain) so the solution was to keep
InstrEndLoop and have a new InstrEndLoopForce for the signaled
Considering that the query is in progress the output will include the
statement (never executed) for nodes that weren't touched yet (or may
never be). This feature is already present in the current ExplainNode
implementation.
I added a new statement (in progress) for nodes currently being executed, ie,
InstrStartNode was called and clock is ticking there.
Back-reading this thread I saw the discussion about extending pg_log_query_plan
to parallel workers or not. I added it in my incremental patch as the new
capability of logging instrumentation makes it useful to be able to see
what parallel workers are currently doing.
# DEMONSTRATION:
postgres=# select pid, backend_type,pg_log_query_plan(pid)
postgres=# from pg_stat_activity
postgres=# where (backend_type = 'client backend' and pid != pg_backend_pid())
postgres=# or backend_type = 'parallel worker';
pid | backend_type | pg_log_query_plan
-------+-----------------+-------------------
33833 | client backend | t
47202 | parallel worker | t
47203 | parallel worker | t
(3 rows)
2023-12-06 23:14:41.756 UTC [33833] LOG: query plan running on backend with PID 33833 is:
Query Text: explain (analyze, buffers) select *
from t2 a
inner join t1 b on a.c1=b.c1
inner join t1 c on a.c1=c.c1
inner join t1 d on a.c1=d.c1
inner join t1 e on a.c1=e.c1;
Gather (cost=70894.63..202643.27 rows=1000000 width=20) (never executed)
Output: a.c1, b.c1, c.c1, d.c1, e.c1
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Join (cost=69894.63..101643.27 rows=416667 width=20) (never executed)
Output: a.c1, b.c1, c.c1, d.c1, e.c1
Hash Cond: (a.c1 = e.c1)
-> Parallel Hash Join (cost=54466.62..77218.65 rows=416667 width=16) (never executed)
Output: a.c1, b.c1, c.c1, d.c1
Hash Cond: (a.c1 = c.c1)
-> Parallel Hash Join (cost=15428.00..29997.42 rows=416667 width=8) (actual time=2500.914..2625.922 rows=250412 loops=1) (in progress)
Output: a.c1, b.c1
Inner Unique: true
Hash Cond: (b.c1 = a.c1)
Buffers: shared hit=755 read=2175, temp read=1473 written=1860
-> Parallel Seq Scan on public.t1 b (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.022..20.904 rows=331492 loops=1)
Output: b.c1
Buffers: shared hit=602 read=865
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=1745.107..1745.107 rows=330638 loops=1)
Output: a.c1
Buffers: shared hit=153 read=1310, temp written=868
-> Parallel Seq Scan on public.t2 a (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.042..27.695 rows=330638 loops=1)
Output: a.c1
Buffers: shared hit=153 read=1310
-> Parallel Hash (cost=32202.28..32202.28 rows=416667 width=8) (actual time=2450.489..2450.489 rows=407941 loops=1)
Output: c.c1, d.c1
Buffers: shared hit=1141 read=1833, temp read=1938 written=2836
-> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=8) (actual time=1323.422..1575.245 rows=407941 loops=1)
Output: c.c1, d.c1
Hash Cond: (c.c1 = d.c1)
Buffers: shared hit=1141 read=1833, temp read=1938 written=1864
-> Parallel Seq Scan on public.t1 c (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.026..22.223 rows=336238 loops=1)
Output: c.c1
Buffers: shared hit=590 read=898
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=653.306..653.306 rows=335836 loops=1)
Output: d.c1
Buffers: shared hit=551 read=935, temp written=872
-> Parallel Seq Scan on public.t1 d (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.022..23.127 rows=335836 loops=1)
Output: d.c1
Buffers: shared hit=551 read=935
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=590.086..590.086 rows=343696 loops=1)
Output: e.c1
Buffers: shared hit=519 read=1002, temp written=896
-> Parallel Seq Scan on public.t1 e (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.066..21.797 rows=343696 loops=1)
Output: e.c1
Buffers: shared hit=519 read=1002
2023-12-06 23:14:41.757 UTC [47203] LOG: query plan running on backend with PID 47203 is:
Query Text: explain (analyze, buffers) select *
from t2 a
inner join t1 b on a.c1=b.c1
inner join t1 c on a.c1=c.c1
inner join t1 d on a.c1=d.c1
inner join t1 e on a.c1=e.c1;
Parallel Hash Join (cost=69894.63..101643.27 rows=416667 width=20) (never executed)
Output: a.c1, b.c1, c.c1, d.c1, e.c1
Hash Cond: (a.c1 = e.c1)
-> Parallel Hash Join (cost=54466.62..77218.65 rows=416667 width=16) (never executed)
Output: a.c1, b.c1, c.c1, d.c1
Hash Cond: (a.c1 = c.c1)
-> Parallel Hash Join (cost=15428.00..29997.42 rows=416667 width=8) (actual time=2464.367..2628.476 rows=258293 loops=1)
Output: a.c1, b.c1
Inner Unique: true
Hash Cond: (b.c1 = a.c1)
Buffers: shared hit=704 read=2213, temp read=1497 written=1860
-> Parallel Seq Scan on public.t1 b (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.032..22.247 rows=330412 loops=1)
Output: b.c1
Buffers: shared hit=594 read=868
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=1745.093..1745.093 rows=328830 loops=1)
Output: a.c1
Buffers: shared hit=110 read=1345, temp written=868
-> Parallel Seq Scan on public.t2 a (cost=0.00..8591.67 rows=416667 width=4) (actual time=1.141..30.128 rows=328830 loops=1)
Output: a.c1
Buffers: shared hit=110 read=1345
-> Parallel Hash (cost=32202.28..32202.28 rows=416667 width=8) (actual time=2449.694..2449.694 rows=295462 loops=1)
Output: c.c1, d.c1
Buffers: shared hit=1161 read=1844, temp read=1971 written=2872
-> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=8) (actual time=1378.678..1577.182 rows=295462 loops=1)
Output: c.c1, d.c1
Hash Cond: (c.c1 = d.c1)
Buffers: shared hit=1161 read=1844, temp read=1971 written=1856
-> Parallel Seq Scan on public.t1 c (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.031..22.459 rows=336288 loops=1)
Output: c.c1
Buffers: shared hit=590 read=898
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=651.048..651.048 rows=328378 loops=1)
Output: d.c1
Buffers: shared hit=507 read=946, temp written=860
-> Parallel Seq Scan on public.t1 d (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.034..24.462 rows=328378 loops=1)
Output: d.c1
Buffers: shared hit=507 read=946
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=575.417..575.417 rows=327926 loops=1)
Output: e.c1
Buffers: shared hit=522 read=929, temp written=860
-> Parallel Seq Scan on public.t1 e (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.786..23.114 rows=327926 loops=1)
Output: e.c1
Buffers: shared hit=522 read=929
2023-12-06 23:14:41.758 UTC [47202] LOG: query plan running on backend with PID 47202 is:
Query Text: explain (analyze, buffers) select *
from t2 a
inner join t1 b on a.c1=b.c1
inner join t1 c on a.c1=c.c1
inner join t1 d on a.c1=d.c1
inner join t1 e on a.c1=e.c1;
Parallel Hash Join (cost=69894.63..101643.27 rows=416667 width=20) (never executed)
Output: a.c1, b.c1, c.c1, d.c1, e.c1
Hash Cond: (a.c1 = e.c1)
-> Parallel Hash Join (cost=54466.62..77218.65 rows=416667 width=16) (never executed)
Output: a.c1, b.c1, c.c1, d.c1
Hash Cond: (a.c1 = c.c1)
-> Parallel Hash Join (cost=15428.00..29997.42 rows=416667 width=8) (actual time=2500.138..2631.855 rows=254125 loops=1)
Output: a.c1, b.c1
Inner Unique: true
Hash Cond: (b.c1 = a.c1)
Buffers: shared hit=749 read=2254, temp read=1483 written=1896
-> Parallel Seq Scan on public.t1 b (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.027..22.224 rows=338096 loops=1)
Output: b.c1
Buffers: shared hit=628 read=868
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=1744.616..1744.616 rows=340532 loops=1)
Output: a.c1
Buffers: shared hit=121 read=1386, temp written=892
-> Parallel Seq Scan on public.t2 a (cost=0.00..8591.67 rows=416667 width=4) (actual time=1.138..30.003 rows=340532 loops=1)
Output: a.c1
Buffers: shared hit=121 read=1386
-> Parallel Hash (cost=32202.28..32202.28 rows=416667 width=8) (actual time=2450.470..2450.470 rows=296597 loops=1)
Output: c.c1, d.c1
Buffers: shared hit=1186 read=1813, temp read=1610 written=2872
-> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=8) (actual time=1380.286..1561.412 rows=296597 loops=1)
Output: c.c1, d.c1
Hash Cond: (c.c1 = d.c1)
Buffers: shared hit=1186 read=1813, temp read=1610 written=1848
-> Parallel Seq Scan on public.t1 c (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.032..22.170 rows=327474 loops=1)
Output: c.c1
Buffers: shared hit=548 read=901
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=653.297..653.297 rows=335786 loops=1)
Output: d.c1
Buffers: shared hit=574 read=912, temp written=868
-> Parallel Seq Scan on public.t1 d (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.023..24.210 rows=335786 loops=1)
Output: d.c1
Buffers: shared hit=574 read=912
-> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=574.553..574.553 rows=328378 loops=1)
Output: e.c1
Buffers: shared hit=495 read=958, temp written=864
-> Parallel Seq Scan on public.t1 e (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.717..23.270 rows=328378 loops=1)
Output: e.c1
Buffers: shared hit=495 read=958
The thread of my initial patch contains all important implementation
details. Sharing the ones relevant to my incremental patch here:
- Safely printing signaled plans with instrumentation
A plan string is built in function ExplainNode here (https://github.com/postgres/postgres/blob/REL_16_STABLE/src/backend/commands/explain.c#L1178)
which is called at the end of a query execution when EXPLAIN is used.
That function performs logic using a PlanState (part of QueryDesc) of
the running query and a ExplainState.
The main challenge there is that ExplainNode calls InstrEndLoop which
changes values in Instrumentation. This is ok for a regular EXPLAIN
where the query is already complete but not ok for the new feature with
signaled plan logging.
So the new code has custom logic to clone Instrumentation instance of
the current node. The cloned object can be safely written.
Function InstrEndLoop has a safety rule here (https://github.com/postgres/postgres/blob/REL_16_STABLE/src/backend/executor/instrument.c#L148)
that prevents adjusting instrumentation details in a running node. This
never happens in the current code logic but with the new feature
it will happen very often.
I didn't want to remove this safety rule as InstrEndLoop gets called in
other places too (even in auto_explain) so the solution was to keep
InstrEndLoop and have a new InstrEndLoopForce for the signaled
plan logging with instrumentation. Both InstrEndLoop and InstrEndLoopForce
call a new internal InstrEndLoopInternal to avoid duplicating the code.
- Memory management
Considering that pg_log_query_plan() creates its own memory context before
calling ExplainAssembleLogOutput, the new logic that allocates memory
to clone instrumentation doesn't need to free anything.
# FINAL CONSIDERATIONS
Let me know what you think about this incremental patch. If you think it
is useful but needs adjustments I will be happy to change as needed.
Kind Regards,
Rafael Castro.
- Memory management
Considering that pg_log_query_plan() creates its own memory context before
calling ExplainAssembleLogOutput, the new logic that allocates memory
to clone instrumentation doesn't need to free anything.
# FINAL CONSIDERATIONS
Let me know what you think about this incremental patch. If you think it
is useful but needs adjustments I will be happy to change as needed.
Kind Regards,
Rafael Castro.
Attachment
On 2023-12-07 08:33, Rafael Thofehrn Castro wrote: > Hello hackers, > > Last Saturday I submitted a patch to the pgsql-hackers list with the > title > "Proposal: In-flight explain logging" with a patch proposing a feature > very > similar to the one being worked on in this thread. I should have done > a better > search in the commitfest before implementing something from scratch. > > So, as recommended by Ashutosh, I am sending an incremental patch > containing > an additional feature I personally think we should include: logging > the plan > with instrumentation details if enabled. Thanks for the proposal and making the patch! > When targeting a query with instrumentation PG should log the complete > EXPLAIN ANALYZE plan with current row count and, if enabled, timing > for each > node. This gives the user not only the ability to see what the plan is > but also what was executed so far, which is super useful when > troubleshooting queries that never finish. I think showing the progress of the query execution would be useful. OTOH it seems to at least need some modifications around Instrumentation as your patch. As a first step, I think it would better to minimize the scope and focus on the fundamental function. For the same reason, getting queries for parallel workers is also prohibited in the current patch as discussed here[1]. [1] https://www.postgresql.org/message-id/c25ae6015be96a1964eddd964657660b%40oss.nttdata.com So I think below steps would be better than pushing all the functionalities to the 1st commit. - First, develop function to enable output of query progress(v34-0001-Add-function-to-log-the-plan-of-the-query.patch). - Then enhance the function - showing the progress of the query execution(v34-0002-Log-plan-along-with-instrumentation-details.patch), etc. > --https://www.postgresql.org/message-id/CAG0ozMp3g3drnkDa6RZxXO_OmnisL2sD9vBrmpu5fOBoYpC-3w%40mail.gmail.com > - ExplainState customization > > A ExplainState is allocated and customized for the in-flight logging. > Instrumentation related settings are enabled based on how the target > query started, which is usually via EXPLAIN ANALYZE or with > auto_explain. Does this mean the progress can be got only when the target query was run with EXPLAIN ANALYZE or auto_explain.log_analyze? If so, there might be limited situations we can get the progress since I imagine EXPLAIN ANALYZE is used when user want to get the plan from the beginning and auto_explain.log_analyze can give negative impact on performance as described in the manual and there may not be many environments which enable it. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Hi, Updated the patch to fix typos and move ProcessLogQueryPlanInterruptActive from errfinish() to AbortTransaction. BTW since the thread is getting long, I list the some points of the discussion so far: # Safety concern ## Catalog access inside CFI - it seems safe if the CFI call is inside an existing valid transaction/query state[1] - We did some tests, for example calling ProcessLogQueryPlanInterrupt() in every single CHECK_FOR_INTERRUPTS()[2]. This test passed on my env but was stucked on James's env, so I modified to exit ProcessLogQueryPlanInterrupt() when target process is inside of lock acquisition code[3] ## Risk of calling EXPLAIN code in CFI - EXPLAIN is not a simple logic code, and there exists risk calling it from CFI. For example, if there is a bug, we may find ourselves in a situation where we can't cancel the query - it's a trade-off that's worth making for the introspection benefits this patch would provide?[4] # Design - Although some suggested it should be in auto_explain, current patch introduces this feature to the core[5] - When the target query is nested, only the most inner query's plan is explained. In future, all the nested queries' plans are expected to explained optionally like auto_explain.log_nested_statements[6] - When the target process is a parallel worker, the plan is not shown[6] - When the target query is nested and its subtransaction is aborted, pg_log_query_plan cannot log the parental query plan after the abort even parental query is running[7] - The output corresponds to EXPLAIN with VERBOSE, COST, SETTINGS and FORMAT text. It doesn't do ANALYZE or show the progress of the query execution. Future work proposed by Rafael Thofehrn Castro may realize this[8] - To prevent assertion error, this patch ensures no page lock is held by checking all the LocalLock entries before running explain code, but there is a discussion that ginInsertCleanup() should be modified[9] It may be not so difficult to improve some of restrictions in "Design", but I'd like to limit the scope of the 1st patch to make it simpler. [1] https://www.postgresql.org/message-id/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAAaqYe8LXVXQhYy3yT0QOHUymdM%3Duha0dJ0%3DBEPzVAx2nG1gsw%40mail.gmail.com [3] https://www.postgresql.org/message-id/0e0e7ca08dff077a625c27a5e0c2ef0a%40oss.nttdata.com [4] https://www.postgresql.org/message-id/CAAaqYe8LXVXQhYy3yT0QOHUymdM%3Duha0dJ0%3DBEPzVAx2nG1gsw%40mail.gmail.com [5] https://www.postgresql.org/message-id/CAAaqYe_1EuoTudAz8mr8-qtN5SoNtYRm4JM2J9CqeverpE3B2A%40mail.gmail.com [6] https://www.postgresql.org/message-id/CAExHW5sh4ahrJgmMAGfptWVmESt1JLKCNm148XVxTunRr%2B-6gA%40mail.gmail.com [7] https://www.postgresql.org/message-id/3d121ed5f81cef588bac836b43f5d1f9%40oss.nttdata.com [8] https://www.postgresql.org/message-id/c161b5e7e1888eb9c9eb182a7d9dcf89%40oss.nttdata.com [9] https://www.postgresql.org/message-id/20220201.172757.1480996662235658750.horikyota.ntt%40gmail.com -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Attachment
Hi Atsushi, On Mon, Jan 29, 2024 at 6:32 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > > Hi, > > Updated the patch to fix typos and move > ProcessLogQueryPlanInterruptActive from errfinish() to AbortTransaction. > > > BTW since the thread is getting long, I list the some points of the > discussion so far: > > # Safety concern > ## Catalog access inside CFI > - it seems safe if the CFI call is inside an existing valid > transaction/query state[1] > > - We did some tests, for example calling ProcessLogQueryPlanInterrupt() > in every single CHECK_FOR_INTERRUPTS()[2]. This test passed on my env > but was stucked on James's env, so I modified to exit > ProcessLogQueryPlanInterrupt() when target process is inside of lock > acquisition code[3] > > ## Risk of calling EXPLAIN code in CFI > - EXPLAIN is not a simple logic code, and there exists risk calling it > from CFI. For example, if there is a bug, we may find ourselves in a > situation where we can't cancel the query > > - it's a trade-off that's worth making for the introspection benefits > this patch would provide?[4] > > # Design > - Although some suggested it should be in auto_explain, current patch > introduces this feature to the core[5] > > - When the target query is nested, only the most inner query's plan is > explained. In future, all the nested queries' plans are expected to > explained optionally like auto_explain.log_nested_statements[6] > > - When the target process is a parallel worker, the plan is not shown[6] > > - When the target query is nested and its subtransaction is aborted, > pg_log_query_plan cannot log the parental query plan after the abort > even parental query is running[7] > > - The output corresponds to EXPLAIN with VERBOSE, COST, SETTINGS and > FORMAT text. It doesn't do ANALYZE or show the progress of the query > execution. Future work proposed by Rafael Thofehrn Castro may realize > this[8] > > - To prevent assertion error, this patch ensures no page lock is held by > checking all the LocalLock entries before running explain code, but > there is a discussion that ginInsertCleanup() should be modified[9] > > > It may be not so difficult to improve some of restrictions in "Design", > but I'd like to limit the scope of the 1st patch to make it simpler. Thanks for the summary. It is helpful. I think patch is also getting better. I have a few questions and suggestions 1. Prologue of GetLockMethodLocalHash() mentions * NOTE: When there are many entries in LockMethodLocalHash, calling this * function and looking into all of them can lead to performance problems. */ How bad this performance could be. Let's assume that a query is taking time and pg_log_query_plan() is invoked to examine the plan of this query. Is it possible that the looping over all the locks itself takes a lot of time delaying the query execution further? 2. What happens if auto_explain is enabled in the backend and pg_log_query_plan() is called on the same backend? Will they conflict? I think we should add a test for the same. Using injection point support we should be able to add tests for testing pg_log_query_plan behaviour when there are page locks held or when auto_explain (with instrumentation) and pg_log_query_plan() work on the same query plan. Use injection point to make the backend running query wait at a suitable point to delay its execution and fire pg_log_query_plan() from other backend. May be the same test could examine the server log file to see if the plan is indeed output to the server log file. Given that the feature will be used when the things have already gone wrong, it should not make things more serious. So more testing and especially automated would help. -- Best Wishes, Ashutosh Bapat
Hi Ashutosh, On 2024-02-06 19:51, Ashutosh Bapat wrote: > Thanks for the summary. It is helpful. I think patch is also getting > better. > > I have a few questions and suggestions Thanks for your comments. > 1. Prologue of GetLockMethodLocalHash() mentions > * NOTE: When there are many entries in LockMethodLocalHash, calling > this > * function and looking into all of them can lead to performance > problems. > */ > How bad this performance could be. Let's assume that a query is taking > time and pg_log_query_plan() is invoked to examine the plan of this > query. Is it possible that the looping over all the locks itself takes > a lot of time delaying the query execution further? I think it depends on the number of local locks, but I've measured cpu time for this page lock check by adding below codes and v27-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch[1], which calls ProcessLogQueryPlanInterrupt() in every CFI on my laptop just for your information: diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 5f7d77d567..65b7cb4925 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -44,6 +44,8 @@ +#include "time.h" ... @@ -5287,6 +5292,7 @@ ProcessLogQueryPlanInterrupt(void) * we check all the LocalLock entries and when finding even one, give up * logging the plan. */ + start = clock(); hash_seq_init(&status, GetLockMethodLocalHash()); while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL) { if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) { ereport(LOG_SERVER_ONLY, errmsg("ignored request for logging query plan due to page lock conflicts"), errdetail("You can try again in a moment.")); hash_seq_term(&status); ProcessLogQueryPlanInterruptActive = false; return; } } + end = clock(); + cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC; + + ereport(LOG, + errmsg("all locallock entry search took: %f", cpu_time_used)); + There were about 3 million log lines which recorded the cpu time, and the duration was quite short: =# -- Extracted cpu_time_used from log and loaded it to cpu_time.d. =# select max(d), min(d), avg(d) from cpu_time ; max | min | avg ----------+-----+----------------------- 0.000116 | 0 | 4.706274625332238e-07 I'm not certain that this is valid for actual use cases, but these results seem to suggest that it will not take that long. > 2. What happens if auto_explain is enabled in the backend and > pg_log_query_plan() is called on the same backend? Will they conflict? > I think we should add a test for the same. Hmm, I think they don't conflict since they just refer QueryDesc and don't modify it and don't use same objects for locking. (I imagine 'conflict' here is something like 'hard conflict' in replication[2].) Actually using both auto_explain and pg_log_query_plan() output each logs separately: (pid:62835)=# select pg_sleep(10); (pid:70000)=# select pg_log_query_plan(62835); (pid:70000)=# \! cat data/log/postgres.log ... 2024-02-06 21:44:17.837 JST [62835:4:0] LOG: 00000: query plan running on backend with PID 62835 is: Query Text: select pg_sleep(10); Result (cost=0.00..0.01 rows=1 width=4) Output: pg_sleep('10'::double precision) Query Identifier: 3506829283127886044 2024-02-06 21:44:17.837 JST [62835:5:0] LOCATION: ProcessLogQueryPlanInterrupt, explain.c:5336 2024-02-06 21:44:26.974 JST [62835:6:0] LOG: 00000: duration: 10000.868 ms plan: Query Text: select pg_sleep(10); Result (cost=0.00..0.01 rows=1 width=4) (actual time=10000.802..10000.804 rows=1 loops=1) > Using injection point support we should be able to add tests for > testing pg_log_query_plan behaviour when there are page locks held or > when auto_explain (with instrumentation) and pg_log_query_plan() work > on the same query plan. Use injection point to make the backend > running query wait at a suitable point to delay its execution and fire > pg_log_query_plan() from other backend. May be the same test could > examine the server log file to see if the plan is indeed output to the > server log file. > > Given that the feature will be used when the things have already gone > wrong, it should not make things more serious. So more testing and > especially automated would help. Thanks for the advice, it seems a good idea. I'm going to try to add tests using injection point. [1] https://www.postgresql.org/message-id/CAAaqYe8LXVXQhYy3yT0QOHUymdM%3Duha0dJ0%3DBEPzVAx2nG1gsw%40mail.gmail.com [2] https://www.postgresql.org/docs/devel/hot-standby.html#HOT-STANDBY-CONFLICT -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
On Wed, Feb 7, 2024 at 9:38 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > Hi Ashutosh, > > On 2024-02-06 19:51, Ashutosh Bapat wrote: > > > Thanks for the summary. It is helpful. I think patch is also getting > > better. > > > > I have a few questions and suggestions > > Thanks for your comments. > > > 1. Prologue of GetLockMethodLocalHash() mentions > > * NOTE: When there are many entries in LockMethodLocalHash, calling > > this > > * function and looking into all of them can lead to performance > > problems. > > */ > > How bad this performance could be. Let's assume that a query is taking > > time and pg_log_query_plan() is invoked to examine the plan of this > > query. Is it possible that the looping over all the locks itself takes > > a lot of time delaying the query execution further? > > I think it depends on the number of local locks, but I've measured cpu > time for this page lock check by adding below codes and > v27-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch[1], which > calls ProcessLogQueryPlanInterrupt() in every CFI on my laptop just for > your information: > > diff --git a/src/backend/commands/explain.c > b/src/backend/commands/explain.c > index 5f7d77d567..65b7cb4925 100644 > --- a/src/backend/commands/explain.c > +++ b/src/backend/commands/explain.c > @@ -44,6 +44,8 @@ > > +#include "time.h" > ... > @@ -5287,6 +5292,7 @@ ProcessLogQueryPlanInterrupt(void) > * we check all the LocalLock entries and when finding even > one, give up > * logging the plan. > */ > + start = clock(); > hash_seq_init(&status, GetLockMethodLocalHash()); > while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != > NULL) > { > if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) > { > ereport(LOG_SERVER_ONLY, > errmsg("ignored request for logging query plan due > to page lock conflicts"), > errdetail("You can try again in a moment.")); > hash_seq_term(&status); > > ProcessLogQueryPlanInterruptActive = false; > return; > } > } > + end = clock(); > + cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC; > + > + ereport(LOG, > + errmsg("all locallock entry search took: %f", > cpu_time_used)); > + > > There were about 3 million log lines which recorded the cpu time, and > the duration was quite short: > > =# -- Extracted cpu_time_used from log and loaded it to cpu_time.d. > =# select max(d), min(d), avg(d) from cpu_time ; > max | min | avg > ----------+-----+----------------------- > 0.000116 | 0 | 4.706274625332238e-07 > > I'm not certain that this is valid for actual use cases, but these > results seem to suggest that it will not take that long. What load did you run? I don't think any query in make check would take say thousands of locks. The prologue refers to a very populated lock hash table. I think that will happen if thousands of tables are queried in a single query OR a query runs on a partitioned table with thousands of partitions. May be we want to try that scenario. > > > > 2. What happens if auto_explain is enabled in the backend and > > pg_log_query_plan() is called on the same backend? Will they conflict? > > I think we should add a test for the same. > > Hmm, I think they don't conflict since they just refer QueryDesc and > don't modify it and don't use same objects for locking. > (I imagine 'conflict' here is something like 'hard conflict' in > replication[2].) By conflict, I mean the two features behave weird when used together e.g give wrong results or crash etc. > > Actually using both auto_explain and pg_log_query_plan() output each > logs separately: > > (pid:62835)=# select pg_sleep(10); > (pid:70000)=# select pg_log_query_plan(62835); > > (pid:70000)=# \! cat data/log/postgres.log > ... > 2024-02-06 21:44:17.837 JST [62835:4:0] LOG: 00000: query plan > running on backend with PID 62835 is: > Query Text: select pg_sleep(10); > Result (cost=0.00..0.01 rows=1 width=4) > Output: pg_sleep('10'::double precision) > Query Identifier: 3506829283127886044 > 2024-02-06 21:44:17.837 JST [62835:5:0] LOCATION: > ProcessLogQueryPlanInterrupt, explain.c:5336 > 2024-02-06 21:44:26.974 JST [62835:6:0] LOG: 00000: duration: > 10000.868 ms plan: > Query Text: select pg_sleep(10); > Result (cost=0.00..0.01 rows=1 width=4) (actual > time=10000.802..10000.804 rows=1 loops=1) > > > Using injection point support we should be able to add tests for > > testing pg_log_query_plan behaviour when there are page locks held or > > when auto_explain (with instrumentation) and pg_log_query_plan() work > > on the same query plan. Use injection point to make the backend > > running query wait at a suitable point to delay its execution and fire > > pg_log_query_plan() from other backend. May be the same test could > > examine the server log file to see if the plan is indeed output to the > > server log file. > > > > Given that the feature will be used when the things have already gone > > wrong, it should not make things more serious. So more testing and > > especially automated would help. > > Thanks for the advice, it seems a good idea. > I'm going to try to add tests using injection point. Your test with pg_sleep() is a good basic test. But more involved testing might need something like injection points. -- Best Wishes, Ashutosh Bapat
On 2024-02-07 13:58, Ashutosh Bapat wrote: > On Wed, Feb 7, 2024 at 9:38 AM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> >> Hi Ashutosh, >> >> On 2024-02-06 19:51, Ashutosh Bapat wrote: >> >> > Thanks for the summary. It is helpful. I think patch is also getting >> > better. >> > >> > I have a few questions and suggestions >> >> Thanks for your comments. >> >> > 1. Prologue of GetLockMethodLocalHash() mentions >> > * NOTE: When there are many entries in LockMethodLocalHash, calling >> > this >> > * function and looking into all of them can lead to performance >> > problems. >> > */ >> > How bad this performance could be. Let's assume that a query is taking >> > time and pg_log_query_plan() is invoked to examine the plan of this >> > query. Is it possible that the looping over all the locks itself takes >> > a lot of time delaying the query execution further? >> >> I think it depends on the number of local locks, but I've measured cpu >> time for this page lock check by adding below codes and >> v27-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch[1], which >> calls ProcessLogQueryPlanInterrupt() in every CFI on my laptop just >> for >> your information: >> >> diff --git a/src/backend/commands/explain.c >> b/src/backend/commands/explain.c >> index 5f7d77d567..65b7cb4925 100644 >> --- a/src/backend/commands/explain.c >> +++ b/src/backend/commands/explain.c >> @@ -44,6 +44,8 @@ >> >> +#include "time.h" >> ... >> @@ -5287,6 +5292,7 @@ ProcessLogQueryPlanInterrupt(void) >> * we check all the LocalLock entries and when finding even >> one, give up >> * logging the plan. >> */ >> + start = clock(); >> hash_seq_init(&status, GetLockMethodLocalHash()); >> while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) >> != >> NULL) >> { >> if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) >> { >> ereport(LOG_SERVER_ONLY, >> errmsg("ignored request for logging query plan >> due >> to page lock conflicts"), >> errdetail("You can try again in a moment.")); >> hash_seq_term(&status); >> >> ProcessLogQueryPlanInterruptActive = false; >> return; >> } >> } >> + end = clock(); >> + cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC; >> + >> + ereport(LOG, >> + errmsg("all locallock entry search took: %f", >> cpu_time_used)); >> + >> >> There were about 3 million log lines which recorded the cpu time, and >> the duration was quite short: >> >> =# -- Extracted cpu_time_used from log and loaded it to cpu_time.d. >> =# select max(d), min(d), avg(d) from cpu_time ; >> max | min | avg >> ----------+-----+----------------------- >> 0.000116 | 0 | 4.706274625332238e-07 >> >> I'm not certain that this is valid for actual use cases, but these >> results seem to suggest that it will not take that long. > > What load did you run? I don't think any query in make check would > take say thousands of locks. Sorry, I forgot to write it but ran make check as you imagined. > The prologue refers to a very populated > lock hash table. I think that will happen if thousands of tables are > queried in a single query OR a query runs on a partitioned table with > thousands of partitions. May be we want to try that scenario. OK, I'll try such cases. >> > 2. What happens if auto_explain is enabled in the backend and >> > pg_log_query_plan() is called on the same backend? Will they conflict? >> > I think we should add a test for the same. >> >> Hmm, I think they don't conflict since they just refer QueryDesc and >> don't modify it and don't use same objects for locking. >> (I imagine 'conflict' here is something like 'hard conflict' in >> replication[2].) > > By conflict, I mean the two features behave weird when used together > e.g give wrong results or crash etc. > >> >> Actually using both auto_explain and pg_log_query_plan() output each >> logs separately: >> >> (pid:62835)=# select pg_sleep(10); >> (pid:70000)=# select pg_log_query_plan(62835); >> >> (pid:70000)=# \! cat data/log/postgres.log >> ... >> 2024-02-06 21:44:17.837 JST [62835:4:0] LOG: 00000: query plan >> running on backend with PID 62835 is: >> Query Text: select pg_sleep(10); >> Result (cost=0.00..0.01 rows=1 width=4) >> Output: pg_sleep('10'::double precision) >> Query Identifier: 3506829283127886044 >> 2024-02-06 21:44:17.837 JST [62835:5:0] LOCATION: >> ProcessLogQueryPlanInterrupt, explain.c:5336 >> 2024-02-06 21:44:26.974 JST [62835:6:0] LOG: 00000: duration: >> 10000.868 ms plan: >> Query Text: select pg_sleep(10); >> Result (cost=0.00..0.01 rows=1 width=4) (actual >> time=10000.802..10000.804 rows=1 loops=1) >> >> > Using injection point support we should be able to add tests for >> > testing pg_log_query_plan behaviour when there are page locks held or >> > when auto_explain (with instrumentation) and pg_log_query_plan() work >> > on the same query plan. Use injection point to make the backend >> > running query wait at a suitable point to delay its execution and fire >> > pg_log_query_plan() from other backend. May be the same test could >> > examine the server log file to see if the plan is indeed output to the >> > server log file. >> > >> > Given that the feature will be used when the things have already gone >> > wrong, it should not make things more serious. So more testing and >> > especially automated would help. >> >> Thanks for the advice, it seems a good idea. >> I'm going to try to add tests using injection point. > > Your test with pg_sleep() is a good basic test. But more involved > testing might need something like injection points. It might be so, I will consider whether there are any subtle timing issues, etc. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
On Mon, Jan 29, 2024 at 9:02 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > > Hi, > > Updated the patch to fix typos and move > ProcessLogQueryPlanInterruptActive from errfinish() to AbortTransaction. > + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>pg_log_query_plan</primary> + </indexterm> + <function>pg_log_query_plan</function> ( <parameter>pid</parameter> <type>integer</type> ) + <returnvalue>boolean</returnvalue> + </para> + <para> + Requests to log the plan of the query currently running on the + backend with specified process ID. + It will be logged at <literal>LOG</literal> message level and + will appear in the server log based on the log + configuration set (See <xref linkend="runtime-config-logging"/> + for more information), but will not be sent to the client + regardless of <xref linkend="guc-client-min-messages"/>. + </para></entry> + </row> it would be better to explain the meaning of return value TRUE/FALSE? +# logging plan of the running query on the specified backend +{ oid => '8000', descr => 'log plan of the running query on the specified backend', + proname => 'pg_log_query_plan', + provolatile => 'v', prorettype => 'bool', you can add `proargnames => '{pid}'` + if (proc == NULL) + { + /* + * This is just a warning so a loop-through-resultset will not abort + * if one backend terminated on its own during the run. + */ + ereport(WARNING, + (errmsg("PID %d is not a PostgreSQL backend process", pid))); + PG_RETURN_BOOL(false); + } + + be_status = pgstat_get_beentry_by_backend_id(proc->backendId); + if (be_status->st_backendType != B_BACKEND) + { + ereport(WARNING, + (errmsg("PID %d is not a PostgreSQL client backend process", pid))); + PG_RETURN_BOOL(false); + } + + if (SendProcSignal(pid, PROCSIG_LOG_QUERY_PLAN, proc->backendId) < 0) + { + /* Again, just a warning to allow loops */ + ereport(WARNING, + (errmsg("could not send signal to process %d: %m", pid))); + PG_RETURN_BOOL(false); + } I found out that pg_log_query_plan's comments look like pg_log_backend_memory_contexts. pg_log_backend_memory_contexts will iterate through many memory contexts. but pg_log_query_plan for one specific pid will only output one plan? so I am a little bit confused by the comments. + /* + * Ensure no page lock is held on this process. + * + * If page lock is held at the time of the interrupt, we can't acquire any + * other heavyweight lock, which might be necessary for explaining the plan + * when retrieving column names. + * + * This may be overkill, but since page locks are held for a short duration + * we check all the LocalLock entries and when finding even one, give up + * logging the plan. + */ + hash_seq_init(&status, GetLockMethodLocalHash()); + while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL) + { + if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) maybe not that self evident, the above comments still not explained why we need to ensure only PAGE lock was held on this process? In the commit message, can you add all the discussion links? My gmail account doesn't have a previous discussion history. I am not sure this (https://www.postgresql.org/message-id/flat/d68c3ae31672664876b22d2dcbb526d2%40postgrespro.ru) is the only discussion link? I found a bug: src8=# select *, pg_sleep(10) from tenk1 for update; 2024-02-11 15:54:17.944 CST [48602] LOG: query plan running on backend with PID 48602 is: Query Text: select *, pg_sleep(10) from tenk1 for update; LockRows (cost=0.00..570.00 rows=10000 width=254) Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4, (pg_sleep('10'::double precision)), ctid -> Seq Scan on public.tenk1 (cost=0.00..470.00 rows=10000 width=254) Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4, pg_sleep('10'::double precision), ctid another session (PID) executes `SELECT pg_log_query_plan(48602);` in the meantime. pg_log_query_plan returns true successfully, but PID 48602 was stuck. I have problem using git apply: error: patch failed: src/include/commands/explain.h:94 error: src/include/commands/explain.h: patch does not apply `patch -p1 < /v35-0001-Add-function-to-log-the-plan-of-the-query.patch` works
On Wed, Feb 7, 2024 at 12:58 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > */ > > > How bad this performance could be. Let's assume that a query is taking > > > time and pg_log_query_plan() is invoked to examine the plan of this > > > query. Is it possible that the looping over all the locks itself takes > > > a lot of time delaying the query execution further? > > corner case test: pgbench --initialize --partition-method=range --partitions=20000 Somehow my setup, the pg_bench didn't populate the data but there are 20000 partitions there. (all my other settings are default) some interesting things happened when a query touch so many partitions like: select abalance, aid from public.pgbench_accounts,pg_sleep(4) where aid > 1; in another session, if you immediate call SELECT pg_log_query_plan(9482); then output be ` LOG: backend with PID 9482 is not running a query or a subtransaction is aborted ` however if you delay a little bit of time (like 1 second), then LOG will emit the plan with lots of text (not sure the plan is right). I think the reason is that the `InitPlan` within standard_ExecutorStart takes more time to finish when your query touches a lot of partitions.
On Mon, Feb 12, 2024 at 5:31 AM jian he <jian.universality@gmail.com> wrote: > > I found a bug: > src8=# select *, pg_sleep(10) from tenk1 for update; > 2024-02-11 15:54:17.944 CST [48602] LOG: query plan running on > backend with PID 48602 is: > Query Text: select *, pg_sleep(10) from tenk1 for update; > LockRows (cost=0.00..570.00 rows=10000 width=254) > Output: unique1, unique2, two, four, ten, twenty, hundred, > thousand, twothousand, fivethous, tenthous, odd, even, stringu1, > stringu2, string4, (pg_sleep('10'::double precision)), ctid > -> Seq Scan on public.tenk1 (cost=0.00..470.00 rows=10000 width=254) > Output: unique1, unique2, two, four, ten, twenty, > hundred, thousand, twothousand, fivethous, tenthous, odd, even, > stringu1, stringu2, string4, pg_sleep('10'::double precision), ctid > > another session (PID) executes `SELECT pg_log_query_plan(48602);` in > the meantime. > pg_log_query_plan returns true successfully, but PID 48602 was stuck. What do you mean by PID 48602 was stuck? -- Best Wishes, Ashutosh Bapat
On Mon, Feb 12, 2024 at 5:31 AM jian he <jian.universality@gmail.com> wrote: > > On Wed, Feb 7, 2024 at 12:58 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > > > > */ > > > > How bad this performance could be. Let's assume that a query is taking > > > > time and pg_log_query_plan() is invoked to examine the plan of this > > > > query. Is it possible that the looping over all the locks itself takes > > > > a lot of time delaying the query execution further? > > > > corner case test: > pgbench --initialize --partition-method=range --partitions=20000 > Somehow my setup, the pg_bench didn't populate the data but there are > 20000 partitions there. > (all my other settings are default) > > some interesting things happened when a query touch so many partitions like: > select abalance, aid from public.pgbench_accounts,pg_sleep(4) where aid > 1; > > in another session, if you immediate call SELECT pg_log_query_plan(9482); > then output be > ` > LOG: backend with PID 9482 is not running a query or a subtransaction > is aborted > ` > however if you delay a little bit of time (like 1 second), then > LOG will emit the plan with lots of text (not sure the plan is right). > > I think the reason is that the `InitPlan` within > standard_ExecutorStart takes more time to finish > when your query touches a lot of partitions. That's probably expected unless we make the ActiveQueryDesc available before ExecutorRun. How much time did it took between issuing SELECT pg_log_query_plan(9482); and plan getting output to the server error logs? How does this time compare with say the same time difference for a simple query and how much of that time can be attributed to Lock table hash scan, if the difference between time difference is huge. -- Best Wishes, Ashutosh Bapat
On Mon, Feb 12, 2024 at 12:42 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Mon, Feb 12, 2024 at 5:31 AM jian he <jian.universality@gmail.com> wrote: > > > > On Wed, Feb 7, 2024 at 12:58 PM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > > > > > > > */ > > > > > How bad this performance could be. Let's assume that a query is taking > > > > > time and pg_log_query_plan() is invoked to examine the plan of this > > > > > query. Is it possible that the looping over all the locks itself takes > > > > > a lot of time delaying the query execution further? > > > > > > corner case test: > > pgbench --initialize --partition-method=range --partitions=20000 > > Somehow my setup, the pg_bench didn't populate the data but there are > > 20000 partitions there. > > (all my other settings are default) > > > > some interesting things happened when a query touch so many partitions like: > > select abalance, aid from public.pgbench_accounts,pg_sleep(4) where aid > 1; > > > > in another session, if you immediate call SELECT pg_log_query_plan(9482); > > then output be > > ` > > LOG: backend with PID 9482 is not running a query or a subtransaction > > is aborted > > ` > > however if you delay a little bit of time (like 1 second), then > > LOG will emit the plan with lots of text (not sure the plan is right). > > > > I think the reason is that the `InitPlan` within > > standard_ExecutorStart takes more time to finish > > when your query touches a lot of partitions. > > That's probably expected unless we make the ActiveQueryDesc available > before ExecutorRun. so the error message should be something like: errmsg("backend with PID %d is not running a query or a subtransaction is aborted or the plan is not generated", > > How much time did it took between issuing SELECT > pg_log_query_plan(9482); and plan getting output to the server error > logs? it either says errmsg("backend with PID %d is not running a query,.....) or outputs the plan immediately, if i wait one or two seconds to call pg_log_query_plan. because of previously mentioned: with lots of partitions, initplan took longer to finish. setup: 2 sessions, one runs the query (select abalance, aid from public.pgbench_accounts,pg_sleep(4) where aid > 1;), another one calls pg_log_query_plan.
On 2024-02-07 19:14, torikoshia wrote: > On 2024-02-07 13:58, Ashutosh Bapat wrote: >> The prologue refers to a very populated >> lock hash table. I think that will happen if thousands of tables are >> queried in a single query OR a query runs on a partitioned table with >> thousands of partitions. May be we want to try that scenario. > > OK, I'll try such cases. I measured this using partitioned pgbench_accounts with some modification to v36[1]. The results[2] show that CPU time increases in proportion to the number of partitions, and the increase is not that large. However I've noticed that these ensuring no page lock logic would not be necessary anymore since cc32ec24fdf3b98 removed the assertion which caused an error[1]. $ git show cc32ec24fdf3b98 .. diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 0a692ee0a6..f595bce31b 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -186,18 +186,6 @@ static int FastPathLocalUseCount = 0; */ static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false; - /* - * We don't acquire any other heavyweight lock while holding the page lock - * except for relation extension. - */ - Assert(!IsPageLockHeld || - (locktag->locktag_type == LOCKTAG_RELATION_EXTEND)); I'm going to remove ensuring no page lock logic after some testings. [1] $ git diff _submission/log_running_query-v36 +#include "time.h" + bool ProcessLogQueryPlanInterruptActive = false; /* Hook for plugins to get control in ExplainOneQuery() */ @@ -5258,6 +5260,10 @@ ProcessLogQueryPlanInterrupt(void) MemoryContext old_cxt; LogQueryPlanPending = false; + clock_t start, end; + double cpu_time_used; + int num_hash_entry = 0; + /* Cannot re-enter. */ if (ProcessLogQueryPlanInterruptActive) return; @@ -5287,9 +5293,11 @@ ProcessLogQueryPlanInterrupt(void) * we check all the LocalLock entries and when finding even one, give up * logging the plan. */ + start = clock(); hash_seq_init(&status, GetLockMethodLocalHash()); while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL) { + num_hash_entry++; if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) { ereport(LOG_SERVER_ONLY, @@ -5301,6 +5309,12 @@ ProcessLogQueryPlanInterrupt(void) return; } } + end = clock(); + cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC; + + ereport(LOG, + errmsg("locallock entry search took: %f for %d entries", cpu_time_used, num_hash_entry)); [2] # partition number: 512 locallock entry search took: 0.000029 for 1026 entries locallock entry search took: 0.000030 for 1026 entries locallock entry search took: 0.000036 for 1026 entries # partition number: 1024 locallock entry search took: 0.000070 for 2050 entries locallock entry search took: 0.000059 for 2050 entries locallock entry search took: 0.000049 for 2050 entries # partition number: 2048 locallock entry search took: 0.000100 for 4098 entries locallock entry search took: 0.000103 for 4098 entries locallock entry search took: 0.000101 for 4098 entries # partition number: 4096 locallock entry search took: 0.000197 for 8194 entries locallock entry search took: 0.000193 for 8194 entries locallock entry search took: 0.000192 for 8194 entries [3] https://www.postgresql.org/message-id/0642712f-1298-960a-a3ba-e256d56040ac%40oss.nttdata.com -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
On 2024-02-12 09:00, jian he wrote: Thanks for you comments. > On Mon, Jan 29, 2024 at 9:02 PM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> >> Hi, >> >> Updated the patch to fix typos and move >> ProcessLogQueryPlanInterruptActive from errfinish() to >> AbortTransaction. >> > > + <row> > + <entry role="func_table_entry"><para role="func_signature"> > + <indexterm> > + <primary>pg_log_query_plan</primary> > + </indexterm> > + <function>pg_log_query_plan</function> ( > <parameter>pid</parameter> <type>integer</type> ) > + <returnvalue>boolean</returnvalue> > + </para> > + <para> > + Requests to log the plan of the query currently running on the > + backend with specified process ID. > + It will be logged at <literal>LOG</literal> message level and > + will appear in the server log based on the log > + configuration set (See <xref > linkend="runtime-config-logging"/> > + for more information), but will not be sent to the client > + regardless of <xref linkend="guc-client-min-messages"/>. > + </para></entry> > + </row> > it would be better to explain the meaning of return value TRUE/FALSE? Yeah, but I've noticed that this should be located in 'Table Server Signaling Functions' not 'Table Control Data Functions'. Since 'Table Server Signaling Functions' describes the return code as below, just relocation seems fine. Each of these functions returns true if the signal was successfully sent and false if sending the signal failed. > +# logging plan of the running query on the specified backend > +{ oid => '8000', descr => 'log plan of the running query on the > specified backend', > + proname => 'pg_log_query_plan', > + provolatile => 'v', prorettype => 'bool', > you can add > `proargnames => '{pid}'` Hmm, pg_log_query_plan() can take one argument, I'm not sure how much sense it makes. Other functions which take one argument such as pg_cancel_backend() does not have proargnames. > + if (proc == NULL) > + { > + /* > + * This is just a warning so a loop-through-resultset will not abort > + * if one backend terminated on its own during the run. > + */ > + ereport(WARNING, > + (errmsg("PID %d is not a PostgreSQL backend process", pid))); > + PG_RETURN_BOOL(false); > + } > + > + be_status = pgstat_get_beentry_by_backend_id(proc->backendId); > + if (be_status->st_backendType != B_BACKEND) > + { > + ereport(WARNING, > + (errmsg("PID %d is not a PostgreSQL client backend process", pid))); > + PG_RETURN_BOOL(false); > + } > + > + if (SendProcSignal(pid, PROCSIG_LOG_QUERY_PLAN, proc->backendId) < 0) > + { > + /* Again, just a warning to allow loops */ > + ereport(WARNING, > + (errmsg("could not send signal to process %d: %m", pid))); > + PG_RETURN_BOOL(false); > + } > > I found out that pg_log_query_plan's comments look like > pg_log_backend_memory_contexts. > pg_log_backend_memory_contexts will iterate through many memory > contexts. > but pg_log_query_plan for one specific pid will only output one plan? > so I am a little bit confused by the comments. These "loop" mean backend can run pg_log_query_plan() repeatedly even when failing sending signals. pg_signal_backend() also have such comments. > + /* > + * Ensure no page lock is held on this process. > + * > + * If page lock is held at the time of the interrupt, we can't acquire > any > + * other heavyweight lock, which might be necessary for explaining the > plan > + * when retrieving column names. > + * > + * This may be overkill, but since page locks are held for a short > duration > + * we check all the LocalLock entries and when finding even one, give > up > + * logging the plan. > + */ > + hash_seq_init(&status, GetLockMethodLocalHash()); > + while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL) > + { > + if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) > maybe not that self evident, the above comments still not explained > why we need to ensure only > PAGE lock was held on this process? This is for preventing assertion error and it seems not necessary anymore as described in [1]. I'm going remove them. > > In the commit message, can you add all the discussion links? > My gmail account doesn't have a previous discussion history. Sure. > I am not sure this > (https://www.postgresql.org/message-id/flat/d68c3ae31672664876b22d2dcbb526d2%40postgrespro.ru) > is the only discussion link? This is the original one: https://www.postgresql.org/message-id/cf8501bcd95ba4d727cbba886ba9eea8%40oss.nttdata.com > I found a bug: > src8=# select *, pg_sleep(10) from tenk1 for update; > 2024-02-11 15:54:17.944 CST [48602] LOG: query plan running on > backend with PID 48602 is: > Query Text: select *, pg_sleep(10) from tenk1 for update; > LockRows (cost=0.00..570.00 rows=10000 width=254) > Output: unique1, unique2, two, four, ten, twenty, hundred, > thousand, twothousand, fivethous, tenthous, odd, even, stringu1, > stringu2, string4, (pg_sleep('10'::double precision)), ctid > -> Seq Scan on public.tenk1 (cost=0.00..470.00 rows=10000 > width=254) > Output: unique1, unique2, two, four, ten, twenty, > hundred, thousand, twothousand, fivethous, tenthous, odd, even, > stringu1, stringu2, string4, pg_sleep('10'::double precision), ctid > > another session (PID) executes `SELECT pg_log_query_plan(48602);` in > the meantime. > pg_log_query_plan returns true successfully, but PID 48602 was stuck. Hmm, it's not simply sleeping, is it? I'm concerned a bit this because estimated rows of tenk1 is 10000. If so, the query will take 10000 * 10 seconds. > I have problem using git apply: > error: patch failed: src/include/commands/explain.h:94 > error: src/include/commands/explain.h: patch does not apply > > `patch -p1 < /v35-0001-Add-function-to-log-the-plan-of-the-query.patch` > works I'll update the patch including other points such as removing ensuring no page lock code. [1] https://www.postgresql.org/message-id/1b2b247530f3ff3afab4ddc2df222e8b%40oss.nttdata.com -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
On 2024-02-13 11:30, torikoshia wrote: > I'll update the patch including other points such as removing ensuring > no page lock code. Updated the patch. > Using injection point support we should be able to add tests for > testing pg_log_query_plan behaviour when there are page locks held or > when auto_explain (with instrumentation) and pg_log_query_plan() work > on the same query plan. Use injection point to make the backend > running query wait at a suitable point to delay its execution and fire > pg_log_query_plan() from other backend. May be the same test could > examine the server log file to see if the plan is indeed output to the > server log file. Attached patch uses injection point as below: - There may be more points to inject, but added an injection point at ExecutorRun(), which seems to be the first interruption point where plans can be reliably displayed. - At injection point, it'd be possible to wait for some duration and fire pg_log_plan_query() as you suggested. However, I'm not sure how long duration is appropriate considering the variety of testing environments. Instead, attached patch calls HandleLogQueryPlanInterrupt() directly and set InterruptPending. - Tests both pg_log_plan_query() and auto_explain logs for their output, and the logged plans are the same. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Attachment
Hi, I've just been catching up on this thread. + if (MyProc->heldLocks) + { + ereport(LOG_SERVER_ONLY, + errmsg("ignored request for logging query plan due to lock conflicts"), + errdetail("You can try again in a moment.")); + return; + } I don't like this for several reasons. First, I think it's not nice to have a request just get ignored. A user will expect that if we cannot immediately respond to some request, we'll respond later at the first time that it's safe to do so, rather than just ignoring it and telling them to retry. Second, I don't think that the error message is very good. It talks about lock conflicts, but we don't know that there is any real problem. We know that, if we enter this block, the server is in the middle of trying to acquire some lock, and we also know that we could attempt to acquire a lock as part of generating the EXPLAIN output, and maybe that's an issue. But that's not a lock conflict. That's a re-entrancy problem. I don't know that we want to talk about re-entrancy problems in an error message, and I don't really think this error message should exist in the first place, but if we're going to error out in this case surely we shouldn't do so with an error message that describes a problem we don't have. Third, I think that the re-entrancy problems with this patch may extend well beyond lock acquisition. This is one example of how it can be unsafe to do something as complicated as EXPLAIN at any arbitrary CHECK_FOR_INTERRUPTS(). It is not correct to say, as http://postgr.es/m/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw@mail.gmail.com does, that the problems with running EXPLAIN at an arbitrary point are specific to running this code in an aborted transaction. The existence of this code shows that there is at least one hazard even if the current transaction is not aborted, and I see no analysis on this thread indicating whether there are any more such hazards, or how we could go about finding them all. I think the issue is very general. We have lots of subsystems that both (a) use global variables and (b) contain CHECK_FOR_INTERRUPTS(). If we process an interrupt while that code is in the middle of manipulating its global variables and then again re-enter that code, bad things might happen. We could try to rule that out by analyzing all such subsystems and all places where CHECK_FOR_INTERRUPTS() may appear, but that's very difficult. Suppose we took the alternative approach recommended by Andrey Lepikhov in http://postgr.es/m/b1b110ae-61f6-4fd9-9b94-f967db9b53d4@app.fastmail.com and instead set a flag that gets handled in some suitable place in the executor code, like ExecProcNode(). If we did that, then we would know that we're not in the middle of a syscache lookup, a catcache lookup, a heavyweight lock acquisition, an ereport, or any other low-level subsystem call that might create problems for the EXPLAIN code. In that design, the hack shown above would go away, and we could be much more certain that we don't need any other similar hacks for other subsystems. The only downside is that we might experience a slightly longer delay before a requested EXPLAIN plan actually shows up, but that seems like a pretty small price to pay for being able to reason about the behavior of the system. I don't *think* there are any cases where we run in the executor for a particularly long time without a new call to ExecProcNode(). I think this case is a bit like vacuum_delay_point(). You might think that vacuum_delay_point() could be moved inside of CHECK_FOR_INTERRUPTS(), but we've made the opposite decision: every vacuum_delay_point() calls CHECK_FOR_INTERRUPTS() but not every CHECK_FOR_INTERRUPTS() calls vacuum_delay_point(). That means that we can allow vacuum_delay_point() only at cases where we know it's safe, rather than at every CHECK_FOR_INTERRUPTS(). I think that's a pretty smart decision, even for vacuum_delay_point(), and AFAICS the code we're proposing to run here does things that are substantially more complicated than what vacuum_delay_point() does. That code just does a bit of reading of shared memory, reports a wait event, and sleeps. That's really simple compared to code that could try to do relcache builds, which can trigger syscache lookups, which can both trigger heavyweight lock acquisition, lightweight lock acquisition, bringing pages into shared_buffers possibly through physical I/O, processing of invalidation messages, and a bunch of other stuff. It's really hard for me to accept that the heavyweight lock problem for which the patch contains a workaround is the only one that exists. I can't see any reason why that should be true. ...Robert
Hi, On 2024-02-15 14:42:11 +0530, Robert Haas wrote: > I think the issue is very general. We have lots of subsystems that > both (a) use global variables and (b) contain CHECK_FOR_INTERRUPTS(). > If we process an interrupt while that code is in the middle of > manipulating its global variables and then again re-enter that code, > bad things might happen. We could try to rule that out by analyzing > all such subsystems and all places where CHECK_FOR_INTERRUPTS() may > appear, but that's very difficult. Suppose we took the alternative > approach recommended by Andrey Lepikhov in > http://postgr.es/m/b1b110ae-61f6-4fd9-9b94-f967db9b53d4@app.fastmail.com > and instead set a flag that gets handled in some suitable place in the > executor code, like ExecProcNode(). If we did that, then we would know > that we're not in the middle of a syscache lookup, a catcache lookup, > a heavyweight lock acquisition, an ereport, or any other low-level > subsystem call that might create problems for the EXPLAIN code. > > In that design, the hack shown above would go away, and we could be > much more certain that we don't need any other similar hacks for other > subsystems. The only downside is that we might experience a slightly > longer delay before a requested EXPLAIN plan actually shows up, but > that seems like a pretty small price to pay for being able to reason > about the behavior of the system. I am very wary of adding overhead to ExecProcNode() - I'm quite sure that adding code there would trigger visible overhead for query times. If we went with something like tht approach, I think we'd have to do something like redirecting node->ExecProcNode to a wrapper, presumably from within a CFI. That wrapper could then implement the explain support, without slowing down the normal execution path. > I don't *think* there are any cases where we run in the executor for a > particularly long time without a new call to ExecProcNode(). I guess it depends on what you call a long time. A large sort, for example, could spend a fair amount of time inside tuplesort, similarly, a gather node might need to wait for a worker for a while etc. > It's really hard for me to accept that the heavyweight lock problem > for which the patch contains a workaround is the only one that exists. > I can't see any reason why that should be true. I suspect you're right. Greetings, Andres Freund
On Thu, Feb 15, 2024 at 6:12 PM Robert Haas <robertmhaas@gmail.com> wrote: > Hi, > > I've just been catching up on this thread. > > + if (MyProc->heldLocks) > + { > + ereport(LOG_SERVER_ONLY, > + errmsg("ignored request for logging query plan due to lock > conflicts"), > + errdetail("You can try again in a moment.")); > + return; > + } > > I don't like this for several reasons. > > First, I think it's not nice to have a request just get ignored. A > user will expect that if we cannot immediately respond to some > request, we'll respond later at the first time that it's safe to do > so, rather than just ignoring it and telling them to retry. > > Second, I don't think that the error message is very good. It talks > about lock conflicts, but we don't know that there is any real > problem. We know that, if we enter this block, the server is in the > middle of trying to acquire some lock, and we also know that we could > attempt to acquire a lock as part of generating the EXPLAIN output, > and maybe that's an issue. But that's not a lock conflict. That's a > re-entrancy problem. I don't know that we want to talk about > re-entrancy problems in an error message, and I don't really think > this error message should exist in the first place, but if we're going > to error out in this case surely we shouldn't do so with an error > message that describes a problem we don't have. > > Third, I think that the re-entrancy problems with this patch may > extend well beyond lock acquisition. This is one example of how it can > be unsafe to do something as complicated as EXPLAIN at any arbitrary > CHECK_FOR_INTERRUPTS(). It is not correct to say, as > http://postgr.es/m/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw@mail.gmail.com > does, that the problems with running EXPLAIN at an arbitrary point are > specific to running this code in an aborted transaction. The existence > of this code shows that there is at least one hazard even if the > current transaction is not aborted, and I see no analysis on this > thread indicating whether there are any more such hazards, or how we > could go about finding them all. > > I think the issue is very general. We have lots of subsystems that > both (a) use global variables and (b) contain CHECK_FOR_INTERRUPTS(). > If we process an interrupt while that code is in the middle of > manipulating its global variables and then again re-enter that code, > bad things might happen. We could try to rule that out by analyzing > all such subsystems and all places where CHECK_FOR_INTERRUPTS() may > appear, but that's very difficult. Suppose we took the alternative > approach recommended by Andrey Lepikhov in > http://postgr.es/m/b1b110ae-61f6-4fd9-9b94-f967db9b53d4@app.fastmail.com > and instead set a flag that gets handled in some suitable place in the > executor code, like ExecProcNode(). If we did that, then we would know > that we're not in the middle of a syscache lookup, a catcache lookup, > a heavyweight lock acquisition, an ereport, or any other low-level > subsystem call that might create problems for the EXPLAIN code. > > In that design, the hack shown above would go away, and we could be > much more certain that we don't need any other similar hacks for other > subsystems. The only downside is that we might experience a slightly > longer delay before a requested EXPLAIN plan actually shows up, but > that seems like a pretty small price to pay for being able to reason > about the behavior of the system. I don't *think* there are any cases > where we run in the executor for a particularly long time without a > new call to ExecProcNode(). > > I think this case is a bit like vacuum_delay_point(). You might think > that vacuum_delay_point() could be moved inside of > CHECK_FOR_INTERRUPTS(), but we've made the opposite decision: every > vacuum_delay_point() calls CHECK_FOR_INTERRUPTS() but not every > CHECK_FOR_INTERRUPTS() calls vacuum_delay_point(). That means that we > can allow vacuum_delay_point() only at cases where we know it's safe, > rather than at every CHECK_FOR_INTERRUPTS(). I think that's a pretty > smart decision, even for vacuum_delay_point(), and AFAICS the code > we're proposing to run here does things that are substantially more > complicated than what vacuum_delay_point() does. That code just does a > bit of reading of shared memory, reports a wait event, and sleeps. > That's really simple compared to code that could try to do relcache > builds, which can trigger syscache lookups, which can both trigger > heavyweight lock acquisition, lightweight lock acquisition, bringing > pages into shared_buffers possibly through physical I/O, processing of > invalidation messages, and a bunch of other stuff. > > It's really hard for me to accept that the heavyweight lock problem > for which the patch contains a workaround is the only one that exists. > I can't see any reason why that should be true. Thanks for the review and the very detailed explanation! I'm convinced that it's unsafe to execute EXPLAIN codes during CHECK_FOR_INTERRUPTS() and we need to execute it in other safe place, as well as the first and second point. On 2024-02-16 03:59, Andres Freund wrote: > Hi, > > On 2024-02-15 14:42:11 +0530, Robert Haas wrote: >> I think the issue is very general. We have lots of subsystems that >> both (a) use global variables and (b) contain CHECK_FOR_INTERRUPTS(). >> If we process an interrupt while that code is in the middle of >> manipulating its global variables and then again re-enter that code, >> bad things might happen. We could try to rule that out by analyzing >> all such subsystems and all places where CHECK_FOR_INTERRUPTS() may >> appear, but that's very difficult. Suppose we took the alternative >> approach recommended by Andrey Lepikhov in >> http://postgr.es/m/b1b110ae-61f6-4fd9-9b94-f967db9b53d4@app.fastmail.com >> and instead set a flag that gets handled in some suitable place in the >> executor code, like ExecProcNode(). If we did that, then we would know >> that we're not in the middle of a syscache lookup, a catcache lookup, >> a heavyweight lock acquisition, an ereport, or any other low-level >> subsystem call that might create problems for the EXPLAIN code. >> >> In that design, the hack shown above would go away, and we could be >> much more certain that we don't need any other similar hacks for other >> subsystems. The only downside is that we might experience a slightly >> longer delay before a requested EXPLAIN plan actually shows up, but >> that seems like a pretty small price to pay for being able to reason >> about the behavior of the system. > > I am very wary of adding overhead to ExecProcNode() - I'm quite sure > that > adding code there would trigger visible overhead for query times. > > If we went with something like tht approach, I think we'd have to do > something > like redirecting node->ExecProcNode to a wrapper, presumably from > within a > CFI. That wrapper could then implement the explain support, without > slowing > down the normal execution path. Thanks for the idea! I'm not so sure about the implementation now, i.e. finding the next node to be executed from the planstate tree, but I'm going to try this approach. >> I don't *think* there are any cases where we run in the executor for a >> particularly long time without a new call to ExecProcNode(). > > I guess it depends on what you call a long time. A large sort, for > example, > could spend a fair amount of time inside tuplesort, similarly, a gather > node > might need to wait for a worker for a while etc. > > >> It's really hard for me to accept that the heavyweight lock problem >> for which the patch contains a workaround is the only one that exists. >> I can't see any reason why that should be true. > > I suspect you're right. > > Greetings, > > Andres Freund -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
On Fri, Feb 16, 2024 at 12:29 AM Andres Freund <andres@anarazel.de> wrote: > If we went with something like tht approach, I think we'd have to do something > like redirecting node->ExecProcNode to a wrapper, presumably from within a > CFI. That wrapper could then implement the explain support, without slowing > down the normal execution path. That's an annoying complication; maybe there's some better way to handle this. But I think we need to do something different than what the patch does currently because... > > It's really hard for me to accept that the heavyweight lock problem > > for which the patch contains a workaround is the only one that exists. > > I can't see any reason why that should be true. > > I suspect you're right. ...I think the current approach is just plain dead, because of this issue. We can't take an approach that creates an unbounded number of unclear reentrancy issues and then insert hacks one by one to cure them (or hack around them, more to the point) as they're discovered. The premise has to be that we only allow logging the query plan at points where we know it's safe, rather than, as at present, allowing it in places that are unsafe and then trying to compensate with code elsewhere. That's not likely to ever be as stable as we want PostgreSQL to be. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Feb 19, 2024 at 11:53 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Feb 16, 2024 at 12:29 AM Andres Freund <andres@anarazel.de> wrote: > > If we went with something like tht approach, I think we'd have to do something > > like redirecting node->ExecProcNode to a wrapper, presumably from within a > > CFI. That wrapper could then implement the explain support, without slowing > > down the normal execution path. > > That's an annoying complication; maybe there's some better way to > handle this. But I think we need to do something different than what > the patch does currently because... > > > > It's really hard for me to accept that the heavyweight lock problem > > > for which the patch contains a workaround is the only one that exists. > > > I can't see any reason why that should be true. > > > > I suspect you're right. > > ...I think the current approach is just plain dead, because of this > issue. We can't take an approach that creates an unbounded number of > unclear reentrancy issues and then insert hacks one by one to cure > them (or hack around them, more to the point) as they're discovered. > > The premise has to be that we only allow logging the query plan at > points where we know it's safe, rather than, as at present, allowing > it in places that are unsafe and then trying to compensate with code > elsewhere. That's not likely to ever be as stable as we want > PostgreSQL to be. This is potentially a bit of a wild idea, but I wonder if having some kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in "normal" as opposed to "critical" (using that word differently than the existing critical sections) would be worth it. Regards, James Coleman
On Thu, Feb 22, 2024 at 6:25 AM James Coleman <jtc331@gmail.com> wrote: > > > > ...I think the current approach is just plain dead, because of this > > issue. We can't take an approach that creates an unbounded number of > > unclear reentrancy issues and then insert hacks one by one to cure > > them (or hack around them, more to the point) as they're discovered. > > > > The premise has to be that we only allow logging the query plan at > > points where we know it's safe, rather than, as at present, allowing > > it in places that are unsafe and then trying to compensate with code > > elsewhere. That's not likely to ever be as stable as we want > > PostgreSQL to be. > > This is potentially a bit of a wild idea, but I wonder if having some > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in > "normal" as opposed to "critical" (using that word differently than > the existing critical sections) would be worth it. My hunch is this will end up being a maintenance burden since every caller has to decide (carefully) whether the call is under normal condition or not. Developers will tend to take a safe approach and flag calls as critical. But importantly, what's normal for one interrupt action may be critical for another and vice versa. Approach would be useful depending upon how easy it is to comprehend the definition of "normal". If a query executes for longer than a user defined threashold (session level GUC? or same value as auto_explain parameter), the executor proactively prepares an EXPLAIN output and keeps it handy in case asked for. It can do so at a "known" safe place and time rather than at any random time and location. Extra time spent in creating EXPLAIN output may not be noticeable in a long running query. The EXPLAIN output could be saved in pg_stat_activity or similar place. This will avoid signaling the backend. -- Best Wishes, Ashutosh Bapat
On Thu, Feb 22, 2024 at 6:25 AM James Coleman <jtc331@gmail.com> wrote: > This is potentially a bit of a wild idea, but I wonder if having some > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in > "normal" as opposed to "critical" (using that word differently than > the existing critical sections) would be worth it. It's worth considering, but the definition of "normal" vs. "critical" might be hard to pin down. Or, we might end up with a definition that is specific to this particular case and not generalizable to others. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On Fri, Feb 23, 2024 at 10:22:32AM +0530, Robert Haas wrote: > On Thu, Feb 22, 2024 at 6:25 AM James Coleman <jtc331@gmail.com> wrote: > > This is potentially a bit of a wild idea, but I wonder if having some > > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in > > "normal" as opposed to "critical" (using that word differently than > > the existing critical sections) would be worth it. > > It's worth considering, but the definition of "normal" vs. "critical" > might be hard to pin down. Or, we might end up with a definition that > is specific to this particular case and not generalizable to others. But it doesn't have to be all or nothing right? I mean each call could say what the situation is like in their context, like CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), and slowly tag calls as needed, similarly to how we add already CFI based on users report.
On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > On Fri, Feb 23, 2024 at 10:22:32AM +0530, Robert Haas wrote: > > On Thu, Feb 22, 2024 at 6:25 AM James Coleman <jtc331@gmail.com> wrote: > > > This is potentially a bit of a wild idea, but I wonder if having some > > > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in > > > "normal" as opposed to "critical" (using that word differently than > > > the existing critical sections) would be worth it. > > > > It's worth considering, but the definition of "normal" vs. "critical" > > might be hard to pin down. Or, we might end up with a definition that > > is specific to this particular case and not generalizable to others. > > But it doesn't have to be all or nothing right? I mean each call could say > what the situation is like in their context, like > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), and > slowly tag calls as needed, similarly to how we add already CFI based on users > report. Absolutely. My gut feeling is that it's going to be simpler to pick a small number of places that are safe and sufficient for this particular feature and add an extra call there, similar to how we do vacuum_delay_point(). The reason I think that's likely to be better is that it will likely require changing only a relatively small number of places. If we instead start annotating CFIs, well, we've got hundreds of those. That's a lot more to change, and it also inconveniences third-party extension authors and people doing back-patching. I'm not here to say it can't work; I just think it's likely not the easiest path. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Feb 23, 2024 at 10:23 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Fri, Feb 23, 2024 at 10:22:32AM +0530, Robert Haas wrote: > > > On Thu, Feb 22, 2024 at 6:25 AM James Coleman <jtc331@gmail.com> wrote: > > > > This is potentially a bit of a wild idea, but I wonder if having some > > > > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in > > > > "normal" as opposed to "critical" (using that word differently than > > > > the existing critical sections) would be worth it. > > > > > > It's worth considering, but the definition of "normal" vs. "critical" > > > might be hard to pin down. Or, we might end up with a definition that > > > is specific to this particular case and not generalizable to others. > > > > But it doesn't have to be all or nothing right? I mean each call could say > > what the situation is like in their context, like > > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), and > > slowly tag calls as needed, similarly to how we add already CFI based on users > > report. > > Absolutely. My gut feeling is that it's going to be simpler to pick a > small number of places that are safe and sufficient for this > particular feature and add an extra call there, similar to how we do > vacuum_delay_point(). The reason I think that's likely to be better is > that it will likely require changing only a relatively small number of > places. If we instead start annotating CFIs, well, we've got hundreds > of those. That's a lot more to change, and it also inconveniences > third-party extension authors and people doing back-patching. I'm not > here to say it can't work; I just think it's likely not the easiest > path. Yes, I suspect it's not the easiest path. I have a small hunch it might end up paying more dividends in the future: there isn't just one of these things that is regularly a thorny discussion for the same reasons each time (basically "no way to trigger this safely from another backend interrupting another one at an arbitrary point"), and if we were able to generalize a solution we may have multiple wins (a very obvious one in my mind is the inability of auto explain to run an explain at the precise time it's most useful: when statement timeout fires). But it's also possible there are simply ways that get us more than this scenario also, so I might be wrong; it's merely a hunch. Regards, James Coleman
On Sat, Feb 24, 2024 at 08:56:41AM -0500, James Coleman wrote: > On Fri, Feb 23, 2024 at 10:23 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > But it doesn't have to be all or nothing right? I mean each call could say > > > what the situation is like in their context, like > > > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), and > > > slowly tag calls as needed, similarly to how we add already CFI based on users > > > report. > > > > Absolutely. My gut feeling is that it's going to be simpler to pick a > > small number of places that are safe and sufficient for this > > particular feature and add an extra call there, similar to how we do > > vacuum_delay_point(). The reason I think that's likely to be better is > > that it will likely require changing only a relatively small number of > > places. If we instead start annotating CFIs, well, we've got hundreds > > of those. That's a lot more to change, and it also inconveniences > > third-party extension authors and people doing back-patching. I'm not > > here to say it can't work; I just think it's likely not the easiest > > path. > > Yes, I suspect it's not the easiest path. I have a small hunch it > might end up paying more dividends in the future: there isn't just one > of these things that is regularly a thorny discussion for the same > reasons each time (basically "no way to trigger this safely from > another backend interrupting another one at an arbitrary point"), and > if we were able to generalize a solution we may have multiple wins (a > very obvious one in my mind is the inability of auto explain to run an > explain at the precise time it's most useful: when statement timeout > fires). Yeah, trying to find a generalized solution seems like worth investing some time to avoid having a bunch of CHECK_FOR_XXX() calls scattered in the code a few years down the road. I might be missing something, but since we already have a ton of macro hacks, why not get another to allow CFI() overloading rather than modifying every single call? Something like that should do the trick (and CFIFlagHandler() is just a naive example with a function call to avoid multiple evaluation, should be replaced with anything that required more than 10s thinking): #define CHECK_FOR_INTERRUPTS_0() \ do { \ if (INTERRUPTS_PENDING_CONDITION()) \ ProcessInterrupts(); \ } while(0) #define CHECK_FOR_INTERRUPTS_1(f) \ do { \ if (INTERRUPTS_PENDING_CONDITION()) \ ProcessInterrupts(); \ \ CFIFlagHandler(f); \ } while(0) #define CHECK_FOR_INTERRUPTS_X(x, f, CFI_IMPL, ...) CFI_IMPL #define CHECK_FOR_INTERRUPTS(...) \ CHECK_FOR_INTERRUPTS_X(, ##__VA_ARGS__, \ CHECK_FOR_INTERRUPTS_1(__VA_ARGS__), \ CHECK_FOR_INTERRUPTS_0(__VA_ARGS__) \ ) We would have to duplicate the current CFI body, but it should never really change, and we shouldn't end up with more than 2 overloads anyway so I don't see it being much of a problem.
On Sun, Feb 25, 2024 at 5:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Sat, Feb 24, 2024 at 08:56:41AM -0500, James Coleman wrote: > > On Fri, Feb 23, 2024 at 10:23 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > > > On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > > > But it doesn't have to be all or nothing right? I mean each call could say > > > > what the situation is like in their context, like > > > > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), and > > > > slowly tag calls as needed, similarly to how we add already CFI based on users > > > > report. That has some potential ... > > Yeah, trying to find a generalized solution seems like worth investing some > time to avoid having a bunch of CHECK_FOR_XXX() calls scattered in the code a > few years down the road. > > I might be missing something, but since we already have a ton of macro hacks, > why not get another to allow CFI() overloading rather than modifying every > single call? Something like that should do the trick (and CFIFlagHandler() is > just a naive example with a function call to avoid multiple evaluation, should > be replaced with anything that required more than 10s thinking): > > #define CHECK_FOR_INTERRUPTS_0() \ > do { \ > if (INTERRUPTS_PENDING_CONDITION()) \ > ProcessInterrupts(); \ > } while(0) > > #define CHECK_FOR_INTERRUPTS_1(f) \ > do { \ > if (INTERRUPTS_PENDING_CONDITION()) \ > ProcessInterrupts(); \ > \ > CFIFlagHandler(f); \ > } while(0) From your earlier description I thought you are talking about flags that can be ORed. We need only two macros above. Why are we calling CFIFLagHandler() after ProcessInterrupts()? Shouldn't we pass flags to ProcessInterrupts() itself. > > #define CHECK_FOR_INTERRUPTS_X(x, f, CFI_IMPL, ...) CFI_IMPL > > #define CHECK_FOR_INTERRUPTS(...) \ > CHECK_FOR_INTERRUPTS_X(, ##__VA_ARGS__, \ > CHECK_FOR_INTERRUPTS_1(__VA_ARGS__), \ > CHECK_FOR_INTERRUPTS_0(__VA_ARGS__) \ > ) > > We would have to duplicate the current CFI body, but it should never really > change, and we shouldn't end up with more than 2 overloads anyway so I don't > see it being much of a problem. Why do we need this complex macro? -- Best Wishes, Ashutosh Bapat
On Mon, Feb 26, 2024 at 12:19:42PM +0530, Ashutosh Bapat wrote: > On Sun, Feb 25, 2024 at 5:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > > > > > But it doesn't have to be all or nothing right? I mean each call could say > > > > > what the situation is like in their context, like > > > > > CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | GUARANTEE_WHATEVER), and > > > > > slowly tag calls as needed, similarly to how we add already CFI based on users > > > > > report. > > That has some potential ... > > > > > I might be missing something, but since we already have a ton of macro hacks, > > why not get another to allow CFI() overloading rather than modifying every > > single call? Something like that should do the trick (and CFIFlagHandler() is > > just a naive example with a function call to avoid multiple evaluation, should > > be replaced with anything that required more than 10s thinking): > > > > #define CHECK_FOR_INTERRUPTS_0() \ > > do { \ > > if (INTERRUPTS_PENDING_CONDITION()) \ > > ProcessInterrupts(); \ > > } while(0) > > > > #define CHECK_FOR_INTERRUPTS_1(f) \ > > do { \ > > if (INTERRUPTS_PENDING_CONDITION()) \ > > ProcessInterrupts(); \ > > \ > > CFIFlagHandler(f); \ > > } while(0) > > From your earlier description I thought you are talking about flags > that can be ORed. We need only two macros above. Why are we calling > CFIFLagHandler() after ProcessInterrupts()? Shouldn't we pass flags to > ProcessInterrupts() itself. Yes, I'm still talking about ORed flags passed to CFI(). That CFIFlagHandler call is just an example for a generalized function that would act those flags, rather than having it coded inside the macro. > > > > > #define CHECK_FOR_INTERRUPTS_X(x, f, CFI_IMPL, ...) CFI_IMPL > > > > #define CHECK_FOR_INTERRUPTS(...) \ > > CHECK_FOR_INTERRUPTS_X(, ##__VA_ARGS__, \ > > CHECK_FOR_INTERRUPTS_1(__VA_ARGS__), \ > > CHECK_FOR_INTERRUPTS_0(__VA_ARGS__) \ > > ) > > > > We would have to duplicate the current CFI body, but it should never really > > change, and we shouldn't end up with more than 2 overloads anyway so I don't > > see it being much of a problem. > > Why do we need this complex macro? So that client code can use either CHECK_FOR_INTERRUPTS() or CHECK_FOR_INTERRUPTS(flag) rather that transforming every single CHECK_FOR_INTERRUPTS() to CHECK_FOR_INTERRUPTS(0), which was Robert's complaint about this approach.
On Mon, Feb 26, 2024 at 1:25 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > > > #define CHECK_FOR_INTERRUPTS_X(x, f, CFI_IMPL, ...) CFI_IMPL > > > > > > #define CHECK_FOR_INTERRUPTS(...) \ > > > CHECK_FOR_INTERRUPTS_X(, ##__VA_ARGS__, \ > > > CHECK_FOR_INTERRUPTS_1(__VA_ARGS__), \ > > > CHECK_FOR_INTERRUPTS_0(__VA_ARGS__) \ > > > ) > > > > > > We would have to duplicate the current CFI body, but it should never really > > > change, and we shouldn't end up with more than 2 overloads anyway so I don't > > > see it being much of a problem. > > > > Why do we need this complex macro? > > So that client code can use either CHECK_FOR_INTERRUPTS() or > CHECK_FOR_INTERRUPTS(flag) rather that transforming every single > CHECK_FOR_INTERRUPTS() to CHECK_FOR_INTERRUPTS(0), which was Robert's > complaint about this approach. It might be better to just create two marcos (with names like CHECK_FOR_INTERRUPTS() and CHECK_FOR_INTERRUPTS_SAFE()) to call ProcessInterrupt() directly and modify ProcessInterrupts() to accept the flag (and if required call CFIFlagHandler() additionally). Last macro is hard to understand. -- Best Wishes, Ashutosh Bapat
On Sun, Feb 25, 2024 at 5:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > Yeah, trying to find a generalized solution seems like worth investing some > time to avoid having a bunch of CHECK_FOR_XXX() calls scattered in the code a > few years down the road. I just don't really see how to do it. I suspect that every task that wants to run at some CFIs but not others is going to have slightly different requirements, and we probably can't foresee what all of those requirements are. Said another way, if in the future we want to call DoSomethingOrOther() from the CFI handler, well then we need to know that we're not already in the middle of using any subsystem that DoSomethingOrOther() might also try to use ... and we also need to know that we're not in the middle of doing anything that's more critical than DoSomethingOrOther(). But both of these are likely to vary in each case. EXPLAIN might be one member of a general class of things that require catalog access (and thus might take locks or lwlocks, access the catalogs, trigger invalidations, etc.) but it's not clear how general that class really is. Also, I think if we try to do too many different kinds of things at CFIs, the whole thing is going to fall apart. You'll end up failing to foresee some interactions, or the stack will get too deep, or ... something. -- Robert Haas EDB: http://www.enterprisedb.com
On 2024-02-24 00:23, Robert Haas wrote: > On Fri, Feb 23, 2024 at 7:50 PM Julien Rouhaud <rjuju123@gmail.com> > wrote: >> On Fri, Feb 23, 2024 at 10:22:32AM +0530, Robert Haas wrote: >> > On Thu, Feb 22, 2024 at 6:25 AM James Coleman <jtc331@gmail.com> wrote: >> > > This is potentially a bit of a wild idea, but I wonder if having some >> > > kind of argument to CHECK_FOR_INTERRUPTS() signifying we're in >> > > "normal" as opposed to "critical" (using that word differently than >> > > the existing critical sections) would be worth it. >> > >> > It's worth considering, but the definition of "normal" vs. "critical" >> > might be hard to pin down. Or, we might end up with a definition that >> > is specific to this particular case and not generalizable to others. >> >> But it doesn't have to be all or nothing right? I mean each call >> could say >> what the situation is like in their context, like >> CHECK_FOR_INTERRUPTS(GUARANTEE_NO_HEAVYWEIGHT_LOCK | >> GUARANTEE_WHATEVER), and >> slowly tag calls as needed, similarly to how we add already CFI based >> on users >> report. > > Absolutely. My gut feeling is that it's going to be simpler to pick a > small number of places that are safe and sufficient for this > particular feature and add an extra call there Hmm, whether extending CHECK_FOR_INTERRUPTS() or adding extras call directly, currently I'm not sure where are the good 'places', which don't give performance impact. As attached PoC patch, I experimentally added extra calls on ExecScanFetch() which would be less called than ExecProcNode()[1]. When running sequential scan on pgbench_accounts which is on the memory, there seems a performance degradation. - Executed "select * from pgbench_accounts" for 20 times - Compared the elapsed time between the patch applied and not applied on 874d817baa160ca7e68 - there were no heap_blks_read during the query - pgbench_accounts has 3000000 rows patch NOT applied: - average: 335.88 ms - max: 367.313 ms - min: 309.609 ms patch applied: - average: 342.57 ms - max: 380.099 ms - min: 324.270 ms It would be nice if there was a place accessed once every few seconds or so.. [1] https://www.postgresql.org/message-id/20240215185911.v4o6fo444md6a3w7%40awork3.anarazel.de -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Attachment
On Mon, Feb 26, 2024 at 01:56:44PM +0530, Robert Haas wrote: > On Sun, Feb 25, 2024 at 5:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Yeah, trying to find a generalized solution seems like worth investing some > > time to avoid having a bunch of CHECK_FOR_XXX() calls scattered in the code a > > few years down the road. > > I just don't really see how to do it. I suspect that every task that > wants to run at some CFIs but not others is going to have slightly > different requirements, and we probably can't foresee what all of > those requirements are. > > Said another way, if in the future we want to call > DoSomethingOrOther() from the CFI handler, well then we need to know > that we're not already in the middle of using any subsystem that > DoSomethingOrOther() might also try to use ... and we also need to > know that we're not in the middle of doing anything that's more > critical than DoSomethingOrOther(). But both of these are likely to > vary in each case. > > EXPLAIN might be one member of a general class of things that require > catalog access (and thus might take locks or lwlocks, access the > catalogs, trigger invalidations, etc.) but it's not clear how general > that class really is. Also, I think if we try to do too many different > kinds of things at CFIs, the whole thing is going to fall apart. > You'll end up failing to foresee some interactions, or the stack will > get too deep, or ... something. I still fail to understand your point. So you say we might want to check for safe condition to run explain or DoSomethingOrOther or even DoMeh, with all different requirements. IIUC what you're saying is that we should have CHECK_FOR_EXPLAIN(), CHECK_FOR_DOSOMETHINGOROTHER() and CHECK_FOR_MEH()? And so in some code path A we could have CHECK_FOR_INTERRUPTS(); CHECK_FOR_EXPLAIN(); In another CHECK_FOR_INTERRUPTS(); CHECK_FOR_DOSOMETHINGOROTHER(); and in one happy path CHECK_FOR_INTERRUPTS(); CHECK_FOR_EXPLAIN(); CHECK_FOR_DOSOMETHINGOROTHER(); CHECK_FOR_MEH(); Or that we should still have all of those but they shouldn't be anywhere close to a CHECK_FOR_INTERRUPTS(), or something totally different? In the first case, I'm not sure why having CHECK_FOR_INTERRUPTS(EXPLAIN|...), with a combination of flags and with the flag handling being done after ProcessIterrupts(), would be any different apart from having less boilerplate lines. Similarly for the 2nd case, but relying on a single more general CHECK_FOR_CONDITION(EXPLAIN | ...) rather than N CHECK_FOR_XXX? If you just want something totally different then sure.
On Mon, Feb 26, 2024 at 5:31 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > It would be nice if there was a place accessed once every few seconds or > so.. I think this comment earlier from Andres deserves close attention: # If we went with something like tht approach, I think we'd have to do something # like redirecting node->ExecProcNode to a wrapper, presumably from within a # CFI. That wrapper could then implement the explain support, without slowing # down the normal execution path. If this is correctly implemented, the overhead in the case where the feature isn't used should be essentially zero, I believe. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Feb 28, 2024 at 1:18 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Feb 26, 2024 at 5:31 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > > It would be nice if there was a place accessed once every few seconds or > > so.. > > I think this comment earlier from Andres deserves close attention: > > # If we went with something like tht approach, I think we'd have to do something > # like redirecting node->ExecProcNode to a wrapper, presumably from within a > # CFI. That wrapper could then implement the explain support, without slowing > # down the normal execution path. > > If this is correctly implemented, the overhead in the case where the > feature isn't used should be essentially zero, I believe. If I can rephrase this idea: it's basically "delay this interrupt until inline to the next ExecProcNode execution". That seems pretty promising to me as well. Regards, James Coleman
On Sat, Mar 2, 2024 at 10:46 AM James Coleman <jtc331@gmail.com> wrote: > If I can rephrase this idea: it's basically "delay this interrupt > until inline to the next ExecProcNode execution". Yes, but it's not just that. It also means that the code which would handle the interrupt doesn't need to be called at every ExecProcNode. Only when the interrupt actually arrives do we enable the code that handles it. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Feb 16, 2024 at 11:42 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > I'm not so sure about the implementation now, i.e. finding the next > node > to be executed from the planstate tree, but I'm going to try this > approach. Attached a patch which takes this approach. - I saw no way to find the next node to be executed from the planstate tree, so the patch wraps all the ExecProcNode of the planstate tree at CHECK_FOR_INTERRUPTS(). - To prevent overhead of this wrapped function call, unwrap it at the end of EXPLAIN code execution. - I first tried to use ExecSetExecProcNode() for wrapping, but it 'changes' ExecProcNodeMtd of nodes, not 'adds' some process to ExecProcNodeMtd. I'm not sure this is the right approach, but attached patch adds new member ExecProcNodeOriginal to PlanState to preserve original ExecProcNodeMtd. Any comments are welcomed. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Attachment
On Wed, Mar 13, 2024 at 1:28 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > - I saw no way to find the next node to be executed from the planstate > tree, so the patch wraps all the ExecProcNode of the planstate tree at > CHECK_FOR_INTERRUPTS(). I don't think it does this correctly, because some node types have children other than the left and right node. See /* special child plans */ in ExplainNode(). But also ... having to wrap the entire plan tree like this seems pretty awful. I don't really like the idea of a large-scan plan modification like this in the middle of the query. I also wonder whether it interacts properly with JIT. But at the same time, I wonder how you're supposed to avoid it. Andres, did you have some clever idea for this feature that would avoid the need to do this? -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Mar 13, 2024 at 1:28 PM torikoshia <torikoshia@oss.nttdata.com> wrote: > > On Fri, Feb 16, 2024 at 11:42 PM torikoshia <torikoshia@oss.nttdata.com> > wrote: > > I'm not so sure about the implementation now, i.e. finding the next > > node > > to be executed from the planstate tree, but I'm going to try this > > approach. > > Attached a patch which takes this approach. > one minor issue. I understand the regress test, compare the expected outcome with testrun outcome, but can you enlighten me about how you check if the change you made in contrib/auto_explain/t/001_auto_explain.pl is correct. (i am not familiar with perl). diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h index cf195f1359..2d06bf297e 100644 --- a/src/include/commands/explain.h +++ b/src/include/commands/explain.h @@ -17,6 +17,8 @@ #include "lib/stringinfo.h" #include "parser/parse_node.h" +extern PGDLLIMPORT bool ProcessLogQueryPlanInterruptActive; diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h index 054dd2bf62..3c6bd1ea7c 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -167,6 +167,7 @@ struct Node; extern bool message_level_is_interesting(int elevel); +extern PGDLLIMPORT bool ProcessLogQueryPlanInterruptActive; utils/elog.h is already included in src/include/postgres.h. you don't need to declare ProcessLogQueryPlanInterruptActive at include/commands/explain.h? generally a variable should only declare once?
On 2024-03-14 04:33, Robert Haas wrote: Thanks for your review! > On Wed, Mar 13, 2024 at 1:28 AM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> - I saw no way to find the next node to be executed from the planstate >> tree, so the patch wraps all the ExecProcNode of the planstate tree at >> CHECK_FOR_INTERRUPTS(). > > I don't think it does this correctly, because some node types have > children other than the left and right node. See /* special child > plans */ in ExplainNode(). Agreed. > But also ... having to wrap the entire plan tree like this seems > pretty awful. I don't really like the idea of a large-scan plan > modification like this in the middle of the query. Yeah, but I haven't come up with other ideas to select the appropriate node to wrap. > Andres, did you have some clever idea for this feature that would > avoid the need to do this? On 2024-03-14 10:02, jian he wrote: > On Wed, Mar 13, 2024 at 1:28 PM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> >> On Fri, Feb 16, 2024 at 11:42 PM torikoshia >> <torikoshia@oss.nttdata.com> >> wrote: >> > I'm not so sure about the implementation now, i.e. finding the next >> > node >> > to be executed from the planstate tree, but I'm going to try this >> > approach. >> >> Attached a patch which takes this approach. >> > > one minor issue. > I understand the regress test, compare the expected outcome with > testrun outcome, > but can you enlighten me about how you check if the change you made in > contrib/auto_explain/t/001_auto_explain.pl is correct. > (i am not familiar with perl). $pg_log_plan_query_output saves the output plan of pg_log_query_plan() and $auto_explain_output saves the output plan of auto_explain. The test checks the both are the same using cmp_ok(). Did I answer your question? > diff --git a/src/include/commands/explain.h > b/src/include/commands/explain.h > index cf195f1359..2d06bf297e 100644 > --- a/src/include/commands/explain.h > +++ b/src/include/commands/explain.h > @@ -17,6 +17,8 @@ > #include "lib/stringinfo.h" > #include "parser/parse_node.h" > +extern PGDLLIMPORT bool ProcessLogQueryPlanInterruptActive; > > diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h > index 054dd2bf62..3c6bd1ea7c 100644 > --- a/src/include/utils/elog.h > +++ b/src/include/utils/elog.h > @@ -167,6 +167,7 @@ struct Node; > extern bool message_level_is_interesting(int elevel); > +extern PGDLLIMPORT bool ProcessLogQueryPlanInterruptActive; > > utils/elog.h is already included in src/include/postgres.h. > you don't need to declare ProcessLogQueryPlanInterruptActive at > include/commands/explain.h? > generally a variable should only declare once? Yeah, this declaration is not necessary and we should add include commands/explain.h to src/backend/access/transam/xact.c. -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Hi, On 2024-03-13 15:33:02 -0400, Robert Haas wrote: > But also ... having to wrap the entire plan tree like this seems > pretty awful. I don't really like the idea of a large-scan plan > modification like this in the middle of the query. It's not great. But I also don't really see an alternative with this approach. I guess we could invent a new CFI version that gets the current PlanState and use that in all of src/backend/executor/node* and pass the PlanState to that - but then we could just as well just directly process the interrupt there. > I also wonder whether it interacts properly with JIT. I don't think there's a problem unless somebody invests a lot of time in JITing much more of the query. Which will require a lot more work, basically redesigning the executor... > Andres, did you have some clever idea for this feature that would > avoid the need to do this? No. I think it's acceptable though. However it might be worth inventing an executor tree walker in a preliminary step. We have already quite a few switches over all plan nodes, which we could largely replace with a helper. Greetings, Andres Freund