Re: RFC: Logging plan of the running query - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: RFC: Logging plan of the running query
Date
Msg-id CAExHW5ujL=mzbjxf6xtyU-Ft6LMBMEmt8H4ZSDiiZre0Fef0jg@mail.gmail.com
Whole thread Raw
In response to Re: RFC: Logging plan of the running query  (torikoshia <torikoshia@oss.nttdata.com>)
Responses Re: RFC: Logging plan of the running query
List pgsql-hackers
Hi,

On Tue, Oct 10, 2023 at 7:00 PM torikoshia <torikoshia@oss.nttdata.com> wrote:

> Thanks for your comments!
>
> I tested v30 patch with
> v28-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch which makes
> CFI() call ProcessLogQueryPlanInterrupt() internally, and confirmed that
> very few logging queries failed with v30 patch.
>
> This is a result in line with your prediction.
>
>
> ```
>   $ rg -c'ignored request for logging query plan due to lock confilcts'
> postmaster.log
>   8

Like many others I think this feature is useful to debug a long running query.

Sorry for jumping late into this.

I have a few of high level comments

There is a lot of similarity between what this feature does and what
auto explain does. I see the code is also duplicated. There is some
merit in avoiding this duplication
1. we will get all the features of auto_explain automatically like
choosing a format (this was expressed somebody earlier in this
thread), setings etc.
2. avoid bugs. E.g your code switches context after ExplainState has
been allocated. These states may leak depending upon when this
function gets called.
3. Building features on top as James envisions will be easier.

Considering the similarity with auto_explain I wondered whether this
function should be part of auto_explain contrib module itself? If we
do that users will need to load auto_explain extension and thus
install executor hooks when this function doesn't need those. So may
not be such a good idea. I didn't see any discussion on this.

I tried following query to pass PID of a non-client backend to this function.
#select pg_log_query_plan(pid), application_name, backend_type from
pg_stat_activity where backend_type = 'autovacuum launcher';
 pg_log_query_plan | application_name |    backend_type
-------------------+------------------+---------------------
 t                 |                  | autovacuum launcher
(1 row)
I see "LOG:  backend with PID 2733631 is not running a query or a
subtransaction is aborted" in server logs. That's ok. But may be we
should not send signal to these kinds of backends at all, thus
avoiding some system calls.

I am also wondering whether it's better to report the WARNING as
status column in the output. E.g. instead of
#select pg_log_query_plan(100);
WARNING:  PID 100 is not a PostgreSQL backend process
 pg_log_query_plan
-------------------
 f
(1 row)
we output
#select pg_log_query_plan(100);
 pg_log_query_plan |                   status
-------------------+---------------------------------------------
 f                 | PID 100 is not a PostgreSQL backend process
(1 row)

That looks neater and can easily be handled by scripts, applications
and such. But it will be inconsistent with other functions like
pg_terminate_backend() and pg_log_backend_memory_contexts().

I do share a concern that was discussed earlier. If a query is running
longer, there's something problematic with it. A diagnostic
intervention breaking it further would be unwelcome. James has run
experiments to shake this code for any loose breakages. He has not
found any. So may be we are good. And we wouldn't know about very rare
corner cases so easily without using it in the field. So fine with it.
If we could add some safety net that will be great but may not be
necessary for the first cut.

--
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Mingyu Li
Date:
Subject: Re: [PoC] run SQL over ciphertext
Next
From: Mingyu Li
Date:
Subject: Re: [PoC] run SQL over ciphertext