Re: RFC: Logging plan of the running query - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: RFC: Logging plan of the running query
Date
Msg-id 20220201.172757.1480996662235658750.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: RFC: Logging plan of the running query  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: RFC: Logging plan of the running query  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
At Tue, 1 Feb 2022 01:51:11 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>
>
> On 2022/01/28 17:45, torikoshia wrote:
> >> There is the case where the request to log a query plan is skipped
> >> even while the target backend is running a query. If this happens,
> >> users can just retry pg_log_query_plan(). These things should be
> >> documented?
> > Agreed.
> > Added following:
> >    +        Note that there is the case where the request to log a
> >  query
> >    +        plan is skipped even while the target backend is running a
> >    +        query due to lock conflict avoidance.
> >    +        If this happens, users can just retry pg_log_query_plan().
>
> This may cause users to misunderstand that pg_log_query_plan() fails
> while the target backend is holding *any* locks? Isn't it better to
> mention "page-level locks", instead? So how about the following?
>
> --------------------------
> Note that the request to log the query plan will be ignored if it's
> received during a short period while the target backend is holding a
> page-level lock.
> --------------------------
>
>
> >> +            ereport(LOG_SERVER_ONLY,
> >> +                errmsg("backend with PID %d is holding a page
> >> lock. Try again",
> >> +                    MyProcPid));
> >>
> >> It seems more proper to output this message in DETAIL or HINT,
> >> instead. So how about something like the following messages?
> >>
> >> LOG: could not log the query plan
> >> DETAIL: query plan cannot be logged while page level lock is being
> >> held
> >> HINT: Try pg_log_query_plan() after a few ....
> > Agreed.
> > I felt the HINT message 'after a few ...' is difficult to describe,
> > and wrote as below.
> > | HINT: Retrying pg_log_query_plan() might succeed since the lock
> > | duration of page level locks are usually short
> > How do you think?
>
> Or we don't need HINT message?
>
>
> +                errmsg("could not log the query plan"),
> + errdetail("query plan cannot be logged while page level lock is
> being held"),
>
> In detail message, the first word of sentences should be
> capitalized. How about "Cannot log the query plan while holding
> page-level lock.", instead?
>
>
> Thanks for updating the patch! Here are some review comments.
>
> +      <row>
> +       <entry role="func_table_entry"><para role="func_signature">
> +        <indexterm>
> +         <primary>pg_log_query_plan</primary>
> +        </indexterm>
>
> This entry is placed before one for
> pg_log_backend_memory_contexts(). But it should be *after* that since
> those entries seem to be placed in alphabetical order in the table?
>
>
> + Requests to log the plan of the query currently running on the
> +        backend with specified process ID along with the untruncated
> +        query string.
>
> Other descriptions about logging of query string seem not to mention
> something like "untruncated query string". For example, auto_explain,
> log_statement, etc. Why do we need to mention "along with the
> untruncated query string" specially for pg_log_query_plan()?
>
>
> + Note that nested statements (statements executed inside a function)
> are not
> + considered for logging. Only the plan of the most deeply nested
> query is logged.
>
> Now the plan of even nested statement can be logged. So this
> description needs to be updated?
>
>
> @@ -440,6 +450,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
>        MemoryContextSwitchTo(oldcontext);
>  +    ActiveQueryDesc = NULL;
>
> ActiveQueryDesc seems unnecessary. Why does ActiveQueryDesc need to be
> reset to NULL in standard_ExecutorFinish()?
>
>
> Currently even during ProcessLogQueryPlanInterrupt(),
> pg_log_query_plan() can be call and another
> ProcessLogQueryPlanInterrupt() can be executed. So repeatable
> re-entrances to ProcessLogQueryPlanInterrupt() could cause "stack
> depth limit exceeded" error. To avoid this, shouldn't we make
> ProcessLogQueryPlanInterrupt() do nothing and return immediately, if
> it's called during another ProcessLogQueryPlanInterrupt()?
>
> pg_log_backend_memory_contexts() also might have the same issue.

Good catch.

By the way, I'm anxious about the following part and I'd like to
remove it.

+     * Ensure no page lock is held on this process.

It seems to me what is wrong is ginInsertCleanup(), not this feature.
As I read the comment for the assertion, I don't believe we want to
allow CFI while holding a page lock.  And AFAICS the function is the
only point where doing that.  (It is the alone user of LockPage()...)

This is the assertion.

lock.c: 902
>     /*
>      * We don't acquire any other heavyweight lock while holding the page lock
>      * except for relation extension.
>      */
>     Assert(!IsPageLockHeld ||
>            (locktag->locktag_type == LOCKTAG_RELATION_EXTEND));

It is added by a recent commit 72e78d831ab5550c39f2dcc7cc5d44c406ec3dc2.

> * Similar to relation extension, page locks are also held for a short
> * duration, so imposing such a restriction won't hurt.

I don't believe a path involving vacuum_delay_point() calls is
short-duration'ed.

https://www.postgresql.org/message-id/CAH2-WznzCPUKnOV%2Bre30_rHLCkqQWm2JTSVjTCAei9LySTc2pw%40mail.gmail.com

> One thing that really bothers me about commit e2c79e14 is that
> LockPage() is called, not LockBuffer(). GIN had no LockPage() calls
> before that commit, and is now the only code in the entire system that
> calls LockPage()/ConditionalLockPage() (the hash am no longer uses
> page heavyweight locks following recent work there).

I agree to the discussion.  Can't we use other mechanism here to get
rid of the Lockpage()?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Greg Nancarrow
Date:
Subject: Re: row filtering for logical replication
Next
From: Pavel Borisov
Date:
Subject: Re: Make mesage at end-of-recovery less scary.