Re: RFC: Logging plan of the running query - Mailing list pgsql-hackers

From James Coleman
Subject Re: RFC: Logging plan of the running query
Date
Msg-id CAAaqYe9QB_CUjXrvNh-cpzW6Hz7S4o5pZnR1U1TujLGptUrS1A@mail.gmail.com
Whole thread Raw
In response to Re: RFC: Logging plan of the running query  (torikoshia <torikoshia@oss.nttdata.com>)
Responses Re: RFC: Logging plan of the running query  (torikoshia <torikoshia@oss.nttdata.com>)
List pgsql-hackers
On Thu, Jun 15, 2023 at 9:00 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2023-06-15 01:48, James Coleman wrote:
> > On Tue, Jun 13, 2023 at 11:53 AM James Coleman <jtc331@gmail.com>
> > wrote:
> >>
> >> ...
> >> I'm going to re-run tests with my patch version + resetting the flag
> >> on SIGINT (and any other error condition) to be certain that the issue
> >> you uncovered (where backends get stuck after a SIGINT not responding
> >> to the requested plan logging) wasn't masking any other issues.
> >>
> >> As long as that run is clean also then I believe the patch is safe
> >> as-is even without the re-entrancy guard.
> >>
> >> I'll report back with the results of that testing.
> >
> > The tests have been running since last night, but have been apparently
> > hung now for many hours. I haven't been able to fully look into it,
> > but so far I know the hung (100% CPU) backend last logged this:
> >
> > 2023-06-14 02:00:30.045 UTC client backend[84461]
> > pg_regress/updatable_views LOG:  query plan running on backend with
> > PID 84461 is:
> >         Query Text: SELECT table_name, column_name, is_updatable
> >           FROM information_schema.columns
> >          WHERE table_name LIKE E'r_\\_view%'
> >          ORDER BY table_name, ordinal_position;
> >
> > The last output from the regression test harness was:
> >
> > # parallel group (5 tests):  index_including create_view
> > index_including_gist create_index create_index_spgist
> > ok 66        + create_index                            36508 ms
> > ok 67        + create_index_spgist                     38588 ms
> > ok 68        + create_view                              1394 ms
> > ok 69        + index_including                           654 ms
> > ok 70        + index_including_gist                     1701 ms
> > # parallel group (16 tests):  errors create_cast drop_if_exists
> > create_aggregate roleattributes constraints hash_func typed_table
> > infinite_recurse
> >
> > Attaching gdb to the hung backend shows this:
> >
> > #0  0x00005601ab1f9529 in ProcLockWakeup
> > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
> > <default_lockmethod>, lock=lock@entry=0x7f5325c913f0) at proc.c:1655
> > #1  0x00005601ab1e99dc in CleanUpLock (lock=lock@entry=0x7f5325c913f0,
> > proclock=proclock@entry=0x7f5325d40d60,
> > lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
> > <default_lockmethod>,
> >     hashcode=hashcode@entry=573498161, wakeupNeeded=<optimized out>)
> > at lock.c:1673
> > #2  0x00005601ab1e9e21 in LockRefindAndRelease
> > (lockMethodTable=lockMethodTable@entry=0x5601ab6484e0
> > <default_lockmethod>, proc=<optimized out>,
> > locktag=locktag@entry=0x5601ac3d7998, lockmode=lockmode@entry=1,
> >
> > decrement_strong_lock_count=decrement_strong_lock_count@entry=false)
> > at lock.c:3150
> > #3  0x00005601ab1edb27 in LockReleaseAll
> > (lockmethodid=lockmethodid@entry=1, allLocks=false) at lock.c:2295
> > #4  0x00005601ab1f8599 in ProcReleaseLocks
> > (isCommit=isCommit@entry=true) at proc.c:781
> > #5  0x00005601ab37f1f4 in ResourceOwnerReleaseInternal
> > (owner=<optimized out>, phase=phase@entry=RESOURCE_RELEASE_LOCKS,
> > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at
> > resowner.c:618
> > #6  0x00005601ab37f7b7 in ResourceOwnerRelease (owner=<optimized out>,
> > phase=phase@entry=RESOURCE_RELEASE_LOCKS,
> > isCommit=isCommit@entry=true, isTopLevel=isTopLevel@entry=true) at
> > resowner.c:494
> > #7  0x00005601aaec1d84 in CommitTransaction () at xact.c:2334
> > #8  0x00005601aaec2b22 in CommitTransactionCommand () at xact.c:3067
> > #9  0x00005601ab200a66 in finish_xact_command () at postgres.c:2783
> > #10 0x00005601ab20338f in exec_simple_query (
> >     query_string=query_string@entry=0x5601ac3b0858 "SELECT table_name,
> > column_name, is_updatable\n  FROM information_schema.columns\n WHERE
> > table_name LIKE E'r_\\\\_view%'\n ORDER BY table_name,
> > ordinal_position;") at postgres.c:1300
> >
> > I am unable to connect to the regression test Postgres instance --
> > psql just hangs, so the lock seems to have affected the postmaster
> > also.
> >
> > I'm wondering if this might represent a bug in the current patch.
>
> Thanks for running and analyzing the test!

Sure thing!

> Could you share me how you are running the test?
>
> I imagined something like below, but currently couldn't reproduce it.
> - apply both v26-0001 and v27-0002 and build
> - run PostgreSQL with default GUCssaaa
> - make installcheck-world
> - run 'SELECT pg_log_query_plan(pid) FROM pg_stat_activity \watch 0.1'
> during make installcheck-world

Apologies, I should have attached my updated patch (with the fix for
the bug you'd reporting with the re-entrancy guard). Attached is v28
which sets ProcessLogQueryPlanInterruptActive to false in errfinish
when necessary. Once built with those two patches I'm simply running
`make check`.

Regards,
James Coleman

Attachment

pgsql-hackers by date:

Previous
From: Yura Sokolov
Date:
Subject: Re: When IMMUTABLE is not.
Next
From: Alvaro Herrera
Date:
Subject: Re: Non-superuser subscription owners