Re: RFC: Logging plan of the running query - Mailing list pgsql-hackers
From | torikoshia |
---|---|
Subject | Re: RFC: Logging plan of the running query |
Date | |
Msg-id | 3d5fd6d955944b5e8e4b8d1917f14885@oss.nttdata.com Whole thread Raw |
In response to | Re: RFC: Logging plan of the running query (jian he <jian.universality@gmail.com>) |
Responses |
Re: RFC: Logging plan of the running query
|
List | pgsql-hackers |
On 2024-02-12 09:00, jian he wrote: Thanks for you comments. > On Mon, Jan 29, 2024 at 9:02 PM torikoshia <torikoshia@oss.nttdata.com> > wrote: >> >> Hi, >> >> Updated the patch to fix typos and move >> ProcessLogQueryPlanInterruptActive from errfinish() to >> AbortTransaction. >> > > + <row> > + <entry role="func_table_entry"><para role="func_signature"> > + <indexterm> > + <primary>pg_log_query_plan</primary> > + </indexterm> > + <function>pg_log_query_plan</function> ( > <parameter>pid</parameter> <type>integer</type> ) > + <returnvalue>boolean</returnvalue> > + </para> > + <para> > + Requests to log the plan of the query currently running on the > + backend with specified process ID. > + It will be logged at <literal>LOG</literal> message level and > + will appear in the server log based on the log > + configuration set (See <xref > linkend="runtime-config-logging"/> > + for more information), but will not be sent to the client > + regardless of <xref linkend="guc-client-min-messages"/>. > + </para></entry> > + </row> > it would be better to explain the meaning of return value TRUE/FALSE? Yeah, but I've noticed that this should be located in 'Table Server Signaling Functions' not 'Table Control Data Functions'. Since 'Table Server Signaling Functions' describes the return code as below, just relocation seems fine. Each of these functions returns true if the signal was successfully sent and false if sending the signal failed. > +# logging plan of the running query on the specified backend > +{ oid => '8000', descr => 'log plan of the running query on the > specified backend', > + proname => 'pg_log_query_plan', > + provolatile => 'v', prorettype => 'bool', > you can add > `proargnames => '{pid}'` Hmm, pg_log_query_plan() can take one argument, I'm not sure how much sense it makes. Other functions which take one argument such as pg_cancel_backend() does not have proargnames. > + if (proc == NULL) > + { > + /* > + * This is just a warning so a loop-through-resultset will not abort > + * if one backend terminated on its own during the run. > + */ > + ereport(WARNING, > + (errmsg("PID %d is not a PostgreSQL backend process", pid))); > + PG_RETURN_BOOL(false); > + } > + > + be_status = pgstat_get_beentry_by_backend_id(proc->backendId); > + if (be_status->st_backendType != B_BACKEND) > + { > + ereport(WARNING, > + (errmsg("PID %d is not a PostgreSQL client backend process", pid))); > + PG_RETURN_BOOL(false); > + } > + > + if (SendProcSignal(pid, PROCSIG_LOG_QUERY_PLAN, proc->backendId) < 0) > + { > + /* Again, just a warning to allow loops */ > + ereport(WARNING, > + (errmsg("could not send signal to process %d: %m", pid))); > + PG_RETURN_BOOL(false); > + } > > I found out that pg_log_query_plan's comments look like > pg_log_backend_memory_contexts. > pg_log_backend_memory_contexts will iterate through many memory > contexts. > but pg_log_query_plan for one specific pid will only output one plan? > so I am a little bit confused by the comments. These "loop" mean backend can run pg_log_query_plan() repeatedly even when failing sending signals. pg_signal_backend() also have such comments. > + /* > + * Ensure no page lock is held on this process. > + * > + * If page lock is held at the time of the interrupt, we can't acquire > any > + * other heavyweight lock, which might be necessary for explaining the > plan > + * when retrieving column names. > + * > + * This may be overkill, but since page locks are held for a short > duration > + * we check all the LocalLock entries and when finding even one, give > up > + * logging the plan. > + */ > + hash_seq_init(&status, GetLockMethodLocalHash()); > + while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL) > + { > + if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE) > maybe not that self evident, the above comments still not explained > why we need to ensure only > PAGE lock was held on this process? This is for preventing assertion error and it seems not necessary anymore as described in [1]. I'm going remove them. > > In the commit message, can you add all the discussion links? > My gmail account doesn't have a previous discussion history. Sure. > I am not sure this > (https://www.postgresql.org/message-id/flat/d68c3ae31672664876b22d2dcbb526d2%40postgrespro.ru) > is the only discussion link? This is the original one: https://www.postgresql.org/message-id/cf8501bcd95ba4d727cbba886ba9eea8%40oss.nttdata.com > I found a bug: > src8=# select *, pg_sleep(10) from tenk1 for update; > 2024-02-11 15:54:17.944 CST [48602] LOG: query plan running on > backend with PID 48602 is: > Query Text: select *, pg_sleep(10) from tenk1 for update; > LockRows (cost=0.00..570.00 rows=10000 width=254) > Output: unique1, unique2, two, four, ten, twenty, hundred, > thousand, twothousand, fivethous, tenthous, odd, even, stringu1, > stringu2, string4, (pg_sleep('10'::double precision)), ctid > -> Seq Scan on public.tenk1 (cost=0.00..470.00 rows=10000 > width=254) > Output: unique1, unique2, two, four, ten, twenty, > hundred, thousand, twothousand, fivethous, tenthous, odd, even, > stringu1, stringu2, string4, pg_sleep('10'::double precision), ctid > > another session (PID) executes `SELECT pg_log_query_plan(48602);` in > the meantime. > pg_log_query_plan returns true successfully, but PID 48602 was stuck. Hmm, it's not simply sleeping, is it? I'm concerned a bit this because estimated rows of tenk1 is 10000. If so, the query will take 10000 * 10 seconds. > I have problem using git apply: > error: patch failed: src/include/commands/explain.h:94 > error: src/include/commands/explain.h: patch does not apply > > `patch -p1 < /v35-0001-Add-function-to-log-the-plan-of-the-query.patch` > works I'll update the patch including other points such as removing ensuring no page lock code. [1] https://www.postgresql.org/message-id/1b2b247530f3ff3afab4ddc2df222e8b%40oss.nttdata.com -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
pgsql-hackers by date: