Re: Proposal: Progressive explain - Mailing list pgsql-hackers

From Euler Taveira
Subject Re: Proposal: Progressive explain
Date
Msg-id d61f6093-5210-42c3-9f82-33f21ac6e2db@app.fastmail.com
Whole thread Raw
In response to Re: Proposal: Progressive explain  (Rafael Thofehrn Castro <rafaelthca@gmail.com>)
Responses Re: Proposal: Progressive explain
List pgsql-hackers
On Fri, Mar 7, 2025, at 5:34 PM, Rafael Thofehrn Castro wrote:
 Did additional benchmarks and found issues with the patch that doesn't do execProcNode
wrapping. There are sporadic crashes with double free or corruption (top)

So making the patch that uses the wrapper the current one. Again, giving the credits to
torikoshia as being the owner of that section of the code.


Rafael, thanks for working on it. It is a step forward in observability. I
started with some performance tests and the last improvements seem to fix the
overhead imposed in the initial patch version. I didn't notice any of these new
function in the perf report while executing fast queries.

I found a crash. It is simple to reproduce.

Session A:

select * from pg_stat_progress_explain;
\watch 2

Session B:

explain select pg_sleep(30);
\watch 2

8<--------------------------------------------------------------------8<

Backtrace:

Core was generated by `postgres: euler postgres [lo'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  WrapExecProcNodeWithExplain (ps=0x7f7f7f7f7f7f7f7f) at explain.c:5401
5401 if (ps->ExecProcNodeOriginal != NULL)
#0  WrapExecProcNodeWithExplain (ps=0x7f7f7f7f7f7f7f7f) at explain.c:5401
#1  0x00005624173829aa in handle_sig_alarm (postgres_signal_arg=<optimized out>) at timeout.c:414
#2  0x00005624173ba02c in wrapper_handler (postgres_signal_arg=14) at pqsignal.c:110
#3  <signal handler called>
#4  0x00007f20fa529e63 in epoll_wait (epfd=6, events=0x56244ef37e58, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#5  0x00005624171fb02f in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7ffdd9e62080, cur_timeout=-1, set=0x56244ef37dd8) at waiteventset.c:1190
#6  WaitEventSetWait (set=0x56244ef37dd8, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7ffdd9e62080, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=100663296) at waiteventset.c:1138
#7  0x000056241709513c in secure_read (port=0x56244eeb90e0, ptr=0x56241775a9a0 <PqRecvBuffer>, len=8192) at be-secure.c:218
#8  0x000056241709bf2e in pq_recvbuf () at pqcomm.c:924
#9  0x000056241709ceb5 in pq_getbyte () at pqcomm.c:970
#10 0x000056241721b617 in SocketBackend (inBuf=0x7ffdd9e622a0) at postgres.c:361
#11 ReadCommand (inBuf=0x7ffdd9e622a0) at postgres.c:484
#12 PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4625
#13 0x00005624172167ed in BackendMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at backend_startup.c:107
#14 0x000056241717519b in postmaster_child_launch (child_type=<optimized out>, child_slot=2, startup_data=startup_data@entry=0x7ffdd9e6253c, startup_data_len=startup_data_len@entry=4, client_sock=client_sock@entry=0x7ffdd9e62540) at launch_backend.c:274
#15 0x0000562417178c32 in BackendStartup (client_sock=0x7ffdd9e62540) at postmaster.c:3519
#16 ServerLoop () at postmaster.c:1688
#17 0x000056241717a6da in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x56244eeb81b0) at postmaster.c:1386
#18 0x0000562416e64f9a in main (argc=1, argv=0x56244eeb81b0) at main.c:230

8<--------------------------------------------------------------------8<

You call this feature "progressive explain". My first impression is that it
will only provide the execution plans for EXPLAIN commands. Instead of
"progressive explain", I would suggest "query progress" that is a general
database terminology. It seems natural to use "progressive explain" since you
are using the explain infrastructure (including the same options -- format,
settings, wal, ...) -- to print the execution plan.

+CREATE VIEW pg_stat_progress_explain AS
+    SELECT
+            *
+    FROM pg_stat_progress_explain(true);
+

There is no use for the function argument. If you decide to keep this function,
remove it.

Why don't you use the pgstat_progress_XXX() API? Since you are using a
pg_stat_progress_XXX view name I would expect using the command progress
reporting infrastructure (see backend_progress.c).

Maybe you could include datid and datname as the other progress reporting
views. It would avoid a join to figure out what the database is.

+static const struct config_enum_entry explain_format_options[] = {
+   {"text", EXPLAIN_FORMAT_TEXT, false},
+   {"xml", EXPLAIN_FORMAT_XML, false},
+   {"json", EXPLAIN_FORMAT_JSON, false},
+   {"yaml", EXPLAIN_FORMAT_YAML, false},
+   {NULL, 0, false}
+};

Isn't it the same definition as in auto_explain.c? Use only one definition for
auto_explain and this feature. You can put this struct into explain.c, use an
extern definition for guc_tables.c and put a extern PGDLLIMPORT defintion into
guc.h. See wal_level_options, for an example.

+static const struct config_enum_entry progressive_explain_options[] = {
+   {"off", PROGRESSIVE_EXPLAIN_NONE, false},
+   {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false},
+   {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false},
+   {"false", PROGRESSIVE_EXPLAIN_NONE, true},
+   {NULL, 0, false}
+};

The "analyze" is a separate option in auto_explain. Should we have 2 options?
One that enable/disable this feature and another one that enable/disable
analyze option.

Don't the other EXPLAIN options make sense here? Like serialize and summary.

    TupleDesc   tupDesc;        /* descriptor for result tuples */
    EState     *estate;         /* executor's query-wide state */
    PlanState  *planstate;      /* tree of per-plan-node state */
+   struct ExplainState *pe_es; /* progressive explain state if enabled */

Should you use the same name pattern here? pestate, for example.

PG_LWLOCK(52, SerialControl)
+PG_LWLOCK(53, ExplainHash)

Could you use a specific name? Even if you keep the proposed name, you should
use ProgressiveExplainHash, ProgressiveExplain or QueryProgress.

+$node->init;
+# Configure progressive explain to be logged immediatelly
+$node->append_conf('postgresql.conf', 'progressive_explain_min_duration = 0');
+$node->start;

s/immediatelly/immediately/

+typedef struct progressiveExplainHashKey
+{
+   int         pid;            /* PID */
+} progressiveExplainHashKey;
+    
+typedef struct progressiveExplainHashEntry
+{
+   progressiveExplainHashKey key;  /* hash key of entry - MUST BE FIRST */
+   dsa_handle  h;
+   dsa_pointer p;
+} progressiveExplainHashEntry;

You don't need progressiveExplainHashKey. Use pid as key directly.

+           /* Update shared memory with new data */
+           strcpy(pe_data->plan, es->str->data);
+           pe_data->last_print = GetCurrentTimestamp();

I don't think last_print is accurate because it is not the time the execution plan
is printed but the time it was updated. I suggest last_updated_time.

+/* Flag set by timeouts to control when to print progressive explains */
+bool       ProgressiveExplainPending = false;

s/print/update/

There are other point that you use "print" but it is better to replace it with
"update".

+   progressiveExplainArray = ShmemInitHash("progressive explain hash",
+                                           50, 50,
+                                           &info,
+                                           HASH_ELEM | HASH_BLOBS);

I'm wondering why you use "array" in the name. ProgressiveExplainHash is a
better name.

+               entry->p = dsa_allocate(es->pe_a,
+                                       add_size(sizeof(progressiveExplainData),
+                                                add_size(strlen(es->str->data),
+                                                         PROGRESSIVE_EXPLAIN_ALLOC_SIZE)));

I think you need a better name for PROGRESSIVE_EXPLAIN_ALLOC_SIZE because it
doesn't reflect what it is. PROGRESSIVE_EXPLAIN_FREE_SIZE or
PROGRESSIVE_EXPLAIN_AVAIL_SIZE?

Maybe you can use dshash.

There are some comments that still refers to the wrong function name.

+/*
+ * ExecProcNodeWithExplain
+ * ExecProcNode wrapper that initializes progressive explain
+ * and uwraps ExecProdNode to the original function.
+ */
+static TupleTableSlot *
+ExecProcNodeExplain(PlanState *node)

and

+/*
+ * ExecProcNodeWithExplain
+ * Responsible for initialization of all structures related to progressive
+ * explains.
+ *

+   /* state related to progressive explains */
+   struct PlanState *pe_curr_node; 
+   struct Instrumentation *pe_local_instr;
+   dsa_area   *pe_a;

Could you add some comments saying what each of these variables are for?

I didn't experiment but I was wondering if there is a way to avoid the
duplicates that you added to avoid the overhead.


--
Euler Taveira

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Index AM API cleanup
Next
From: Yura Sokolov
Date:
Subject: Re: Implement waiting for wal lsn replay: reloaded