Re: Proposal: Progressive explain - Mailing list pgsql-hackers
From | torikoshia |
---|---|
Subject | Re: Proposal: Progressive explain |
Date | |
Msg-id | 78cff2165abb881244376988893a87b8@oss.nttdata.com Whole thread Raw |
In response to | Re: Proposal: Progressive explain (Rafael Thofehrn Castro <rafaelthca@gmail.com>) |
Responses |
Re: Proposal: Progressive explain
|
List | pgsql-hackers |
On 2025-04-01 15:23, Rafael Thofehrn Castro wrote: > Hello again, > >> ERROR: could not attach to dynamic shared area > > In addition to that refactoring issue, the current patch had a race > condition in pg_stat_progress_explain to access the DSA of a process > running a query that gets aborted. > > While discussing with Robert we agreed that it would be wiser to take > a step back and change the strategy used to share progressive explain > data in shared memory. > > Instead of using per backend's DSAs shared via a hash structure I now > define a dsa_pointer and a LWLock in each backend's PGPROC. > > A global DSA is created by the first backend that attempts to use > the progressive explain feature. After the DSA is created, subsequent > uses of the feature will just allocate memory there and reference > via PGPROC's dsa_pointer. > > This solves the race condition reported by Torikoshi and improves > concurrency performance as now we don't have a global LWLock > controlling shared memory access, but a per-backend LWLock. > > Performed the same tests done by Torikoshi and it looks like we are > good now. Even with more frequent inspects in pg_stat_progress_explain > (\watch 0.01). Thanks for updating the patch! Have you tested enabling progressive_explain? When I ran the 'make installcheck' test again setting progressive_explain to on, there was the same assertion failure: TRAP: failed Assert("param->paramkind == PARAM_EXTERN"), File: "ruleutils.c", Line: 8802, PID: 116832 postgres: parallel worker for PID 116822 (ExceptionalCondition+0x98)[0xb7311ea8bf80] postgres: parallel worker for PID 116822 (+0x89de6c)[0xb7311e9ede6c] postgres: parallel worker for PID 116822 (+0x89eb68)[0xb7311e9eeb68] postgres: parallel worker for PID 116822 (+0x89e78c)[0xb7311e9ee78c] postgres: parallel worker for PID 116822 (+0x8a1d10)[0xb7311e9f1d10] postgres: parallel worker for PID 116822 (+0x89ed80)[0xb7311e9eed80] postgres: parallel worker for PID 116822 (+0x89e78c)[0xb7311e9ee78c] postgres: parallel worker for PID 116822 (+0x89f174)[0xb7311e9ef174] postgres: parallel worker for PID 116822 (+0x89e78c)[0xb7311e9ee78c] postgres: parallel worker for PID 116822 (+0x89f0b8)[0xb7311e9ef0b8] postgres: parallel worker for PID 116822 (+0x8928dc)[0xb7311e9e28dc] postgres: parallel worker for PID 116822 (deparse_expression+0x34)[0xb7311e9e2834] postgres: parallel worker for PID 116822 (+0x347870)[0xb7311e497870] postgres: parallel worker for PID 116822 (+0x3478e4)[0xb7311e4978e4] postgres: parallel worker for PID 116822 (+0x347970)[0xb7311e497970] ... TRAP: failed Assert("param->paramkind == PARAM_EXTERN"), File: "ruleutils.c", Line: 8802, PID: 116831 [115650] LOG: 00000: background worker "parallel worker" (PID 116831) was terminated by signal 6: Aborted [115650] DETAIL: Failed process was running: explain (analyze, costs off, summary off, timing off, buffers off) select count(*) from ab where (a = (select 1) or a = (select 3)) and b = 2 [115650] LOCATION: LogChildExit, postmaster.c:2846 We can reproduce it as follows: show progressive_explain; progressive_explain --------------------- on create table ab (a int not null, b int not null) partition by list (a); create table ab_a2 partition of ab for values in(2) partition by list (b); create table ab_a2_b1 partition of ab_a2 for values in (1); create table ab_a2_b2 partition of ab_a2 for values in (2); create table ab_a2_b3 partition of ab_a2 for values in (3); create table ab_a1 partition of ab for values in(1) partition by list (b); create table ab_a1_b1 partition of ab_a1 for values in (1); create table ab_a1_b2 partition of ab_a1 for values in (2); create table ab_a1_b3 partition of ab_a1 for values in (3); create table ab_a3 partition of ab for values in(3) partition by list (b); create table ab_a3_b1 partition of ab_a3 for values in (1); create table ab_a3_b2 partition of ab_a3 for values in (2); create table ab_a3_b3 partition of ab_a3 for values in (3); set parallel_setup_cost = 0; set parallel_tuple_cost = 0; set min_parallel_table_scan_size = 0; set max_parallel_workers_per_gather = 2; explain (analyze, costs off, summary off, timing off, buffers off) select count(*) from ab where (a = (select 1) or a = (select 3)) and b = 2; WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. Note that there is no need to access pg_stat_progress_explain. Could you please check if you can reproduce this? -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
pgsql-hackers by date: