Home > mailing lists

Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers

From	Srinath Reddy Sadipiralla
Subject	Re: Adding REPACK [concurrently]
Date	February 25 16:55:42
Msg-id	CAFC+b6o2yzA80YmfEhmMO9puN8qvGRvr-15BBLn3UmJxPfpr2w@mail.gmail.com Whole thread Raw
In response to	Re: Adding REPACK [concurrently] (Antonin Houska <ah@cybertec.at>)
Responses	Re: Adding REPACK [concurrently]
List	pgsql-hackers

Tree view

Hello,

I did stress testing on v35 patches, where I did concurrency test using
pgbench with 50 concurrent clients, 4 threads with the below pgbench
script (dual_chaos.sql) on the following table setup(setup.sql).
I ran pgbench with 5M rows for 10 minutes and 50M for ~45 minutes
multiple times. REPACK (concurrently) ran successfully except "once"(see below).
I created a shadow/clone table to use for checking the correctness after doing

the concurrency test.I used 4 checks to verify that data is intact and
REPACK (concurrently) ran successfully.

1) table file OID(relfilenode) swapped?
2) bloat gone? victim relation size should be less than
shadow relation size.
3) using FULL JOIN logic (borrowed from repack.spec, with small change)

against the shadow table which goes under the same concurrent ops
done on the victim table , basically doing dual writes (see dual_chaos.sql) to
verify table data integrity.
4) Physical Index Integrity (amcheck) (borrowed from Mihail's tests)

The concurrency test failed once. I tried to reproduce the below scenario
but no luck,i think the reason the assert failure happened because
after speculative insert there might be no spec CONFIRM or ABORT, thoughts?

TRAP: failed Assert("!specinsert"), File: "reorderbuffer.c", Line: 2610, PID: 3956168
postgres: REPACK decoding worker for relation "stress_victim" (ExceptionalCondition+0x98)[0xaaaab1251188]
postgres: REPACK decoding worker for relation "stress_victim" (+0x67b1cc)[0xaaaab0f4b1cc]
postgres: REPACK decoding worker for relation "stress_victim" (+0x67b86c)[0xaaaab0f4b86c]
postgres: REPACK decoding worker for relation "stress_victim" (ReorderBufferCommit+0x74)[0xaaaab0f4b8f0]
postgres: REPACK decoding worker for relation "stress_victim" (+0x66229c)[0xaaaab0f3229c]
postgres: REPACK decoding worker for relation "stress_victim" (xact_decode+0x1a0)[0xaaaab0f312bc]
postgres: REPACK decoding worker for relation "stress_victim" (LogicalDecodingProcessRecord+0xd4)[0xaaaab0f30e60]
postgres: REPACK decoding worker for relation "stress_victim" (+0x3372e4)[0xaaaab0c072e4]
postgres: REPACK decoding worker for relation "stress_victim" (+0x339634)[0xaaaab0c09634]
postgres: REPACK decoding worker for relation "stress_victim" (RepackWorkerMain+0x1ac)[0xaaaab0c094e8]
postgres: REPACK decoding worker for relation "stress_victim" (BackgroundWorkerMain+0x2b0)[0xaaaab0efc440]
postgres: REPACK decoding worker for relation "stress_victim" (postmaster_child_launch+0x1f0)[0xaaaab0f00398]
postgres: REPACK decoding worker for relation "stress_victim" (+0x639ca4)[0xaaaab0f09ca4]
postgres: REPACK decoding worker for relation "stress_victim" (+0x639f94)[0xaaaab0f09f94]
postgres: REPACK decoding worker for relation "stress_victim" (+0x638714)[0xaaaab0f08714]
postgres: REPACK decoding worker for relation "stress_victim" (+0x635978)[0xaaaab0f05978]
postgres: REPACK decoding worker for relation "stress_victim" (PostmasterMain+0x160c)[0xaaaab0f050c8]
postgres: REPACK decoding worker for relation "stress_victim" (main+0x3dc)[0xaaaab0d974d4]
/lib/aarch64-linux-gnu/libc.so.6(+0x284c4)[0xffff867584c4]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0xffff86758598]
postgres: REPACK decoding worker for relation "stress_victim" (_start+0x30)[0xaaaab09bc1f0]
2026-02-19 18:20:56.088 IST [3905812] LOG: checkpoint starting: wal
2026-02-19 18:21:10.683 IST [3905808] LOG: background worker "REPACK decoding worker" (PID 3956168) was terminated by signal 6: Aborted

Crash Test:
i did crash test using debugger using a breakpoint inside apply_concurrent_changes
to simulate a crash while concurrent changes are being done, after few concurrent changes
are done , i crashed the server using "pg_ctl -m immediate stop", then restarted the server,
i observed that REPACK (concurrently) didn't completed (expected), files were not swapped and data
on the victim table is intact checked using FULL JOIN with shadow table, but there are
some leftovers of the transient table we used for REPACK (concurrently) such as
1) transient table's relation files - these consume extra space , i think this was the
case with VACUUM FULL previously, so these has to be removed manually , but

I think this time we have a "leverage" which we can use to remove the extra space.
2) transient table's WALs - these are generated because of concurrent changes done while
applying the logical decoded changes on the new transient table, i think this won't be a problem
until they only will get recycled but if they get archived , they are of no use instead they
consume more space and time during the archival process.

"Leverage" Idea:
i think we can re-use these transient table's relation files and WALs during crash recovery,
so that user don't have to re-run the REPACK (concurrently) after server has recovered,
for this we might need to write a WAL for REPACK (concurrently) to let startup process
know REPACK (concurrently) occurred which sets a flag, so at the end of startup process
all the WALs of the transient table are already applied so transient table perfect now ,
at the end we can do swapping (finish_heap_swap) after checking the flag , these are
all my initial thoughts on this idea to reuse the "residue" files of the transient table.
I could be totally wrong :) Please correct me if I am.

i think we need to update this statement in repack.sgml regarding wal_level
<listitem>
<para>
The <link linkend="guc-wal-level"><varname>wal_level</varname></link>
configuration parameter is less than <literal>logical</literal>.
</para>
</listitem>
because of this commit POC: enable logical decoding when wal_level = 'replica' without a server restart (67c2097)

Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

Attachment

pgsql-hackers by date:

From: Peter Eisentraut
Date: 25 February, 16:54:56
Subject: Re: Regression failures after changing PostgreSQL blocksize

From: Nazir Bilal Yavuz
Date: 25 February, 17:24:27
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD

Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers

Attachment

Previous

Next