Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers

From Srinath Reddy Sadipiralla
Subject Re: Adding REPACK [concurrently]
Date
Msg-id CAFC+b6pK9ogeSpMA8hg18XhC1eNPcsKWBwoC5OySXi4iTxwtRw@mail.gmail.com
Whole thread Raw
In response to Re: Adding REPACK [concurrently]  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: Adding REPACK [concurrently]
List pgsql-hackers
Hallo Alvaro,

On Wed, Mar 25, 2026 at 4:02 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Many thanks for the review.  I have applied fixes for these, so here's
v44. 

Thanks for the patches.

- 0004 is Antonin's bugfix from the crash reported by Srinath.

I think it's "0004 is Srinath's bugfix from the crash reported by Srinath." ;-)
after i provided the analysis and fix for the crash[1], Antonin tried to reproduce
this crash using isolation tester , for this he even proposed changes to
isolation tester (so cool ... btw i reviewed it) [2] .

i have done another round of stress testing on V43 , this time with more tests,
as i did previously [3] did concurrency test - went well,

crash test: 
after crash i observed that repack worker files are cleaned during server restart
using RemovePgTempFiles but the transient table relation files remains
not cleaned up, maybe we can do cleanup for this as well during server restart,
I will think about this more.

physical replication test where I did REPACK (concurrently) on primary and
checked if data is intact using the 4 verifications I did here [3] on replica - went well.

Then as suggested by Alvaro off-list I checked the lock upgrade behavior
during the table swap phase. I observed that if another transaction holds a
conflicting lock on the table when the swap is attempted, it can lead to
“transient table” data loss during a manual or timeout abort.
when a REPACK (concurrent) waits for a conflicting lock to be released and eventually hits a
lock_timeout (or is cancelled via ctrl+c), the transaction aborts. During this abort,
the cleanup process triggers smgrDoPendingDeletes. This results in the removal
of all transient table relfiles and decoder worker files created during the process.
This effectively wipes out the work done by the transient table creation before
the swap could successfully complete, this happens because during transient
table creation we add the table to the PendingRelDelete list.


rebuild_relation→make_new_heap->heap_create_with_catalog→heap_create→table_relation_set_new_filelocator→RelationCreateStorage
/*
* Add the relation to the list of stuff to delete at abort, if we are
* asked to do so.
*/
if (register_delete)
{
PendingRelDelete *pending;
pending = (PendingRelDelete *)
MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
pending->rlocator = rlocator;
pending->procNumber = procNumber;
pending->atCommit = false; /* delete if abort */
pending->nestLevel = GetCurrentTransactionNestLevel();
pending->next = pendingDeletes;
pendingDeletes = pending;
}
and the base/5/pgsql_tmp/ files also gets unlinked during the decoding worker cleanup,
I think this cleanup of transient table relfiles and decoder files makes sense because
we don’t have any resume operation in which we can re-use the transient table’s files,
please correct me if I am not getting your point here.

test scenario:

session 1:
postgres=# repack (concurrently) stress_victim;
had a breakpoint rebuild_relation_finish_concurrent-> LockRelationOid(old_table_oid, AccessExclusiveLock); just before getting the exclusive lock.
with lock_timeout = 5s

session 2:
postgres=# BEGIN;
SELECT * FROM stress_victim LIMIT 1;
-- left it open
BEGIN
 id  | balance |                                
           payload                              
-----+---------+---------------------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
--------------
 170 |      65 | d12f400c4d0d3c49818f88597e16cf29
d12f400c4d0d3c49818f88597e16cf29d12f400c4d0d3c498
18f88597e16cf29d12f400c4d0d3c49818f88597e16cf29d1
2f400c4d0d3c49818f88597e16cf29d12f400c4d0d3c49818
f88597e16cf29
(1 row)
-- this gets us a conflicting lock (AccessShareLock) on the same table, REPACK (concurrently) is running on.

session 1:
release the breakpoint and now the backend waits for the conflicting lock to be released.
in between if lock_timeout occurs then transaction aborts.
postgres=# repack (concurrently) stress_victim;
ERROR:  canceling statement due to lock timeout
CONTEXT:  waiting for AccessExclusiveLock on relation 16637 of database 5

Now we can see the transient table relfiles and decoder worker files getting cleaned up.

[1] - https://www.postgresql.org/message-id/CAFC%2Bb6qk3-DQTi43QMqvVLP%2BsudPV4vsLQm5iHfcCeObrNaVyA%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/flat/4703.1774250534%40localhost
[3] - https://www.postgresql.org/message-id/CAFC%2Bb6o2yzA80YmfEhmMO9puN8qvGRvr-15BBLn3UmJxPfpr2w%40mail.gmail.com

--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [Patch] add new parameter to pg_replication_origin_session_setup
Next
From: Amit Kapila
Date:
Subject: Re: [Patch] add new parameter to pg_replication_origin_session_setup