Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers

From Srinath Reddy Sadipiralla
Subject Re: Adding REPACK [concurrently]
Date
Msg-id CAFC+b6of6_poBQ6EgK8N49VQwAUYX=uLkHiU-TP7y+DamBD5TQ@mail.gmail.com
Whole thread
In response to Re: Adding REPACK [concurrently]  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: Adding REPACK [concurrently]
Re: Adding REPACK [concurrently]
List pgsql-hackers
Hi Alvaro,

On Thu, Mar 26, 2026 at 1:42 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

As for lock upgrade, I wonder if the best way to handle this isn't to
hack the deadlock detector so that it causes any *other* process to die,
if they detect that they would block on REPACK.  Arguably there's
nothing that you can do to a table while its undergoing REPACK
CONCURRENTLY; any alterations would have to wait until the repacking is
compelted.  We can implement that idea simply enough, as shown in this
crude prototype. 

After testing this, I observed that it solves the scenario where a query is waiting
on REPACK. For example, if a DROP TABLE requests an AEL and queues
behind REPACK's ShareUpdateExclusiveLock, the deadlock detector comes
when REPACK tries to upgrade to AEL, killing the DROP to prevent the circular
queue deadlock, But the case I originally mentioned [1] was the reverse: what
happens if a transaction already holds a lock that conflicts with the upcoming
AEL upgrade (e.g., an analytical SELECT or an idle-in-transaction holding an AccessShareLock),
but isn't waiting on REPACK at all?

In this case, there's no circular wait. The deadlock detector never fires. REPACK
simply queues behind the SELECT, eventually hits its lock_timeout, aborts and
cleans up.Initially, I thought this cleanup was expected behavior. But after seeing
your solution to protect REPACK from losing its transient table work, I thought it's "not expected".
If the goal is to prevent REPACK's work from being wasted, should we error out
the backend that is making REPACK wait during the final swap phase? I am thinking
of something conceptually similar to ResolveRecoveryConflictWithLock,actively
cancelling the conflicting session to allow the AEL upgrade to proceed. Thoughts?



test scenario:

session 1:
postgres=# repack (concurrently) stress_victim;
had a breakpoint rebuild_relation_finish_concurrent->
LockRelationOid(old_table_oid, AccessExclusiveLock); just before getting
the exclusive lock.
with lock_timeout = 5s



session 2:
postgres=# BEGIN;
SELECT * FROM stress_victim LIMIT 1;
-- left it open
BEGIN
 id  | balance |
           payload
-----+---------+---------------------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
--------------
 170 |      65 | d12f400c4d0d3c49818f88597e16cf29
d12f400c4d0d3c49818f88597e16cf29d12f400c4d0d3c498
18f88597e16cf29d12f400c4d0d3c49818f88597e16cf29d1
2f400c4d0d3c49818f88597e16cf29d12f400c4d0d3c49818
f88597e16cf29
(1 row)
-- this gets us a conflicting lock (AccessShareLock) on the same table,
REPACK (concurrently) is running on.



session 1:
release the breakpoint and now the backend waits for the conflicting lock
to be released.
in between if lock_timeout occurs then transaction aborts.
postgres=# repack (concurrently) stress_victim;
ERROR:  canceling statement due to lock timeout
CONTEXT:  waiting for AccessExclusiveLock on relation 16637 of database 5


--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Exit walsender before confirming remote flush in logical replication
Next
From: Peter Geoghegan
Date:
Subject: Re: index prefetching