Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers
| From | Robert Treat |
|---|---|
| Subject | Re: Adding REPACK [concurrently] |
| Date | |
| Msg-id | CAJSLCQ2R9uUfP-1kdCBvHYhU_iuKjVpCByViZQ+Qnwan4nDU3w@mail.gmail.com Whole thread Raw |
| In response to | Re: Adding REPACK [concurrently] (Andres Freund <andres@anarazel.de>) |
| Responses |
Re: Adding REPACK [concurrently]
|
| List | pgsql-hackers |
On Thu, Apr 9, 2026 at 10:20 AM Andres Freund <andres@anarazel.de> wrote: > On 2026-04-09 10:43:14 +0200, Antonin Houska wrote: > > What Andres proposed (AFAIU) should help to avoid this problem because > > REPACK's request for AEL would get in front of the VACUUM's request for SUEL > > in the queue. > > Note that that already happens today. > > This works today (without the error triggering patch): > > S1: REPACK starts > S2: LOCK TABLE / VACUUM / ... starts waiting > S1: REPACK tries to get AEL > S1: REPACK's lock requests get reordered in the wait queue to be before S2 and > just gets the lock > S1: REPACK finishes > S2: lock acquisition completes. > > That's because we do already have this "jumping the wait queue" logic, which I > had forgotten about. > You know, I was wondering how this wasn't already a problem for pg_repack/pg_squeeze, and I guess this explains it :-P > > What does *not* work is this: > > S1: REPACK starts > S2: BEGIN; SELECT 1 FROM table LIMIT 1; > S2: LOCK TABLE / VACUUM / ... starts waiting > S1: REPACK tries to get AEL > S1: lock is not granted, can't be reordered to be before S2, because S2 holds > conflicting lock, deadlock detector triggers > S2: lock acquisition completes > > But with my proposal to properly teach the deadlock detector about assuming > there's a wait edge for the eventual lock upgrade by S1, the first example > would still work, because the lock upgrade would not be considered a hard > cycle, and the second example would have S2 error out. > In the above S2 will error out if you try to run a VACUUM, but the point still stands that calling an explicit LOCK or similar could lead to this issue. In the current repack world, we document the need for lock escalation at the end of the repacking and caution that doing things like DDL or explicit LOCKing could cause trouble, so don't do that. What you're proposing above would be an improvement though, IMHO. > > > Anti-wraparound (failsafe) VACUUM is a bit different case [1] (i.e. it should > > possibly have higher priority than REPACK), but I think this prioritization > > should be implemented in other way than just letting it get in the way of > > REPACK (at the time REPACK is nearly finished). > > Yea, it makes no sense to interrupt the long running repack, given that the > new relation will have much less stuff for vacuum to do. > We might be talking about 2 different scenarios. In the case where we are at the point of lock escalation, you would probably want the repack to get priority over a waiting vacuum, even a failsafe vacuum. But outside of that scenario, we can't know that the repack is the better option (and statistically it probably isn't) since a repack that is actively copying rows might still need to rebuild some large number of indexes (or just some really expensive index) which could take significantly longer than a failsafe vacuum would need to ensure wraparound avoidance. I don't think we'd go as far as saying the failsafe vacuum should cancel the repack, but I think ideally we'd like it to not be canceled either, since that would increase likelihood for dba/monitoring to pick up on the situation, and in the case that REPACK fails for some reason, the failsafe vacuum could immediately start working without having to go through any additional hoops. Robert Treat https://xzilla.net
pgsql-hackers by date: