On Fri, May 8, 2026 at 2:00 PM Alvaro Herrera <
alvherre@alvh.no-ip.org> wrote:
>
> Hello James,
>
> On 2026-May-08, James Locke wrote:
>
> > Attached is a POC to enable userland table compaction: A top-level COMPACT
> > command that performs the relocation directly in the server, with a
> > stripped-down heap_relocate primitive instead of full UPDATE, and a
> > built-in prune-and-truncate pass so it runs to a useful end state in one
> > command.
>
> How does this implementation handle the case of a seqscan in the middle
> of scanning the table, which has already skipped the destination page
> and not yet the page from where the table is to be removed? There needs
> to be a way to distinguish which of these to show (it must be exactly
> one), and you didn't mention this in your description.
It's the same invariant a cross-page UPDATE relies on, and heap_relocate inherits it because the on-disk and WAL record are identical to a regular update.
heap_relocate sets the source's xmax and the new tuple's xmin to the same xid (the relocator's), and both writes go through one log_heap_update AL record. So when HeapTupleSatisfiesMVCC asks "is this visible" for either tuple, it ends up asking the same XidInMVCCSnapshot(R, snap) question against the eqscan's snapshot; once for the destination's xmin and once for the source's xmax. Same xid, same answer.
seqscan reads block 5 first and sees no live tuple there, either because the relocation hasn't happened yet, or it has but R is still in the snapshot's xip list so xmin reads as in-progress. Then COMPACT commits cluster-wide. Seqscan reaches block 200 still using the snapshot it took at scan start, which treats R the same way it did at block 5; snapshots don't change mid-scan. So either both pages treated R as committed (block 5 returned the row already, block 200 now sees the source as dead) or both treated it as running (block 5 saw nothing, block 200 returns the source). Exactly one.
The page-level atomicity comes from log_heap_update registering both buffers in one record and the modifications happening inside one RIT_SECTION with exclusive content locks on both pages; concurrent share-locking readers can't see half-applied state.
James