Home > mailing lists

Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers

From	David Klika
Subject	Re: Adding REPACK [concurrently]
Date	December 8 12:13:20
Msg-id	0ac6540e-c9f7-4918-913b-21288f6436cf@atlas.cz Whole thread Raw
In response to	Re: Adding REPACK [concurrently] (Álvaro Herrera <alvherre@alvh.no-ip.org>)
List	pgsql-hackers

Tree view

Hello Alvaro

Thank you for the detailed analysis.

Dne 04.12.2025 v 16:43 Álvaro Herrera napsal(a):
> Hello David,
>
> Thanks for your interest in this.
>
> On 2025-Dec-04, David Klika wrote:
>
>> Let's consider a large table where 80% blocks are fine (filled enough by
>> live tuples). The table could be scanned from the beginning (left side) to
>> identify "not enough filled" blocks and also from the end (right side) to
>> process live tuples by moving them to the blocks identified by the left side
>> scan. The work is over when both scan reaches the same position.
> If you only have a small number of pages that have this problem, then
> you don't actually need to do anything -- the pages will be marked free
> by regular vacuuming, and future inserts or updates can make use of
> those pages.  It's not a problem to have a small number of pages in
> empty state for some time.
>
> So if you're trying to do this, the number of problematic pages must be
> large.

I agree, I had in mind about 20-40% of the table that could have tenths 
of GB.

> Now, the issue with what you propose is that you need to make either the
> old tuples or the new tuples visible to concurrent transactions.  If at
> any point they are both visible, or none of them is visible, then you
> have potentially corrupted the results that would be obtained by a query
> that's scanning the table and halfway through.

When performing a tuple movement from a (right) page to a (left) page, 
both of pages must be hold in shared buffers. I suppose the other 
processes scanning the table also access the table data through the 
shared buffers so the movement could be handled at this level. If the 
tuple movement does not change its xid, it wouldn't even have to be in 
conflict with other transactions that locked/modified the tuple (in 
buffer cache again, just changing the physical location). Looks like 
something dirty...

> The other point is that you need to keep indexes updated.  That is, you
> need to make the indexes point to both the old and new, until you remove
> the old tuples from the table, then remove those index pointers.
> This process bloats the indexes, which is not insignificant, considering
> that the number of tuples to process is large.  If there are several
> indexes, this makes your process take even longer.
>
> You can fix the concurrency problem by holding a lock on the table that
> ensures nobody is reading the table until you've finished.  But we don't
> want to have to hold such a lock for long!  And we already established
> that the number of pages to check is large, which means you're going to
> work for a long time.
> So, I'm not really sure that it's practical to implement what you
> suggest.

I agree. Proposed tuple shuffle might work better compared to the 
current VACUUM FULL (i.e. blocking non-clustered maintenance) but I 
understand that you prefer an universal method of data files maintenance 
(the concurrent variant will be amazing).

Regards David

pgsql-hackers by date:

From: Victor Yegorov
Date: 08 December, 12:13:13
Subject: Re: Moving _bt_readpage and _bt_checkkeys into a new .c file

From: Chao Li
Date: 08 December, 12:18:35
Subject: Re: Why cannot alter a column's type when it's used by a generated column

Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers

Previous

Next