Re: why there is not VACUUM FULL CONCURRENTLY? - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: why there is not VACUUM FULL CONCURRENTLY?
Date
Msg-id 178741.1743514291@localhost
Whole thread Raw
In response to Re: why there is not VACUUM FULL CONCURRENTLY?  (Antonin Houska <ah@cybertec.at>)
Responses Re: why there is not VACUUM FULL CONCURRENTLY?
Re: why there is not VACUUM FULL CONCURRENTLY?
Re: why there is not VACUUM FULL CONCURRENTLY?
List pgsql-hackers
Antonin Houska <ah@cybertec.at> wrote:

> Antonin Houska <ah@cybertec.at> wrote:
> 
> > Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > 
> > > On 2025-Mar-22, Antonin Houska wrote:
> > > 
> > > > Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > > > 
> > > > > I rebased this patch series; here's v09.  No substantive changes from v08.
> > > > > I made sure the tree still compiles after each commit.
> > > 
> > > I rebased again, fixing a compiler warning reported by CI and applying
> > > pgindent to each individual patch.  I'm slowly starting to become more
> > > familiar with the whole of this new code.
> > 
> > I'm trying to reflect Robert's suggestions about locking [1]. The next version
> > should be a bit simpler, so maybe wait for it before you continue studying the
> > code.
> 
> This is it.

One more version, hopefully to make cfbot happy (I missed the bug because I
did not set the RELCACHE_FORCE_RELEASE macro in my environment.)

Besides that, it occurred to me that 0005 ("Preserve visibility information of
the concurrent data changes.") will probably introduce significant
overhead. The problem is that the table we're repacking is treated like a
catalog, for reorderbuffer.c to generate snapshots that we need to replay
UPDATE / DELETE commands on the new table.

contrib/test_decoding can be used to demonstrate the difference between
ordinary and catalog tables:

CREATE TABLE t(i int);
SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');
INSERT INTO t SELECT n FROM generate_series(1, 1000000) g(n);
DELETE FROM t;
EXPLAIN ANALYZE SELECT * FROM pg_logical_slot_get_binary_changes('test_slot', null, null);
...
Execution Time: 3521.190 ms

ALTER TABLE t SET (user_catalog_table = true);

INSERT INTO t SELECT n FROM generate_series(1, 1000000) g(n);
DELETE FROM t;
EXPLAIN ANALYZE SELECT * FROM pg_logical_slot_get_binary_changes('test_slot', null, null);
...
Execution Time: 6561.634 ms

I wanted to avoid the "MVCC unsafety" [1], so that both REPACK and REPACK
CONCURRENTLY both work "cleanly". We can try to optimize the logical decoding
for REPACK CONCURRENTLY, or implement 0005 in a different way, but not sure
how much effort that would require. Or implement REPACK CONCURRENTLY as
MVCC-unsafe for now? (The pg_squeeze extension also is not MVCC-safe.)


[1] https://www.postgresql.org/docs/17/mvcc-caveats.html

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Attachment

pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Next
From: Robert Haas
Date:
Subject: Re: Statistics Import and Export