Tomas, Andreus, Andrey, hello!
> A ~170kB patch really should present some numbers
> quantifying the expected benefit. It doesn't need to be a real workload
> from production, but something plausible enough. Even some basic
> back-of-the-envelope calculations might be enough to show the promise.
The patch results in reduction of WAL total size by:
81% during vacuuming a table having no index,
and by 55% during vacuuming a table having an index.
The numbers are the next:
=== VACUUM (table with no index) ===
-------------------- ----------------- ----------------- -----------
DFOR off, bytes DFOR on, bytes Reduction
-------------------- ----------------- ----------------- -----------
WAL total size 6743149 1184446 82%
Prune records size 6710185 1159723 5.8x
-------------------- ----------------- ----------------- -----------
=== VACUUM (table with index) ===
-------------------- ----------------- ----------------- -----------
DFOR off, bytes DFOR on, bytes Reduction
-------------------- ----------------- ----------------- -----------
WAL total size 20394208 8907090 56%
Prune records size 6812850 1225944 5.6x
-------------------- ----------------- ----------------- -----------
The logic of the tests is based on the technique from [1] and is the
next:
-- SQL
CREATE TABLE t_prune ( id int, val text )
WITH (fillfactor = 100, autovacuum_enabled = false);
INSERT INTO t_prune
SELECT g, 'x' FROM generate_series(1,3000000) g;
CREATE INDEX ON t_prune(id); -- for the test using an indexed table
DELETE FROM t_prune WHERE id % 500 <> 0;
SELECT pg_current_wal_flush_lsn(); -- get start_lsn here
VACUUM FREEZE t_prune; -- 3 times
SELECT pg_current_wal_flush_lsn(); -- get end_lsn here
# BASH
# stop cluster
pg_waldump -p $wal_dir -s $start_lsn -e $end_lsn 2>/dev/null;
The test is implemented in 052_prune_dfor_compression.pl, therefore
the presented results can be refetched by restarting this test script.
> Also, I find it somewhat unlikely we'd import a GPLv3 library like
> this, even if it's just a testing framework. Even ignoring the
> question of having a different license for some of the code, it'd mean
> maintenance burden (maybe libtap is stable/mature, no idea). I don't
> see why this would be better than "write a SQL callable test module".
I am ready to rework it once there is consensus on the core of the
patch.
Best regards,
Evgeny.
P.s. rebased onto a1643d40b30.
[1]
https://www.postgresql.org/message-id/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com