Re: Compress prune/freeze records with Delta Frame of Reference algorithm - Mailing list pgsql-hackers

From Evgeny Voropaev
Subject Re: Compress prune/freeze records with Delta Frame of Reference algorithm
Date
Msg-id 3fc9f40b-2c4f-49c7-bf1c-570c5b9e6e9e@tantorlabs.com
Whole thread Raw
In response to Re: Compress prune/freeze records with Delta Frame of Reference algorithm  (Tomas Vondra <tomas@vondra.me>)
Responses Re: Compress prune/freeze records with Delta Frame of Reference algorithm
Re: Compress prune/freeze records with Delta Frame of Reference algorithm
List pgsql-hackers
Tomas, Andreus, Andrey, hello!

 > A ~170kB patch really should present some numbers
 > quantifying the expected benefit. It doesn't need to be a real workload
 > from production, but something plausible enough. Even some basic
 > back-of-the-envelope calculations might be enough to show the promise.

The patch results in reduction of WAL total size by:
     81% during vacuuming a table having no index,
     and by 55% during vacuuming a table having an index.

The numbers are the next:

=== VACUUM (table with no index) ===
-------------------- ----------------- ----------------- -----------
                        DFOR off, bytes    DFOR on, bytes   Reduction
-------------------- ----------------- ----------------- -----------
WAL total size                 6743149           1184446         82%
Prune records size             6710185           1159723        5.8x
-------------------- ----------------- ----------------- -----------

=== VACUUM (table with index) ===
-------------------- ----------------- ----------------- -----------
                        DFOR off, bytes    DFOR on, bytes   Reduction
-------------------- ----------------- ----------------- -----------
WAL total size                20394208           8907090         56%
Prune records size             6812850           1225944        5.6x
-------------------- ----------------- ----------------- -----------

The logic of the tests is based on the technique from [1] and is the
next:

    -- SQL
    CREATE TABLE t_prune ( id int, val text )
        WITH (fillfactor = 100, autovacuum_enabled = false);

    INSERT INTO t_prune
        SELECT g, 'x' FROM generate_series(1,3000000) g;

    CREATE INDEX ON t_prune(id); -- for the test using an indexed table

    DELETE FROM t_prune WHERE id % 500 <> 0;
    SELECT pg_current_wal_flush_lsn(); -- get start_lsn here
    VACUUM FREEZE t_prune; -- 3 times
    SELECT pg_current_wal_flush_lsn(); -- get end_lsn here

    # BASH
    # stop cluster
    pg_waldump -p $wal_dir -s $start_lsn -e $end_lsn 2>/dev/null;

The test is implemented in 052_prune_dfor_compression.pl, therefore
the presented results can be refetched by restarting this test script.

 > Also, I find it somewhat unlikely we'd import a GPLv3 library like
 > this, even if it's just a testing framework. Even ignoring the
 > question of having a different license for some of the code, it'd mean
 > maintenance burden (maybe libtap is stable/mature, no idea). I don't
 > see why this would be better than "write a SQL callable test module".

I am ready to rework it once there is consensus on the core of the
patch.

Best regards,
Evgeny.

P.s. rebased onto a1643d40b30.

[1] 
https://www.postgresql.org/message-id/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: PG 19 release notes and authors
Next
From: Andrew Dunstan
Date:
Subject: Re: Buildfarm misses running some contrib TAP tests