Re: Compress prune/freeze records with Delta Frame of Reference algorithm - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Compress prune/freeze records with Delta Frame of Reference algorithm
Date
Msg-id 5a2f3df2-a736-4ada-8aa3-aa6e20b2e067@vondra.me
Whole thread Raw
In response to Re: Compress prune/freeze records with Delta Frame of Reference algorithm  (Evgeny Voropaev <evgeny.voropaev@tantorlabs.com>)
List pgsql-hackers
On 3/24/26 15:28, Evgeny Voropaev wrote:
> Hello Andres,
> 
>> I'm unconvinced that this is a serious problem - typically the
>> overhead of WAL
>> volume due to pruning / freezing is due to the full page images
>> emitted, not
>> the raw size of the records. Once an FPI is emitted, this doesn't matter.
>>
>> What gains have you measured in somewhat realistic workloads?
> 
> So far, we have had no tests in any real production environment.
> Moreover, the load in the new test (recovery/
> t/052_prune_dfor_compression.pl) is quite contrived. However, it
> demonstrates a compression ratio of more than 5, and it is measured for
> an overall size of all prune/freeze records with no filtering.
> 
> Further development is the implementation of compression of unsorted
> sequences. This is going to allow PostgreSQL to compress also the
> 'frozen' and the 'redirected' offset sequences, which should result in a
> greater compression ratio.
> 
> But I agree with you, Andres, we need practical results to estimate a
> profit. I wish we would test it on some real load soon.
> 
> Also I hope, independently of its usage in prune/freeze records, the
> DFoR itself might be used for compression sequences in other places of PG.
> 

IMHO Andres is right. A ~170kB patch really should present some numbers
quantifying the expected benefit. It doesn't need to be a real workload
from production, but something plausible enough. Even some basic
back-of-the-envelope calculations might be enough to show the promise.

Without this, the cost/benefit is so unclear most senior contributors
will probably review something else. You need to make the case why this
is worth it.

I only quickly skimmed the patches, for exactly this reason. I'm a bit
confused why this needs to add the whole libtap thing in 0001, instead
of just testing this through the SQL interface (same as test_aio etc.).

Also, I find it somewhat unlikely we'd import a GPLv3 library like this,
even if it's just a testing framework. Even ignoring the question of
having a different license for some of the code, it'd mean maintenance
burden (maybe libtap is stable/mature, no idea). I don't see why this
would be better than "write a SQL callable test module".


regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Henson Choi
Date:
Subject: Re: Row pattern recognition
Next
From: Daniil Davydov
Date:
Subject: Get rid of redundant StringInfo accumulation