Home > mailing lists

Re: pglz compression performance, take two - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: pglz compression performance, take two
Date	February 6, 2023 19:57:04
Msg-id	0ecaec9d-861b-c74f-846b-b64d929507ef@enterprisedb.com Whole thread Raw
In response to	Re: pglz compression performance, take two (Andrey Borodin <amborodin86@gmail.com>)
Responses	Re: pglz compression performance, take two
List	pgsql-hackers

Tree view


On 2/6/23 03:00, Andrey Borodin wrote:
> On Sun, Feb 5, 2023 at 5:51 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 2/5/23 19:36, Andrey Borodin wrote:
>>> On Fri, Jan 6, 2023 at 10:02 PM Andrey Borodin <amborodin86@gmail.com> wrote:
>>>>
>>>> Hello! Please find attached v8.
>>>
>>> I got some interesting feedback from some patch users.
>>> There was an oversight that frequently yielded results that are 1,2 or
>>> 3 bytes longer than expected.
>>> Looking closer I found that the correctness of the last 3-byte tail is
>>> checked in two places. PFA fix for this. Previously compressed data
>>> was correct, however in some cases few bytes longer than the result of
>>> current pglz implementation.
>>>
>>
>> Thanks. What were the consequences of the issue? Lower compression
>> ratio, or did we then fail to decompress the data (or would current pglz
>> implementation fail to decompress it)?
>>
> The data was decompressed fine. But extension tests (Citus's columnar
> engine) hard-coded a lot of compression ratio stuff.

OK. Not sure I'd blame the patch for these failures, as long as long as
the result is still correct and can be decompressed. I'm not aware of a
specification of what the compression must (not) produce.

> And there is still 1 more test where optimized version produces 1 byte
> longer output. I'm trying to find it, but with no success yet.
> 
> There are known and documented cases when optimized pglz version would
> do so. good_match without 10-division and memcmp by 4 bytes. But even
> disabling this, still observing 1-byte longer compression results
> persists... The problem is the length is changed after deleting some
> data, so compression of that particular sequence seems to be somewhere
> far away.
> It was funny at the beginning - to hunt for 1 byte. But the weekend is
> ending, and it seems that byte slipped from me again...
> 

I wonder what that means for the patch. I haven't investigated this at
all, but it seems as if the optimization means we fail to find a match,
producing a tad larger output. That may be still be a good tradeoff, as
long as the output is correct (assuming it doesn't break some promise
regarding expected output).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Robert Haas
Date: 06 February 2023, 19:40:18
Subject: Re: Non-superuser subscription owners

From: Matthias van de Meent
Date: 06 February 2023, 20:11:45
Subject: Re: Pluggable toaster

Re: pglz compression performance, take two - Mailing list pgsql-hackers

Previous

Next