Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Add LZ4 compression in pg_dump
Date
Msg-id 8d4c70d8-3212-45fe-ef8e-6b6bb511ace4@enterprisedb.com
Whole thread Raw
In response to Re: Add LZ4 compression in pg_dump  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
List pgsql-hackers

On 3/12/23 11:07, Peter Eisentraut wrote:
> On 11.03.23 07:00, Alexander Lakhin wrote:
>> Hello,
>> 23.02.2023 23:24, Tomas Vondra wrote:
>>> On 2/23/23 16:26, Tomas Vondra wrote:
>>>> Thanks for v30 with the updated commit messages. I've pushed 0001 after
>>>> fixing a comment typo and removing (I think) an unnecessary change
>>>> in an
>>>> error message.
>>>>
>>>> I'll give the buildfarm a bit of time before pushing 0002 and 0003.
>>>>
>>> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
>>> and marked the CF entry as committed. Thanks for the patch!
>>>
>>> I wonder how difficult would it be to add the zstd compression, so that
>>> we don't have the annoying "unsupported" cases.
>>
>> With the patch 0003 committed, a single warning -Wtype-limits appeared
>> in the
>> master branch:
>> $ CPPFLAGS="-Og -Wtype-limits" ./configure --with-lz4 -q && make -s -j8
>> compress_lz4.c: In function ‘LZ4File_gets’:
>> compress_lz4.c:492:19: warning: comparison of unsigned expression in
>> ‘< 0’ is always false [-Wtype-limits]
>>    492 |         if (dsize < 0)
>>        |
>> (I wonder, is it accidental that there no other places that triggers
>> the warning, or some buildfarm animals had this check enabled before?)
> 
> I think there is an underlying problem in this code that it dances back
> and forth between size_t and int in an unprincipled way.
> 
> In the code that triggers the warning, dsize is size_t.  dsize is the
> return from LZ4File_read_internal(), which is declared to return int.
> The variable that LZ4File_read_internal() returns in the success case is
> size_t, but in case of an error it returns -1.  (So the code that is
> warning is meaning to catch this error case, but it won't ever work.)
> Further below LZ4File_read_internal() calls LZ4File_read_overflow(),
> which is declared to return int, but in some cases it returns
> fs->overflowlen, which is size_t.
> 

I agree. I just got home so I looked at this only very briefly, but I
think it's clearly wrong to assign the LZ4File_read_internal() result to
a size_t variable (and it seems to me LZ4File_gets does the same mistake
with LZ4File_read_internal() result).

I'll get this fixed early next week, I'm too tired to do that now
without likely causing further issues.

> This should be cleaned up.
> 
> AFAICT, the upstream API in lz4.h uses int for size values, but
> lz4frame.h uses size_t, so I don't know what the correct approach is.

Yeah, that's a good point. I think Justin is right we should be using
the LZ4F stuff, so ultimately we'll probably switch to size_t. But IMO
it's definitely better to correct the current code first, and only then
switch to LZ4F (from one correct state to another).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Stéphane Tachoires
Date:
Subject: Re: [Proposal] Allow pg_dump to include all child tables with the root table
Next
From: Tom Lane
Date:
Subject: Re: pg_dump versus hash partitioning