Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Add LZ4 compression in pg_dump
Date
Msg-id 6ab89d42-0345-b96e-02d4-68e171cdbe03@enterprisedb.com
Whole thread Raw
In response to Re: Add LZ4 compression in pg_dump  (gkokolatos@pm.me)
Responses Re: Add LZ4 compression in pg_dump
List pgsql-hackers
On 1/18/23 20:05, gkokolatos@pm.me wrote:
> 
> 
> 
> 
> 
> ------- Original Message -------
> On Wednesday, January 18th, 2023 at 3:00 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> 
> 
>>
>>
>> Hi,
>>
>> On 1/16/23 16:14, gkokolatos@pm.me wrote:
>>
>>> Hi,
>>>
>>> I admit I am completely at lost as to what is expected from me anymore.
>>
> <snip>
>>
>> Unfortunately, this plays against this patch - I'm certainly in favor of
>> adding lz4 (and other compression algos) into pg_dump, but if I commit
>> 0001 we get little benefit, and the other parts actually adding lz4/zstd
>> are treated as "WIP / for completeness" so it's unclear when we'd get to
>> commit them.
> 
> Thank you for your kindness and for taking the time to explain.
>  
>> So if I could recommend one thing, it'd be to get at least one of those
>> WIP patches into a shape that's likely committable right after 0001.
> 
> This was clearly my fault. I misunderstood a suggestion upthread to focus
> on the first patch of the series and ignore documentation and comments on
> the rest.
> 
> Please find v21 to contain 0002 and 0003 in a state which I no longer consider
> as WIP but worthy of proper consideration. Some guidance on where is best to add
> documentation in 0002 for the function pointers in CompressFileHandle will
> be welcomed.
> 

This is internal-only API, not meant for use by regular users and/or
extension authors, so I don't think we need sgml docs. I'd just add
regular code-level documentation to compress_io.h.

For inspiration see docs for "struct ReorderBuffer" in reorderbuffer.h,
or "struct _archiveHandle" in pg_backup_archiver.h.

Or what other kind of documentation you had in mind?

>>
>>> I had posted v19-0001 for a committer's consideration and v19-000{2,3} for completeness.
>>> Please find a rebased v20 attached.
>>
>>
>> I took a quick look at 0001, so a couple comments (sorry if some of this
>> was already discussed in the thread):
> 
> Much appreciated!
> 
>>
>> 1) I don't think a "refactoring" patch should reference particular
>> compression algorithms (lz4/zstd), and in particular I don't think we
>> should have "not yet implemented" messages. We only have a couple other
>> places doing that, when we didn't have a better choice. But here we can
>> simply reject the algorithm when parsing the options, we don't need to
>> do that in a dozen other places.
> 
> I have now removed lz4/zstd from where they were present with the exception
> of pg_dump.c which is responsible for parsing.
> 

I'm not sure I understand why leave the lz4/zstd in this place?

>> 2) I wouldn't reorder the cases in WriteDataToArchive, i.e. I'd keep
>> "none" at the end. It might make backpatches harder.
> 
> Agreed. However a 'default' is needed in order to avoid compilation warnings.
> Also note that 0002 completely does away with cases within WriteDataToArchive.
> 

OK, although that's also a consequence of using a "switch" instead of
plan "if" branches.

Furthermore, I'm not sure we really need the pg_fatal() about invalid
compression method in these default blocks. I mean, how could we even
get to these places when the build does not support the algorithm? All
of this (ReadDataFromArchive, WriteDataToArchive, EndCompressor, ...)
happens looong after the compressor was initialized and the method
checked, no? So maybe either this should simply do Assert(false) or use
a different error message.

>> 3) While building, I get bunch of warnings about missing cfdopen()
>> prototype and pg_backup_archiver.c not knowing about cfdopen() and
>> adding an implicit prototype (so I doubt it actually works).
> 
> Fixed. cfdopen() got prematurely introduced in 5e73a6048 and then got removed
> in 69fb29d1af. v20 failed to properly take 69fb29d1af in consideration. Note
> that cfdopen is removed in 0002 which explains why cfbot didn't complain.
>  

OK.

>> 4) "cfp" struct no longer wraps gzFile, but the comment was not updated.
>> FWIW I'm not sure switching to "void *" is an improvement, maybe it'd be
>> better to have a "union" of correct types?
> 
> Please find and updated comment and a union in place of the void *. Also
> note that 0002 completely does away with cfp in favour of a new struct
> CompressFileHandle. I maintained the void * there because it is used by
> private methods of the compressors. 0003 contains such an example with
> LZ4CompressorState.
> 

I wonder if this (and also the previous item) makes sense to keep 0001
and 0002 or to combine them. The "intermediate" state is a bit annoying.

>> 5) cfopen/cfdopen are missing comments. cfopen_internal has an updated
>> comment, but that's a static function while cfopen/cfdopen are the
>> actual API.
> 
> Added comments to cfopen/cfdopen.
> 

OK.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: almost-super-user problems that we haven't fixed yet
Next
From: Robert Haas
Date:
Subject: Re: Non-superuser subscription owners