Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Add LZ4 compression in pg_dump
Date
Msg-id b0ced9ea-e92d-f9d1-7a2f-075881b755b2@enterprisedb.com
Whole thread Raw
In response to Re: Add LZ4 compression in pg_dump  (gkokolatos@pm.me)
Responses Re: Add LZ4 compression in pg_dump
List pgsql-hackers

On 3/1/23 14:39, gkokolatos@pm.me wrote:
> 
> 
> 
> 
> 
> ------- Original Message -------
> On Wednesday, March 1st, 2023 at 12:58 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:
> 
> 
> 
>> I found that e9960732a broke writing of empty gzip-compressed data,
>> specifically LOs. pg_dump succeeds, but then the restore fails:
>>
>> postgres=# SELECT lo_create(1234);
>> lo_create | 1234
>>
>> $ time ./src/bin/pg_dump/pg_dump -h /tmp -d postgres -Fc |./src/bin/pg_dump/pg_restore -f /dev/null -v
>> pg_restore: implied data-only restore
>> pg_restore: executing BLOB 1234
>> pg_restore: processing BLOBS
>> pg_restore: restoring large object with OID 1234
>> pg_restore: error: could not uncompress data: (null)
>>
> 
> Thank you for looking. This was an untested case.
> 

Yeah :-(

>> The inline patch below fixes it, but you won't be able to apply it
>> directly, as it's on top of other patches which rename the functions
>> back to "Zlib" and rearranges the functions to their original order, to
>> allow running:
>>
>> git diff --diff-algorithm=minimal -w e9960732a~:./src/bin/pg_dump/compress_io.c ./src/bin/pg_dump/compress_gzip.c
>>
> 
> Please find a patch attached that can be applied directly.
> 
>> The current function order avoids 3 lines of declarations, but it's
>> obviously pretty useful to be able to run that diff command. I already
>> argued for not calling the functions "Gzip" on the grounds that the name
>> was inaccurate.
> 
> I have no idea why we are back on the naming issue. I stand by the name
> because in my humble opinion helps the code reader. There is a certain
> uniformity when the compression_spec.algorithm and the compressor
> functions match as the following code sample shows.
> 
>     if (compression_spec.algorithm == PG_COMPRESSION_NONE)         
>         InitCompressorNone(cs, compression_spec);
>     else if (compression_spec.algorithm == PG_COMPRESSION_GZIP)
>         InitCompressorGzip(cs, compression_spec);
>     else if (compression_spec.algorithm == PG_COMPRESSION_LZ4)
>         InitCompressorLZ4(cs, compression_spec);        
>                                                  
> When the reader wants to see what happens when the PG_COMPRESSION_XXX
> is set, has to simply search for the XXX part. I think that this is
> justification enough for the use of the names.
> 

I don't recall the previous discussion about the naming, but I'm not
sure why would it be inaccurate. We call it 'gzip' pretty much
everywhere, and I agree with Georgios there's it helps to make this
consistent with the PG_COMPRESSION_ stuff.

The one thing that concerned me while reviewing it earlier was that it
might make the backpatcheing harder. But that's mostly irrelevant due to
all the other changes I think.

>>
>> I'd want to create an empty large object in src/test/sql/largeobject.sql
>> to exercise this tested during pgupgrade. But unfortunately that
>> doesn't use -Fc, so this isn't hit. Empty input is an important enough
>> test case to justify a tap test, if there's no better way.
> 
> Please find in the attached a test case that exercises this codepath.
> 

Thanks. That seems correct to me, but I find it somewhat confusing,
because we now have

 DeflateCompressorInit vs. InitCompressorGzip

 DeflateCompressorEnd vs. EndCompressorGzip

 DeflateCompressorData - The name doesn't really say what it does (would
                         be better to have a verb in there, I think).

I wonder if we can make this somehow clearer?

Also, InitCompressorGzip says this:

   /*
    * If the caller has defined a write function, prepare the necessary
    * state. Avoid initializing during the first write call, because End
    * may be called without ever writing any data.
    */
    if (cs->writeF)
        DeflateCompressorInit(cs);

Does it actually make sense to not have writeF defined in some cases?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: meson: Non-feature feature options
Next
From: Nikolay Samokhvalov
Date:
Subject: Re: pg_upgrade and logical replication