Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Add LZ4 compression in pg_dump |
Date | |
Msg-id | b0ced9ea-e92d-f9d1-7a2f-075881b755b2@enterprisedb.com Whole thread Raw |
In response to | Re: Add LZ4 compression in pg_dump (gkokolatos@pm.me) |
Responses |
Re: Add LZ4 compression in pg_dump
|
List | pgsql-hackers |
On 3/1/23 14:39, gkokolatos@pm.me wrote: > > > > > > ------- Original Message ------- > On Wednesday, March 1st, 2023 at 12:58 AM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > >> I found that e9960732a broke writing of empty gzip-compressed data, >> specifically LOs. pg_dump succeeds, but then the restore fails: >> >> postgres=# SELECT lo_create(1234); >> lo_create | 1234 >> >> $ time ./src/bin/pg_dump/pg_dump -h /tmp -d postgres -Fc |./src/bin/pg_dump/pg_restore -f /dev/null -v >> pg_restore: implied data-only restore >> pg_restore: executing BLOB 1234 >> pg_restore: processing BLOBS >> pg_restore: restoring large object with OID 1234 >> pg_restore: error: could not uncompress data: (null) >> > > Thank you for looking. This was an untested case. > Yeah :-( >> The inline patch below fixes it, but you won't be able to apply it >> directly, as it's on top of other patches which rename the functions >> back to "Zlib" and rearranges the functions to their original order, to >> allow running: >> >> git diff --diff-algorithm=minimal -w e9960732a~:./src/bin/pg_dump/compress_io.c ./src/bin/pg_dump/compress_gzip.c >> > > Please find a patch attached that can be applied directly. > >> The current function order avoids 3 lines of declarations, but it's >> obviously pretty useful to be able to run that diff command. I already >> argued for not calling the functions "Gzip" on the grounds that the name >> was inaccurate. > > I have no idea why we are back on the naming issue. I stand by the name > because in my humble opinion helps the code reader. There is a certain > uniformity when the compression_spec.algorithm and the compressor > functions match as the following code sample shows. > > if (compression_spec.algorithm == PG_COMPRESSION_NONE) > InitCompressorNone(cs, compression_spec); > else if (compression_spec.algorithm == PG_COMPRESSION_GZIP) > InitCompressorGzip(cs, compression_spec); > else if (compression_spec.algorithm == PG_COMPRESSION_LZ4) > InitCompressorLZ4(cs, compression_spec); > > When the reader wants to see what happens when the PG_COMPRESSION_XXX > is set, has to simply search for the XXX part. I think that this is > justification enough for the use of the names. > I don't recall the previous discussion about the naming, but I'm not sure why would it be inaccurate. We call it 'gzip' pretty much everywhere, and I agree with Georgios there's it helps to make this consistent with the PG_COMPRESSION_ stuff. The one thing that concerned me while reviewing it earlier was that it might make the backpatcheing harder. But that's mostly irrelevant due to all the other changes I think. >> >> I'd want to create an empty large object in src/test/sql/largeobject.sql >> to exercise this tested during pgupgrade. But unfortunately that >> doesn't use -Fc, so this isn't hit. Empty input is an important enough >> test case to justify a tap test, if there's no better way. > > Please find in the attached a test case that exercises this codepath. > Thanks. That seems correct to me, but I find it somewhat confusing, because we now have DeflateCompressorInit vs. InitCompressorGzip DeflateCompressorEnd vs. EndCompressorGzip DeflateCompressorData - The name doesn't really say what it does (would be better to have a verb in there, I think). I wonder if we can make this somehow clearer? Also, InitCompressorGzip says this: /* * If the caller has defined a write function, prepare the necessary * state. Avoid initializing during the first write call, because End * may be called without ever writing any data. */ if (cs->writeF) DeflateCompressorInit(cs); Does it actually make sense to not have writeF defined in some cases? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: