Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Add LZ4 compression in pg_dump
Date
Msg-id ZFhCyn4Gm2eu60rB@paquier.xyz
Whole thread Raw
In response to Re: Add LZ4 compression in pg_dump  (gkokolatos@pm.me)
Responses Re: Add LZ4 compression in pg_dump
List pgsql-hackers
On Fri, May 05, 2023 at 02:13:28PM +0000, gkokolatos@pm.me wrote:
> Good point. I thought about it before submitting the patch. I
> concluded that given the complexity and operations involved in
> LZ4Stream_read_internal() and the rest of t he pg_dump/pg_restore
> code, the memset() call will be negligible. However from the
> readability point of view, the function is a bit cleaner with the
> memset().
>
> I will not object to any suggestion though, as this is a very
> trivial point. Please find attached a v2 of the patch following the
> suggested approach.

Hmm.  I was looking at this patch, and what you are trying to do
sounds rather right to keep a parallel with the gzip and zstd code
paths.

Looking at the code of gzread.c, gzgets() enforces a null-termination
on the string read.  Still, isn't that something we'd better enforce
in read_none() as well?  compress_io.h lists this as a requirement of
the callback, and Zstd_gets() does so already.  read_none() does not
enforce that, unfortunately.

+   /* No work needs to be done for a zero-sized output buffer */
+   if (size <= 0)
+       return 0;

Indeed.  This should be OK.

-   ret = LZ4Stream_read_internal(state, ptr, size, true);
+   Assert(size > 1);

The addition of this assertion is a bit surprising, and this is
inconsistent with Zstd_gets where a length of 1 is authorized.  We
should be more consistent across all the callbacks, IMO, not less, so
as we apply the same API contract across all the compression methods.

While testing this patch, I have triggered an error pointing out that
the decompression path of LZ4 is broken for table data.  I can
reproduce that with a dump of the regression database, as of:
make installcheck
pg_dump --format=d --file=dump_lz4 --compress=lz4 regression
createdb regress_lz4
pg_restore --format=d -d regress_lz4 dump_lz4
pg_restore: error: COPY failed for table "clstr_tst": ERROR:  extra data after last expected column
CONTEXT:  COPY clstr_tst, line 15: "32    6    seis
xyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzy..."
pg_restore: warning: errors ignored on restore: 1

This does not show up with gzip or zstd, and the patch does not
influence the result.  In short it shows up with and without the
patch, on HEAD.  That does not look really stable :/
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: PGDOCS - Replica Identity quotes
Next
From: "Jonathan S. Katz"
Date:
Subject: Re: 2023-05-11 release announcement draft