Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: Add LZ4 compression in pg_dump
Date
Msg-id 20230228235834.GC30529@telsasoft.com
Whole thread Raw
In response to Re: Add LZ4 compression in pg_dump  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Add LZ4 compression in pg_dump
Re: Add LZ4 compression in pg_dump
List pgsql-hackers
On Thu, Feb 23, 2023 at 09:24:46PM +0100, Tomas Vondra wrote:
> On 2/23/23 16:26, Tomas Vondra wrote:
> > Thanks for v30 with the updated commit messages. I've pushed 0001 after
> > fixing a comment typo and removing (I think) an unnecessary change in an
> > error message.
> > 
> > I'll give the buildfarm a bit of time before pushing 0002 and 0003.
> > 
> 
> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
> and marked the CF entry as committed. Thanks for the patch!

I found that e9960732a broke writing of empty gzip-compressed data,
specifically LOs.  pg_dump succeeds, but then the restore fails:

postgres=# SELECT lo_create(1234);
lo_create | 1234

$ time ./src/bin/pg_dump/pg_dump -h /tmp -d postgres -Fc |./src/bin/pg_dump/pg_restore -f /dev/null -v 
pg_restore: implied data-only restore
pg_restore: executing BLOB 1234
pg_restore: processing BLOBS
pg_restore: restoring large object with OID 1234
pg_restore: error: could not uncompress data: (null)

The inline patch below fixes it, but you won't be able to apply it
directly, as it's on top of other patches which rename the functions
back to "Zlib" and rearranges the functions to their original order, to
allow running:

git diff --diff-algorithm=minimal -w e9960732a~:./src/bin/pg_dump/compress_io.c ./src/bin/pg_dump/compress_gzip.c

The current function order avoids 3 lines of declarations, but it's
obviously pretty useful to be able to run that diff command.  I already
argued for not calling the functions "Gzip" on the grounds that the name
was inaccurate.

I'd want to create an empty large object in src/test/sql/largeobject.sql
to exercise this tested during pgupgrade.  But unfortunately that
doesn't use -Fc, so this isn't hit.  Empty input is an important enough
test case to justify a tap test, if there's no better way.

diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index f3f5e87c9a8..68f3111b2fe 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -55,6 +55,32 @@ InitCompressorZlib(CompressorState *cs,
     gzipcs = (ZlibCompressorState *) pg_malloc0(sizeof(ZlibCompressorState));
 
     cs->private_data = gzipcs;
+
+    if (cs->writeF)
+    {
+        z_streamp    zp;
+        zp = gzipcs->zp = (z_streamp) pg_malloc0(sizeof(z_stream));
+        zp->zalloc = Z_NULL;
+        zp->zfree = Z_NULL;
+        zp->opaque = Z_NULL;
+
+        /*
+         * outsize is the buffer size we tell zlib it can output to.  We
+         * actually allocate one extra byte because some routines want to append a
+         * trailing zero byte to the zlib output.
+         */
+
+        gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
+        gzipcs->outsize = ZLIB_OUT_SIZE;
+
+        if (deflateInit(gzipcs->zp, cs->compression_spec.level) != Z_OK)
+            pg_fatal("could not initialize compression library: %s",
+                    zp->msg);
+
+        /* Just be paranoid - maybe End is called after Start, with no Write */
+        zp->next_out = gzipcs->outbuf;
+        zp->avail_out = gzipcs->outsize;
+    }
 }
 
 static void
@@ -63,7 +89,7 @@ EndCompressorZlib(ArchiveHandle *AH, CompressorState *cs)
     ZlibCompressorState *gzipcs = (ZlibCompressorState *) cs->private_data;
     z_streamp    zp;
 
-    if (gzipcs->zp)
+    if (cs->writeF != NULL)
     {
         zp = gzipcs->zp;
         zp->next_in = NULL;
@@ -131,29 +157,6 @@ WriteDataToArchiveZlib(ArchiveHandle *AH, CompressorState *cs,
                        const void *data, size_t dLen)
 {
     ZlibCompressorState *gzipcs = (ZlibCompressorState *) cs->private_data;
-    z_streamp    zp;
-
-    if (!gzipcs->zp)
-    {
-        zp = gzipcs->zp = (z_streamp) pg_malloc(sizeof(z_stream));
-        zp->zalloc = Z_NULL;
-        zp->zfree = Z_NULL;
-        zp->opaque = Z_NULL;
-
-        /*
-         * outsize is the buffer size we tell zlib it can output to.  We
-         * actually allocate one extra byte because some routines want to
-         * append a trailing zero byte to the zlib output.
-         */
-        gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
-        gzipcs->outsize = ZLIB_OUT_SIZE;
-
-        if (deflateInit(zp, cs->compression_spec.level) != Z_OK)
-            pg_fatal("could not initialize compression library: %s", zp->msg);
-
-        zp->next_out = gzipcs->outbuf;
-        zp->avail_out = gzipcs->outsize;
-    }
 
     gzipcs->zp->next_in = (void *) unconstify(void *, data);
     gzipcs->zp->avail_in = dLen;

Attachment

pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
Next
From: Michael Paquier
Date:
Subject: Re: Doc update for pg_stat_statements normalization