On Sat, Feb 25, 2023 at 08:05:53AM -0600, Justin Pryzby wrote:
> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
> > I have some fixes (attached) and questions while polishing the patch for
> > zstd compression. The fixes are small and could be integrated with the
> > patch for zstd, but could be applied independently.
>
> One more - WriteDataToArchiveGzip() says:
One more again.
The LZ4 path is using non-streaming mode, which compresses each block
without persistent state, giving poor compression for -Fc compared with
-Fp. If the data is highly compressible, the difference can be orders
of magnitude.
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fp |wc -c
12351763
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
21890708
That's not true for gzip:
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fc |wc -c
2118869
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z gzip -Fp |wc -c
2115832
The function ought to at least use streaming mode, so each block/row
isn't compressioned in isolation. 003 is a simple patch to use
streaming mode, which improves the -Fc case:
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -Z lz4 -Fc |wc -c
15178283
However, that still flushes the compression buffer, writing a block
header, for every row. With a single-column table, pg_dump -Fc -Z lz4
still outputs ~10% *more* data than with no compression at all. And
that's for compressible data.
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z lz4 |wc -c
12890296
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Fc -Z none |wc -c
11890296
I think this should use the LZ4F API with frames, which are buffered to
avoid outputting a header for every single row. The LZ4F format isn't
compatible with the LZ4 format, so (unlike changing to the streaming
API) that's not something we can change in a bugfix release. I consider
this an Opened Item.
With the LZ4F API in 004, -Fp and -Fc are essentially the same size
(like gzip). (Oh, and the output is three times smaller, too.)
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fp |wc -c
4155448
$ ./src/bin/pg_dump/pg_dump -h /tmp postgres -t t1 -Z lz4 -Fc |wc -c
4156548
--
Justin