Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: Add LZ4 compression in pg_dump
Date
Msg-id 20230108194524.GA27637@telsasoft.com
Whole thread Raw
In response to Re: Add LZ4 compression in pg_dump  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: Add LZ4 compression in pg_dump  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Thu, Dec 22, 2022 at 11:08:59AM -0600, Justin Pryzby wrote:
> There's a couple of lz4 bits which shouldn't be present in 002: file
> extension and comments.

There were "LZ4" comments and file extension stuff in the preparatory
commit.  But now it seems like you *removed* them in the LZ4 commit
(where it actually belongs) rather than *moving* it from the
prior/parent commit *to* the lz4 commit.  I recommend to run something
like "git diff @{1}" whenever doing this kind of patch surgery.

+   if (AH->compression_spec.algorithm != PG_COMPRESSION_NONE &&
+       AH->compression_spec.algorithm == PG_COMPRESSION_GZIP &&

This looks wrong/redundant.  The gzip part should be removed, right ?

Maybe other places that check if (compression==PG_COMPRESSION_GZIP)
should maybe change to say compression!=NONE?

_PrepParallelRestore() references ".gz", so I think it needs to be
retrofitted to handle .lz4.  Ideally, that's built into a struct or list
of file extensions to try.  Maybe compression.h should have a function
to return the file extension of a given algorithm.  I'm planning to send
a patch for zstd, and hoping its changes will be minimized by these
preparatory commits.

+ errno = errno ? : ENOSPC;

"?:" is a GNU extension (not the ternary operator, but the ternary
operator with only 2 args).  It's not in use anywhere else in postgres.
You could instead write it with 3 "errno"s or as "if (errno==0):
errno=ENOSPC"

You wrote "eol_flag == false" and "eol_flag == 0" and true.  But it's
cleaner to test it as a boolean: if (eol_flag) / if (!eol_flag).

Both LZ4File_init() and its callers check "inited".  Better to do it in
one place than 3.  It's a static function, so I think there's no
performance concern.

Gzip_close() still has a useless save_errno (or rebase issue?).

I think it's confusing to have two functions, one named
InitCompressLZ4() and InitCompressorLZ4().

pg_compress_specification is being passed by value, but I think it
should be passed as a pointer, as is done everywhere else.

pg_compress_algorithm is being writen directly into the pg_dump header.
Currently, I think that's not an externally-visible value (it could be
renumbered, theoretically even in a minor release).  Maybe there should
be a "private" enum for encoding the pg_dump header, similar to
WAL_COMPRESSION_LZ4 vs BKPIMAGE_COMPRESS_LZ4 ?  Or else a comment there
should warn that the values are encoded in pg_dump, and must never be
changed.

+ Verify that data files where compressed
typo: s/where/were/

Also:
s/occurance/occurrence/
s/begining/beginning/
s/Verfiy/Verify/
s/nessary/necessary/

BTW I noticed that cfdopen() was accidentally committed to compress_io.h
in master without being defined anywhere.

-- 
Justin



pgsql-hackers by date:

Previous
From: Vik Fearing
Date:
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Next
From: Alexander Korotkov
Date:
Subject: Re: Fix gin index cost estimation