On Fri, Feb 25, 2022 at 12:05:31PM +0000, Georgios wrote:
> The first commit does the heavy lifting required for additional compression methods.
> It expands testing coverage for the already supported gzip compression. Commit
> bf9aa490db introduced cfp in compress_io.{c,h} with the intent of unifying
> compression related code and allow for the introduction of additional archive
> formats. However, pg_backup_archiver.c was not using that API. This commit
> teaches pg_backup_archiver.c about cfp and is using it through out.
Thanks for the patch. I have a few high-level comments.
+ # Do not use --no-sync to give test coverage for data sync.
+ compression_gzip_directory_format => {
+ test_key => 'compression',
The tests for GZIP had better be split into their own commit, as
that's a coverage improvement for the existing code.
I was assuming that this was going to be much larger :)
+/* Routines that support LZ4 compressed data I/O */
+#ifdef HAVE_LIBLZ4
+static void InitCompressorLZ4(CompressorState *cs);
+static void ReadDataFromArchiveLZ4(ArchiveHandle *AH, ReadFunc
readF);
+static void WriteDataToArchiveLZ4(ArchiveHandle *AH, CompressorState *cs,
+ const char *data, size_t dLen);
+static void EndCompressorLZ4(ArchiveHandle *AH, CompressorState *cs);
+#endif
Hmm. This is the same set of APIs as ZLIB and NONE to init, read,
write and end, but for the LZ4 compressor (NONE has no init/end).
Wouldn't it be better to refactor the existing pg_dump code to have a
central structure holding all the function definitions in a common
structure so as all those function signatures are set in stone in the
shape of a catalog of callbacks, making the addition of more
compression formats easier? I would imagine that we'd split the code
of each compression method into their own file with their own context
data. This would lead to a removal of compress_io.c, with its entry
points ReadDataFromArchive(), WriteDataToArchive() & co replaced by
pointers to each per-compression callback.
> Furthermore, compression was chosen based on the value of the level passed
> as an argument during the invocation of pg_dump or some hardcoded defaults. This
> does not scale for more than one compression methods. Now the method used for
> compression can be explicitly requested during command invocation, or set during
> hardcoded defaults. Then it is stored in the relevant structs and passed in the
> relevant functions, along side compression level which has lost it's special
> meaning. The method for compression is not yet stored in the actual archive.
> This is done in the next commit which does introduce a new method.
That's one thing Robert was arguing about with pg_basebackup, so that
would be consistent, and the option set is backward-compatible as far
as I get it by reading the code.
--
Michael