Re: zstd compression for pg_dump - Mailing list pgsql-hackers

From Andreas Karlsson
Subject Re: zstd compression for pg_dump
Date
Msg-id ced272a6-2f02-73cf-b588-03107a257c29@proxel.se
Whole thread Raw
In response to Re: zstd compression for pg_dump  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On 1/4/21 3:53 AM, Justin Pryzby wrote:
>> About 89% smaller.

Did a quick code review of the patch. I have not yet taken it for a spin 
yet and there are parts of the code I have not read yet.

## Is there any reason for this diff?

-    cfp        *fp = pg_malloc(sizeof(cfp));
+    cfp        *fp = pg_malloc0(sizeof(cfp));

## Since we know have multiple returns in cfopen() I am not sure that 
setting fp to NULL is still clearer than just returning NULL.

## I do not like that this pretends to support r+, w+ and a+ but does 
not actually do so since it does not create an input stream in those cases.

else if (mode[0] == 'w' || mode[0] == 'a' ||
    strchr(mode, '+') != NULL)
[...]
else if (strchr(mode, 'r'))

## Wouldn't cfread(), cfwrite(), cfgetc(), cfgets(), cfclose() and 
cfeof() be cleaner with sitch statments similar to cfopen()?

## "/* Should be called "method" or "library" ? */"

Maybe, but personally I think algorithm is fine too.

## "Is a nondefault level set ?"

The PostgreSQL project does not use space before question mark (at least 
not in English).

## Why isn't level_set just a local variable in parse_compression()? It 
does not seem to be used elsewhere.

## Shouldn't we call the Compression variable in OpenArchive() 
nocompress to match with the naming convention in other places.

And in general I wonder if we should not write "nocompression = 
{COMPR_ALG_NONE}" rather than "nocompression = {0}".

## Why not use const on the pointers to Compression for functions like 
cfopen()? As far as I can see several of them could be const.

## Shouldn't "AH->compression.alg = Z_DEFAULT_COMPRESSION" in ReadHead() 
be "AH->compression.alg = COMPR_ALG_DEFAULT"?

Additionally I am not convinced that returning COMPR_ALG_DEFAULT will 
even work but I have not had the time to test that theory yet. And in 
general I am quite sceptical of that we really need of COMPR_ALG_DEFAULT.

## Some white space issues

Add spaces around plus in "atoi(1+eq)" and "pg_log_error("unknown 
compression algorithm: %s", 1+eq)".

Add spaces around plus in parse_compression(), e.g. in "strlen(1+eq)".

## Shouldn't hasSuffix() take the current compression algorithm as a 
parameter? Or alternatively look up which compression algorithm to use 
from the suffix?

## Why support multiple ways to write zlib on the command line? I do not 
see any advatange of being able to write it as libz.

## I feel renaming SaveOutput() to GetOutput() would make it more clear 
what it does now that you have changed the return type.

## You have accidentally committed "-runstatedir" in configure. I have 
no idea why we do not have it (maybe it is something Debian specific) 
but even if we are going to add it it should not be in this patch. Same 
with the parenthesis changes to LARGE_OFF_T.

## This is probably out of scope of your patch but I am not a fan of the 
fallback logic in cfopen_read(). I feel ideally we should always know if 
there is a suffix or not and not try to guess file names and do 
pointless syscalls.

## COMPR_ALG_DEFAULT looks like it would error out for archive and 
directory if someone has neither zlib nor zstandard. It feels like it 
should default to uncompressed if we have neither. Or at least give a 
better error message.

> Note, there's currently several "compression" patches in CF app.  This patch
> seems to be independent of the others, but probably shouldn't be totally
> uncoordinated (like adding lz4 in one and ztsd in another might be poor
> execution).

A thought here is that maybe we want to use the same values for the 
enums in all patches. Especially if we write the numeric value to pg 
dump files.

Andreas




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Terminate the idle sessions
Next
From: Noah Misch
Date:
Subject: Re: Inconsistent "" use