Re: refactoring basebackup.c (zstd workers) - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: refactoring basebackup.c (zstd workers)
Date
Msg-id 20220317194130.GM28503@telsasoft.com
Whole thread Raw
In response to Re: refactoring basebackup.c (zstd workers)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 9178c779ba..00c593f1af 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2731,14 +2731,24 @@ The commands accepted in replication mode are:
+        <para>
+          For <literal>gzip</literal> the compression level should be an

gzip comma

+++ b/src/backend/replication/basebackup.c
@@ -18,6 +18,7 @@
 
 #include "access/xlog_internal.h"    /* for pg_start/stop_backup */
 #include "common/file_perm.h"
+#include "common/backup_compression.h"

alphabetical

-                                                errmsg("unrecognized compression algorithm: \"%s\"",
+                                                errmsg("unrecognized compression algorithm \"%s\"",

Most other places seem to say "compression method".  So I'd suggest to change
that here, and in doc/src/sgml/ref/pg_basebackup.sgml.

-    if (o_compression_level && !o_compression)
+    if (o_compression_detail && !o_compression)
         ereport(ERROR,
                 (errcode(ERRCODE_SYNTAX_ERROR),
                  errmsg("compression level requires compression")));

s/level/detail/

 /*
+ * Basic parsing of a value specified for -Z/--compress.
+ *
+ * We're not concerned here with understanding exactly what behavior the
+ * user wants, but we do need to know whether the user is requesting client
+ * or server side compression or leaving it unspecified, and we need to
+ * separate the name of the compression algorithm from the detail string.
+ *
+ * For instance, if the user writes --compress client-lz4:6, we want to
+ * separate that into (a) client-side compression, (b) algorithm "lz4",
+ * and (c) detail "6". Note, however, that all the client/server prefix is
+ * optional, and so is the detail. The algorithm name is required, unless
+ * the whole string is an integer, in which case we assume "gzip" as the
+ * algorithm and use the integer as the detail.
..
  */
 static void
+parse_compress_options(char *option, char **algorithm, char **detail,
+                       CompressionLocation *locationres)

It'd be great if this were re-usable for wal_compression, which I hope in pg16 will
support at least level=N.  And eventually pg_dump.  But those clients shouldn't
accept a client/server prefix.  Maybe the way to handle that is for those tools
to check locationres and reject it if it was specified.

+ * We're not concerned with validation at this stage, so if the user writes
+ * --compress client-turkey:sandwhich, the requested algorithm is "turkey"
+ * and the detail string is "sandwhich". We'll sort out whether that's legal

sp: sandwich

+        WalCompressionMethod    wal_compress_method;

This is confusingly similar to src/include/access/xlog.h:WalCompression.
I think someone else mentioned this before ?

+ * A compression specification specifies the parameters that should be used
+ * when * performing compression with a specific algorithm. The simplest

star

+/*
+ * Get the human-readable name corresponding to a particular compression
+ * algorithm.
+ */
+char *
+get_bc_algorithm_name(bc_algorithm algorithm)

should be const ?

+    /* As a special case, the specification can be a bare integer. */
+    bare_level = strtol(specification, &bare_level_endp, 10);

Should this call expect_integer_value()?
See below.

+            result->parse_error =
+                pstrdup("found empty string where a compression option was expected");

Needs to be localized with _() ?
Also, document that it's pstrdup'd.

+/*
+ * Parse 'value' as an integer and return the result.
+ *
+ * If parsing fails, set result->parse_error to an appropriate message
+ * and return -1.
+ */
+static int
+expect_integer_value(char *keyword, char *value, bc_specification *result)

-1 isn't great, since it's also an integer, and, also a valid compression level
for zstd (did you see my message about that?).  Maybe INT_MIN is ok.

+{
+    int        ivalue;
+    char   *ivalue_endp;
+
+    ivalue = strtol(value, &ivalue_endp, 10);

Should this also set/check errno ?
And check if value != ivalue_endp ?
See strtol(3)

+char *
+validate_bc_specification(bc_specification *spec)
...
+    /*
+     * If a compression level was specified, check that the algorithm expects
+     * a compression level and that the level is within the legal range for
+     * the algorithm.

It would be nice if this could be shared with wal_compression and pg_dump.
We shouldn't need multiple places with structures giving the algorithms and
range of compression levels.

+    unsigned    options;        /* OR of BACKUP_COMPRESSION_OPTION constants */

Should be "unsigned int" or "bits32" ?

The server crashes if I send an unknown option - you should hit that in the
regression tests.

$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --no-manifest
--compress=server-lz4:a|wc -c
 
TRAP: FailedAssertion("pointer != NULL", File: "../../../../src/include/utils/memutils.h", Line: 123, PID: 8627)
postgres: walsender pryzbyj [local] BASE_BACKUP(ExceptionalCondition+0xa0)[0x560b45d7b64b]
postgres: walsender pryzbyj [local] BASE_BACKUP(pfree+0x5d)[0x560b45dad1ea]
postgres: walsender pryzbyj [local] BASE_BACKUP(parse_bc_specification+0x154)[0x560b45dc5d4f]
postgres: walsender pryzbyj [local] BASE_BACKUP(+0x43d56c)[0x560b45bc556c]
postgres: walsender pryzbyj [local] BASE_BACKUP(SendBaseBackup+0x2d)[0x560b45bc85ca]
postgres: walsender pryzbyj [local] BASE_BACKUP(exec_replication_command+0x3a2)[0x560b45bdddb2]
postgres: walsender pryzbyj [local] BASE_BACKUP(PostgresMain+0x6b2)[0x560b45c39131]
postgres: walsender pryzbyj [local] BASE_BACKUP(+0x40530e)[0x560b45b8d30e]
postgres: walsender pryzbyj [local] BASE_BACKUP(+0x408572)[0x560b45b90572]
postgres: walsender pryzbyj [local] BASE_BACKUP(+0x4087b9)[0x560b45b907b9]
postgres: walsender pryzbyj [local] BASE_BACKUP(PostmasterMain+0x1135)[0x560b45b91d9b]
postgres: walsender pryzbyj [local] BASE_BACKUP(main+0x229)[0x560b45ad0f78]

This is interpreted like client-gzip-1; should multiple specifications of
compress be prohibited ?

| src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --no-manifest --compress=server-lz4
--compress=1



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: ICU for global collation
Next
From: Greg Stark
Date:
Subject: Re: Proposal for internal Numeric to Uint64 conversion function.