Re: refactoring basebackup.c - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: refactoring basebackup.c
Date
Msg-id CE497CA6-D19C-4AF9-9954-4EEDBFDB65CD@enterprisedb.com
Whole thread Raw
In response to Re: refactoring basebackup.c  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: refactoring basebackup.c
List pgsql-hackers

> On Jul 8, 2021, at 8:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> The interesting
> patches in terms of functionality are 0006 and 0007;

The difficulty in v3-0007 with pg_basebackup only knowing how to parse tar archives seems to be a natural consequence
ofnot sufficiently abstracting out the handling of the tar format.  If the bbsink and bbstreamer abstractions fully
encapsulateda set of parsing callbacks, then pg_basebackup wouldn't contain things like: 

    streamer = bbstreamer_tar_parser_new(streamer);

but instead would use the parser callbacks without knowledge of whether they were parsing tar vs. cpio vs. whatever.
Itjust seems really odd that pg_basebackup is using the extensible abstraction layer and then defeating the purpose by
knowingtoo much about the format.  It might even be a useful exercise to write cpio support into this patch set rather
thanwaiting until v16, just to make sure the abstraction layer doesn't have tar-specific assumptions left over. 


    printf(_("  -F, --format=p|t       output format (plain (default), tar)\n"));

    printf(_("  -z, --gzip             compress tar output\n"));
    printf(_("  -Z, --compress=0-9     compress tar output with given compression level\n"));

This is the pre-existing --help output, not changed by your patch, but if you anticipate that other output formats will
besupported in future releases, perhaps it's better not to write the --help output in such a way as to imply that -z
and-Z are somehow connected with the choice of tar format?  Would changing the --help now make for less confusion
later? I'm just asking... 

The new options to pg_basebackup should have test coverage in src/bin/pg_basebackup/t/010_pg_basebackup.pl, though I
expectyou are waiting to hammer out the interface before writing the tests. 

> the rest is
> preparatory refactoring.

patch v3-0001:

The new function AppendPlainCommandOption writes too many spaces, which does no harm, but seems silly, resulting in
lineslike: 

  LOG:  received replication command: BASE_BACKUP ( LABEL 'pg_basebackup base backup',  PROGRESS,  WAIT 0,  MANIFEST
'yes')


patch v3-0003:

The introduction of the sink abstraction seems incomplete, as basebackup.c still has knowledge of things like tar
headers. Calls like _tarWriteHeader(sink, ...) feel like an abstraction violation.  I expected perhaps this would get
addressedin later patches, but it doesn't. 

+ * 'bbs_buffer' is the buffer into which data destined for the bbsink
+ * should be stored. It must be a multiple of BLCKSZ.
+ *
+ * 'bbs_buffer_length' is the allocated length of the buffer.

The length must be a multiple of BLCKSZ, not the pointer.


patch-v3-0005:

+ * 'copystream' sends a starts a single COPY OUT operation and transmits

too many verbs.

+ * Regardless of which method is used, we sent a result set with

"is used" vs. "sent" verb tense mismatch.

+ * So we only check it after the number of bytes sine the last check reaches

typo.  s/sine/since/

-    * (2) we need to inject backup_manifest or recovery configuration into it.
+    * (2) we need to inject backup_manifest or recovery configuration into
+    * it.

src/bin/pg_basebackup/pg_basebackup.c contains word wrap changes like the above which would better be left to a
differentcommit, if done at all. 

+   if (state.manifest_file !=NULL)

Need a space after !=


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Next
From: Jacob Champion
Date:
Subject: Re: Support for NSS as a libpq TLS backend