Re: refactoring basebackup.c - Mailing list pgsql-hackers

From Robert Haas
Subject Re: refactoring basebackup.c
Date
Msg-id CA+TgmobWAD+b+-rA9zdwhaOxS0quK9iJjD4eq2N8qzUsPYD+CA@mail.gmail.com
Whole thread Raw
In response to Re: refactoring basebackup.c  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: refactoring basebackup.c  (Mark Dilger <mark.dilger@enterprisedb.com>)
List pgsql-hackers
On Mon, Jul 19, 2021 at 2:51 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> The difficulty in v3-0007 with pg_basebackup only knowing how to parse tar archives seems to be a natural consequence
ofnot sufficiently abstracting out the handling of the tar format.  If the bbsink and bbstreamer abstractions fully
encapsulateda set of parsing callbacks, then pg_basebackup wouldn't contain things like: 
>
>     streamer = bbstreamer_tar_parser_new(streamer);
>
> but instead would use the parser callbacks without knowledge of whether they were parsing tar vs. cpio vs. whatever.
Itjust seems really odd that pg_basebackup is using the extensible abstraction layer and then defeating the purpose by
knowingtoo much about the format.  It might even be a useful exercise to write cpio support into this patch set rather
thanwaiting until v16, just to make sure the abstraction layer doesn't have tar-specific assumptions left over. 

Well, I had a patch in an earlier patch set that tried to get
knowledge of tar out of basebackup.c, but it couldn't use the bbsink
abstraction; it needed a whole separate abstraction layer which I had
called bbarchiver with a different API. So I dropped it, for fear of
being told, not without some justification, that I was just changing
things for the sake of changing them, and also because having exactly
one implementation of some interface is really not great. I do
conceptually like the idea of making the whole thing flexible enough
to generate cpio or zip archives, because like you I think that having
tar-specific stuff all over the place is grotty, but I have a feeling
there's little market demand for having pg_basebackup produce cpio,
pax, zip, iso, etc. archives. On the other hand, server-side
compression and server-side backup seem like functionality with real
utility. Still, if you or others want to vote for resurrecting
bbarchiver on the grounds that general code cleanup is worthwhile for
its own sake, I'm OK with that, too.

I don't really understand what your problem is with how the patch set
leaves pg_basebackup. On the server side, because I dropped the
bbarchiver stuff, basebackup.c still ends up knowing a bunch of stuff
about tar. pg_basebackup.c, however, really doesn't know anything much
about tar any more. It knows that if it's getting a tar file and needs
to parse a tar file then it had better call the tar parsing code, but
that seems difficult to avoid. What we can avoid, and I think the
patch set does, is pg_basebackup.c having any real knowledge of what
the tar parser is doing under the hood.

Thanks also for the detailed comments. I'll try to the right number of
verbs in each sentence in the next version of the patch. I will also
look into the issues mentioned by Dilip and Tushar.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: something is wonky with pgbench pipelining
Next
From: Robert Haas
Date:
Subject: Re: [bug?] Missed parallel safety checks, and wrong parallel safety