Re: refactoring basebackup.c - Mailing list pgsql-hackers

From Robert Haas
Subject Re: refactoring basebackup.c
Date
Msg-id CA+Tgmoa7ND2gTqMnaXO=m8eVQakGcLnw4BJyjzT=91ee=Vt8kQ@mail.gmail.com
Whole thread Raw
In response to Re: refactoring basebackup.c  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Fri, Jul 31, 2020 at 12:49 PM Andres Freund <andres@anarazel.de> wrote:
> Have you tested whether this still works against older servers? Or do
> you think we should not have that as a goal?

I haven't tested that recently but I intended to keep it working. I'll
make sure to nail that down before I get to the point of committing
anything, but I don't expect big problems. It's kind of annoying to
have so much backward compatibility stuff here but I think ripping any
of that out should wait for another time.

> Hm. I don't think I terribly like the idea of things like -R having to
> be processed server side. That'll be awfully annoying to keep working
> across versions, for one. But perhaps the config file should just not be
> in the main tar file going forward?

That'd be a user-visible change, though, whereas what I'm proposing
isn't. Instead of directly injecting stuff, the client can just send
it to the server and have the server inject it, provided the server is
new enough. Cross-version issues don't seem to be any worse than now.
That being said, I don't love it, either. We could just suggest to
people that using -R together with server compression is

> I think we should eventually be able to use one archive for multiple
> purposes, e.g. to set up a standby as well as using it for a base
> backup. Or multiple standbys with different tablespace remappings.

I don't think I understand your point here.

> ISTM that that can help to some degree, but things like tablespace
> remapping etc IMO aren't best done server side, so I think the client
> will continue to need to know about the contents to a significnat
> degree?

If I'm not mistaken, those mappings are only applied with -Fp i.e. if
we're extracting. And it's no problem to jigger things in that case;
we can only do this if we understand the archive in the first place.
The problem is when you have to decompress and recompress to jigger
things.

> Wonder if there's a way to get this to be less stateful. It seems a bit
> ugly that the client would know what the last 'a' was for a 'p'? Perhaps
> we could actually make 'a' include an identifier for each archive, and
> then 'p' would append to a specific archive? Which would then also would
> allow for concurrent processing of those archives on the server side.

...says the guy working on asynchronous I/O. I don't know, it's not a
bad idea, but I think we'd have to change a LOT of code to make it
actually do something useful. I feel like this could be added as a
later extension of the protocol, rather than being something that we
necessarily need to do now.

> I'd personally rather have a separate message type for progress and
> payload. Seems odd to have to send payload messages with 0 payload just
> because we want to update progress (in case of uploading to
> e.g. S3). And I think it'd be nice if we could have a more extensible
> progress measurement approach than a fixed length prefix. E.g. it might
> be nice to allow it to report both the overall progress, as well as a
> per archive progress. Or we might want to send progress when uploading
> to S3, even when not having pre-calculated the total size of the data
> directory.

I don't mind a separate message type here, but if you want merging of
short messages with adjacent longer messages to generate a minimal
number of system calls, that might have some implications for the
other thread where we're talking about how to avoid extra memory
copies when generating protocol messages. If you don't mind them going
out as separate network packets, then it doesn't matter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Next
From: Robert Haas
Date:
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."