Re: Streaming base backups - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Streaming base backups
Date
Msg-id 4D2648C9.2090906@enterprisedb.com
Whole thread Raw
In response to Streaming base backups  (Magnus Hagander <magnus@hagander.net>)
Responses Re: Streaming base backups  (Magnus Hagander <magnus@hagander.net>)
Re: Streaming base backups  (Magnus Hagander <magnus@hagander.net>)
List pgsql-hackers
On 05.01.2011 15:54, Magnus Hagander wrote:
> Attached is an updated streaming base backup patch, based off the work
> that Heikki started.
> ...
> I've implemented a frontend for this in pg_streamrecv, based on the assumption
> that we wanted to include this in bin/ for 9.1 - and that it seems like a
> reasonable place to put it. This can obviously be moved elsewhere if we want to.

Hmm, is there any point in keeping the two functionalities in the same 
binary, taking the base backup and streaming WAL to an archive 
directory? Looks like the only common option between the two modes is 
passing the connection string, and the verbose flag. A separate 
pg_basebackup binary would probably make more sense.

> That code needs a lot more cleanup, but I wanted to make sure I got the backend
> patch out for review quickly. You can find the current WIP branch for
> pg_streamrecv on my github page at https://github.com/mhagander/pg_streamrecv,
> in the branch "baserecv". I'll be posting that as a separate patch once it's
> been a bit more cleaned up (it does work now if you want to test it, though).

Looks like pg_streamrecv creates the pg_xlog and pg_tblspc directories, 
because they're not included in the streamed tar. Wouldn't it be better 
to include them in the tar as empty directories at the server-side? 
Otherwise if you write the tar file to disk and untar it later, you have 
to manually create them.

It would be nice to have an option in pg_streamrecv to specify the 
backup label to use.

An option to stream the tar to stdout instead of a file would be very 
handy too, so that you could pipe it directly to gzip for example. I 
realize you get multiple tar files if tablespaces are used, but even if 
you just throw an error in that case, it would be handy.

> * Suggestion from Heikki: perhaps at some point we're going to need a full
>    bison grammar for walsender commands.

Maybe we should at least start using the lexer; we're not quite there to 
need a full-blown grammar yet, but even a lexer might help.


BTW, looking at the WAL-streaming side of pg_streamrecv, if you start it 
from scratch with an empty target directory, it needs to connect to 
"postgres" database, to run pg_current_xlog_location(), and then 
reconnect in replication mode. That's a bit awkward, there might not be 
a "postgres" database, and even if there is, you might not have the 
permission to connect to it. It would be much better to have a variant 
of the START_REPLICATION command at the server-side that begins 
streaming from the current location. Maybe just by leaving out the 
start-location parameter.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: DISCARD ALL ; stored procedures
Next
From: Magnus Hagander
Date:
Subject: Re: Streaming base backups