base backup client as auxiliary backend process - Mailing list pgsql-hackers
From | Peter Eisentraut |
---|---|
Subject | base backup client as auxiliary backend process |
Date | |
Msg-id | 61b8d18d-c922-ac99-b990-a31ba63cdcbb@2ndquadrant.com Whole thread Raw |
Responses |
Re: base backup client as auxiliary backend process
Re: base backup client as auxiliary backend process Re: base backup client as auxiliary backend process Re: base backup client as auxiliary backend process Re: base backup client as auxiliary backend process |
List | pgsql-hackers |
Setting up a standby instance is still quite complicated. You need to run pg_basebackup with all the right options. You need to make sure pg_basebackup has the right permissions for the target directories. The created instance has to be integrated into the operating system's start scripts. There is this slightly awkward business of the --recovery-conf option and how it interacts with other features. And you should probably run pg_basebackup under screen. And then how do you get notified when it's done. And when it's done you have to log back in and finish up. Too many steps. My idea is that the postmaster can launch a base backup worker, wait till it's done, then proceed with the rest of the startup. initdb gets a special option to create a "minimal" data directory with only a few files, directories, and the usual configuration files. Then you create a $PGDATA/basebackup.signal, start the postmaster as normal. It sees the signal file, launches an auxiliary process that runs the base backup, then proceeds with normal startup in standby mode. This makes a whole bunch of things much nicer: The connection information for where to get the base backup from comes from postgresql.conf, so you only need to specify it in one place. pg_basebackup is completely out of the picture; no need to deal with command-line options, --recovery-conf, screen, monitoring for completion, etc. If something fails, the base backup process can automatically be restarted (maybe). Operating system integration is much easier: You only call initdb and then pg_ctl or postgres, as you are already doing. Automated deployment systems don't need to wait for pg_basebackup to finish: You only call initdb, then start the server, and then you're done -- waiting for the base backup to finish can be done by the regular monitoring system. Attached is a very hackish patch to implement this. It works like this: # (assuming you have a primary already running somewhere) initdb -D data2 --minimal $EDITOR data2/postgresql.conf # set primary_conninfo pg_ctl -D data2 start (Curious side note: If you don’t set primary_conninfo in these steps, then libpq defaults apply, so the default behavior might end up being that a given instance attempts to replicate from itself.) It works for basic cases. It's missing tablespace support, proper fsyncing, progress reporting, probably more. Those would be pretty straightforward I think. The interesting bit is the delicate ordering of the postmaster startup: Normally, the pg_control file is read quite early, but if starting from a minimal data directory, we need to wait until the base backup is done. There is also the question what you do if the base backup fails halfway through. Currently you probably need to delete the whole data directory and start again with initdb. Better might be a way to start again and overwrite any existing files, but that can clearly also be dangerous. All this needs some careful analysis, but I think it's doable. Any thoughts? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: