On 11.01.2011 22:16, Jeff Davis wrote:
> On Tue, 2011-01-11 at 20:17 +0200, Heikki Linnakangas wrote:
>> So, this patch modifies the internal do_pg_start/stop_backup functions
>> so that in addition to the traditional mode of operation, where a
>> backup_label file is created in the data directory where it's backed up
>> along with all other files, the backup label file is be returned to the
>> caller, and the caller is responsible for including it in the backup.
>> The code in replication/basebackup.c includes it in the tar file that's
>> streamed the client, as "backup_label".
>
> Perhaps we can use this more intelligent form of base backup to
> differentiate between:
> a. a primary that has crashed while a backup was in progress; and
> b. an online backup that is being restored.
>
> Allowing the user to do an unrestricted file copy as a base backup
> doesn't allow us to make that differentiation. That lead to the two bugs
> that we fixed in StartupXLOG(). And right now there are still two
> problems remaining (albeit less severe):
>
> 1. If it's a primary recovering from a crash, and there is a
> backup_label file, and the WAL referenced in the backup_label exists,
> then it does a bunch of extra work during recovery; and
> 2. In the same situation, if the WAL referenced in the backup_label
> does not exist, then it PANICs with a HINT to tell you to remove the
> backup_label.
>
> Is this an opportunity to solve these problems and simplify the code?
It won't change the situation for pg_start_backup(), but with the patch
the base backups done via streaming won't have those issues, because
backup_label is not created (with that name) in the master.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com