Re: Allowing multiple concurrent base backups - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Allowing multiple concurrent base backups
Date
Msg-id AANLkTi=q5=x_WoOccf5dqHoAWXyDc7qXG0cpbnsNKUCB@mail.gmail.com
Whole thread Raw
In response to Re: Allowing multiple concurrent base backups  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Allowing multiple concurrent base backups  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Hmm, good point. It's harmless, but creating the history file in the first
> place sure seems like a waste of time.

The attached patch changes pg_stop_backup so that it doesn't create
the backup history file if archiving is not enabled.

When I tested the multiple backups, I found that they can have the same
checkpoint location and the same history file name.

--------------------
$ for ((i=0; i<4; i++)); do
pg_basebackup -D test$i -c fast -x -l test$i &
done

$ cat test0/backup_label
START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/20000E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test0

$ cat test1/backup_label
START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/20000E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test1

$ cat test2/backup_label
START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/20000E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test2

$ cat test3/backup_label
START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/20000E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test3

$ ls archive/*.backup
archive/000000010000000000000002.000000B0.backup
--------------------

This would cause a serious problem. Because the backup-end record
which indicates the same "START WAL LOCATION" can be written by the
first backup before the other finishes. So we might think wrongly that
we've already reached a consistency state by reading the backup-end
record (written by the first backup) before reading the last required WAL
file.

        /*
         * Force a CHECKPOINT.    Aside from being necessary to prevent torn
         * page problems, this guarantees that two successive backup runs will
         * have different checkpoint positions and hence different history
         * file names, even if nothing happened in between.
         *
         * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing
         * fast = true).  Otherwise this can take awhile.
         */
        RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
                          (fast ? CHECKPOINT_IMMEDIATE : 0));

This problem happens because the above code (in do_pg_start_backup)
actually doesn't ensure that the concurrent backups have the different
checkpoint locations. ISTM that we should change the above or elsewhere
to ensure that. Or we should include backup label name in the backup-end
record, to prevent a recovery from reading not-its-own backup-end record.

Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [pgsql-general 2011-1-21:] Are there any projects interested in object functionality? (+ rule bases)
Next
From: Hitoshi Harada
Date:
Subject: Re: Add ENCODING option to COPY