Re: Incomplete description of pg_start_backup? - Mailing list pgsql-hackers

From Dmitry Koterov
Subject Re: Incomplete description of pg_start_backup?
Date
Msg-id CA+CZih6L2w+BcLH4_EmhthdJDiiygGH5oApLuB2UvzSb6bCeag@mail.gmail.com
Whole thread Raw
In response to Re: Incomplete description of pg_start_backup?  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Incomplete description of pg_start_backup?  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
I don't get still.

Suppose we have a data file with blocks with important (non-empty) data:

A B C D

1. I call pg_start_backup().
2. Tar starts to copy A block to the destination archive...
3. During this copying, somebody removes data from a table which is situated in B block. So this data is a subject for vacuuming, and the block is marked as a free space.
4. Somebody writes data to a table, and this data is placed to a free space - to B block. This is also added to the WAL log (so the data is stored at 2 places: at B block and at WAL).
5. Tar (at last!) finishes copying of A block and begins to copy B block.
6. It finishes, then it copies C and D to the archive too.
7. Then we call pg_stop_backup() and also archive collected WAL (which contains the new data of B block as we saw above).

The question is - where is the OLD data of B block in this scheme? Seems it is NOT in the backup! So it cannot be restored. (And, in case when we never overwrite blocks between pg_start_backup...pg_stop_backup, but always append the new data, it is not a problem.) Seems to me this is not documented at all! That is what my initial e-mail about.

(I have one hypothesis on that, but I am not sure. Here is it: does vacuum saves ALL deleted data of B block to WAL on step 3 prior deletion? If yes, it is, of course, a part of the backup. But it wastes space a lot...)




On Tue, May 14, 2013 at 6:05 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, May 13, 2013 at 4:31 PM, Dmitry Koterov <dmitry@koterov.ru> wrote:
Could you please provide a bit more detailed explanation on how it works? 

And how could postgres write at the middle of archiving files during an active pg_start_backup? if it could, here might be a case when a part of archived data file contains an overridden information "from the future",

The data files cannot contain information from the future.  If the backup is restored, it must be restored to the time of pg_stop_backup (at least), which means the data would at that point be from the past/present, not the future.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: WAL segments (names) not in a sequence
Next
From: Greg Smith
Date:
Subject: Re: Cost limited statements RFC