Thread: [HACKERS] question: data file update when pg_basebackup in progress

[HACKERS] question: data file update when pg_basebackup in progress

From
Rui Hai Jiang
Date:

Hello,
I'm checking how the pg_basebackup works and I got a question(maybe there are no such issues):

When pg_basebackup is launched, a checkpoint is created first, then all files are transferred to the  pg_basebackup client.  Is it possible that a data page(say page-N) in a data file is changed after the checkpoint and before the pg_basebackup is finished?

If this happens,  is it possible that only part of the changed page be transferred to the pg_basebackup client?  i.e.  the pg_basebackup client gets page-N with part of the old content and part of the new content. How does postgreSQL handle this kind of data page?

Thanks,
Rui Hai

Re: [HACKERS] question: data file update when pg_basebackup in progress

From
"David G. Johnston"
Date:
On Tue, Apr 25, 2017 at 9:08 AM, Rui Hai Jiang <ruihaijiang@msn.com> wrote:
When pg_basebackup is launched, a checkpoint is created first, then all files are transferred to the  pg_basebackup client.  Is it possible that a data page(say page-N) in a data file is changed after the checkpoint and before the pg_basebackup is finished?

​I believe so.
If this happens,  is it possible that only part of the changed page be transferred to the pg_basebackup client?  i.e.  the pg_basebackup client gets page-N with part of the old content and part of the new content. How does postgreSQL handle this kind of data page?

​The first write to a page after a checkpoint is always recorded in the WAL as a full page write.  Every ​WAL file since the checkpoint must also be copied to the backed up system.  The replay of those WAL files is what brings the remote and local system into sync with respect to all changes since the backup checkpoint.

David J.

Re: [HACKERS] question: data file update when pg_basebackup in progress

From
Michael Paquier
Date:
On Wed, Apr 26, 2017 at 1:45 AM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> The first write to a page after a checkpoint is always recorded in the WAL
> as a full page write.  Every WAL file since the checkpoint must also be
> copied to the backed up system.  The replay of those WAL files is what
> brings the remote and local system into sync with respect to all changes
> since the backup checkpoint.

Bringing to the point that the presence of backup_label in a backup is
critical, as this tells Postgres from which position in WAL it should
begin recovery to bring the system up to a consistent state.
pg_basebackup also makes sure that the last WAL segment needed is
archived before the backup completes so as recovery can completely be
done.
-- 
Michael