Thread: pg_start_backup - backups

pg_start_backup - backups

From
droland
Date:
Attempting to run a backup with the database online. I first issue
'PG_START_BACKUP', then copy the contents of the postgresql data folder,
then issue 'PG_STOP_BACKUP'. The database service remains running during the
copy. If queries continue to hit the database after the PG_START_BACKUP
command is issued changes to the data folder are possible, which can cause
copy errors (for instance, a file not found exception). The file copy method
first creates a string array of the data folder contents (all folders and
files) and then processes the array, copying one file at a time. What am I
doing wrong?

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-start-backup-backups-tp4289135p4289135.html
Sent from the PostgreSQL - admin mailing list archive at Nabble.com.

Re: pg_start_backup - backups

From
Ian Lea
Date:
> Attempting to run a backup with the database online. I first issue
> 'PG_START_BACKUP', then copy the contents of the postgresql data folder,
> then issue 'PG_STOP_BACKUP'. The database service remains running during the
> copy. If queries continue to hit the database after the PG_START_BACKUP
> command is issued changes to the data folder are possible, which can cause
> copy errors (for instance, a file not found exception). The file copy method
> first creates a string array of the data folder contents (all folders and
> files) and then processes the array, copying one file at a time. What am I
> doing wrong?

You need to use a program to do the copying that can accept changes to
files as it runs.  rsync is a common choice and has the massive
advantage that it only takes changes and is therefore, after the first
run, quick.

See http://www.postgresql.org/docs/9.0/static/continuous-archiving.html#BACKUP-BASE-BACKUP
or use your favourite search engine to find more info.


--
Ian.

Re: pg_start_backup - backups

From
Ian Lea
Date:
> First of all, thanks so much for the reply. To be clear, as stated in my
> original post, if a user is interacting with the PostgreSQL database,
> changes to the data folder continue even after the PG_START_BACKUP
> command has been issued. This implies to me that the contents of any
> copy of the data folder may be unreliable. i.e. the copy may not reflect
> the state of the data folder either before the copy started or after the
> copy has finished. It may reflect the state of the data folder in some
> transient form. Assuming this is true, is the copy still usable for
> restoration? If so, how does PostgreSQL get the data folder to a stable
> state? Is it by the use of the WAL files that may be created during the
> backup process and the restore.config file?

The copy is not useful by itself, but is when used in conjunction with
the WAL archives.

It is all explained, much better than I could do it, in
http://www.postgresql.org/docs/9.0/interactive/continuous-archiving.html

There are other backup strategies too.
http://www.postgresql.org/docs/9.0/interactive/backup.html


--
Ian.

> -----Original Message-----
> From: Ian Lea [mailto:ian.lea@gmail.com]
> Sent: Friday, April 08, 2011 3:55 AM
> To: David Roland
> Cc: pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] pg_start_backup - backups
>
>> Attempting to run a backup with the database online. I first issue
>> 'PG_START_BACKUP', then copy the contents of the postgresql data
>> folder, then issue 'PG_STOP_BACKUP'. The database service remains
>> running during the copy. If queries continue to hit the database after
>
>> the PG_START_BACKUP command is issued changes to the data folder are
>> possible, which can cause copy errors (for instance, a file not found
>> exception). The file copy method first creates a string array of the
>> data folder contents (all folders and
>> files) and then processes the array, copying one file at a time. What
>> am I doing wrong?
>
> You need to use a program to do the copying that can accept changes to
> files as it runs.  rsync is a common choice and has the massive
> advantage that it only takes changes and is therefore, after the first
> run, quick.
>
> See
> http://www.postgresql.org/docs/9.0/static/continuous-archiving.html#BACK
> UP-BASE-BACKUP
> or use your favourite search engine to find more info.
>
>
> --
> Ian.
>

Re: pg_start_backup - backups

From
"David Roland"
Date:
First of all, thanks so much for the reply. To be clear, as stated in my
original post, if a user is interacting with the PostgreSQL database,
changes to the data folder continue even after the PG_START_BACKUP
command has been issued. This implies to me that the contents of any
copy of the data folder may be unreliable. i.e. the copy may not reflect
the state of the data folder either before the copy started or after the
copy has finished. It may reflect the state of the data folder in some
transient form. Assuming this is true, is the copy still usable for
restoration? If so, how does PostgreSQL get the data folder to a stable
state? Is it by the use of the WAL files that may be created during the
backup process and the restore.config file?

-----Original Message-----
From: Ian Lea [mailto:ian.lea@gmail.com]
Sent: Friday, April 08, 2011 3:55 AM
To: David Roland
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] pg_start_backup - backups

> Attempting to run a backup with the database online. I first issue
> 'PG_START_BACKUP', then copy the contents of the postgresql data
> folder, then issue 'PG_STOP_BACKUP'. The database service remains
> running during the copy. If queries continue to hit the database after

> the PG_START_BACKUP command is issued changes to the data folder are
> possible, which can cause copy errors (for instance, a file not found
> exception). The file copy method first creates a string array of the
> data folder contents (all folders and
> files) and then processes the array, copying one file at a time. What
> am I doing wrong?

You need to use a program to do the copying that can accept changes to
files as it runs.  rsync is a common choice and has the massive
advantage that it only takes changes and is therefore, after the first
run, quick.

See
http://www.postgresql.org/docs/9.0/static/continuous-archiving.html#BACK
UP-BASE-BACKUP
or use your favourite search engine to find more info.


--
Ian.

Re: pg_start_backup - backups

From
"Kevin Grittner"
Date:
"David Roland" <david.roland@soapware.com> wrote:

> changes to the data folder continue even after the PG_START_BACKUP
> command has been issued. This implies to me that the contents of
> any copy of the data folder may be unreliable. i.e. the copy may
> not reflect the state of the data folder either before the copy
> started or after the copy has finished. It may reflect the state
> of the data folder in some transient form.

Right.

> Assuming this is true, is the copy still usable for restoration?

Yes.

> If so, how does PostgreSQL get the data folder to a stable
> state? Is it by the use of the WAL files that may be created
> during the backup process and the restore.config file?

Exactly.  Simplifying somewhat:

- The pg_start_backup causes the WAL position to be remembered.  You
  will need to start WAL replay at this point.

- Every significant change to a page is WAL-logged.

- Changes after you record the restart point may or may not be in
  the base backup.

- After the base backup copying is complete, pg_stop_backup records
  the WAL position.  You will need to replay WAL *at least* to this
  point to have a consistent database.

- WAL replay will bring every page modified after the "start" to a
  valid state, whether or not any or all direct modifications to
  that page made it into the base backup.

- This actually goes beyond page fix-ups to file creation and
  deletion, etc.

-Kevin

Re: pg_start_backup - backups

From
"David Roland"
Date:
Thanks Kevin and Ian for your replies. Your information confirms
activity I was noticing, but without indepth knowledge of what goes on
"behind the current" I was left to speculation - a situation I am
extremely uncomfortable with. I have read and re-read the existing
documentation on backup and restore for postgresql and I believe we have
all of the pieces in place. You both have been a big help.

Thanks,
David

-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: Monday, April 11, 2011 10:47 AM
To: Ian Lea; David Roland
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] pg_start_backup - backups

"David Roland" <david.roland@soapware.com> wrote:

> changes to the data folder continue even after the PG_START_BACKUP
> command has been issued. This implies to me that the contents of any
> copy of the data folder may be unreliable. i.e. the copy may not
> reflect the state of the data folder either before the copy started or

> after the copy has finished. It may reflect the state of the data
> folder in some transient form.

Right.

> Assuming this is true, is the copy still usable for restoration?

Yes.

> If so, how does PostgreSQL get the data folder to a stable state? Is
> it by the use of the WAL files that may be created during the backup
> process and the restore.config file?

Exactly.  Simplifying somewhat:

- The pg_start_backup causes the WAL position to be remembered.  You
  will need to start WAL replay at this point.

- Every significant change to a page is WAL-logged.

- Changes after you record the restart point may or may not be in
  the base backup.

- After the base backup copying is complete, pg_stop_backup records
  the WAL position.  You will need to replay WAL *at least* to this
  point to have a consistent database.

- WAL replay will bring every page modified after the "start" to a
  valid state, whether or not any or all direct modifications to
  that page made it into the base backup.

- This actually goes beyond page fix-ups to file creation and
  deletion, etc.

-Kevin