Thread: pg_start_backup - backups
Attempting to run a backup with the database online. I first issue 'PG_START_BACKUP', then copy the contents of the postgresql data folder, then issue 'PG_STOP_BACKUP'. The database service remains running during the copy. If queries continue to hit the database after the PG_START_BACKUP command is issued changes to the data folder are possible, which can cause copy errors (for instance, a file not found exception). The file copy method first creates a string array of the data folder contents (all folders and files) and then processes the array, copying one file at a time. What am I doing wrong? -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-start-backup-backups-tp4289135p4289135.html Sent from the PostgreSQL - admin mailing list archive at Nabble.com.
> Attempting to run a backup with the database online. I first issue > 'PG_START_BACKUP', then copy the contents of the postgresql data folder, > then issue 'PG_STOP_BACKUP'. The database service remains running during the > copy. If queries continue to hit the database after the PG_START_BACKUP > command is issued changes to the data folder are possible, which can cause > copy errors (for instance, a file not found exception). The file copy method > first creates a string array of the data folder contents (all folders and > files) and then processes the array, copying one file at a time. What am I > doing wrong? You need to use a program to do the copying that can accept changes to files as it runs. rsync is a common choice and has the massive advantage that it only takes changes and is therefore, after the first run, quick. See http://www.postgresql.org/docs/9.0/static/continuous-archiving.html#BACKUP-BASE-BACKUP or use your favourite search engine to find more info. -- Ian.
> First of all, thanks so much for the reply. To be clear, as stated in my > original post, if a user is interacting with the PostgreSQL database, > changes to the data folder continue even after the PG_START_BACKUP > command has been issued. This implies to me that the contents of any > copy of the data folder may be unreliable. i.e. the copy may not reflect > the state of the data folder either before the copy started or after the > copy has finished. It may reflect the state of the data folder in some > transient form. Assuming this is true, is the copy still usable for > restoration? If so, how does PostgreSQL get the data folder to a stable > state? Is it by the use of the WAL files that may be created during the > backup process and the restore.config file? The copy is not useful by itself, but is when used in conjunction with the WAL archives. It is all explained, much better than I could do it, in http://www.postgresql.org/docs/9.0/interactive/continuous-archiving.html There are other backup strategies too. http://www.postgresql.org/docs/9.0/interactive/backup.html -- Ian. > -----Original Message----- > From: Ian Lea [mailto:ian.lea@gmail.com] > Sent: Friday, April 08, 2011 3:55 AM > To: David Roland > Cc: pgsql-admin@postgresql.org > Subject: Re: [ADMIN] pg_start_backup - backups > >> Attempting to run a backup with the database online. I first issue >> 'PG_START_BACKUP', then copy the contents of the postgresql data >> folder, then issue 'PG_STOP_BACKUP'. The database service remains >> running during the copy. If queries continue to hit the database after > >> the PG_START_BACKUP command is issued changes to the data folder are >> possible, which can cause copy errors (for instance, a file not found >> exception). The file copy method first creates a string array of the >> data folder contents (all folders and >> files) and then processes the array, copying one file at a time. What >> am I doing wrong? > > You need to use a program to do the copying that can accept changes to > files as it runs. rsync is a common choice and has the massive > advantage that it only takes changes and is therefore, after the first > run, quick. > > See > http://www.postgresql.org/docs/9.0/static/continuous-archiving.html#BACK > UP-BASE-BACKUP > or use your favourite search engine to find more info. > > > -- > Ian. >
First of all, thanks so much for the reply. To be clear, as stated in my original post, if a user is interacting with the PostgreSQL database, changes to the data folder continue even after the PG_START_BACKUP command has been issued. This implies to me that the contents of any copy of the data folder may be unreliable. i.e. the copy may not reflect the state of the data folder either before the copy started or after the copy has finished. It may reflect the state of the data folder in some transient form. Assuming this is true, is the copy still usable for restoration? If so, how does PostgreSQL get the data folder to a stable state? Is it by the use of the WAL files that may be created during the backup process and the restore.config file? -----Original Message----- From: Ian Lea [mailto:ian.lea@gmail.com] Sent: Friday, April 08, 2011 3:55 AM To: David Roland Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] pg_start_backup - backups > Attempting to run a backup with the database online. I first issue > 'PG_START_BACKUP', then copy the contents of the postgresql data > folder, then issue 'PG_STOP_BACKUP'. The database service remains > running during the copy. If queries continue to hit the database after > the PG_START_BACKUP command is issued changes to the data folder are > possible, which can cause copy errors (for instance, a file not found > exception). The file copy method first creates a string array of the > data folder contents (all folders and > files) and then processes the array, copying one file at a time. What > am I doing wrong? You need to use a program to do the copying that can accept changes to files as it runs. rsync is a common choice and has the massive advantage that it only takes changes and is therefore, after the first run, quick. See http://www.postgresql.org/docs/9.0/static/continuous-archiving.html#BACK UP-BASE-BACKUP or use your favourite search engine to find more info. -- Ian.
"David Roland" <david.roland@soapware.com> wrote: > changes to the data folder continue even after the PG_START_BACKUP > command has been issued. This implies to me that the contents of > any copy of the data folder may be unreliable. i.e. the copy may > not reflect the state of the data folder either before the copy > started or after the copy has finished. It may reflect the state > of the data folder in some transient form. Right. > Assuming this is true, is the copy still usable for restoration? Yes. > If so, how does PostgreSQL get the data folder to a stable > state? Is it by the use of the WAL files that may be created > during the backup process and the restore.config file? Exactly. Simplifying somewhat: - The pg_start_backup causes the WAL position to be remembered. You will need to start WAL replay at this point. - Every significant change to a page is WAL-logged. - Changes after you record the restart point may or may not be in the base backup. - After the base backup copying is complete, pg_stop_backup records the WAL position. You will need to replay WAL *at least* to this point to have a consistent database. - WAL replay will bring every page modified after the "start" to a valid state, whether or not any or all direct modifications to that page made it into the base backup. - This actually goes beyond page fix-ups to file creation and deletion, etc. -Kevin
Thanks Kevin and Ian for your replies. Your information confirms activity I was noticing, but without indepth knowledge of what goes on "behind the current" I was left to speculation - a situation I am extremely uncomfortable with. I have read and re-read the existing documentation on backup and restore for postgresql and I believe we have all of the pieces in place. You both have been a big help. Thanks, David -----Original Message----- From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov] Sent: Monday, April 11, 2011 10:47 AM To: Ian Lea; David Roland Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] pg_start_backup - backups "David Roland" <david.roland@soapware.com> wrote: > changes to the data folder continue even after the PG_START_BACKUP > command has been issued. This implies to me that the contents of any > copy of the data folder may be unreliable. i.e. the copy may not > reflect the state of the data folder either before the copy started or > after the copy has finished. It may reflect the state of the data > folder in some transient form. Right. > Assuming this is true, is the copy still usable for restoration? Yes. > If so, how does PostgreSQL get the data folder to a stable state? Is > it by the use of the WAL files that may be created during the backup > process and the restore.config file? Exactly. Simplifying somewhat: - The pg_start_backup causes the WAL position to be remembered. You will need to start WAL replay at this point. - Every significant change to a page is WAL-logged. - Changes after you record the restart point may or may not be in the base backup. - After the base backup copying is complete, pg_stop_backup records the WAL position. You will need to replay WAL *at least* to this point to have a consistent database. - WAL replay will bring every page modified after the "start" to a valid state, whether or not any or all direct modifications to that page made it into the base backup. - This actually goes beyond page fix-ups to file creation and deletion, etc. -Kevin