Thread: PostgreSQL File System Based Backup Restartability
girish R G peetle wrote: > Is PostgreSQL file system based backup restart-ability possible ? > > Steps > - Database size is 1 TB > - I'll execute pg_start_backup('label'); > - I will traverse through the DATA directory to get the list of directories and files. > - I will start moving the files to backup media. ( over network ) > - When I'm half way (500GB done), network error happens. So movement of files is halted. > - Network issue is resolve after 30 minutes (In this time files might have got deleted or added with > creation/deletion of database/tables ) > - I'll resume moving the files to backup media from where it had halted. > - When movement of files is done, I'll execute pg_stop_backup() > - I'll move all the transaction logs that got archived to backup media. > > Is PostgreSQL server recovery to a consistent state possible with backup content obtained from above > steps ? That should work, but I guess it may depend on what the copy program does if it encounters the network error. Were there any messages? Yours, Laurenz Albe
After executing pg_start_backup, we will generate a file 'Backup file' which will contain list of all the folders n files under DATA directory.
Then we read entry from 'Backup file' sequentially and use copy command/script to move it to backup media.
When network encounters copy script will fail stating couldn't backup the file.
Say copy command throws error at 20th file entry. When backup is resumed copy command resumes from 20th file entry.
Thanks
Girish
girish R G peetle wrote:
> Is PostgreSQL file system based backup restart-ability possible ?
>
> Steps
> - Database size is 1 TB
> - I'll execute pg_start_backup('label');
> - I will traverse through the DATA directory to get the list of directories and files.
> - I will start moving the files to backup media. ( over network )
> - When I'm half way (500GB done), network error happens. So movement of files is halted.
> - Network issue is resolve after 30 minutes (In this time files might have got deleted or added with
> creation/deletion of database/tables )
> - I'll resume moving the files to backup media from where it had halted.
> - When movement of files is done, I'll execute pg_stop_backup()
> - I'll move all the transaction logs that got archived to backup media.
>
> Is PostgreSQL server recovery to a consistent state possible with backup content obtained from above
> steps ?
That should work, but I guess it may depend on what the copy program does if it encounters the
network error. Were there any messages?
Yours,
Laurenz Albe
girish R G peetle <giri.anamika0@gmail.com> wrote: > Steps > - Database size is 1 TB > - I'll execute pg_start_backup('label'); > - I will traverse through the DATA directory to get the list of > directories and files. > - I will start moving the files to backup media. ( over network ) > - When I'm half way (500GB done), network error happens. So > movement of files is halted. > - Network issue is resolve after 30 minutes (In this time files > might have got deleted or added with creation/deletion of > database/tables ) > - I'll resume moving the files to backup media from where it had > halted. > - When movement of files is done, I'll execute pg_stop_backup() > - I'll move all the transaction logs that got archived to backup > media. > > Is PostgreSQL server recovery to a consistent state possible with > backup content obtained from above steps ? As long as you follow all the rules for what files to copy (and not copy), you should have a good backup. Some things to be careful about that I've seen people get wrong with some frequency: - Be sure to exclude all files in and under the pg_xlog directory. The backup should contain the directory itself, and it is best to include its subdirectory, but copying the files can cause corruption. - Exclude the postmaster.pid file. - Be sure *not* to exclude the backup_label file. Without this file, restoring from the backup may or may not have corruption, which may or may not be initially apparent. Be careful that when you resume after such an interruption you do not skip any files and that you complete or re-copy any files that were partially copied before the problems. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
"Be careful that when you resume after such an interruption you do
not skip any files and that you complete or re-copy any files that
were partially copied before the problems."
I will have to backup entire content under DATA directory again ?
Girish
As long as you follow all the rules for what files to copy (and notgirish R G peetle <giri.anamika0@gmail.com> wrote:
> Steps
> - Database size is 1 TB
> - I'll execute pg_start_backup('label');
> - I will traverse through the DATA directory to get the list of
> directories and files.
> - I will start moving the files to backup media. ( over network )
> - When I'm half way (500GB done), network error happens. So
> movement of files is halted.
> - Network issue is resolve after 30 minutes (In this time files
> might have got deleted or added with creation/deletion of
> database/tables )
> - I'll resume moving the files to backup media from where it had
> halted.
> - When movement of files is done, I'll execute pg_stop_backup()
> - I'll move all the transaction logs that got archived to backup
> media.
>
> Is PostgreSQL server recovery to a consistent state possible with
> backup content obtained from above steps ?
copy), you should have a good backup. Some things to be careful
about that I've seen people get wrong with some frequency:
- Be sure to exclude all files in and under the pg_xlog directory.
The backup should contain the directory itself, and it is best to
include its subdirectory, but copying the files can cause
corruption.
- Exclude the postmaster.pid file.
- Be sure *not* to exclude the backup_label file. Without this
file, restoring from the backup may or may not have corruption,
which may or may not be initially apparent.
Be careful that when you resume after such an interruption you do
not skip any files and that you complete or re-copy any files that
were partially copied before the problems.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
girish R G peetle <giri.anamika0@gmail.com> writes: > Thanks Kevin. > > "Be careful that when you resume after such an interruption you do > not skip any files and that you complete or re-copy any files that > were partially copied before the problems." > > Here you mean, we should not skip any files that was already backed up before interruption ? >  I will have to backup entire content under DATA directory again ? False. Any file already copied and known copied thoroughly should not require copying again. The file under copy when the network drop occurred needs copying again and/or syncing if your copy program knows how to do that without moving all bits... and of course all missing files need copying also. HTH > Thanks > Girish > > On Tue, Feb 17, 2015 at 10:04 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > > girish R G peetle <giri.anamika0@gmail.com> wrote: > > > Steps > > - Database size is 1 TB > > - I'll execute pg_start_backup('label'); > > - I will traverse through the DATA directory to get the list of > >  directories and files. > > - I will start moving the files to backup media. ( over network ) > > - When I'm half way (500GB done), network error happens. So > >  movement of files is halted. > > - Network issue is resolve after 30 minutes (In this time files > >  might have got deleted or added with creation/deletion of > >  database/tables ) > > - I'll resume moving the files to backup media from where it had > >  halted. > > - When movement of files is done, I'll execute pg_stop_backup() > > - I'll move all the transaction logs that got archived to backup > >  media. > > > > Is PostgreSQL server recovery to a consistent state possible with > > backup content obtained from above steps ? > > As long as you follow all the rules for what files to copy (and not > copy), you should have a good backup. Some things to be careful > about that I've seen people get wrong with some frequency: > > - Be sure to exclude all files in and under the pg_xlog directory. >  The backup should contain the directory itself, and it is best to >  include its subdirectory, but copying the files can cause >  corruption. > - Exclude the postmaster.pid file. > - Be sure *not* to exclude the backup_label file. Without this >  file, restoring from the backup may or may not have corruption, >  which may or may not be initially apparent. > > Be careful that when you resume after such an interruption you do > not skip any files and that you complete or re-copy any files that > were partially copied before the problems. > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net p: 312.241.7800
girish R G peetle wrote: > After executing pg_start_backup, we will generate a file 'Backup file' which will contain list of all > the folders n files under DATA directory. > > Then we read entry from 'Backup file' sequentially and use copy command/script to move it to backup > media. > > When network encounters copy script will fail stating couldn't backup the file. > > Say copy command throws error at 20th file entry. When backup is resumed copy command resumes from > 20th file entry. That should work just fine. I'd still test the result by restoring such a backup and running "pg_dump -f /dev/null" on it, just to be sure and because you cannot test your backups often enough. Yours, Laurenz Albe
girish R G peetle <giri.anamika0@gmail.com> wrote: > "Be careful that when you resume after such an interruption you > do not skip any files and that you complete or re-copy any files > that were partially copied before the problems." > > Here you mean, we should not skip any files that was already > backed up before interruption ? > I will have to backup entire content under DATA directory again ? No. Think of it this way: for every file that existed both when pg_start_backup() and pg_stop_backup() were run, every OS-level page must represent the state of that page at some point between when those functions were run. *Which* point in time each page represents is not important, and it is not expected that all files (or all pages within a file) represent the same point in time. WAL replay is guaranteed to fix up all pages modified between those function executions. The backup_label file specifies which WAL records are needed to do that. There were two concerns I had with what you described. (1) That when you resume with the 20th file, that is not an ordinal position in a new list which might have fewer files ahead of the 20th position, resulting in skipping some files. If you're continuing to use the original list, using the position in that list is fine. (2) That if the error occurred part-way through reading a file, leaving a portion uncopied, that the missing portion be copied. (Of course re-copying the whole file works, too; but you could safely resume just past the last page successfully copied before the network problems.) -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Thanks a lot Kevin for the detailed explanation.
In response to your concerns
1) I am using the original list, won't create a new list. So we are fine as per your explanation.
2) I have a question about partially copied file. What if a file that got partially copied disappears (say it belonged to a table n table was dropped ) when I resume backup. Should I mark this partially copied file invalid in backup media ? And continue with the next entry in the list ?
Thanks
Girish
girish R G peetle <giri.anamika0@gmail.com> wrote:
> "Be careful that when you resume after such an interruption you
> do not skip any files and that you complete or re-copy any files
> that were partially copied before the problems."
>
> Here you mean, we should not skip any files that was already
> backed up before interruption ?
> I will have to backup entire content under DATA directory again ?
No.
Think of it this way: for every file that existed both when
pg_start_backup() and pg_stop_backup() were run, every OS-level
page must represent the state of that page at some point between
when those functions were run. *Which* point in time each page
represents is not important, and it is not expected that all files
(or all pages within a file) represent the same point in time. WAL
replay is guaranteed to fix up all pages modified between those
function executions. The backup_label file specifies which WAL
records are needed to do that.
There were two concerns I had with what you described.
(1) That when you resume with the 20th file, that is not an
ordinal position in a new list which might have fewer files ahead
of the 20th position, resulting in skipping some files. If you're
continuing to use the original list, using the position in that
list is fine.
(2) That if the error occurred part-way through reading a file,
leaving a portion uncopied, that the missing portion be copied.
(Of course re-copying the whole file works, too; but you could
safely resume just past the last page successfully copied before
the network problems.)
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
girish R G peetle <giri.anamika0@gmail.com> writes: > Thanks a lot Kevin for the detailed explanation. > In response to your concerns > > 1) I am using the original list, won't create a new list. So we are fine as per your explanation. > > 2) I have a question about partially copied file. What if a file that got partially copied disappears (say it belongedto a table n table was dropped ) when I resume > backup. Should I mark this partially copied file invalid in backup media ? And continue with the next entry in the list? Stop complicating all of this. Don't rescan the origin server. Just restart copying your files with the one that was incomplete when the failure occurred and then proceed through the list till you reach the end. WAL replay will account for whatever changes took place on the origin since pg_start_backup() was run. This assumes that any files completely copied are not themselves corrupt for some reason. If your copy process is untrustworth in that regard then all of this is pointless anyway. I'd be a lot more comfortable using rsync rather than a home-grown solution or other archiver not commonly used for this porpose. HTH. > Girish > > On Feb 18, 2015 9:00 PM, "Kevin Grittner" <kgrittn@ymail.com> wrote: > > girish R G peetle <giri.anamika0@gmail.com> wrote: > > > "Be careful that when you resume after such an interruption you > > do not skip any files and that you complete or re-copy any files > > that were partially copied before the problems." > > > > Here you mean, we should not skip any files that was already > > backed up before interruption ? > > I will have to backup entire content under DATA directory again ? > > No. > > Think of it this way: for every file that existed both when > pg_start_backup() and pg_stop_backup() were run, every OS-level > page must represent the state of that page at some point between > when those functions were run. *Which* point in time each page > represents is not important, and it is not expected that all files > (or all pages within a file) represent the same point in time. WAL > replay is guaranteed to fix up all pages modified between those > function executions. The backup_label file specifies which WAL > records are needed to do that. > > There were two concerns I had with what you described. > > (1) That when you resume with the 20th file, that is not an > ordinal position in a new list which might have fewer files ahead > of the 20th position, resulting in skipping some files. If you're > continuing to use the original list, using the position in that > list is fine. > > (2) That if the error occurred part-way through reading a file, > leaving a portion uncopied, that the missing portion be copied. > (Of course re-copying the whole file works, too; but you could > safely resume just past the last page successfully copied before > the network problems.) > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net p: 312.241.7800
On Feb 18, 2015, at 12:57 PM, Jerry Sievers <gsievers19@comcast.net> wrote: > > I'd be a lot more comfortable using rsync rather than a home-grown > solution or other archiver not commonly used for this porpose. And if your network connection is prone to dropping, the --partial option with rsync would be handy. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
girish R G peetle <giri.anamika0@gmail.com> wrote: > What if a file that got partially copied disappears (say it > belonged to a table n table was dropped ) when I resume backup. > Should I mark this partially copied file invalid in backup media ? It doesn't much matter. WAL replay will delete the file if found, and must tolerate a missing file. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company