Thread: PostgreSQL File System Based Backup Restartability

PostgreSQL File System Based Backup Restartability

From
girish R G peetle
Date:
Hi 
Is PostgreSQL file system based backup restart-ability possible ?

Steps
- Database size is 1 TB
- I'll execute pg_start_backup('label');
- I will traverse through the DATA directory to get the list of directories and files. 
- I will start moving the files to backup media. ( over network )
- When I'm half way (500GB done), network error happens. So movement of files is halted.
- Network issue is resolve after 30 minutes (In this time files might have got deleted or added with creation/deletion of database/tables )
- I'll resume moving the files to backup media from where it had halted.
- When movement of files is done, I'll execute pg_stop_backup()
- I'll move all the transaction logs that got archived to backup media.

Is PostgreSQL server recovery to a consistent state possible with backup content obtained from above steps ?


Thanks
Girish

Re: PostgreSQL File System Based Backup Restartability

From
Albe Laurenz
Date:
girish R G peetle wrote:
> Is PostgreSQL file system based backup restart-ability possible ?
> 
> Steps
> - Database size is 1 TB
> - I'll execute pg_start_backup('label');
> - I will traverse through the DATA directory to get the list of directories and files.
> - I will start moving the files to backup media. ( over network )
> - When I'm half way (500GB done), network error happens. So movement of files is halted.
> - Network issue is resolve after 30 minutes (In this time files might have got deleted or added with
> creation/deletion of database/tables )
> - I'll resume moving the files to backup media from where it had halted.
> - When movement of files is done, I'll execute pg_stop_backup()
> - I'll move all the transaction logs that got archived to backup media.
> 
> Is PostgreSQL server recovery to a consistent state possible with backup content obtained from above
> steps ?

That should work, but I guess it may depend on what the copy program does if it encounters the
network error.  Were there any messages?

Yours,
Laurenz Albe

Re: PostgreSQL File System Based Backup Restartability

From
girish R G peetle
Date:

After executing pg_start_backup, we will generate a file 'Backup file' which will contain list  of all the folders n files under DATA directory.

Then we read entry from 'Backup file' sequentially and use copy command/script to move it to backup media.

When network encounters copy script will fail stating couldn't backup the file.

Say copy command throws error at 20th file entry. When backup is resumed copy command resumes  from 20th file entry.

Thanks
Girish

On Feb 17, 2015 4:26 PM, "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:
girish R G peetle wrote:
> Is PostgreSQL file system based backup restart-ability possible ?
>
> Steps
> - Database size is 1 TB
> - I'll execute pg_start_backup('label');
> - I will traverse through the DATA directory to get the list of directories and files.
> - I will start moving the files to backup media. ( over network )
> - When I'm half way (500GB done), network error happens. So movement of files is halted.
> - Network issue is resolve after 30 minutes (In this time files might have got deleted or added with
> creation/deletion of database/tables )
> - I'll resume moving the files to backup media from where it had halted.
> - When movement of files is done, I'll execute pg_stop_backup()
> - I'll move all the transaction logs that got archived to backup media.
>
> Is PostgreSQL server recovery to a consistent state possible with backup content obtained from above
> steps ?

That should work, but I guess it may depend on what the copy program does if it encounters the
network error.  Were there any messages?

Yours,
Laurenz Albe

Re: PostgreSQL File System Based Backup Restartability

From
Kevin Grittner
Date:
girish R G peetle <giri.anamika0@gmail.com> wrote:

> Steps
> - Database size is 1 TB
> - I'll execute pg_start_backup('label');
> - I will traverse through the DATA directory to get the list of
>   directories and files.
> - I will start moving the files to backup media. ( over network )
> - When I'm half way (500GB done), network error happens. So
>   movement of files is halted.
> - Network issue is resolve after 30 minutes (In this time files
>   might have got deleted or added with creation/deletion of
>   database/tables )
> - I'll resume moving the files to backup media from where it had
>   halted.
> - When movement of files is done, I'll execute pg_stop_backup()
> - I'll move all the transaction logs that got archived to backup
>   media.
>
> Is PostgreSQL server recovery to a consistent state possible with
> backup content obtained from above steps ?

As long as you follow all the rules for what files to copy (and not
copy), you should have a good backup.  Some things to be careful
about that I've seen people get wrong with some frequency:

- Be sure to exclude all files in and under the pg_xlog directory.
  The backup should contain the directory itself, and it is best to
  include its subdirectory, but copying the files can cause
  corruption.
- Exclude the postmaster.pid file.
- Be sure *not* to exclude the backup_label file.  Without this
  file, restoring from the backup may or may not have corruption,
  which may or may not be initially apparent.

Be careful that when you resume after such an interruption you do
not skip any files and that you complete or re-copy any files that
were partially copied before the problems.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PostgreSQL File System Based Backup Restartability

From
girish R G peetle
Date:
Thanks Kevin.

"Be careful that when you resume after such an interruption you do
not skip any files and that you complete or re-copy any files that
were partially copied before the problems."

Here you mean, we should not skip any files that was already backed up before interruption ?
 I will have to backup entire content under DATA directory again ?

Thanks
Girish


On Tue, Feb 17, 2015 at 10:04 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
girish R G peetle <giri.anamika0@gmail.com> wrote:

> Steps
> - Database size is 1 TB
> - I'll execute pg_start_backup('label');
> - I will traverse through the DATA directory to get the list of
>   directories and files.
> - I will start moving the files to backup media. ( over network )
> - When I'm half way (500GB done), network error happens. So
>   movement of files is halted.
> - Network issue is resolve after 30 minutes (In this time files
>   might have got deleted or added with creation/deletion of
>   database/tables )
> - I'll resume moving the files to backup media from where it had
>   halted.
> - When movement of files is done, I'll execute pg_stop_backup()
> - I'll move all the transaction logs that got archived to backup
>   media.
>
> Is PostgreSQL server recovery to a consistent state possible with
> backup content obtained from above steps ?

As long as you follow all the rules for what files to copy (and not
copy), you should have a good backup.  Some things to be careful
about that I've seen people get wrong with some frequency:

- Be sure to exclude all files in and under the pg_xlog directory.
  The backup should contain the directory itself, and it is best to
  include its subdirectory, but copying the files can cause
  corruption.
- Exclude the postmaster.pid file.
- Be sure *not* to exclude the backup_label file.  Without this
  file, restoring from the backup may or may not have corruption,
  which may or may not be initially apparent.

Be careful that when you resume after such an interruption you do
not skip any files and that you complete or re-copy any files that
were partially copied before the problems.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PostgreSQL File System Based Backup Restartability

From
Jerry Sievers
Date:
girish R G peetle <giri.anamika0@gmail.com> writes:

> Thanks Kevin.
>
> "Be careful that when you resume after such an interruption you do
> not skip any files and that you complete or re-copy any files that
> were partially copied before the problems."
>
> Here you mean, we should not skip any files that was already backed up before interruption ?
>  I will have to backup entire content under DATA directory again ?

False.

Any file already copied and known copied thoroughly should not require
copying again.

The file under copy when the network drop occurred needs copying again
and/or  syncing if your copy program knows how to do that without moving
all bits... and of course all missing files  need copying also.

HTH

> Thanks
> Girish
>
> On Tue, Feb 17, 2015 at 10:04 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>
>     girish R G peetle <giri.anamika0@gmail.com> wrote:
>
>     > Steps
>     > - Database size is 1 TB
>     > - I'll execute pg_start_backup('label');
>     > - I will traverse through the DATA directory to get the list of
>     >   directories and files.
>     > - I will start moving the files to backup media. ( over network )
>     > - When I'm half way (500GB done), network error happens. So
>     >   movement of files is halted.
>     > - Network issue is resolve after 30 minutes (In this time files
>     >   might have got deleted or added with creation/deletion of
>     >   database/tables )
>     > - I'll resume moving the files to backup media from where it had
>     >   halted.
>     > - When movement of files is done, I'll execute pg_stop_backup()
>     > - I'll move all the transaction logs that got archived to backup
>     >   media.
>     >
>     > Is PostgreSQL server recovery to a consistent state possible with
>     > backup content obtained from above steps ?
>
>     As long as you follow all the rules for what files to copy (and not
>     copy), you should have a good backup.  Some things to be careful
>     about that I've seen people get wrong with some frequency:
>
>     - Be sure to exclude all files in and under the pg_xlog directory.
>       The backup should contain the directory itself, and it is best to
>       include its subdirectory, but copying the files can cause
>       corruption.
>     - Exclude the postmaster.pid file.
>     - Be sure *not* to exclude the backup_label file.  Without this
>       file, restoring from the backup may or may not have corruption,
>       which may or may not be initially apparent.
>
>     Be careful that when you resume after such an interruption you do
>     not skip any files and that you complete or re-copy any files that
>     were partially copied before the problems.
>
>     --
>     Kevin Grittner
>     EDB: http://www.enterprisedb.com
>     The Enterprise PostgreSQL Company
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800


Re: PostgreSQL File System Based Backup Restartability

From
Albe Laurenz
Date:
girish R G peetle wrote:
> After executing pg_start_backup, we will generate a file 'Backup file' which will contain list  of all
> the folders n files under DATA directory.
> 
> Then we read entry from 'Backup file' sequentially and use copy command/script to move it to backup
> media.
> 
> When network encounters copy script will fail stating couldn't backup the file.
> 
> Say copy command throws error at 20th file entry. When backup is resumed copy command resumes  from
> 20th file entry.

That should work just fine.

I'd still test the result by restoring such a backup and running "pg_dump -f /dev/null" on it,
just to be sure and because you cannot test your backups often enough.

Yours,
Laurenz Albe

Re: PostgreSQL File System Based Backup Restartability

From
Kevin Grittner
Date:
girish R G peetle <giri.anamika0@gmail.com> wrote:

> "Be careful that when you resume after such an interruption you
> do not skip any files and that you complete or re-copy any files
> that were partially copied before the problems."
>
> Here you mean, we should not skip any files that was already
> backed up before interruption ?
> I will have to backup entire content under DATA directory again ?

No.

Think of it this way: for every file that existed both when
pg_start_backup() and pg_stop_backup() were run, every OS-level
page must represent the state of that page at some point between
when those functions were run.  *Which* point in time each page
represents is not important, and it is not expected that all files
(or all pages within a file) represent the same point in time.  WAL
replay is guaranteed to fix up all pages modified between those
function executions.  The backup_label file specifies which WAL
records are needed to do that.

There were two concerns I had with what you described.

(1)  That when you resume with the 20th file, that is not an
ordinal position in a new list which might have fewer files ahead
of the 20th position, resulting in skipping some files.  If you're
continuing to use the original list, using the position in that
list is fine.

(2)  That if the error occurred part-way through reading a file,
leaving a portion uncopied, that the missing portion be copied.
(Of course re-copying the whole file works, too; but you could
safely resume just past the last page successfully copied before
the network problems.)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: PostgreSQL File System Based Backup Restartability

From
girish R G peetle
Date:

Thanks a lot Kevin for the detailed explanation.
In response to your concerns

1) I am using the original list, won't create a new list. So we are fine as per your explanation.

2) I have a question about partially copied file. What if a file that got partially copied disappears (say it belonged to a table n table was dropped ) when I resume backup. Should I mark this partially copied file invalid in backup media ? And continue with the next entry in the list ?

Thanks
Girish

On Feb 18, 2015 9:00 PM, "Kevin Grittner" <kgrittn@ymail.com> wrote:
girish R G peetle <giri.anamika0@gmail.com> wrote:

> "Be careful that when you resume after such an interruption you
> do not skip any files and that you complete or re-copy any files
> that were partially copied before the problems."
>
> Here you mean, we should not skip any files that was already
> backed up before interruption ?
> I will have to backup entire content under DATA directory again ?

No.

Think of it this way: for every file that existed both when
pg_start_backup() and pg_stop_backup() were run, every OS-level
page must represent the state of that page at some point between
when those functions were run.  *Which* point in time each page
represents is not important, and it is not expected that all files
(or all pages within a file) represent the same point in time.  WAL
replay is guaranteed to fix up all pages modified between those
function executions.  The backup_label file specifies which WAL
records are needed to do that.

There were two concerns I had with what you described.

(1)  That when you resume with the 20th file, that is not an
ordinal position in a new list which might have fewer files ahead
of the 20th position, resulting in skipping some files.  If you're
continuing to use the original list, using the position in that
list is fine.

(2)  That if the error occurred part-way through reading a file,
leaving a portion uncopied, that the missing portion be copied.
(Of course re-copying the whole file works, too; but you could
safely resume just past the last page successfully copied before
the network problems.)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PostgreSQL File System Based Backup Restartability

From
Jerry Sievers
Date:
girish R G peetle <giri.anamika0@gmail.com> writes:

> Thanks a lot Kevin for the detailed explanation.
> In response to your concerns
>
> 1) I am using the original list, won't create a new list. So we are fine as per your explanation.
>
> 2) I have a question about partially copied file. What if a file that got partially copied disappears (say it
belongedto a table n table was dropped ) when I resume 
> backup. Should I mark this partially copied file invalid in backup media ? And continue with the next entry in the
list? 

Stop complicating all of this.  Don't rescan the origin server.

Just restart copying your files with the one that was incomplete when
the failure occurred and then proceed through  the list till you reach
the end.

WAL replay will account for whatever changes took place on the origin
since pg_start_backup() was run.

This assumes that any files completely copied  are not themselves
corrupt for some reason.  If your copy process is untrustworth in that
regard then all of this is pointless anyway.

I'd be a lot more comfortable using rsync rather than a home-grown
solution or other archiver not commonly used for this porpose.

HTH.


> Girish
>
> On Feb 18, 2015 9:00 PM, "Kevin Grittner" <kgrittn@ymail.com> wrote:
>
>     girish R G peetle <giri.anamika0@gmail.com> wrote:
>
>     > "Be careful that when you resume after such an interruption you
>     > do not skip any files and that you complete or re-copy any files
>     > that were partially copied before the problems."
>     >
>     > Here you mean, we should not skip any files that was already
>     > backed up before interruption ?
>     > I will have to backup entire content under DATA directory again ?
>
>     No.
>
>     Think of it this way: for every file that existed both when
>     pg_start_backup() and pg_stop_backup() were run, every OS-level
>     page must represent the state of that page at some point between
>     when those functions were run.  *Which* point in time each page
>     represents is not important, and it is not expected that all files
>     (or all pages within a file) represent the same point in time.  WAL
>     replay is guaranteed to fix up all pages modified between those
>     function executions.  The backup_label file specifies which WAL
>     records are needed to do that.
>
>     There were two concerns I had with what you described.
>
>     (1)  That when you resume with the 20th file, that is not an
>     ordinal position in a new list which might have fewer files ahead
>     of the 20th position, resulting in skipping some files.  If you're
>     continuing to use the original list, using the position in that
>     list is fine.
>
>     (2)  That if the error occurred part-way through reading a file,
>     leaving a portion uncopied, that the missing portion be copied.
>     (Of course re-copying the whole file works, too; but you could
>     safely resume just past the last page successfully copied before
>     the network problems.)
>
>     --
>     Kevin Grittner
>     EDB: http://www.enterprisedb.com
>     The Enterprise PostgreSQL Company
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800


Re: PostgreSQL File System Based Backup Restartability

From
Scott Ribe
Date:
On Feb 18, 2015, at 12:57 PM, Jerry Sievers <gsievers19@comcast.net> wrote:
>
> I'd be a lot more comfortable using rsync rather than a home-grown
> solution or other archiver not commonly used for this porpose.

And if your network connection is prone to dropping, the --partial option with rsync would be handy.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice






Re: PostgreSQL File System Based Backup Restartability

From
Kevin Grittner
Date:
girish R G peetle <giri.anamika0@gmail.com> wrote:

> What if a file that got partially copied disappears (say it
> belonged to a table n table was dropped ) when I resume backup.
> Should I mark this partially copied file invalid in backup media ?

It doesn't much matter.  WAL replay will delete the file if found,
and must tolerate a missing file.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company