Thread: How safe is pg_basebackup + continuous archiving?

How safe is pg_basebackup + continuous archiving?

From
Kaixi Luo
Date:
Hello,

We use PostgreSQL at work and we do daily backups with pg_dump. After that we pg_restore the dump and check the database that there isn't any data corruption. As the database grows, the whole pg_dump / pg_restore cycle time is quickly approaching 24h, so we need to change strategies.

We've thought about using pg_basebackup + continuous archiving as an alternative backup process, but I have doubts regarding the safety of such procedure. As far as I know, pg_basebackup is done via rsync (and we also archive wals using rsync), so if by any chance  disk corruption occurs on the master server, the corruption would be carried over to our backup server.

How can we check for backup corruption in this case? Thanks you very much.

Kaixi Luo

Re: How safe is pg_basebackup + continuous archiving?

From
Michael Paquier
Date:
On Wed, Jun 29, 2016 at 11:51 PM, Kaixi Luo <kaixiluo@gmail.com> wrote:
> We use PostgreSQL at work and we do daily backups with pg_dump. After that
> we pg_restore the dump and check the database that there isn't any data
> corruption. As the database grows, the whole pg_dump / pg_restore cycle time
> is quickly approaching 24h, so we need to change strategies.

That's a mature solution. And without doubts, many systems in
production use it, abuse of it and rely on it.

> We've thought about using pg_basebackup + continuous archiving as an
> alternative backup process, but I have doubts regarding the safety of such
> procedure. As far as I know, pg_basebackup is done via rsync (and we also
> archive wals using rsync), so if by any chance  disk corruption occurs on
> the master server, the corruption would be carried over to our backup
> server.

pg_basebackup speaks the replication protocol and uses it to receive a
base backup from the server in the shape of a tar stream. It then
decides if it needs to untar the content or write it to disk as-is.
Note though that the contents of a backup are not fsync'd to disk
after pg_basebackup finished (that's in the works), so you had better
do it as well to ensure that the data stays here.

> How can we check for backup corruption in this case? Thanks you very much.

Before replaying a backup on a production system, you would need a
pre-production setup where the backup is replayed and checked.
Honestly, you can only be sure that a backup is working correctly
after reusing it. You could always do some validation of the raw
backup contents, but you need at the end the WAL applied on top of it
to be able to check the status of a server that has reached a
consistent point.
--
Michael


Re: How safe is pg_basebackup + continuous archiving?

From
Alex Ignatov
Date:
On 29.06.2016 17:51, Kaixi Luo wrote:
> Hello,
>
> We use PostgreSQL at work and we do daily backups with pg_dump. After
> that we pg_restore the dump and check the database that there isn't
> any data corruption. As the database grows, the whole pg_dump /
> pg_restore cycle time is quickly approaching 24h, so we need to change
> strategies.
>
> We've thought about using pg_basebackup + continuous archiving as an
> alternative backup process, but I have doubts regarding the safety of
> such procedure. As far as I know, pg_basebackup is done via rsync (and
> we also archive wals using rsync), so if by any chance  disk
> corruption occurs on the master server, the corruption would be
> carried over to our backup server.
>
> How can we check for backup corruption in this case? Thanks you very much.
>
> Kaixi Luo
Hello!
Only pg_dump+ data checksums turned on can make you "sure" in absence
corruption in backup. But! If at any way you've lost some data file from
relation or it zeroed by say powerloss(there were some issue with xfs in
the past) even with pg_dump you will never know it.
But  there is  no any other method in PG to check database for
corruption. Also PG have no checksums on clog's file. So if any
corruption is happend in this file you also will never know it.
So at now pg_basebackup+ wal archiving is like walking on the minefield
with tightly closed eyes . You never know when it will make explode!


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: How safe is pg_basebackup + continuous archiving?

From
Kaixi Luo
Date:
Before replaying a backup on a production system, you would need a
pre-production setup where the backup is replayed and checked.
Honestly, you can only be sure that a backup is working correctly
after reusing it. You could always do some validation of the raw
backup contents, but you need at the end the WAL applied on top of it
to be able to check the status of a server that has reached a
consistent point.

Could you elaborate a bit more on this last part? If a PostgreSQL recovery from a pg_basebackup reaches a consistent point and is able to start up fully, that is not a guarantee that there hasn't been any underlying corruption, as far as I know. Am I correct on this? 

Thanks once again.

Kaixi

Re: How safe is pg_basebackup + continuous archiving?

From
Michael Paquier
Date:
On Thu, Jun 30, 2016 at 9:00 PM, Kaixi Luo <kaixiluo@gmail.com> wrote:
>> Before replaying a backup on a production system, you would need a
>> pre-production setup where the backup is replayed and checked.
>> Honestly, you can only be sure that a backup is working correctly
>> after reusing it. You could always do some validation of the raw
>> backup contents, but you need at the end the WAL applied on top of it
>> to be able to check the status of a server that has reached a
>> consistent point.
>
>
> Could you elaborate a bit more on this last part? If a PostgreSQL recovery
> from a pg_basebackup reaches a consistent point and is able to start up
> fully, that is not a guarantee that there hasn't been any underlying
> corruption, as far as I know. Am I correct on this?

You are correct, that's why you need a copycat of the production
system that is a pre-stage of the production stage, where the backups
are replayed and checked with an application that replays the patterns
of the production application. Applying extra checks on top of that is
good as well: pg_dump, data checksums, index consistency checks (this
makes regret that we don't have pg_amcheck in core yet actually), etc.
--
Michael


Re: How safe is pg_basebackup + continuous archiving?

From
Stephen Frost
Date:
Greetings,

* Kaixi Luo (kaixiluo@gmail.com) wrote:
> We use PostgreSQL at work and we do daily backups with pg_dump. After that
> we pg_restore the dump and check the database that there isn't any data
> corruption. As the database grows, the whole pg_dump / pg_restore cycle
> time is quickly approaching 24h, so we need to change strategies.

I've found this to be a good strategy also, but it's far from perfect.
Corruption can still occur, for example, in indexes on the primary
system.  Generally speaking, pg_dump doesn't exercise indexes and
therefore you won't notice if an index is corrupt.

> We've thought about using pg_basebackup + continuous archiving as an
> alternative backup process, but I have doubts regarding the safety of such
> procedure. As far as I know, pg_basebackup is done via rsync (and we also
> archive wals using rsync), so if by any chance  disk corruption occurs on
> the master server, the corruption would be carried over to our backup
> server.

This is correct, but checksums are now available in modern versions of
PG, which will detect disk corruption.  Those checksums would be carried
over to the backup server and could be verified there by using pg_dump
(note that this still wouldn't help with indexes, but you don't have
coverage there today anyway).

> How can we check for backup corruption in this case? Thanks you very much.

There has been some discussion about a specific tool for checking the
checksums throughout the entire system.  I don't know of anyone activly
working on that, unfortunately.

There are a number of tools available to help with online backups and
continuous archiving beyond pgbasebackup and having to hand-roll
scripts.  I'm personally biased towards and prefer pgBackRest, as I
helped start that project, but there are other tools, such as barman and
WAL-E, which would still be better than trying to implement everything
correctly on your own.

Thanks!

Stephen

Attachment

Re: How safe is pg_basebackup + continuous archiving?

From
Jim Nasby
Date:
On 6/30/16 8:30 AM, Stephen Frost wrote:
>> > How can we check for backup corruption in this case? Thanks you very much.
> There has been some discussion about a specific tool for checking the
> checksums throughout the entire system.  I don't know of anyone activly
> working on that, unfortunately.

If someone did want that though, it could probably be done as an
extension. I believe you just have to pull all of each relation into
shared buffers for the checksums to be verified.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


Re: How safe is pg_basebackup + continuous archiving?

From
Kaixi Luo
Date:
Thank you all a lot for the detailed answers! :)

Kaixi

On Thu, Jun 30, 2016 at 3:15 PM Michael Paquier <michael.paquier@gmail.com> wrote:
On Thu, Jun 30, 2016 at 9:00 PM, Kaixi Luo <kaixiluo@gmail.com> wrote:
>> Before replaying a backup on a production system, you would need a
>> pre-production setup where the backup is replayed and checked.
>> Honestly, you can only be sure that a backup is working correctly
>> after reusing it. You could always do some validation of the raw
>> backup contents, but you need at the end the WAL applied on top of it
>> to be able to check the status of a server that has reached a
>> consistent point.
>
>
> Could you elaborate a bit more on this last part? If a PostgreSQL recovery
> from a pg_basebackup reaches a consistent point and is able to start up
> fully, that is not a guarantee that there hasn't been any underlying
> corruption, as far as I know. Am I correct on this?

You are correct, that's why you need a copycat of the production
system that is a pre-stage of the production stage, where the backups
are replayed and checked with an application that replays the patterns
of the production application. Applying extra checks on top of that is
good as well: pg_dump, data checksums, index consistency checks (this
makes regret that we don't have pg_amcheck in core yet actually), etc.
--
Michael