Thread: Pgbasebackup help

Pgbasebackup help

From
"Yelai, Ramkumar IN BLR STS"
Date:
Hi All,
 
I need some help in postgresql base backup.
 
We are currently using multiple DBMS in our project and postgresql is one of them. Our private DBMS keeps the online data and postgresql keeps online as well as historical data.
 
At present, we are doing the backup/restore process for our project. So we planned to use Pg_basebackup instead of  pg_dump. 
 
Below is the backup steps.
 
Time-T1   = Start the backup of private DBMS.
Time-T2   = Finished the private backup DBMS.
Time-T3   = Start the pg_basebackup.
Time-T4   = End the Pg_basebackup.
 
Here the requirement is we don’t want to restore the data after Time-T3.  But when I followed this approach https://opensourcedbms.com/dbms/point-in-time-recovery-pitr-using-pg_basebackup-with-postgresql-9-2/, I am still getting the information’s archived from Time-T3 to TimeT4.
 
Seems, WAL archives are holding all the transactions, which are happened between Time T3 – Time T4.
 
Also, I don’t want enable archive_mode = on as it needs to maintain archives files.
 
So I decided the enable only these parameters.
 
Postgresql.conf
---------------------------------
wal_level = hot_standby
max_wal_senders = 1
 
And added replication permissions for the current user in pg_hba.conf.
 
It does, what I need it. In the backup I did not have the data between T3-T4.
 
Is this correct or is there anything I missing it.
 
Please let me know.
 
With best regards,
Ramkumar Yelai
 
Siemens Technology and Services Private Limited
CT DC AA I HOUSE DEV GL4
84, Hosur Road
Bengaluru 560100, Indien
Tel.: +91 80 33136494
Fax: +91 80 33133389
Mobil: +91 9886182031
 
Registered Office: 130, Pandurang Budhkar Marg, Worli, Mumbai 400 018. Telephone +91 22 39677000. Fax +91 22 39677075. Other Offices: Bangalore, Chennai, Gurgaon, Noida, Kolkata, Pune. Corporate Identity number:U99999MH1986PLC093854
 
 

Re: Pgbasebackup help

From
David Steele
Date:
On 11/30/15 6:28 AM, Yelai, Ramkumar IN BLR STS wrote:
> Hi All,
>
> I need some help in postgresql base backup.
>
> We are currently using multiple DBMS in our project and postgresql is
> one of them. Our private DBMS keeps the online data and postgresql keeps
> online as well as historical data.
>
> At present, we are doing the backup/restore process for our project. So
> we planned to use Pg_basebackup instead of  pg_dump.
>
> Below is the backup steps.
>
> Time-T1   = Start the backup of private DBMS.
> Time-T2   = Finished the private backup DBMS.
> Time-T3   = Start the pg_basebackup.
> Time-T4   = End the Pg_basebackup.
>
> Here the requirement is we don’t want to restore the data after
> Time-T3.  But when I followed this approach
> _https://opensourcedbms.com/dbms/point-in-time-recovery-pitr-using-pg_basebackup-with-postgresql-9-2/_,
> I am still getting the information’s archived from Time-T3 to TimeT4.
>
> Seems, WAL archives are holding all the transactions, which are happened
> between Time T3 – Time T4.

This is the expected behavior.  The WAL generated during the backup must
be replayed to make the database consistent so T4 is the earliest you
can possibly stop recovery.

> Also, I don’t want enable archive_mode = on as it needs to maintain
> archives files.

As it turns out, archiving would be the solution to your problem.  If
you were archiving you could restore a *previous* backup and then replay
WAL to exactly T3.  There might be some jitter from clock differences
but it should have the desired effect.

I've done this to have development database reasonably in sync with each
other and in practice it works quite well.

> So I decided the enable only these parameters.
>
> Postgresql.conf
> ---------------------------------
> wal_level = hot_standby
> max_wal_senders = 1
>
> And added replication permissions for the current user in pg_hba.conf.
>
> It does, what I need it. In the backup I did not have the data between
> T3-T4.

There's not not enough detail here for me to make out what you are
doing.  Is there still a pg_basebackup going on or are you just copying
files?

If archive_mode is not enabled then wal_level = hot_standby is likely
ignored.

What you end up with may start, but I doubt it's consistent.  I don't
see how you could use pg_basebackup without archiving and end up at T3
with a consistent cluster.

Here's a tutorial I wrote for pgBackRest that covers point in time
recovery and goes into a bit more detail than the article you cited:

http://www.pgbackrest.org/user-guide.html#pitr

This method requires archive_mode to be enabled, which I believe is the
correct way to achieve the desired result.

--
-David
david@pgmasters.net


Attachment

Re: Pgbasebackup help

From
Jim Nasby
Date:
On 12/2/15 1:56 PM, David Steele wrote:
>> >Also, I don’t want enable archive_mode = on as it needs to maintain
>> >archives files.
> As it turns out, archiving would be the solution to your problem.  If
> you were archiving you could restore a*previous*  backup and then replay
> WAL to exactly T3.  There might be some jitter from clock differences
> but it should have the desired effect.

And in this case previous could be a PG backup taken immediately before
the backup of the private DBMS.

I don't remember off-hand if pg_basebackup has an option for delaying
the pg_stop_backup() call, but if it does then I think the simplest
thing is to just delay that until after your private DBMS backup is
complete.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: Pgbasebackup help

From
"Yelai, Ramkumar IN BLR STS"
Date:
HI,

Thanks David.

What I wanted to achieve is simple copy of Data folder. I can't shutdown the database during the backup and unable to
usefile system copy of data folder as it creates inconsistency and don't want to use pg_dump.  

Hence I decided to use Pg_basebackup for copying the base backup and don't want to replay the wal.

Anyway, pg_basebackup put checkpoint before copying the data folder. For me it is enough to restore till checkpoint.

I saw this link http://blog.veritech.io/2014/10/automated-backup-for-postgresql-cluster.html.

In this link also, I have not seen they have enabled archive_mode. Archive mode is not necessary as long as you
streamingthe your wal files to pg_xlog.  

By using this option in pg_basebackup and backup will become standalone backup.

--xlog-method = stream
"Includes the required transaction log files (WAL files) in the backup. This will include all transaction logs
generatedduring the backup. If this option is specified, it is possible to start a postmaster directly in the extracted
directorywithout the need to consult the log archive, thus making this a completely standalone backup." 

Also, even if I have all wal files , how do I  restore till time T3. I am analyzing at pgbackrest to know how to
restorebackup till time T3.  

Our private DBMS backup does not have any data after Time T3, hence I don't want postgresql backup contains any
transactionafter Time T3. 

Please let me know is my understand correct? And am I doing the correct way to backup.

Regards,
Ramkumar.

-----Original Message-----
From: David Steele [mailto:david@pgmasters.net]
Sent: Thursday, December 03, 2015 1:27 AM
To: pgsql-general@postgresql.org; Yelai, Ramkumar IN BLR STS
Subject: Re: [GENERAL] Pgbasebackup help

On 11/30/15 6:28 AM, Yelai, Ramkumar IN BLR STS wrote:
> Hi All,
>
> I need some help in postgresql base backup.
>
> We are currently using multiple DBMS in our project and postgresql is
> one of them. Our private DBMS keeps the online data and postgresql
> keeps online as well as historical data.
>
> At present, we are doing the backup/restore process for our project.
> So we planned to use Pg_basebackup instead of  pg_dump.
>
> Below is the backup steps.
>
> Time-T1   = Start the backup of private DBMS.
> Time-T2   = Finished the private backup DBMS.
> Time-T3   = Start the pg_basebackup.
> Time-T4   = End the Pg_basebackup.
>
> Here the requirement is we don't want to restore the data after
> Time-T3.  But when I followed this approach
> _https://opensourcedbms.com/dbms/point-in-time-recovery-pitr-using-pg_
> basebackup-with-postgresql-9-2/_, I am still getting the information's
> archived from Time-T3 to TimeT4.
>
> Seems, WAL archives are holding all the transactions, which are
> happened between Time T3 - Time T4.

This is the expected behavior.  The WAL generated during the backup must be replayed to make the database consistent so
T4is the earliest you can possibly stop recovery. 

> Also, I don't want enable archive_mode = on as it needs to maintain
> archives files.

As it turns out, archiving would be the solution to your problem.  If you were archiving you could restore a *previous*
backupand then replay WAL to exactly T3.  There might be some jitter from clock differences but it should have the
desiredeffect. 

I've done this to have development database reasonably in sync with each other and in practice it works quite well.

> So I decided the enable only these parameters.
>
> Postgresql.conf
> ---------------------------------
> wal_level = hot_standby
> max_wal_senders = 1
>
> And added replication permissions for the current user in pg_hba.conf.
>
> It does, what I need it. In the backup I did not have the data between
> T3-T4.

There's not not enough detail here for me to make out what you are doing.  Is there still a pg_basebackup going on or
areyou just copying files? 

If archive_mode is not enabled then wal_level = hot_standby is likely ignored.

What you end up with may start, but I doubt it's consistent.  I don't see how you could use pg_basebackup without
archivingand end up at T3 with a consistent cluster. 

Here's a tutorial I wrote for pgBackRest that covers point in time recovery and goes into a bit more detail than the
articleyou cited: 

http://www.pgbackrest.org/user-guide.html#pitr

This method requires archive_mode to be enabled, which I believe is the correct way to achieve the desired result.

--
-David
david@pgmasters.net



Re: Pgbasebackup help

From
John R Pierce
Date:
On 12/2/2015 9:59 PM, Yelai, Ramkumar IN BLR STS wrote:
>   I can't shutdown the database during the backup and unable to use file system copy of data folder as it creates
inconsistency

If you invoke pg_start_backup()  before the copy, and pg_stop_backup()
after the copy, then a file system copy is consistent.

see
http://www.postgresql.org/docs/current/static/functions-admin.html#FUNCTIONS-ADMIN-BACKUP
for the explanation of those functions.


--
john r pierce, recycling bits in santa cruz



Re: Pgbasebackup help

From
Andreas Kretschmer
Date:

> John R Pierce <pierce@hogranch.com> hat am 3. Dezember 2015 um 21:38
> geschrieben:
>
>
> On 12/2/2015 9:59 PM, Yelai, Ramkumar IN BLR STS wrote:
> >   I can't shutdown the database during the backup and unable to use file
> > system copy of data folder as it creates inconsistency
>
> If you invoke pg_start_backup()  before the copy, and pg_stop_backup()
> after the copy, then a file system copy is consistent.

file system snapshot PLUS transaction logs => consistent backup.


Re: Pgbasebackup help

From
Jim Nasby
Date:
Maybe I'm just being picky here, but...

On 12/3/15 3:04 PM, Andreas Kretschmer wrote:
>> If you invoke pg_start_backup()  before the copy, and pg_stop_backup()
>> >after the copy, then a file system copy is consistent.

The filesystem copy is not consistent, but because of how
pg_start/stop_backup works it doesn't matter that it's inconsistent.
Postgres uses the archived WAL files to recover from the inconsistency.

> file system snapshot PLUS transaction logs => consistent backup.

A real filesystem snapshot would always be consistent, and as long as it
included all Postgres files that's all you need. Start Postgres against
that snapshot and it will think the database had crashed and perform a
normal recovery.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: Pgbasebackup help

From
David Steele
Date:
On 12/2/15 6:25 PM, Jim Nasby wrote:
> On 12/2/15 1:56 PM, David Steele wrote:
>>> >Also, I don’t want enable archive_mode = on as it needs to maintain
>>> >archives files.
>> As it turns out, archiving would be the solution to your problem.  If
>> you were archiving you could restore a*previous*  backup and then replay
>> WAL to exactly T3.  There might be some jitter from clock differences
>> but it should have the desired effect.
>
> And in this case previous could be a PG backup taken immediately before
> the backup of the private DBMS.

Exactly.

> I don't remember off-hand if pg_basebackup has an option for delaying
> the pg_stop_backup() call, but if it does then I think the simplest
> thing is to just delay that until after your private DBMS backup is
> complete.

There's no option to delay it in pg_basebackup but it can be called
whenever you like when using pg_start/stop_backup.  So, yeah, if the
postgres backup were started first then pg_stop_backup() could be called
right after the private backup stops.  That would make them pretty much
in sync.

Of course, it's a lot of work to get that going.  pg_basebackup does a
lot of work for you.

--
-David
david@pgmasters.net


Re: Pgbasebackup help

From
David Steele
Date:
On 12/3/15 12:59 AM, Yelai, Ramkumar IN BLR STS wrote:
> What I wanted to achieve is simple copy of Data folder. I can't shutdown the database during the backup and unable to
usefile system copy of data folder as it creates inconsistency and don't want to use pg_dump. 
>
> Hence I decided to use Pg_basebackup for copying the base backup and don't want to replay the wal.

Replaying WAL is *not* optional.  Each restore will have to replay at
least one WAL segment to become consistent, depending on write volume
during the backup.

> Anyway, pg_basebackup put checkpoint before copying the data folder. For me it is enough to restore till checkpoint.

This won't work - the database keeps running and making changes after
the checkpoint.

> I saw this link http://blog.veritech.io/2014/10/automated-backup-for-postgresql-cluster.html.
>
> In this link also, I have not seen they have enabled archive_mode. Archive mode is not necessary as long as you
streamingthe your wal files to pg_xlog. 

These instructions are for bringing up a replica.  Even if this is OK
for your purposes, it still would not get you a database at time T3.
You are supposing that because this method does not use archiving that

> Also, even if I have all wal files , how do I  restore till time T3. I am analyzing at pgbackrest to know how to
restorebackup till time T3. 

To restore to time T3 you would select a backup that ended *before* T3
then using point-in-time recovery to play forward to T3.

That should be explained pretty clearly in the user guide - if there's
something you don't understand then it would be helpful to know so I can
improve the guide.

--
-David
david@pgmasters.net


Re: Pgbasebackup help

From
"Yelai, Ramkumar IN BLR STS"
Date:
Thanks David,

This helped me to understand the WAL importance.

Finally, we decided to use "stream" option to copy the WAL file during the backup as mentioned in the help.

Enabled this options in postgres.conf

wal_level = hot_standby
max_wal_senders = 2

Though I get the information, which are archived during the backup process. I myself deleting archived records the
aftertime T3, while restoring the backup by using our sql procedures.  

Since I am not using archive recovery or standy replica (I am restoring the data folder in to the same server not in
standbyserver), I can't use the recovery.conf options to recover till Time T3.  Hence I forcefully deleted as mentioned
earlier.

Please let me know If any way to replay the WAL till Time T3 then I am interested to use it.

Thanks once again.

Regards,
Ramkumar.

-----Original Message-----
From: David Steele [mailto:david@pgmasters.net]
Sent: Friday, December 04, 2015 6:26 PM
To: Yelai, Ramkumar IN BLR STS; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Pgbasebackup help

On 12/3/15 12:59 AM, Yelai, Ramkumar IN BLR STS wrote:
> What I wanted to achieve is simple copy of Data folder. I can't shutdown the database during the backup and unable to
usefile system copy of data folder as it creates inconsistency and don't want to use pg_dump. 
>
> Hence I decided to use Pg_basebackup for copying the base backup and don't want to replay the wal.

Replaying WAL is *not* optional.  Each restore will have to replay at least one WAL segment to become consistent,
dependingon write volume during the backup. 

> Anyway, pg_basebackup put checkpoint before copying the data folder. For me it is enough to restore till checkpoint.

This won't work - the database keeps running and making changes after the checkpoint.

> I saw this link http://blog.veritech.io/2014/10/automated-backup-for-postgresql-cluster.html.
>
> In this link also, I have not seen they have enabled archive_mode. Archive mode is not necessary as long as you
streamingthe your wal files to pg_xlog. 

These instructions are for bringing up a replica.  Even if this is OK for your purposes, it still would not get you a
databaseat time T3.  
You are supposing that because this method does not use archiving that

> Also, even if I have all wal files , how do I  restore till time T3. I am analyzing at pgbackrest to know how to
restorebackup till time T3. 

To restore to time T3 you would select a backup that ended *before* T3 then using point-in-time recovery to play
forwardto T3. 

That should be explained pretty clearly in the user guide - if there's something you don't understand then it would be
helpfulto know so I can improve the guide. 

--
-David
david@pgmasters.net