Thread: PITR

PITR

From
Murthy Nunna
Date:

All,

 

I am testing PITR…. I am looking for recovery.conf parameters where you can recovery the WALs available in the restore_command but do not complete recovery. I want to be able to connect to the database and check database in read only and if I am not there yet, I will feed more WALs in the archive directory and resume recovery. I would like to prevent multiple base restorations. I want to roll forward with WALs but check in between. Also, I do not want to set up replication standby.

 

Is it possible? If so, could you tell me what are the relevant recovery.conf params?

 

Thanks,

Murthy

Re: PITR

From
desmodemone
Date:



2014-02-22 17:06 GMT+01:00 Murthy Nunna <mnunna@fnal.gov>:

All,

 

I am testing PITR…. I am looking for recovery.conf parameters where you can recovery the WALs available in the restore_command but do not complete recovery. I want to be able to connect to the database and check database in read only and if I am not there yet, I will feed more WALs in the archive directory and resume recovery. I would like to prevent multiple base restorations. I want to roll forward with WALs but check in between. Also, I do not want to set up replication standby.

 

Is it possible? If so, could you tell me what are the relevant recovery.conf params?

 

Thanks,

Murthy


Hi Murthy,      
               look at parameter "pause_at_recovery_target"  and "recovery_target_time" with these parameters in the recovery.conf you could reach a point in the timeline to recover the database and then see if it's all ok and then finish the recovery with pg_xlog_replay_resume. If you want instead continue the recovery you have to stop the backend and change the parameter of recovery_target_time in the recovery.conf and restart the postmaster.

Mat DBA


Re: PITR

From
Murthy Nunna
Date:

Mat,

 

Thank you for your quick response….

 

The documentation says  for pause_at_recovery_target:

 

This setting has no effect if hot_standby is not enabled, or if no recovery target is set.

 

In my case hot_standby is not enabled.

 

Thanks,

Murthy

 

 

 

 

From: desmodemone [mailto:desmodemone@gmail.com]
Sent: Saturday, February 22, 2014 10:34 AM
To: Murthy Nunna
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] PITR

 

 

 

2014-02-22 17:06 GMT+01:00 Murthy Nunna <mnunna@fnal.gov>:

All,

 

I am testing PITR…. I am looking for recovery.conf parameters where you can recovery the WALs available in the restore_command but do not complete recovery. I want to be able to connect to the database and check database in read only and if I am not there yet, I will feed more WALs in the archive directory and resume recovery. I would like to prevent multiple base restorations. I want to roll forward with WALs but check in between. Also, I do not want to set up replication standby.

 

Is it possible? If so, could you tell me what are the relevant recovery.conf params?

 

Thanks,

Murthy

 

Hi Murthy,      
               look at parameter "pause_at_recovery_target"  and "recovery_target_time" with these parameters in the recovery.conf you could reach a point in the timeline to recover the database and then see if it's all ok and then finish the recovery with pg_xlog_replay_resume. If you want instead continue the recovery you have to stop the backend and change the parameter of recovery_target_time in the recovery.conf and restart the postmaster.

Mat DBA

 

Re: PITR

From
Julien Rouhaud
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Le 22/02/2014 18:03, Murthy Nunna a écrit :
> Mat,
>
>
>
> Thank you for your quick response….
>
>
>
> The documentation says  for pause_at_recovery_target:
>
>
>
> This setting has no effect if hot_standby
> <http://www.postgresql.org/docs/9.2/static/runtime-config-replication.html#GUC-HOT-STANDBY>
> is not enabled, or if no recovery target is set.
>
>
>
> In my case hot_standby is not enabled.
>
>
Hi,

If you want to connect to your database in read only to check the
recovery point, you have to enable it. It doesn't have any impact when
the cluster is not in recovery mode.

Regards.

>
> Thanks,
>
> Murthy
>
>
>
>
>
>
>
>
>
> *From:*desmodemone [mailto:desmodemone@gmail.com] *Sent:* Saturday,
> February 22, 2014 10:34 AM *To:* Murthy Nunna *Cc:*
> pgsql-admin@postgresql.org *Subject:* Re: [ADMIN] PITR
>
>
>
>
>
>
>
> 2014-02-22 17:06 GMT+01:00 Murthy Nunna <mnunna@fnal.gov
> <mailto:mnunna@fnal.gov>>:
>
> All,
>
>
>
> I am testing PITR…. I am looking for recovery.conf parameters where
> you can recovery the WALs available in the restore_command but do
> not complete recovery. I want to be able to connect to the database
> and check database in read only and if I am not there yet, I will
> feed more WALs in the archive directory and resume recovery. I
> would like to prevent multiple base restorations. I want to roll
> forward with WALs but check in between. Also, I do not want to set
> up replication standby.
>
>
>
> Is it possible? If so, could you tell me what are the relevant
> recovery.conf params?
>
>
>
> Thanks,
>
> Murthy
>
>
>
> Hi Murthy, look at parameter "pause_at_recovery_target"  and
> "recovery_target_time" with these parameters in the recovery.conf
> you could reach a point in the timeline to recover the database and
> then see if it's all ok and then finish the recovery with
> pg_xlog_replay_resume. If you want instead continue the recovery
> you have to stop the backend and change the parameter of
> recovery_target_time in the recovery.conf and restart the
> postmaster.
>
> Mat DBA
>
>
>


- --
Julien Rouhaud
http://dalibo.com - http://dalibo.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTCN8GAAoJELGaJ8vfEpOqynAH/1Vu2gUwdDww26qbbDreXRid
kpWGUEI2YXBw4e6D3SFiFDG77aPwFF7aGDXs/3Rjck3erLaYrtj/70ZFEwXJ2d4K
W2hDS8KFt8cx6YsNMI2epL/FnDYZKudU2Qcceixlzf2gGSDeexyy/ZLdTOQMgXZF
D4ktuEOdpIxUDipbWe7af2TVzaiTqHEu64RtcmtgPWIlwJHdtShEYFo6Go0BryY9
ToU+8DU45x4tuH6msk94D/7/NQJnkMpPNztxAXXZsDp4Xwtrq3nA20187y69NVu8
uWSpJVFzfPZIV0Cl6vupyw2k6G9ObvTBWcpSPrixVjgcpiHh+15j49z+OTfSB1I=
=FR3m
-----END PGP SIGNATURE-----


Re: PITR

From
desmodemone
Date:



2014-02-22 18:31 GMT+01:00 Julien Rouhaud <julien.rouhaud@dalibo.com>:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Le 22/02/2014 18:03, Murthy Nunna a écrit :
> Mat,
>
>
>
> Thank you for your quick response….
>
>
>
> The documentation says  for pause_at_recovery_target:
>
>
>
> This setting has no effect if hot_standby
> <http://www.postgresql.org/docs/9.2/static/runtime-config-replication.html#GUC-HOT-STANDBY>
> is not enabled, or if no recovery target is set.
>
>
>
> In my case hot_standby is not enabled.
>
>
Hi,

If you want to connect to your database in read only to check the
recovery point, you have to enable it. It doesn't have any impact when
the cluster is not in recovery mode.

Regards.

>
> Thanks,
>
> Murthy
>
>
>
>
>
>
>
>
>
> *From:*desmodemone [mailto:desmodemone@gmail.com] *Sent:* Saturday,
> February 22, 2014 10:34 AM *To:* Murthy Nunna *Cc:*
> pgsql-admin@postgresql.org *Subject:* Re: [ADMIN] PITR
>
>
>
>
>
>
>
> 2014-02-22 17:06 GMT+01:00 Murthy Nunna <mnunna@fnal.gov
> <mailto:mnunna@fnal.gov>>:
>
> All,
>
>
>
> I am testing PITR…. I am looking for recovery.conf parameters where
> you can recovery the WALs available in the restore_command but do
> not complete recovery. I want to be able to connect to the database
> and check database in read only and if I am not there yet, I will
> feed more WALs in the archive directory and resume recovery. I
> would like to prevent multiple base restorations. I want to roll
> forward with WALs but check in between. Also, I do not want to set
> up replication standby.
>
>
>
> Is it possible? If so, could you tell me what are the relevant
> recovery.conf params?
>
>
>
> Thanks,
>
> Murthy
>
>
>
> Hi Murthy, look at parameter "pause_at_recovery_target"  and
> "recovery_target_time" with these parameters in the recovery.conf
> you could reach a point in the timeline to recover the database and
> then see if it's all ok and then finish the recovery with
> pg_xlog_replay_resume. If you want instead continue the recovery
> you have to stop the backend and change the parameter of
> recovery_target_time in the recovery.conf and restart the
> postmaster.
>
> Mat DBA
>
>
>


- --
Julien Rouhaud
http://dalibo.com - http://dalibo.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTCN8GAAoJELGaJ8vfEpOqynAH/1Vu2gUwdDww26qbbDreXRid
kpWGUEI2YXBw4e6D3SFiFDG77aPwFF7aGDXs/3Rjck3erLaYrtj/70ZFEwXJ2d4K
W2hDS8KFt8cx6YsNMI2epL/FnDYZKudU2Qcceixlzf2gGSDeexyy/ZLdTOQMgXZF
D4ktuEOdpIxUDipbWe7af2TVzaiTqHEu64RtcmtgPWIlwJHdtShEYFo6Go0BryY9
ToU+8DU45x4tuH6msk94D/7/NQJnkMpPNztxAXXZsDp4Xwtrq3nA20187y69NVu8
uWSpJVFzfPZIV0Cl6vupyw2k6G9ObvTBWcpSPrixVjgcpiHh+15j49z+OTfSB1I=
=FR3m
-----END PGP SIGNATURE-----


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

+1 for Julien

Mat DBA


Re: PITR

From
Murthy Nunna
Date:

Hi Mat,

 

Thank you for the pointers on pause_at_recovery_target and recovery_target_time. It worked but I encountered an unexpected situation.

 

I wanted to test three recovery times by checking data at each point and then proceed to the next. It worked as expected first 2 recovery times but the last one did not give me an opportunity to check data. It simply completed recovery and switched timeline. This means I cannot rollforward anymore unless I restore the database again. Do you think I did something wrong?

 

Thanks,

Murthy

 

Following are my recovery.conf settings:

 

pause_at_recovery_target = true

#recovery_target_time = '2014-02-22 19:15:00'

#recovery_target_time = '2014-02-22 19:44:00'

recovery_target_time = '2014-02-22 19:50:00'

restore_command = 'cp /pgdata/backups/xlogs/minerva_ecl_test/%f %p'

 

Following is the pg_log of my 3rd recovery:

 

,2014-02-22 22:04:16 CSTLOG:  database system was shut down in recovery at 2014-02-22 22:02:42 CST

,2014-02-22 22:04:16 CSTLOG:  restored log file "00000004.history" from archive

,2014-02-22 22:04:16 CSTLOG:  starting point-in-time recovery to 2014-02-22 19:50:00-06

,2014-02-22 22:04:16 CSTLOG:  restored log file "000000040000000800000004" from archive

,2014-02-22 22:04:16 CSTLOG:  redo starts at 8/4000020

,2014-02-22 22:04:16 CSTLOG:  restored log file "000000040000000800000005" from archive

,2014-02-22 22:04:16 CSTLOG:  consistent recovery state reached at 8/5F51EA8

,2014-02-22 22:04:16 CSTLOG:  database system is ready to accept read only connections

,2014-02-22 22:04:16 CSTLOG:  restored log file "000000040000000800000006" from archive

,2014-02-22 22:04:16 CSTLOG:  restored log file "000000040000000800000007" from archive

,2014-02-22 22:04:16 CSTLOG:  restored log file "000000040000000800000008" from archive

,2014-02-22 22:04:16 CSTLOG:  restored log file "000000040000000800000009" from archive

,2014-02-22 22:04:16 CSTLOG:  restored log file "00000004000000080000000A" from archive

,2014-02-22 22:04:16 CSTLOG:  restored log file "00000004000000080000000B" from archive

,2014-02-22 22:04:16 CSTLOG:  restored log file "00000004000000080000000C" from archive

cp: cannot stat `/pgdata/backups/xlogs/minerva_ecl_test/00000004000000080000000D': No such file or directory

,2014-02-22 22:04:16 CSTLOG:  unexpected pageaddr 7/BC000000 in log file 8, segment 13, offset 0

,2014-02-22 22:04:16 CSTLOG:  redo done at 8/C0000B8

,2014-02-22 22:04:16 CSTLOG:  last completed transaction was at log time 2014-02-22 19:48:29.205898-06

,2014-02-22 22:04:16 CSTLOG:  restored log file "00000004000000080000000C" from archive

cp: cannot stat `/pgdata/backups/xlogs/minerva_ecl_test/00000005.history': No such file or directory

cp: cannot stat `/pgdata/backups/xlogs/minerva_ecl_test/00000006.history': No such file or directory

cp: cannot stat `/pgdata/backups/xlogs/minerva_ecl_test/00000007.history': No such file or directory

cp: cannot stat `/pgdata/backups/xlogs/minerva_ecl_test/00000008.history': No such file or directory

cp: cannot stat `/pgdata/backups/xlogs/minerva_ecl_test/00000009.history': No such file or directory

cp: cannot stat `/pgdata/backups/xlogs/minerva_ecl_test/0000000A.history': No such file or directory

,2014-02-22 22:04:16 CSTLOG:  selected new timeline ID: 10

,2014-02-22 22:04:16 CSTLOG:  restored log file "00000004.history" from archive

,2014-02-22 22:04:17 CSTLOG:  archive recovery complete

,2014-02-22 22:04:17 CSTLOG:  database system is ready to accept connections

,2014-02-22 22:04:17 CSTLOG:  autovacuum launcher started

 

 

 

 

 

From: desmodemone [mailto:desmodemone@gmail.com]
Sent: Saturday, February 22, 2014 10:34 AM
To: Murthy Nunna
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] PITR

 

 

 

2014-02-22 17:06 GMT+01:00 Murthy Nunna <mnunna@fnal.gov>:

All,

 

I am testing PITR…. I am looking for recovery.conf parameters where you can recovery the WALs available in the restore_command but do not complete recovery. I want to be able to connect to the database and check database in read only and if I am not there yet, I will feed more WALs in the archive directory and resume recovery. I would like to prevent multiple base restorations. I want to roll forward with WALs but check in between. Also, I do not want to set up replication standby.

 

Is it possible? If so, could you tell me what are the relevant recovery.conf params?

 

Thanks,

Murthy

 

Hi Murthy,      
               look at parameter "pause_at_recovery_target"  and "recovery_target_time" with these parameters in the recovery.conf you could reach a point in the timeline to recover the database and then see if it's all ok and then finish the recovery with pg_xlog_replay_resume. If you want instead continue the recovery you have to stop the backend and change the parameter of recovery_target_time in the recovery.conf and restart the postmaster.

Mat DBA

 

Re: PITR

From
Raghavendra
Date:
On Sun, Feb 23, 2014 at 9:57 AM, Murthy Nunna <mnunna@fnal.gov> wrote:

Hi Mat,

 

Thank you for the pointers on pause_at_recovery_target and recovery_target_time. It worked but I encountered an unexpected situation.

 

I wanted to test three recovery times by checking data at each point and then proceed to the next. It worked as expected first 2 recovery times but the last one did not give me an opportunity to check data. It simply completed recovery and switched timeline. This means I cannot rollforward anymore unless I restore the database again. Do you think I did something wrong?


You did right, for the first two time targets you have some wal files pending hence pause is allowed. Whereas for the last time target there are no archive files to pause so it has just opened the database as completion of PITR.

I am able to reproduce this on my local as well.

This is from my logs:

2014-02-06 18:54:54 PST-11634---[] LOG:  restored log file "0000000100000001000000B8" from archive
2014-02-06 18:54:55 PST-11634---[] LOG:  restored log file "0000000100000001000000B9" from archive
2014-02-06 18:54:55 PST-11634---[] LOG:  consistent recovery state reached at 1/B90029C8
2014-02-06 18:54:55 PST-11632---[] LOG:  database system is ready to accept read only connections
cp: cannot stat `/opt/PostgreSQL/9.3/archives93/0000000100000001000000BA': No such file or directory

B8,B9 applied and looking for next file BA which is not there and no point of pausing without any files, hence it has opened the database. But my guess is, what it has done is right, there no more files to pause and allow you to query, its like completion of PITR.

Same in your case, there may not be any files in your archives directory. You can check for '00000004000000080000000D' file in your archive directory.

,2014-02-22 22:04:16 CSTLOG:  restored log file "00000004000000080000000C" from archive

cp: cannot stat `/pgdata/backups/xlogs/minerva_ecl_test/00000004000000080000000D': No such file or directory


---
Regards,
Raghavendra
EnterpriseDB Corporation

Re: PITR

From
desmodemone
Date:



2014-02-23 7:08 GMT+01:00 Raghavendra <raghavendra.rao@enterprisedb.com>:
On Sun, Feb 23, 2014 at 9:57 AM, Murthy Nunna <mnunna@fnal.gov> wrote:

Hi Mat,

 

Thank you for the pointers on pause_at_recovery_target and recovery_target_time. It worked but I encountered an unexpected situation.

 

I wanted to test three recovery times by checking data at each point and then proceed to the next. It worked as expected first 2 recovery times but the last one did not give me an opportunity to check data. It simply completed recovery and switched timeline. This means I cannot rollforward anymore unless I restore the database again. Do you think I did something wrong?


You did right, for the first two time targets you have some wal files pending hence pause is allowed. Whereas for the last time target there are no archive files to pause so it has just opened the database as completion of PITR.

I am able to reproduce this on my local as well.

This is from my logs:

2014-02-06 18:54:54 PST-11634---[] LOG:  restored log file "0000000100000001000000B8" from archive
2014-02-06 18:54:55 PST-11634---[] LOG:  restored log file "0000000100000001000000B9" from archive
2014-02-06 18:54:55 PST-11634---[] LOG:  consistent recovery state reached at 1/B90029C8
2014-02-06 18:54:55 PST-11632---[] LOG:  database system is ready to accept read only connections
cp: cannot stat `/opt/PostgreSQL/9.3/archives93/0000000100000001000000BA': No such file or directory

B8,B9 applied and looking for next file BA which is not there and no point of pausing without any files, hence it has opened the database. But my guess is, what it has done is right, there no more files to pause and allow you to query, its like completion of PITR.

Same in your case, there may not be any files in your archives directory. You can check for '00000004000000080000000D' file in your archive directory.

,2014-02-22 22:04:16 CSTLOG:  restored log file "00000004000000080000000C" from archive

cp: cannot stat `/pgdata/backups/xlogs/minerva_ecl_test/00000004000000080000000D': No such file or directory


---
Regards,
Raghavendra
EnterpriseDB Corporation


Raghavendra is right,  the recovery applied successful your archived wal segment to the database and ended the loop in the xlog.c when it not found anymore records in the wal segments. Moreover the backend is  telling you that the recovery phase ended at "2014-02-22 19:48:29.205898-06" while your target timeline was "2014-02-22 19:50:00" and it's interesting.
Did you create transactions and you committed them at that time or after ?

Normally, when you have to do a restore,  probably you had a crash and your filesystem is not ok, so  you use a new filesystem on another storage. So your last transactions are "lost", because those transactions was not still archived, infact those transactions were in the last wal segment of the crashed filesystem. If that file is not OK, you could not use it for complete the recovery.


If you have a RPO policy , you have to look at parameter archive_timeout , so you are sure your wal segments will be archived, and if you do that and your database does not create so much transactions , think about to compress them or use data deduplication at storage level or filesystem level , or you will have a lot of wasted space ( the wal segment even if is not full will write 16Mb of <data>+<00000..000>)


Have a nice day



Mat Dba

Re: PITR

From
Murthy Nunna
Date:

Raghavendra,

 

Thanks for testing and confirming the behavior of “pause” setting.

 

While I understand your explanation, I feel I am still missing something. IMHO, when I say pause using “pause” setting, no matter what, I expect the recovery to wait for manual intervention. I myself can come up with number of reasons for doing so… e.g I may be purposely “hiding” some WALs somewhere else, or maybe I have several thousands of WALs that I want to parallelize the process of applying some logs while I recall some from tapes.

 

Let me know what you think.

 

Mat, This is a test database so I purposely lowered checkpoint/archive timeouts. Thank-you, I’ll follow your advice for production systems to conserve space.

 

Thanks,

Murthy

 

 

 

From: desmodemone [mailto:desmodemone@gmail.com]
Sent: Sunday, February 23, 2014 4:13 AM
To: Raghavendra
Cc: Murthy Nunna; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] PITR

 

 

 

2014-02-23 7:08 GMT+01:00 Raghavendra <raghavendra.rao@enterprisedb.com>:

On Sun, Feb 23, 2014 at 9:57 AM, Murthy Nunna <mnunna@fnal.gov> wrote:

Hi Mat,

 

Thank you for the pointers on pause_at_recovery_target and recovery_target_time. It worked but I encountered an unexpected situation.

 

I wanted to test three recovery times by checking data at each point and then proceed to the next. It worked as expected first 2 recovery times but the last one did not give me an opportunity to check data. It simply completed recovery and switched timeline. This means I cannot rollforward anymore unless I restore the database again. Do you think I did something wrong?

 

You did right, for the first two time targets you have some wal files pending hence pause is allowed. Whereas for the last time target there are no archive files to pause so it has just opened the database as completion of PITR.

 

I am able to reproduce this on my local as well.

 

This is from my logs:

 

2014-02-06 18:54:54 PST-11634---[] LOG:  restored log file "0000000100000001000000B8" from archive

2014-02-06 18:54:55 PST-11634---[] LOG:  restored log file "0000000100000001000000B9" from archive

2014-02-06 18:54:55 PST-11634---[] LOG:  consistent recovery state reached at 1/B90029C8

2014-02-06 18:54:55 PST-11632---[] LOG:  database system is ready to accept read only connections

cp: cannot stat `/opt/PostgreSQL/9.3/archives93/0000000100000001000000BA': No such file or directory

 

B8,B9 applied and looking for next file BA which is not there and no point of pausing without any files, hence it has opened the database. But my guess is, what it has done is right, there no more files to pause and allow you to query, its like completion of PITR.

 

Same in your case, there may not be any files in your archives directory. You can check for '00000004000000080000000D' file in your archive directory.

 

,2014-02-22 22:04:16 CSTLOG:  restored log file "00000004000000080000000C" from archive

cp: cannot stat `/pgdata/backups/xlogs/minerva_ecl_test/00000004000000080000000D': No such file or directory

 

---

Regards,

Raghavendra

EnterpriseDB Corporation



Raghavendra is right,  the recovery applied successful your archived wal segment to the database and ended the loop in the xlog.c when it not found anymore records in the wal segments. Moreover the backend is  telling you that the recovery phase ended at "
2014-02-22 19:48:29.205898-06" while your target timeline was "2014-02-22 19:50:00" and it's interesting.

Did you create transactions and you committed them at that time or after ?

Normally, when you have to do a restore,  probably you had a crash and your filesystem is not ok, so  you use a new filesystem on another storage. So your last transactions are "lost", because those transactions was not still archived, infact those transactions were in the last wal segment of the crashed filesystem. If that file is not OK, you could not use it for complete the recovery.

If you have a RPO policy , you have to look at parameter archive_timeout , so you are sure your wal segments will be archived, and if you do that and your database does not create so much transactions , think about to compress them or use data deduplication at storage level or filesystem level , or you will have a lot of wasted space ( the wal segment even if is not full will write 16Mb of <data>+<00000..000>)

 

Have a nice day


Mat Dba

Re: PITR

From
bricklen
Date:

On Sun, Feb 23, 2014 at 7:20 AM, Murthy Nunna <mnunna@fnal.gov> wrote:
While I understand your explanation, I feel I am still missing something. IMHO, when I say pause using “pause” setting, no matter what, I expect the recovery to wait for manual intervention.

Have you explored using "pg_xlog_replay_pause()" and "pg_xlog_replay_resume()"?  See http://www.postgresql.org/docs/current/static/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL-TABLE

Re: PITR

From
Raghavendra
Date:
On Sun, Feb 23, 2014 at 8:50 PM, Murthy Nunna <mnunna@fnal.gov> wrote:

Raghavendra,

 

Thanks for testing and confirming the behavior of “pause” setting.

 

While I understand your explanation, I feel I am still missing something. IMHO, when I say pause using “pause” setting, no matter what, I expect the recovery to wait for manual intervention.


I very much agree with your point that it has to pause when you ask for it, however, as per design (some other might comment on this well) am guessing it will open the database if no wals are there though you intentionally hide them. 

You can use (HOT STANDBY) standby_mode=on which does the same thing, it just waits for the WAL files but it won't open the database until you pass the trigger file. In hot standby, it apply the existing wals fed and wait for coming wals and it won't come out of recovery.  This you can try with below link.

 

I myself can come up with number of reasons for doing so… e.g I may be purposely “hiding” some WALs somewhere else, or maybe I have several thousands of WALs that I want to parallelize the process of applying some logs while I recall some from tapes.

 

Let me know what you think.


Agreed it might be possible of not having wals at the moment and waiting for them to copy, however, I prefer in that case to use hot_standby. Pause just works in case if it sees some pending file in Arch.. location.

My explanation might not reach to your expectation, but I am sure few other's here might share their inputs.
--Raghav

Re: PITR

From
Murthy Nunna
Date:

Hi Raghavendra,

 

I used standby_mode=on and it worked. I can put checkpoints (not database checkpoint J) in between and still be in recovery state. This is what I wanted.

 

Thanks for your help!

 

Murthy

 

 

From: Raghavendra [mailto:raghavendra.rao@enterprisedb.com]
Sent: Monday, February 24, 2014 6:31 AM
To: Murthy Nunna
Cc: desmodemone; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] PITR

 

On Sun, Feb 23, 2014 at 8:50 PM, Murthy Nunna <mnunna@fnal.gov> wrote:

Raghavendra,

 

Thanks for testing and confirming the behavior of “pause” setting.

 

While I understand your explanation, I feel I am still missing something. IMHO, when I say pause using “pause” setting, no matter what, I expect the recovery to wait for manual intervention.

 

I very much agree with your point that it has to pause when you ask for it, however, as per design (some other might comment on this well) am guessing it will open the database if no wals are there though you intentionally hide them. 

 

You can use (HOT STANDBY) standby_mode=on which does the same thing, it just waits for the WAL files but it won't open the database until you pass the trigger file. In hot standby, it apply the existing wals fed and wait for coming wals and it won't come out of recovery.  This you can try with below link.

 

 

I myself can come up with number of reasons for doing so… e.g I may be purposely “hiding” some WALs somewhere else, or maybe I have several thousands of WALs that I want to parallelize the process of applying some logs while I recall some from tapes.

 

Let me know what you think.

 

Agreed it might be possible of not having wals at the moment and waiting for them to copy, however, I prefer in that case to use hot_standby. Pause just works in case if it sees some pending file in Arch.. location.

 

My explanation might not reach to your expectation, but I am sure few other's here might share their inputs.

--Raghav

 

Re: PITR

From
Murthy Nunna
Date:

All,

 

My database is in continuous recovery but it is not a traditional standby setup…. I am manually copying WALs (several thousands) from the source server to the server I am recovering. This means recovery is being done while WALs are being copied. It could so happen that a particular WAL could be open for recovery while it is still being copied (and not closed). Is this ok? I am asking because, I am seeing lots of LOG messages “unexpected pageaddr” as below.

I can do this differently if this method is not supported. I don’t want to end up with a corrupted database.

 

Thanks in advance for your advice.

 

2014-02-25 23:24:27 CST []LOG:  unexpected pageaddr 15D1/6E000000 in log file 5587, segment 22, offset 0

cp: cannot stat `/data1/pg_archlogs/ifb_prd/00000003000015D300000016': No such file or directory

2014-02-25 23:24:31 CST []LOG:  restored log file "00000003000015D300000016" from archive

2014-02-25 23:24:31 CST []LOG:  restored log file "00000003000015D300000017" from archive

2014-02-25 23:24:32 CST []LOG:  restored log file "00000003000015D300000018" from archive

2014-02-25 23:24:32 CST []LOG:  restored log file "00000003000015D300000019" from archive

2014-02-25 23:24:32 CST []LOG:  restored log file "00000003000015D30000001A" from archive

2014-02-25 23:24:32 CST []LOG:  restored log file "00000003000015D30000001B" from archive

2014-02-25 23:24:32 CST []LOG:  restored log file "00000003000015D30000001C" from archive

2014-02-25 23:24:32 CST []LOG:  restored log file "00000003000015D30000001D" from archive

2014-02-25 23:24:32 CST []LOG:  restored log file "00000003000015D30000001E" from archive

2014-02-25 23:24:33 CST []LOG:  unexpected pageaddr 15D1/77000000 in log file 5587, segment 31, offset 0

2014-02-25 23:24:36 CST []LOG:  restored log file "00000003000015D30000001F" from archive

2014-02-25 23:24:36 CST []LOG:  restored log file "00000003000015D300000020" from archive

2014-02-25 23:24:36 CST []LOG:  restored log file "00000003000015D300000021" from archive

2014-02-25 23:24:36 CST []LOG:  restored log file "00000003000015D300000022" from archive

2014-02-25 23:24:36 CST []LOG:  restored log file "00000003000015D300000023" from archive

2014-02-25 23:24:36 CST []LOG:  restored log file "00000003000015D300000024" from archive

2014-02-25 23:24:36 CST []LOG:  restored log file "00000003000015D300000025" from archive

2014-02-25 23:24:37 CST []LOG:  unexpected pageaddr 15D1/7E000000 in log file 5587, segment 38, offset 0

2014-02-25 23:24:41 CST []LOG:  restored log file "00000003000015D300000026" from archive

2014-02-25 23:24:41 CST []LOG:  restored log file "00000003000015D300000027" from archive

2014-02-25 23:24:41 CST []LOG:  restored log file "00000003000015D300000028" from archive

2014-02-25 23:24:41 CST []LOG:  restored log file "00000003000015D300000029" from archive

2014-02-25 23:24:41 CST []LOG:  restored log file "00000003000015D30000002A" from archive

2014-02-25 23:24:41 CST []LOG:  restored log file "00000003000015D30000002B" from archive

2014-02-25 23:24:41 CST []LOG:  restored log file "00000003000015D30000002C" from archive

2014-02-25 23:24:42 CST []LOG:  restored log file "00000003000015D30000002D" from archive

2014-02-25 23:24:42 CST []LOG:  restored log file "00000003000015D30000002E" from archive

2014-02-25 23:24:42 CST []LOG:  unexpected pageaddr 15D1/26000000 in log file 5587, segment 47, offset 0

 

 

 

 

 

 

From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Murthy Nunna
Sent: Monday, February 24, 2014 12:14 PM
To: Raghavendra
Cc: desmodemone; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] PITR

 

Hi Raghavendra,

 

I used standby_mode=on and it worked. I can put checkpoints (not database checkpoint J) in between and still be in recovery state. This is what I wanted.

 

Thanks for your help!

 

Murthy

 

 

From: Raghavendra [mailto:raghavendra.rao@enterprisedb.com]
Sent: Monday, February 24, 2014 6:31 AM
To: Murthy Nunna
Cc: desmodemone; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] PITR

 

On Sun, Feb 23, 2014 at 8:50 PM, Murthy Nunna <mnunna@fnal.gov> wrote:

Raghavendra,

 

Thanks for testing and confirming the behavior of “pause” setting.

 

While I understand your explanation, I feel I am still missing something. IMHO, when I say pause using “pause” setting, no matter what, I expect the recovery to wait for manual intervention.

 

I very much agree with your point that it has to pause when you ask for it, however, as per design (some other might comment on this well) am guessing it will open the database if no wals are there though you intentionally hide them. 

 

You can use (HOT STANDBY) standby_mode=on which does the same thing, it just waits for the WAL files but it won't open the database until you pass the trigger file. In hot standby, it apply the existing wals fed and wait for coming wals and it won't come out of recovery.  This you can try with below link.

 

 

I myself can come up with number of reasons for doing so… e.g I may be purposely “hiding” some WALs somewhere else, or maybe I have several thousands of WALs that I want to parallelize the process of applying some logs while I recall some from tapes.

 

Let me know what you think.

 

Agreed it might be possible of not having wals at the moment and waiting for them to copy, however, I prefer in that case to use hot_standby. Pause just works in case if it sees some pending file in Arch.. location.

 

My explanation might not reach to your expectation, but I am sure few other's here might share their inputs.

--Raghav

 

Re: PITR

From
Murthy Nunna
Date:

Could someone please comment on this? Thanks.

 

From: Murthy Nunna
Sent: Tuesday, February 25, 2014 11:33 PM
To: Murthy Nunna; Raghavendra
Cc: desmodemone; pgsql-admin@postgresql.org
Subject: RE: [ADMIN] PITR

 

All,

 

My database is in continuous recovery but it is not a traditional standby setup…. I am manually copying WALs (several thousands) from the source server to the server I am recovering. This means recovery is being done while WALs are being copied. It could so happen that a particular WAL could be open for recovery while it is still being copied (and not closed). Is this ok? I am asking because, I am seeing lots of LOG messages “unexpected pageaddr” as below.

I can do this differently if this method is not supported. I don’t want to end up with a corrupted database.

 

Thanks in advance for your advice.

 

2014-02-25 23:24:27 CST []LOG:  unexpected pageaddr 15D1/6E000000 in log file 5587, segment 22, offset 0

cp: cannot stat `/data1/pg_archlogs/ifb_prd/00000003000015D300000016': No such file or directory

2014-02-25 23:24:31 CST []LOG:  restored log file "00000003000015D300000016" from archive

2014-02-25 23:24:31 CST []LOG:  restored log file "00000003000015D300000017" from archive

2014-02-25 23:24:32 CST []LOG:  restored log file "00000003000015D300000018" from archive

2014-02-25 23:24:32 CST []LOG:  restored log file "00000003000015D300000019" from archive

2014-02-25 23:24:32 CST []LOG:  restored log file "00000003000015D30000001A" from archive

2014-02-25 23:24:32 CST []LOG:  restored log file "00000003000015D30000001B" from archive

2014-02-25 23:24:32 CST []LOG:  restored log file "00000003000015D30000001C" from archive

2014-02-25 23:24:32 CST []LOG:  restored log file "00000003000015D30000001D" from archive

2014-02-25 23:24:32 CST []LOG:  restored log file "00000003000015D30000001E" from archive

2014-02-25 23:24:33 CST []LOG:  unexpected pageaddr 15D1/77000000 in log file 5587, segment 31, offset 0

2014-02-25 23:24:36 CST []LOG:  restored log file "00000003000015D30000001F" from archive

2014-02-25 23:24:36 CST []LOG:  restored log file "00000003000015D300000020" from archive

2014-02-25 23:24:36 CST []LOG:  restored log file "00000003000015D300000021" from archive

2014-02-25 23:24:36 CST []LOG:  restored log file "00000003000015D300000022" from archive

2014-02-25 23:24:36 CST []LOG:  restored log file "00000003000015D300000023" from archive

2014-02-25 23:24:36 CST []LOG:  restored log file "00000003000015D300000024" from archive

2014-02-25 23:24:36 CST []LOG:  restored log file "00000003000015D300000025" from archive

2014-02-25 23:24:37 CST []LOG:  unexpected pageaddr 15D1/7E000000 in log file 5587, segment 38, offset 0

2014-02-25 23:24:41 CST []LOG:  restored log file "00000003000015D300000026" from archive

2014-02-25 23:24:41 CST []LOG:  restored log file "00000003000015D300000027" from archive

2014-02-25 23:24:41 CST []LOG:  restored log file "00000003000015D300000028" from archive

2014-02-25 23:24:41 CST []LOG:  restored log file "00000003000015D300000029" from archive

2014-02-25 23:24:41 CST []LOG:  restored log file "00000003000015D30000002A" from archive

2014-02-25 23:24:41 CST []LOG:  restored log file "00000003000015D30000002B" from archive

2014-02-25 23:24:41 CST []LOG:  restored log file "00000003000015D30000002C" from archive

2014-02-25 23:24:42 CST []LOG:  restored log file "00000003000015D30000002D" from archive

2014-02-25 23:24:42 CST []LOG:  restored log file "00000003000015D30000002E" from archive

2014-02-25 23:24:42 CST []LOG:  unexpected pageaddr 15D1/26000000 in log file 5587, segment 47, offset 0

 

 

 

 

 

 

From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Murthy Nunna
Sent: Monday, February 24, 2014 12:14 PM
To: Raghavendra
Cc: desmodemone; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] PITR

 

Hi Raghavendra,

 

I used standby_mode=on and it worked. I can put checkpoints (not database checkpoint J) in between and still be in recovery state. This is what I wanted.

 

Thanks for your help!

 

Murthy

 

 

From: Raghavendra [mailto:raghavendra.rao@enterprisedb.com]
Sent: Monday, February 24, 2014 6:31 AM
To: Murthy Nunna
Cc: desmodemone; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] PITR

 

On Sun, Feb 23, 2014 at 8:50 PM, Murthy Nunna <mnunna@fnal.gov> wrote:

Raghavendra,

 

Thanks for testing and confirming the behavior of “pause” setting.

 

While I understand your explanation, I feel I am still missing something. IMHO, when I say pause using “pause” setting, no matter what, I expect the recovery to wait for manual intervention.

 

I very much agree with your point that it has to pause when you ask for it, however, as per design (some other might comment on this well) am guessing it will open the database if no wals are there though you intentionally hide them. 

 

You can use (HOT STANDBY) standby_mode=on which does the same thing, it just waits for the WAL files but it won't open the database until you pass the trigger file. In hot standby, it apply the existing wals fed and wait for coming wals and it won't come out of recovery.  This you can try with below link.

 

 

I myself can come up with number of reasons for doing so… e.g I may be purposely “hiding” some WALs somewhere else, or maybe I have several thousands of WALs that I want to parallelize the process of applying some logs while I recall some from tapes.

 

Let me know what you think.

 

Agreed it might be possible of not having wals at the moment and waiting for them to copy, however, I prefer in that case to use hot_standby. Pause just works in case if it sees some pending file in Arch.. location.

 

My explanation might not reach to your expectation, but I am sure few other's here might share their inputs.

--Raghav

 

Re: PITR

From
Geoff Winkless
Date:
On 26 February 2014 13:55, Murthy Nunna <mnunna@fnal.gov> wrote:
Could someone please comment on this? Thanks.

Sounds like it's not doing what you want it to do, huh?

Geoff