Thread: standby waiting for what?
Testing pg_standby in 8.3.6. I've gotten this standby into some sort of bind. It seems like it may be waiting for some WAL. How can I tell what it is waiting on? I don't really know how this works, so I may
On Wed, 2009-03-04 at 15:06 -0500, Ray Stell wrote: > Testing pg_standby in 8.3.6. I've gotten this standby into some sort of > bind. It seems like it may be waiting for some WAL. How can I tell > what it is waiting on? I don't really know how this works, so I may Looks like you were cut off a bit. What do the logs say and your ps output on the standby? Joshua D. Drake > -- PostgreSQL - XMPP: jdrake@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997
On Wed, Mar 04, 2009 at 03:06:12PM -0500, Ray Stell wrote: > Testing pg_standby in 8.3.6. I've gotten this standby into some sort of > bind. It seems like it may be waiting for some WAL. How can I tell > what it is waiting on? I don't really know how this works, so I may say something silly. The standby log says: ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,1,2009-03-04 12:23:01 EST,0, LOG: database system was interrupted; lastknown up at 2009-03-04 12:20:29 EST ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,2,2009-03-04 12:23:01 EST,0, LOG: starting archive recovery ,2512,,2009-03-04 12:23:01.484 EST,49aeb8f5.9d0,3,2009-03-04 12:23:01 EST,0, LOG: restore_command = '/usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp %f %p %r >> /home/postgresql/log/alerts_oamp/recovery.log' alerts_oamp]$ cat postmaster.pid 2510 /data/pgsql/alerts_oamp 5498001 4194312 alerts_oamp]$ ps -ef | grep 1005 1005 903 901 0 10:10 ? 00:00:00 sshd: postgresql@pts/0 1005 904 903 0 10:10 pts/0 00:00:00 -bash 1005 1016 1013 0 10:21 ? 00:00:00 sshd: postgresql@pts/1 1005 1017 1016 0 10:21 pts/1 00:00:00 -bash 1005 2510 1 0 12:23 pts/0 00:00:00 /usr/local/pgsql836/bin/postgres -D /data/pgsql/alerts_oamp 1005 2511 2510 0 12:23 ? 00:00:00 postgres: logger process 1005 2512 2510 0 12:23 ? 00:00:00 postgres: startup process 1005 2520 2512 0 12:23 ? 00:00:00 sh -c /usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp 00000002000000000000001C.00512178.backuppg_xlog/RECOVERYHISTORY 000000000000000000000000 >> /home/postgresql/log/alerts_oamp/recovery.log 1005 2521 2520 0 12:23 ? 00:00:00 /usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp 00000002000000000000001C.00512178.backuppg_xlog/RECOVERYHISTORY 000000000000000000000000 1005 2615 1017 0 12:27 pts/1 00:00:00 tail -f alerts_oamp-2009-03-04_122301.log 1005 3271 904 0 15:11 pts/0 00:00:00 ps -ef 1005 3272 904 0 15:11 pts/0 00:00:00 grep 1005 alerts_oamp]$ ls -l /data/pgsql/wals/alerts_oamp/ total 114828 -rw------- 1 postgresql postgresql 16777216 Mar 4 11:28 00000002000000000000001A -rw------- 1 postgresql postgresql 16777216 Mar 4 11:29 00000002000000000000001B -rw------- 1 postgresql postgresql 16777216 Mar 4 12:24 00000002000000000000001C -rw------- 1 postgresql postgresql 16777216 Mar 4 12:25 00000002000000000000001D -rw------- 1 postgresql postgresql 16777216 Mar 4 12:26 00000002000000000000001E -rw------- 1 postgresql postgresql 16777216 Mar 4 14:45 00000002000000000000001F -rw------- 1 postgresql postgresql 16777216 Mar 4 14:45 000000020000000000000020 any ideas what this guy is hurt by?
On Wed, 2009-03-04 at 15:14 -0500, Ray Stell wrote: > On Wed, Mar 04, 2009 at 03:06:12PM -0500, Ray Stell wrote: > > Testing pg_standby in 8.3.6. I've gotten this standby into some sort of > > bind. It seems like it may be waiting for some WAL. How can I tell > > what it is waiting on? I don't really know how this works, so I may > > > say something silly. The standby log says: > > ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,1,2009-03-04 12:23:01 EST,0, LOG: database system was interrupted; lastknown up at 2009-03-04 12:20:29 EST > ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,2,2009-03-04 12:23:01 EST,0, LOG: starting archive recovery > ,2512,,2009-03-04 12:23:01.484 EST,49aeb8f5.9d0,3,2009-03-04 12:23:01 EST,0, LOG: restore_command = '/usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp %f %p %r >> /home/postgresql/log/alerts_oamp/recovery.log' You've set archive_timeout? http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVING -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
On Wed, Mar 04, 2009 at 08:31:16PM +0000, Simon Riggs wrote: > You've set archive_timeout? > no, but new WAL files seem to be getting created and replicated to the standby, just the pg_standby command seems snagged on something. I probably have done something dumb, ready, fire, aim. I'm more interested in how to analyze the state. Thanks.
For some reason it is looking for 00000002000000000000001C.00512178.backup file which is not the WAL file. Are you sure that you made initial recovery properly? Yauheni Labko (Eugene Lobko) Junior System Administrator Chapdelaine & Co. (212)208-9150 On Wednesday 04 March 2009 03:14:51 pm Ray Stell wrote: > On Wed, Mar 04, 2009 at 03:06:12PM -0500, Ray Stell wrote: > > Testing pg_standby in 8.3.6. I've gotten this standby into some sort of > > bind. It seems like it may be waiting for some WAL. How can I tell > > what it is waiting on? I don't really know how this works, so I may > > say something silly. The standby log says: > > ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,1,2009-03-04 12:23:01 > EST,0, LOG: database system was interrupted; last known up at 2009-03-04 > 12:20:29 EST ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,2,2009-03-04 > 12:23:01 EST,0, LOG: starting archive recovery ,2512,,2009-03-04 > 12:23:01.484 EST,49aeb8f5.9d0,3,2009-03-04 12:23:01 EST,0, LOG: > restore_command = '/usr/local/pgsql/bin/pg_standby > /data/pgsql/wals/alerts_oamp %f %p %r >> > /home/postgresql/log/alerts_oamp/recovery.log' > > > alerts_oamp]$ cat postmaster.pid > 2510 > /data/pgsql/alerts_oamp > 5498001 4194312 > > alerts_oamp]$ ps -ef | grep 1005 > 1005 903 901 0 10:10 ? 00:00:00 sshd: postgresql@pts/0 > 1005 904 903 0 10:10 pts/0 00:00:00 -bash > 1005 1016 1013 0 10:21 ? 00:00:00 sshd: postgresql@pts/1 > 1005 1017 1016 0 10:21 pts/1 00:00:00 -bash > 1005 2510 1 0 12:23 pts/0 00:00:00 > /usr/local/pgsql836/bin/postgres -D /data/pgsql/alerts_oamp 1005 2511 > 2510 0 12:23 ? 00:00:00 postgres: logger process 1005 2512 > 2510 0 12:23 ? 00:00:00 postgres: startup process 1005 2520 > 2512 0 12:23 ? 00:00:00 sh -c /usr/local/pgsql/bin/pg_standby > /data/pgsql/wals/alerts_oamp 00000002000000000000001C.00512178.backup > pg_xlog/RECOVERYHISTORY 000000000000000000000000 >> > /home/postgresql/log/alerts_oamp/recovery.log 1005 2521 2520 0 12:23 > ? 00:00:00 /usr/local/pgsql/bin/pg_standby > /data/pgsql/wals/alerts_oamp 00000002000000000000001C.00512178.backup > pg_xlog/RECOVERYHISTORY 000000000000000000000000 1005 2615 1017 0 > 12:27 pts/1 00:00:00 tail -f alerts_oamp-2009-03-04_122301.log 1005 > 3271 904 0 15:11 pts/0 00:00:00 ps -ef > 1005 3272 904 0 15:11 pts/0 00:00:00 grep 1005 > > alerts_oamp]$ ls -l /data/pgsql/wals/alerts_oamp/ > total 114828 > -rw------- 1 postgresql postgresql 16777216 Mar 4 11:28 > 00000002000000000000001A -rw------- 1 postgresql postgresql 16777216 Mar 4 > 11:29 00000002000000000000001B -rw------- 1 postgresql postgresql 16777216 > Mar 4 12:24 00000002000000000000001C -rw------- 1 postgresql postgresql > 16777216 Mar 4 12:25 00000002000000000000001D -rw------- 1 postgresql > postgresql 16777216 Mar 4 12:26 00000002000000000000001E -rw------- 1 > postgresql postgresql 16777216 Mar 4 14:45 00000002000000000000001F > -rw------- 1 postgresql postgresql 16777216 Mar 4 14:45 > 000000020000000000000020 > > any ideas what this guy is hurt by?
On Wed, Mar 04, 2009 at 03:41:06PM -0500, Yauheni Labko wrote: > For some reason it is looking for 00000002000000000000001C.00512178.backup > file which is not the WAL file. > > Are you sure that you made initial recovery properly? I could have fouled this in any number of ways. Like I said I'm trying to understand how to analyze the situation and maybe learn something. OK, so my recovery.conf is set like this: restore_command='/usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp %f %p %r >> /home/postgresql/log/alerts_oamp/recovery.log' So, the %f arg sent to the pg_standby command has a value of 00000002000000000000001C.00512178.backup, right? Is that wrong? If so, where could that have come from or how could I have trashed the thing. I love fishing.
No. %f is the WAL filename which is needed by the server to start recovery. 00000002000000000000001C.00512178.backup will give you start and end of WAL segment, the WAL filename containing this segment and your label to identify where it might be. That's why I asked you about your backup. What is the archive_command for primary server? Yauheni Labko (Eugene Lobko) Junior System Administrator Chapdelaine & Co. (212)208-9150 On Wednesday 04 March 2009 04:35:00 pm you wrote: > On Wed, Mar 04, 2009 at 03:41:06PM -0500, Yauheni Labko wrote: > > For some reason it is looking for > > 00000002000000000000001C.00512178.backup file which is not the WAL file. > > > > Are you sure that you made initial recovery properly? > > I could have fouled this in any number of ways. Like I said > I'm trying to understand how to analyze the situation and maybe > learn something. > > OK, so my recovery.conf is set like this: > > restore_command='/usr/local/pgsql/bin/pg_standby > /data/pgsql/wals/alerts_oamp %f %p %r >> > /home/postgresql/log/alerts_oamp/recovery.log' > > So, the %f arg sent to the pg_standby command has a value of > 00000002000000000000001C.00512178.backup, right? Is that wrong? > If so, where could that have come from or how could I have trashed > the thing. > > I love fishing.
Btw i think you may remove %r from the restore command. Yauheni Labko (Eugene Lobko) Junior System Administrator Chapdelaine & Co. (212)208-9150
On Wed, Mar 04, 2009 at 05:05:19PM -0500, Yauheni Labko wrote: > No. %f is the WAL filename which is needed by the server to start recovery. my recovery command is: restore_command='/usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp %f %p %r >> /home/postgresql/log/alerts_oamp/recovery.log' the process that is running is, from the ps command: sh -c /usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp +00000002000000000000001C.00512178.backup pg_xlog/RECOVERYHISTORY 000000000000000000000000 >> +/home/postgresql/log/alerts_oamp/recovery.log there are 4 args to pg_standby: 1. /data/pgsql/wals/alerts_oamp the archive dir 2. 00000002000000000000001C.00512178.backup = %f 3. pg_xlog/RECOVERYHISTORY = %p 4. 000000000000000000000000 = %r Looks like %f to me. Right? So, how could it get set to a funky value like that? > 00000002000000000000001C.00512178.backup will give you start and end of WAL > segment, the WAL filename containing this segment and your label to identify > where it might be. That's why I asked you about your backup. I don't follow you here. How did you know that the string, 00000002000000000000001C.00512178.backup will provide the start and end of WAL segment? Please explain how you determined this? I suspect the standy is waiting for a WAL named 00000002000000000000001C and that file exists on the standby: ls -l /data/pgsql/wals/alerts_oamp/ total 114828 -rw------- 1 postgresql postgresql 16777216 Mar 4 11:28 00000002000000000000001A -rw------- 1 postgresql postgresql 16777216 Mar 4 11:29 00000002000000000000001B -rw------- 1 postgresql postgresql 16777216 Mar 4 12:24 00000002000000000000001C -rw------- 1 postgresql postgresql 16777216 Mar 4 12:25 00000002000000000000001D -rw------- 1 postgresql postgresql 16777216 Mar 4 12:26 00000002000000000000001E -rw------- 1 postgresql postgresql 16777216 Mar 4 14:45 00000002000000000000001F -rw------- 1 postgresql postgresql 16777216 Mar 4 14:45 000000020000000000000020 but, somehow I confused pg and the name being passed to pg_standby is wrong. I have a suspicion that if I mv /data/pgsql/wals/alerts_oamp/00000002000000000000001C to /data/pgsql/wals/alerts_oamp/00000002000000000000001C.00512178.backup that the recovery would pick up. If there is no understanding to be gained here, I might do that and then throw this out with the bath water and start over. > What is the archive_command for primary server? the archive command is working fine as you see above. It just runs a local script that does an rsync of the pg_xlog wal file over to the standby. I agree with you, something about the backup must have caused this confusion. I'm just rsyncing the cluster dir from the primary to the standby, removing pg_xlogs files, and setting recovery.conf. Is there a better way?
> my recovery command is: > restore_command='/usr/local/pgsql/bin/pg_standby > /data/pgsql/wals/alerts_oamp %f %p %r >> > /home/postgresql/log/alerts_oamp/recovery.log' You gave the recovery command, not the archive command. And give a piece of log file where primary server performs archiving WAL files. > I don't follow you here. How did you know that the string, > 00000002000000000000001C.00512178.backup will provide the start and > end of WAL segment? Please explain how you determined this? *.backup files are ASCII files. They're described at 24.3.2 section of postgres documentation. > I have a suspicion that if I mv > /data/pgsql/wals/alerts_oamp/00000002000000000000001C to > /data/pgsql/wals/alerts_oamp/00000002000000000000001C.00512178.backup that > the recovery would pick up. If there is no understanding to be gained > here, I might do that and then throw this out with the bath water and start > over. May be you will fix the problem at this point. But you will never know why database server wants to grab this file. > confusion. I'm just rsyncing the cluster dir from the primary to the > standby, removing pg_xlogs files, and setting recovery.conf. Is there > a better way? I think you have problem at this point. Did you switch primary database server into backup mode? Where are the archived WAL files? Did you switch primary database server back into normal operation mode after rsyncing? Yauheni Labko (Eugene Lobko) Junior System Administrator Chapdelaine & Co. (212)208-9150
On Thu, Mar 05, 2009 at 10:04:44AM -0500, Yauheni Labko wrote: > You gave the recovery command, not the archive command. I'm curious why you are focused on the archive on the primary? Is there some logic in the archive process that could cause the recovery to be looking for /data/pgsql/wals/alerts_oamp/00000002000000000000001C.00512178.backup ? That doesn't make any sense to me. Are they not two completely seperate worlds or does the archive command somehow determine what args go pg_standby? The archive command is set to: archive_command = '/home/postgresql/bin/archive.sh %p %f alerts_oamp' that script in essence echos some debug values and rsyncs the WAL file over. echo "src - $SRC" echo "destname - $NAME" rsync --links -e 'ssh -c blowfish -ax' -avz "${SRC}" "${STBY}:${STBYDIR}/${NAME}" Again, those files look fine on the standby, the names are identical to those in the primary pg_xlogs. So, please explain your reasoning. > And give a piece of > log file where primary server performs archiving WAL files. src - /data/pgsql/alerts_oamp/pg_xlog/00000002000000000000001F destname - 00000002000000000000001F Authorized use only! All sessions may be monitored or recorded. building file list ... done 00000002000000000000001F sent 4211135 bytes received 42 bytes 935817.11 bytes/sec total size is 16777216 speedup is 3.98 src - /data/pgsql/alerts_oamp/pg_xlog/000000020000000000000020 destname - 000000020000000000000020 Authorized use only! All sessions may be monitored or recorded. building file list ... done 000000020000000000000020 sent 4750728 bytes received 42 bytes 863776.36 bytes/sec total size is 16777216 speedup is 3.53
> I'm curious why you are focused on the archive on the primary? Postgres docs do not say that during initial recovery you need a backup history file. But It is required during initial recovery. That's why I am focused on it. You lost 00000002000000000000001C.00512178.backup. That means your archiving is working wrong. I simulated your situation. Standby server (the WAL files location) # ls -l /pg_restore/pg_xlog/backup_03012009_0200/ total 49216 -rw------- 1 postgres postgres 16777216 2009-03-01 02:03 0000000100000005000000B7 -rw------- 1 postgres postgres 256 2009-03-01 02:03 __0000000100000005000000B7.00718400.backup -rw------- 1 postgres postgres 16777216 2009-03-01 12:02 0000000100000005000000B8 -rw------- 1 postgres postgres 16777216 2009-03-01 22:01 0000000100000005000000B9 My restore command (debug is on): # cat recovery.conf restore_command = '/opt/postgresplus/8.3/bin/pg_standby -d -r 3 -w 0 -s 10 -t /opt/postgresplus/8.3/data/pg_trigger /pg_restore/pg_xlog/backup_03012009_0200 %f %p' The piece of serverlog (pay attention to the backup history filename and compare it with the ls -l output before): Trigger file : /opt/postgresplus/8.3/data/pg_trigger Waiting for WAL file : 0000000100000005000000B7.00718400.backup WAL file path : /pg_restore/pg_xlog/backup_03012009_0200/0000000100000005000000B7.00718400.backup Restoring to... : pg_xlog/RECOVERYHISTORY Sleep interval : 10 seconds Max wait interval : 0 forever Command for restore : cp "/pg_restore/pg_xlog/backup_03012009_0200/0000000100000005000000B7.00718400.backup" "pg_xlog/RECOVERYHISTORY" Keep archive history : No cleanup required WAL file not present yet. Checking for trigger file... WAL file not present yet. Checking for trigger file... WAL file not present yet. Checking for trigger file... Now I renamed __0000000100000005000000B7.00718400.backup to 0000000100000005000000B7.00718400.backup and restarted the standby server: Trigger file : /opt/postgresplus/8.3/data/pg_trigger Waiting for WAL file : 0000000100000005000000B7.00718400.backup WAL file path : /pg_restore/pg_xlog/backup_03012009_0200/0000000100000005000000B7.00718400.backup Restoring to... : pg_xlog/RECOVERYHISTORY Sleep interval : 10 seconds Max wait interval : 0 forever Command for restore : cp "/pg_restore/pg_xlog/backup_03012009_0200/0000000100000005000000B7.00718400.backup" "pg_xlog/RECOVERYHISTORY" Keep archive history : No cleanup required running restore : OKLOG: restored log file "0000000100000005000000B7.00718400.backup" from archive Trigger file : /opt/postgresplus/8.3/data/pg_trigger Waiting for WAL file : 0000000100000005000000B7 WAL file path : /pg_restore/pg_xlog/backup_03012009_0200/0000000100000005000000B7 Restoring to... : pg_xlog/RECOVERYXLOG Sleep interval : 10 seconds Max wait interval : 0 forever Command for restore : cp "/pg_restore/pg_xlog/backup_03012009_0200/0000000100000005000000B7" "pg_xlog/RECOVERYXLOG" Keep archive history : No cleanup required running restore : OKLOG: restored log file "0000000100000005000000B7" from archive LOG: automatic recovery in progress LOG: redo starts at 5/B7718440 .... (recovering from other WAL files) WAL file not present yet. Checking for trigger file... You see that the database server is looking for the history file at first which is not necessary and the next file which is a backup history file. That's why your standby server is waiting for it. Yauheni Labko (Eugene Lobko) Junior System Administrator Chapdelaine & Co. (212)208-9150
On Wed, Mar 04, 2009 at 03:14:51PM -0500, Ray Stell wrote: > On Wed, Mar 04, 2009 at 03:06:12PM -0500, Ray Stell wrote: > > Testing pg_standby in 8.3.6. I've gotten this standby into some sort of > > bind. It seems like it may be waiting for some WAL. How can I tell > > what it is waiting on? I don't really know how this works, so I may > > > say something silly. The standby log says: > > ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,1,2009-03-04 12:23:01 EST,0, LOG: database system was interrupted; lastknown up at 2009-03-04 12:20:29 EST > ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,2,2009-03-04 12:23:01 EST,0, LOG: starting archive recovery > ,2512,,2009-03-04 12:23:01.484 EST,49aeb8f5.9d0,3,2009-03-04 12:23:01 EST,0, LOG: restore_command = '/usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp %f %p %r >> /home/postgresql/log/alerts_oamp/recovery.log' > > > alerts_oamp]$ cat postmaster.pid > 2510 > /data/pgsql/alerts_oamp > 5498001 4194312 > > alerts_oamp]$ ps -ef | grep 1005 > 1005 903 901 0 10:10 ? 00:00:00 sshd: postgresql@pts/0 > 1005 904 903 0 10:10 pts/0 00:00:00 -bash > 1005 1016 1013 0 10:21 ? 00:00:00 sshd: postgresql@pts/1 > 1005 1017 1016 0 10:21 pts/1 00:00:00 -bash > 1005 2510 1 0 12:23 pts/0 00:00:00 /usr/local/pgsql836/bin/postgres -D /data/pgsql/alerts_oamp > 1005 2511 2510 0 12:23 ? 00:00:00 postgres: logger process > 1005 2512 2510 0 12:23 ? 00:00:00 postgres: startup process > 1005 2520 2512 0 12:23 ? 00:00:00 sh -c /usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp 00000002000000000000001C.00512178.backuppg_xlog/RECOVERYHISTORY 000000000000000000000000 >> /home/postgresql/log/alerts_oamp/recovery.log > 1005 2521 2520 0 12:23 ? 00:00:00 /usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp 00000002000000000000001C.00512178.backuppg_xlog/RECOVERYHISTORY 000000000000000000000000 > 1005 2615 1017 0 12:27 pts/1 00:00:00 tail -f alerts_oamp-2009-03-04_122301.log > 1005 3271 904 0 15:11 pts/0 00:00:00 ps -ef > 1005 3272 904 0 15:11 pts/0 00:00:00 grep 1005 > > alerts_oamp]$ ls -l /data/pgsql/wals/alerts_oamp/ > total 114828 > -rw------- 1 postgresql postgresql 16777216 Mar 4 11:28 00000002000000000000001A > -rw------- 1 postgresql postgresql 16777216 Mar 4 11:29 00000002000000000000001B > -rw------- 1 postgresql postgresql 16777216 Mar 4 12:24 00000002000000000000001C > -rw------- 1 postgresql postgresql 16777216 Mar 4 12:25 00000002000000000000001D > -rw------- 1 postgresql postgresql 16777216 Mar 4 12:26 00000002000000000000001E > -rw------- 1 postgresql postgresql 16777216 Mar 4 14:45 00000002000000000000001F > -rw------- 1 postgresql postgresql 16777216 Mar 4 14:45 000000020000000000000020 > > any ideas what this guy is hurt by? I stubbled into the source of the problem. I hope somebody who knows the code can explain. I decided to bounce the primary just to see if it would make a difference in the standby. The primary would not restart: ,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,2,2009-03-06 10:34:01 EST,0, LOG: could not open file "pg_xlog/00000002000000000000001C"(log file 0, segment 28): No such file or directory ,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,3,2009-03-06 10:34:01 EST,0, LOG: invalid checkpoint record ,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,4,2009-03-06 10:34:01 EST,0, PANIC: could not locate required checkpointrecord ,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,5,2009-03-06 10:34:01 EST,0, HINT: If you are not restoring from a backup,try removing the file "/data/pgsql/alerts_oamp/backup_label". ,3093,,2009-03-06 10:34:01.910 EST,49b14269.c15,1,2009-03-06 10:34:01 EST,0, LOG: startup process (PID 3095) was terminatedby signal 6: Aborted So, I removed that file and restarted. Rebuilt the standby and all is well. So, why did that file muck up the standby and change the value pg was passing to pg_standby? Thanks, looking forward to 8.4!