Thread: standby waiting for what?

standby waiting for what?

From
Ray Stell
Date:
Testing pg_standby in 8.3.6.  I've gotten this standby into some sort of
bind.  It seems like it may be waiting for some WAL.   How can I tell
what it is waiting on?  I don't really know how this works, so I may

Re: standby waiting for what?

From
"Joshua D. Drake"
Date:
On Wed, 2009-03-04 at 15:06 -0500, Ray Stell wrote:
> Testing pg_standby in 8.3.6.  I've gotten this standby into some sort of
> bind.  It seems like it may be waiting for some WAL.   How can I tell
> what it is waiting on?  I don't really know how this works, so I may


Looks like you were cut off a bit. What do the logs say and your ps
output on the standby?

Joshua D. Drake


>
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


Re: standby waiting for what?

From
Ray Stell
Date:
On Wed, Mar 04, 2009 at 03:06:12PM -0500, Ray Stell wrote:
> Testing pg_standby in 8.3.6.  I've gotten this standby into some sort of
> bind.  It seems like it may be waiting for some WAL.   How can I tell
> what it is waiting on?  I don't really know how this works, so I may


say something silly.  The standby log says:

,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,1,2009-03-04 12:23:01 EST,0, LOG:  database system was interrupted;
lastknown up at 2009-03-04 12:20:29 EST 
,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,2,2009-03-04 12:23:01 EST,0, LOG:  starting archive recovery
,2512,,2009-03-04 12:23:01.484 EST,49aeb8f5.9d0,3,2009-03-04 12:23:01 EST,0, LOG:  restore_command =
'/usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp %f %p %r >>
/home/postgresql/log/alerts_oamp/recovery.log'


alerts_oamp]$ cat postmaster.pid
2510
/data/pgsql/alerts_oamp
  5498001   4194312

alerts_oamp]$ ps -ef | grep 1005
1005       903   901  0 10:10 ?        00:00:00 sshd: postgresql@pts/0
1005       904   903  0 10:10 pts/0    00:00:00 -bash
1005      1016  1013  0 10:21 ?        00:00:00 sshd: postgresql@pts/1
1005      1017  1016  0 10:21 pts/1    00:00:00 -bash
1005      2510     1  0 12:23 pts/0    00:00:00 /usr/local/pgsql836/bin/postgres -D /data/pgsql/alerts_oamp
1005      2511  2510  0 12:23 ?        00:00:00 postgres: logger process
1005      2512  2510  0 12:23 ?        00:00:00 postgres: startup process
1005      2520  2512  0 12:23 ?        00:00:00 sh -c /usr/local/pgsql/bin/pg_standby  /data/pgsql/wals/alerts_oamp
00000002000000000000001C.00512178.backuppg_xlog/RECOVERYHISTORY 000000000000000000000000 >>
/home/postgresql/log/alerts_oamp/recovery.log
1005      2521  2520  0 12:23 ?        00:00:00 /usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp
00000002000000000000001C.00512178.backuppg_xlog/RECOVERYHISTORY 000000000000000000000000 
1005      2615  1017  0 12:27 pts/1    00:00:00 tail -f alerts_oamp-2009-03-04_122301.log
1005      3271   904  0 15:11 pts/0    00:00:00 ps -ef
1005      3272   904  0 15:11 pts/0    00:00:00 grep 1005

alerts_oamp]$ ls -l /data/pgsql/wals/alerts_oamp/
total 114828
-rw------- 1 postgresql postgresql 16777216 Mar  4 11:28 00000002000000000000001A
-rw------- 1 postgresql postgresql 16777216 Mar  4 11:29 00000002000000000000001B
-rw------- 1 postgresql postgresql 16777216 Mar  4 12:24 00000002000000000000001C
-rw------- 1 postgresql postgresql 16777216 Mar  4 12:25 00000002000000000000001D
-rw------- 1 postgresql postgresql 16777216 Mar  4 12:26 00000002000000000000001E
-rw------- 1 postgresql postgresql 16777216 Mar  4 14:45 00000002000000000000001F
-rw------- 1 postgresql postgresql 16777216 Mar  4 14:45 000000020000000000000020

any ideas what this guy is hurt by?

Re: standby waiting for what?

From
Simon Riggs
Date:
On Wed, 2009-03-04 at 15:14 -0500, Ray Stell wrote:
> On Wed, Mar 04, 2009 at 03:06:12PM -0500, Ray Stell wrote:
> > Testing pg_standby in 8.3.6.  I've gotten this standby into some sort of
> > bind.  It seems like it may be waiting for some WAL.   How can I tell
> > what it is waiting on?  I don't really know how this works, so I may
>
>
> say something silly.  The standby log says:
>
> ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,1,2009-03-04 12:23:01 EST,0, LOG:  database system was interrupted;
lastknown up at 2009-03-04 12:20:29 EST 
> ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,2,2009-03-04 12:23:01 EST,0, LOG:  starting archive recovery
> ,2512,,2009-03-04 12:23:01.484 EST,49aeb8f5.9d0,3,2009-03-04 12:23:01 EST,0, LOG:  restore_command =
'/usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp %f %p %r >>
/home/postgresql/log/alerts_oamp/recovery.log'

You've set archive_timeout?

http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVING

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: standby waiting for what?

From
Ray Stell
Date:
On Wed, Mar 04, 2009 at 08:31:16PM +0000, Simon Riggs wrote:
> You've set archive_timeout?
>

no, but new WAL files seem to be getting created and replicated to the standby, just the pg_standby
command seems snagged on something.   I probably have done something dumb, ready, fire, aim.  I'm
more interested in how to analyze the state.  Thanks.

Re: standby waiting for what?

From
Yauheni Labko
Date:
For some reason it is looking for 00000002000000000000001C.00512178.backup
file which is not the WAL file.

Are you sure that you made initial recovery properly?

Yauheni Labko (Eugene Lobko)
Junior System Administrator
Chapdelaine & Co.
(212)208-9150


On Wednesday 04 March 2009 03:14:51 pm Ray Stell wrote:
> On Wed, Mar 04, 2009 at 03:06:12PM -0500, Ray Stell wrote:
> > Testing pg_standby in 8.3.6.  I've gotten this standby into some sort of
> > bind.  It seems like it may be waiting for some WAL.   How can I tell
> > what it is waiting on?  I don't really know how this works, so I may
>
> say something silly.  The standby log says:
>
> ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,1,2009-03-04 12:23:01
> EST,0, LOG:  database system was interrupted; last known up at 2009-03-04
> 12:20:29 EST ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,2,2009-03-04
> 12:23:01 EST,0, LOG:  starting archive recovery ,2512,,2009-03-04
> 12:23:01.484 EST,49aeb8f5.9d0,3,2009-03-04 12:23:01 EST,0, LOG:
> restore_command = '/usr/local/pgsql/bin/pg_standby
> /data/pgsql/wals/alerts_oamp %f %p %r >>
> /home/postgresql/log/alerts_oamp/recovery.log'
>
>
> alerts_oamp]$ cat postmaster.pid
> 2510
> /data/pgsql/alerts_oamp
>   5498001   4194312
>
> alerts_oamp]$ ps -ef | grep 1005
> 1005       903   901  0 10:10 ?        00:00:00 sshd: postgresql@pts/0
> 1005       904   903  0 10:10 pts/0    00:00:00 -bash
> 1005      1016  1013  0 10:21 ?        00:00:00 sshd: postgresql@pts/1
> 1005      1017  1016  0 10:21 pts/1    00:00:00 -bash
> 1005      2510     1  0 12:23 pts/0    00:00:00
> /usr/local/pgsql836/bin/postgres -D /data/pgsql/alerts_oamp 1005      2511
> 2510  0 12:23 ?        00:00:00 postgres: logger process 1005      2512
> 2510  0 12:23 ?        00:00:00 postgres: startup process 1005      2520
> 2512  0 12:23 ?        00:00:00 sh -c /usr/local/pgsql/bin/pg_standby
> /data/pgsql/wals/alerts_oamp 00000002000000000000001C.00512178.backup
> pg_xlog/RECOVERYHISTORY 000000000000000000000000 >>
> /home/postgresql/log/alerts_oamp/recovery.log 1005      2521  2520  0 12:23
> ?        00:00:00 /usr/local/pgsql/bin/pg_standby
> /data/pgsql/wals/alerts_oamp 00000002000000000000001C.00512178.backup
> pg_xlog/RECOVERYHISTORY 000000000000000000000000 1005      2615  1017  0
> 12:27 pts/1    00:00:00 tail -f alerts_oamp-2009-03-04_122301.log 1005
> 3271   904  0 15:11 pts/0    00:00:00 ps -ef
> 1005      3272   904  0 15:11 pts/0    00:00:00 grep 1005
>
> alerts_oamp]$ ls -l /data/pgsql/wals/alerts_oamp/
> total 114828
> -rw------- 1 postgresql postgresql 16777216 Mar  4 11:28
> 00000002000000000000001A -rw------- 1 postgresql postgresql 16777216 Mar  4
> 11:29 00000002000000000000001B -rw------- 1 postgresql postgresql 16777216
> Mar  4 12:24 00000002000000000000001C -rw------- 1 postgresql postgresql
> 16777216 Mar  4 12:25 00000002000000000000001D -rw------- 1 postgresql
> postgresql 16777216 Mar  4 12:26 00000002000000000000001E -rw------- 1
> postgresql postgresql 16777216 Mar  4 14:45 00000002000000000000001F
> -rw------- 1 postgresql postgresql 16777216 Mar  4 14:45
> 000000020000000000000020
>
> any ideas what this guy is hurt by?



Re: standby waiting for what?

From
Ray Stell
Date:
On Wed, Mar 04, 2009 at 03:41:06PM -0500, Yauheni Labko wrote:
> For some reason it is looking for 00000002000000000000001C.00512178.backup
> file which is not the WAL file.
>
> Are you sure that you made initial recovery properly?

I could have fouled this in any number of ways.  Like I said
I'm trying to understand how to analyze the situation and maybe
learn something.

OK, so my recovery.conf is set like this:

restore_command='/usr/local/pgsql/bin/pg_standby  /data/pgsql/wals/alerts_oamp %f %p %r >>
/home/postgresql/log/alerts_oamp/recovery.log'

So, the %f arg sent to the pg_standby command has a value of
00000002000000000000001C.00512178.backup, right?  Is that wrong?
If so, where could that have come from or how could I have trashed
the thing.

I love fishing.

Re: standby waiting for what?

From
Yauheni Labko
Date:
No. %f is the WAL filename which is needed by the server to start recovery.
00000002000000000000001C.00512178.backup will give you start and end of WAL
segment, the WAL filename containing this segment and your label to identify
where it might be. That's why I asked you about your backup.

What is the archive_command for primary server?


Yauheni Labko (Eugene Lobko)
Junior System Administrator
Chapdelaine & Co.
(212)208-9150


On Wednesday 04 March 2009 04:35:00 pm you wrote:
> On Wed, Mar 04, 2009 at 03:41:06PM -0500, Yauheni Labko wrote:
> > For some reason it is looking for
> > 00000002000000000000001C.00512178.backup file which is not the WAL file.
> >
> > Are you sure that you made initial recovery properly?
>
> I could have fouled this in any number of ways.  Like I said
> I'm trying to understand how to analyze the situation and maybe
> learn something.
>
> OK, so my recovery.conf is set like this:
>
> restore_command='/usr/local/pgsql/bin/pg_standby
> /data/pgsql/wals/alerts_oamp %f %p %r >>
> /home/postgresql/log/alerts_oamp/recovery.log'
>
> So, the %f arg sent to the pg_standby command has a value of
> 00000002000000000000001C.00512178.backup, right?  Is that wrong?
> If so, where could that have come from or how could I have trashed
> the thing.
>
> I love fishing.



Re: standby waiting for what?

From
Yauheni Labko
Date:
Btw i think you may remove %r from the restore command.

Yauheni Labko (Eugene Lobko)
Junior System Administrator
Chapdelaine & Co.
(212)208-9150

Re: standby waiting for what?

From
Ray Stell
Date:
On Wed, Mar 04, 2009 at 05:05:19PM -0500, Yauheni Labko wrote:
> No. %f is the WAL filename which is needed by the server to start recovery.

my recovery command is:
restore_command='/usr/local/pgsql/bin/pg_standby  /data/pgsql/wals/alerts_oamp %f %p %r >>
/home/postgresql/log/alerts_oamp/recovery.log'

the process that is running is, from the ps command:

sh -c /usr/local/pgsql/bin/pg_standby  /data/pgsql/wals/alerts_oamp
+00000002000000000000001C.00512178.backup pg_xlog/RECOVERYHISTORY 000000000000000000000000 >>
+/home/postgresql/log/alerts_oamp/recovery.log

there are 4 args to pg_standby:

1. /data/pgsql/wals/alerts_oamp the archive dir
2. 00000002000000000000001C.00512178.backup = %f
3. pg_xlog/RECOVERYHISTORY = %p
4. 000000000000000000000000 = %r

Looks like %f to me.  Right?  So, how could it get set to a funky value like that?


> 00000002000000000000001C.00512178.backup will give you start and end of WAL
> segment, the WAL filename containing this segment and your label to identify
> where it might be. That's why I asked you about your backup.


I don't follow you here.  How did you know that the string,
00000002000000000000001C.00512178.backup will provide the start and
end of WAL segment?  Please explain how you determined this?

I suspect the standy is waiting for a WAL named 00000002000000000000001C
and that file exists on the standby:

ls -l /data/pgsql/wals/alerts_oamp/
total 114828
-rw------- 1 postgresql postgresql 16777216 Mar  4 11:28 00000002000000000000001A
-rw------- 1 postgresql postgresql 16777216 Mar  4 11:29 00000002000000000000001B
-rw------- 1 postgresql postgresql 16777216 Mar  4 12:24 00000002000000000000001C
-rw------- 1 postgresql postgresql 16777216 Mar  4 12:25 00000002000000000000001D
-rw------- 1 postgresql postgresql 16777216 Mar  4 12:26 00000002000000000000001E
-rw------- 1 postgresql postgresql 16777216 Mar  4 14:45 00000002000000000000001F
-rw------- 1 postgresql postgresql 16777216 Mar  4 14:45 000000020000000000000020

but, somehow I confused pg and the name being passed to pg_standby is wrong.

I have a suspicion that if I mv /data/pgsql/wals/alerts_oamp/00000002000000000000001C
to /data/pgsql/wals/alerts_oamp/00000002000000000000001C.00512178.backup that the
recovery would pick up.  If there is no understanding to be gained here, I might do
that and then throw this out with the bath water and start over.

> What is the archive_command for primary server?

the archive command is working fine as you see above.  It just runs a local
script that does an rsync of the pg_xlog wal file over to the standby.

I agree with you, something about the backup must have caused this
confusion.  I'm just rsyncing the cluster dir from the primary to the
standby, removing pg_xlogs files, and setting recovery.conf.  Is there
a better way?

Re: standby waiting for what?

From
Yauheni Labko
Date:
> my recovery command is:
> restore_command='/usr/local/pgsql/bin/pg_standby
> /data/pgsql/wals/alerts_oamp %f %p %r >>
> /home/postgresql/log/alerts_oamp/recovery.log'

You gave the recovery command, not the archive command.  And give a piece of
log file where primary server performs archiving WAL files.

> I don't follow you here.  How did you know that the string,
> 00000002000000000000001C.00512178.backup will provide the start and
> end of WAL segment?  Please explain how you determined this?

*.backup files are ASCII files. They're described at 24.3.2 section of
postgres documentation.

> I have a suspicion that if I mv
> /data/pgsql/wals/alerts_oamp/00000002000000000000001C to
> /data/pgsql/wals/alerts_oamp/00000002000000000000001C.00512178.backup that
> the recovery would pick up.  If there is no understanding to be gained
> here, I might do that and then throw this out with the bath water and start
> over.

May be you will fix the problem at this point. But you will never know why
database server wants to grab this file.

> confusion.  I'm just rsyncing the cluster dir from the primary to the
> standby, removing pg_xlogs files, and setting recovery.conf.  Is there
> a better way?

I think you have problem at this point. Did you switch primary database server
into backup mode? Where are the archived WAL files? Did you switch primary
database server back into normal operation mode after rsyncing?

Yauheni Labko (Eugene Lobko)
Junior System Administrator
Chapdelaine & Co.
(212)208-9150

Re: standby waiting for what?

From
Ray Stell
Date:
On Thu, Mar 05, 2009 at 10:04:44AM -0500, Yauheni Labko wrote:
> You gave the recovery command, not the archive command.

I'm curious why you are focused on the archive on the primary?
Is there some logic in the archive process that could cause the recovery to
be looking for /data/pgsql/wals/alerts_oamp/00000002000000000000001C.00512178.backup ?
That doesn't make any sense to me.  Are they not two completely seperate
worlds or does the archive command somehow determine what
args go pg_standby?

The archive command is set to:
archive_command = '/home/postgresql/bin/archive.sh %p %f alerts_oamp'

that script in essence echos some debug values and rsyncs the WAL file
over.

echo "src - $SRC"
echo "destname - $NAME"
rsync --links -e 'ssh -c blowfish -ax' -avz "${SRC}" "${STBY}:${STBYDIR}/${NAME}"

Again, those files look fine on the standby, the names are identical to
those in the primary pg_xlogs.  So, please explain your reasoning.


> And give a piece of
> log file where primary server performs archiving WAL files.


src - /data/pgsql/alerts_oamp/pg_xlog/00000002000000000000001F
destname - 00000002000000000000001F
Authorized use only!  All sessions may be monitored or recorded.
building file list ... done
00000002000000000000001F

sent 4211135 bytes  received 42 bytes  935817.11 bytes/sec
total size is 16777216  speedup is 3.98
src - /data/pgsql/alerts_oamp/pg_xlog/000000020000000000000020
destname - 000000020000000000000020
Authorized use only!  All sessions may be monitored or recorded.
building file list ... done
000000020000000000000020

sent 4750728 bytes  received 42 bytes  863776.36 bytes/sec
total size is 16777216  speedup is 3.53



Re: standby waiting for what?

From
Yauheni Labko
Date:
> I'm curious why you are focused on the archive on the primary?

Postgres docs do not say that during initial recovery you need a backup
history file. But It is required during initial recovery.  That's why I am
focused on it.
You lost 00000002000000000000001C.00512178.backup. That means your archiving
is working wrong.

I simulated your situation.

Standby server (the WAL files location)
# ls -l /pg_restore/pg_xlog/backup_03012009_0200/
total 49216
-rw------- 1 postgres postgres 16777216 2009-03-01 02:03
0000000100000005000000B7
-rw------- 1 postgres postgres      256 2009-03-01 02:03
__0000000100000005000000B7.00718400.backup
-rw------- 1 postgres postgres 16777216 2009-03-01 12:02
0000000100000005000000B8
-rw------- 1 postgres postgres 16777216 2009-03-01 22:01
0000000100000005000000B9

My restore command (debug is on):
# cat recovery.conf
restore_command = '/opt/postgresplus/8.3/bin/pg_standby -d -r 3 -w 0 -s
10 -t /opt/postgresplus/8.3/data/pg_trigger /pg_restore/pg_xlog/backup_03012009_0200 %f %p'

The piece of serverlog (pay attention to the backup history filename and
compare it with the ls -l output before):

Trigger file            : /opt/postgresplus/8.3/data/pg_trigger
Waiting for WAL file    : 0000000100000005000000B7.00718400.backup
WAL file
path           : /pg_restore/pg_xlog/backup_03012009_0200/0000000100000005000000B7.00718400.backup
Restoring to...         : pg_xlog/RECOVERYHISTORY
Sleep interval          : 10 seconds
Max wait interval       : 0 forever
Command for restore     :
cp "/pg_restore/pg_xlog/backup_03012009_0200/0000000100000005000000B7.00718400.backup" "pg_xlog/RECOVERYHISTORY"
Keep archive history    : No cleanup required
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...



Now I renamed __0000000100000005000000B7.00718400.backup to
0000000100000005000000B7.00718400.backup and restarted the standby server:

Trigger file            : /opt/postgresplus/8.3/data/pg_trigger
Waiting for WAL file    : 0000000100000005000000B7.00718400.backup
WAL file
path           : /pg_restore/pg_xlog/backup_03012009_0200/0000000100000005000000B7.00718400.backup
Restoring to...         : pg_xlog/RECOVERYHISTORY
Sleep interval          : 10 seconds
Max wait interval       : 0 forever
Command for restore     :
cp "/pg_restore/pg_xlog/backup_03012009_0200/0000000100000005000000B7.00718400.backup" "pg_xlog/RECOVERYHISTORY"
Keep archive history    : No cleanup required
running restore         : OKLOG:  restored log
file "0000000100000005000000B7.00718400.backup" from archive

Trigger file            : /opt/postgresplus/8.3/data/pg_trigger
Waiting for WAL file    : 0000000100000005000000B7
WAL file
path           : /pg_restore/pg_xlog/backup_03012009_0200/0000000100000005000000B7
Restoring to...         : pg_xlog/RECOVERYXLOG
Sleep interval          : 10 seconds
Max wait interval       : 0 forever
Command for restore     :
cp "/pg_restore/pg_xlog/backup_03012009_0200/0000000100000005000000B7" "pg_xlog/RECOVERYXLOG"
Keep archive history    : No cleanup required
running restore         : OKLOG:  restored log file "0000000100000005000000B7"
from archive
LOG:  automatic recovery in progress
LOG:  redo starts at 5/B7718440

.... (recovering from other WAL files)

WAL file not present yet. Checking for trigger file...


You see that the database server is looking for the history file at first
which is not necessary and the next file which is a backup history file.
That's why your standby server is waiting for it.

Yauheni Labko (Eugene Lobko)
Junior System Administrator
Chapdelaine & Co.
(212)208-9150

Re: standby waiting for what?

From
Ray Stell
Date:
On Wed, Mar 04, 2009 at 03:14:51PM -0500, Ray Stell wrote:
> On Wed, Mar 04, 2009 at 03:06:12PM -0500, Ray Stell wrote:
> > Testing pg_standby in 8.3.6.  I've gotten this standby into some sort of
> > bind.  It seems like it may be waiting for some WAL.   How can I tell
> > what it is waiting on?  I don't really know how this works, so I may
>
>
> say something silly.  The standby log says:
>
> ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,1,2009-03-04 12:23:01 EST,0, LOG:  database system was interrupted;
lastknown up at 2009-03-04 12:20:29 EST 
> ,2512,,2009-03-04 12:23:01.483 EST,49aeb8f5.9d0,2,2009-03-04 12:23:01 EST,0, LOG:  starting archive recovery
> ,2512,,2009-03-04 12:23:01.484 EST,49aeb8f5.9d0,3,2009-03-04 12:23:01 EST,0, LOG:  restore_command =
'/usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp %f %p %r >>
/home/postgresql/log/alerts_oamp/recovery.log'
>
>
> alerts_oamp]$ cat postmaster.pid
> 2510
> /data/pgsql/alerts_oamp
>   5498001   4194312
>
> alerts_oamp]$ ps -ef | grep 1005
> 1005       903   901  0 10:10 ?        00:00:00 sshd: postgresql@pts/0
> 1005       904   903  0 10:10 pts/0    00:00:00 -bash
> 1005      1016  1013  0 10:21 ?        00:00:00 sshd: postgresql@pts/1
> 1005      1017  1016  0 10:21 pts/1    00:00:00 -bash
> 1005      2510     1  0 12:23 pts/0    00:00:00 /usr/local/pgsql836/bin/postgres -D /data/pgsql/alerts_oamp
> 1005      2511  2510  0 12:23 ?        00:00:00 postgres: logger process
> 1005      2512  2510  0 12:23 ?        00:00:00 postgres: startup process
> 1005      2520  2512  0 12:23 ?        00:00:00 sh -c /usr/local/pgsql/bin/pg_standby  /data/pgsql/wals/alerts_oamp
00000002000000000000001C.00512178.backuppg_xlog/RECOVERYHISTORY 000000000000000000000000 >>
/home/postgresql/log/alerts_oamp/recovery.log
> 1005      2521  2520  0 12:23 ?        00:00:00 /usr/local/pgsql/bin/pg_standby /data/pgsql/wals/alerts_oamp
00000002000000000000001C.00512178.backuppg_xlog/RECOVERYHISTORY 000000000000000000000000 
> 1005      2615  1017  0 12:27 pts/1    00:00:00 tail -f alerts_oamp-2009-03-04_122301.log
> 1005      3271   904  0 15:11 pts/0    00:00:00 ps -ef
> 1005      3272   904  0 15:11 pts/0    00:00:00 grep 1005
>
> alerts_oamp]$ ls -l /data/pgsql/wals/alerts_oamp/
> total 114828
> -rw------- 1 postgresql postgresql 16777216 Mar  4 11:28 00000002000000000000001A
> -rw------- 1 postgresql postgresql 16777216 Mar  4 11:29 00000002000000000000001B
> -rw------- 1 postgresql postgresql 16777216 Mar  4 12:24 00000002000000000000001C
> -rw------- 1 postgresql postgresql 16777216 Mar  4 12:25 00000002000000000000001D
> -rw------- 1 postgresql postgresql 16777216 Mar  4 12:26 00000002000000000000001E
> -rw------- 1 postgresql postgresql 16777216 Mar  4 14:45 00000002000000000000001F
> -rw------- 1 postgresql postgresql 16777216 Mar  4 14:45 000000020000000000000020
>
> any ideas what this guy is hurt by?


I stubbled into the source of the problem.  I hope somebody who knows the code can explain.
I decided to bounce the primary just to see if it would make a difference in the standby.
The primary would not restart:

,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,2,2009-03-06 10:34:01 EST,0, LOG:  could not open file
"pg_xlog/00000002000000000000001C"(log file 0, segment 28): No such file or directory 
,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,3,2009-03-06 10:34:01 EST,0, LOG:  invalid checkpoint record
,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,4,2009-03-06 10:34:01 EST,0, PANIC:  could not locate required
checkpointrecord 
,3095,,2009-03-06 10:34:01.910 EST,49b14269.c17,5,2009-03-06 10:34:01 EST,0, HINT:  If you are not restoring from a
backup,try removing the file "/data/pgsql/alerts_oamp/backup_label". 
,3093,,2009-03-06 10:34:01.910 EST,49b14269.c15,1,2009-03-06 10:34:01 EST,0, LOG:  startup process (PID 3095) was
terminatedby signal 6: Aborted 

So, I removed that file and restarted.  Rebuilt the standby and all is well.  So, why did that file muck up the standby
and
change the value pg was passing to pg_standby?

Thanks, looking forward to 8.4!