Fwd: Problems waking up from a warm standby - Mailing list pgsql-admin

From Ori Garin
Subject Fwd: Problems waking up from a warm standby
Date
Msg-id 9c64d57f0810300958l7f3465e8nb4724ad291fa4d47@mail.gmail.com
Whole thread Raw
List pgsql-admin
Hi.

I have a problem with a standby server running on Windows 2003 R2, Enterprise x64 edition. I use Postgres 8.3 (installed to C:\Program Files (x86)\ )
Everything was working fine (base backup, archiving, recovery), until I wanted to test failover.

To do that I create a trigger file, the recovery command returns a nonzero code, and I get something like:

2008-10-29 16:39:23 CET LOG:  restored log file "0000000100000011000000CF" from archive
2008-10-29 16:42:25 CET LOG:  restored log file "0000000100000011000000D0" from archive
2008-10-29 16:45:56 CET LOG:  restored log file "0000000100000011000000D1" from archive
2008-10-29 16:49:02 CET LOG:  restored log file "0000000100000011000000D2" from archive
2008-10-29 16:58:45 CET LOG:  could not open file "pg_xlog/0000000100000011000000D3" (log file 17, segment 211): No such file or directory

Now postgres seems to be stuck, or dead. Using Process Explorer I see that one of the postgres.exe processes is running drwtsn32.exe (Dr Watson Postmortem debugger) indefinitely. If I kill drwtsn32, postgres dies too:

2008-10-29 16:58:45 CET LOG:  could not open file "pg_xlog/0000000100000011000000D3" (log file 17, segment 211): No such file or directory
2008-10-29 17:20:27 CET LOG:  startup process (PID 2952) was terminated by exception 0xC000000D
2008-10-29 17:20:27 CET HINT:  See C include file "ntstatus.h" for a description of the hexadecimal value.
2008-10-29 17:20:27 CET LOG:  aborting startup due to startup process failure

At first I was thinking that the message "could not open file..." is the problem, cause I figured postgres must think that the recovery command succeeded, otherwise it wouldn't try to use the new WAL file, but I'm not sure anymore if that's true.
As the recovery command, I've used my own perl script (compiled), switched to pg_standby, and then reverted back to the script. Nothing works, even killing the restore_command process causes the same problem.

I just noticed an Application Error in the Event Log:

Event Type:    Error
Event Source:    Application Error
Event Category:    (100)
Event ID:    1000
Date:        29-10-2008
Time:        16:58:47
User:        N/A
Computer:    S1217
Description:
Faulting application postgres.exe, version 8.3.1.876, faulting module msvcr80.dll, version 8.0.50727.762, fault address 0x0001e879.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 41 70 70 6c 69 63 61 74   Applicat
0008: 69 6f 6e 20 46 61 69 6c   ion Fail
0010: 75 72 65 20 20 70 6f 73   ure  pos
0018: 74 67 72 65 73 2e 65 78   tgres.ex
0020: 65 20 38 2e 33 2e 31 2e   e 8.3.1.
0028: 38 37 36 20 69 6e 20 6d   876 in m
0030: 73 76 63 72 38 30 2e 64   svcr80.d
0038: 6c 6c 20 38 2e 30 2e 35   ll 8.0.5
0040: 30 37 32 37 2e 37 36 32   0727.762
0048: 20 61 74 20 6f 66 66 73    at offs
0050: 65 74 20 30 30 30 31 65   et 0001e
0058: 38 37 39                  879    


Does anyone has any ideas about this??
Thanks in advance!

Ori

pgsql-admin by date:

Previous
From: Chander Ganesan
Date:
Subject: Re: Deleting old archived WAL files
Next
From: Jaume Sabater
Date:
Subject: Re: Deleting old archived WAL files