Errors during recovery of a postgres. Need some help understanding them... - Mailing list pgsql-general

From Dhaval Shah
Subject Errors during recovery of a postgres. Need some help understanding them...
Date
Msg-id 565237760704091823v1f5527d3w74f93c1c7fd3040e@mail.gmail.com
Whole thread Raw
Responses Re: Errors during recovery of a postgres. Need some help understanding them...  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Here is the situation:

I have a standby postgres which is fed a WAL File every 2 minutes.
Whenever it is fed a WAL file it logs the following:


---
LOG:  restored log file "000000010000000000000070" from archive
pg_restore::copyWALFile: Moving
/opt/data/mirror/000000010000000000000071 to pg_xlog/RECOVERYXLOG
LOG:  restored log file "000000010000000000000071" from archive
pg_restore::copyWALFile: Moving
/opt/data/mirror/000000010000000000000072 to pg_xlog/RECOVERYXLOG
LOG:  restored log file "000000010000000000000072" from archive
...
...
pg_restore::copyWALFile: Moving
/opt/data/mirror/000000010000000000000082 to pg_xlog/RECOVERYXLOG
LOG:  restored log file "000000010000000000000082" from archive
---

I assume that the above situation is a happy postgres in a recovery
mode. The "copyWALFile" is my message in the serverlog.

After a while, the primary gives up. That is it goes down and I am not
able to pull any WAL file from the primary. So I tell the standby that
I do not have any WAL File to give.

----
LOG:  could not open file "pg_xlog/000000010000000000000083" (log file
0, segment 131): No such file or directory
LOG:  redo done at 0/8200D280
Main: Triggering recovery
PANIC:  could not open file "pg_xlog/000000010000000000000082" (log
file 0, segment 130): No such file or directory
---

The issue above is that I do not have the "001...0083" file and I
return a "file not found". Further when the postgres asks me about
"001...0082", I do not have that either, since in the intervening
minutes, I have moved that file out of my /opt/data/mirror to
/opt/data/tape directory for long term tape storage. So how do I make
my standby postgres happy?

Having run into that situation, the standby also spits out the following:

---
LOG:  could not open file "pg_xlog/000000010000000000000082" (log file
0, segment 130): No such file or directory
LOG:  invalid primary checkpoint record
LOG:  could not open file "pg_xlog/000000010000000000000080" (log file
0, segment 128): No such file or directory
LOG:  invalid secondary checkpoint record
---

What is happening is that the postgres is looking behind in time for
the "0001...0082" and "0001...0080" files.

The question I have is, how far does it look behind in time? Then I
have to be careful of when I move the WAL file out to tape. Further if
I know how far back in time I have to keep my WAL file, then I can
device an effective strategy of removing older files. That is if I
come and say that I generate WAL file every 2 minutes, then do I keep
10 files or 120 files?

Any insight on this will help.

Regards
Dhaval

pgsql-general by date:

Previous
From: Geoffrey
Date:
Subject: Re: backend reset of database
Next
From: Robert Treat
Date:
Subject: Re: Is there a shortage of postgresql skilled ops people