Re: Failing start-up archive recovery at Standby mode in PG9.2.4 - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Failing start-up archive recovery at Standby mode in PG9.2.4
Date
Msg-id 1366870433934-5753221.post@n5.nabble.com
Whole thread Raw
In response to Re: Failing start-up archive recovery at Standby mode in PG9.2.4  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Failing start-up archive recovery at Standby mode in PG9.2.4
List pgsql-hackers
I also had a similar observation when I could reproduce this.
I tried to find why restartpoint causes the recycled segment to be named
after timeline 3, but have not been able to determine that.

When I looked at the source, I found that, the function XLogFileReadAnyTLI
which returns a segment file for reading a XLog page iterates over a list
expectedTLIs which starts with 3 in such a case (that is, in case where this
error happens).  XLogFileReadAnyTLI checks the segment in both archive and
pg_xlog. So, even if such a segment (that is with timeline 3) may not be in
the archive , it may be in pg_xlog, due to recycling as we have observed.
So, such a recycled segment may be returned by XLogFileReadAnyTLI as though
it were the next segment to recover from, resulting in the "unexpected
pageaddr ..." error. 

I could not understand (in case this error happens) how expectedTLIs list
had 3 at its head (for example, when XLogFileReadAnyTLI used it as we
observed) whereas at other times (when this error does not happen), it has 2
at its head until all the segments of timeline 2 are recovered from?
Also, how does recycled segment gets timeline 3 whereas 2 is expected in
this case?

Is this the right way to look at the problem and its possible fix?

I have tried going through the source regarding this but have not been able
to determine where this could accidentally happen, partly because I do not
understand recovery process (and its code) very well. Will post if find
anything useful. 

regards,
Amit Langote





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Failing-start-up-archive-recovery-at-Standby-mode-in-PG9-2-4-tp5753110p5753221.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Please add discussion topics for cluster-hackers meeting
Next
From: Peter Geoghegan
Date:
Subject: Redundancy in comment within lock.c