Re: Recovery inconsistencies, standby much larger than primary - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Recovery inconsistencies, standby much larger than primary
Date
Msg-id CAM-w4HOnmqPXXRiA+-ojbVmt0-rtAEo0zM47Kea7dV9a0T-W+g@mail.gmail.com
Whole thread Raw
In response to Re: Recovery inconsistencies, standby much larger than primary  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Jan 31, 2014 at 10:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, I'd been wondering if the WAL record somehow got corrupted while
> in memory (presumably after being CRC-checked).  It's a bit hard to see
> how though.

One thing I mentioned early on but bears repeating is that this
instance is 9.1.11.

Also something that occurred to me at 3am -- the "reference to invalid
pages" recovery errors that replayed correctly after the panic might
also explain why the slave seems to operate correctly. It's possible
after the panic it replayed those same records correctly.

> Are all the bloated-on-the-slave relations indexes?  I think the most
> fruitful thing to do at this point is to try to isolate the bloating
> events for the other affected rels as you've done for this one.
> Maybe we'll see a pattern.

I'll poke at those others tomorrow/today. I can also try to bring up a
new standby from the same base backup but it'll take time. It's a
large database. Also the fear I have above is that if I set a recovery
target I might make it miss the bug.


-- 
greg



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Small catcache optimization
Next
From: Fujii Masao
Date:
Subject: Re: Compression of full-page-writes