Re: Streaming replication slave crash - Mailing list pgsql-general

From Quentin Hartman
Subject Re: Streaming replication slave crash
Date
Msg-id CAJ48qNaF4pSvU9RUo=nf5hNzbEZXui=+3KKC=WSmaeXp02tVLQ@mail.gmail.com
Whole thread Raw
In response to Re: Streaming replication slave crash  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Fri, Mar 29, 2013 at 10:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Quentin Hartman <qhartman@direwolfdigital.com> writes:
> On Fri, Mar 29, 2013 at 10:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What process did you use for setting up the slave?

> I used an rsync from the master while both were stopped.

If the master was shut down cleanly (not -m immediate) then the bug fix
I was thinking about wouldn't explain this.  The fact that the panic
didn't recur after restarting seems to void that theory as well.  I'm
not sure what to make of that angle.

Yes, it was shut down cleanly. A good thought, but I don't think it's relevant in this case.
 
Can you determine which table is being complained of in the failure
message, ie, what has relfilenode 63370 in database 63229?  If so it
would be interesting to know what was being done to that table on the
master.

Good point! Looking deeper into that, it's actually one of our smaller tables, and it doesn't seem to have any corruption, on either server. I was able to select all the records from it and the content seems sane. The only thing that would have been happening on that table is an INSERT or UPDATE.

I think I'm going to run with the spurious EC2 hiccup explanation. I'm comfortable with that given the extra due diligence I've done with your (and Lonni's) guidance.

Thanks!

QH

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Streaming replication slave crash
Next
From: Yuriy Rusinov
Date:
Subject: Regular function