Re: Re: corruption issue after server crash - ERROR: unexpected chunk number 0 - Mailing list pgsql-general
From | Shaun Thomas |
---|---|
Subject | Re: Re: corruption issue after server crash - ERROR: unexpected chunk number 0 |
Date | |
Msg-id | 0683F5F5A5C7FE419A752A034B4A0B97975AAAED@sswchi5pmbx2.peak6.net Whole thread Raw |
In response to | Re: Re: corruption issue after server crash - ERROR: unexpected chunk number 0 (Mike Broers <mbroers@gmail.com>) |
List | pgsql-general |
> So if this problem replicated to our standby servers does that indicate > that the potential problematic fsync occurred during a pg_xlog write? Pretty much. You have a couple issues here, and no easy way to approach them. Primarily, you got data corruption during async operation. This means either the OS or the hardware somewhere along the line lied about the write, or the write wascorrupted and the filesystem log replayed incorrectly upon reboot. Once that happens, you can't trust *any* data in yourdatabase. Pre-checksum PostgreSQL has no way to verify integrity of existing data, and system crashes can corrupt quitea bit of data that was only tangentially involved. What likely happens in these scenarios, is that the database startup succeeds, and then it read some rows in from a corruptedtable. By corrupted, I mean even a single data page with a mangled pointer. That mangled pointer gave the databaseincorrect information about the state of that data page's contents, and the database continued on that information.That means subsequent transaction logs from that point are *also* corrupt, and hence any streaming or warm standbyreplicas are subsequently damaged as well. But they'll be damaged differently, because they likely didn't have theinitial corruption, just the byte changes dictated by the WAL stream. Unless you know where the initial corruption came from, the system that caused it should be quarantined for verification.RAM, disk, CPU, everything should pass integrity checks before putting it back into production. > Would breaking replication at the time of the crash have prevented > this from cascading or was it already too late at that point? Most likely. If, at the time of the crash, you switched to one of your replicas and made it the new master, it would giveyou the opportunity to check out the crashed system before it spread the love. Even if you don't have a true STONITHmodel, starting up a potentially data-compromised node in an active cluster is a gamble. I did something similar once. One of our DRBD nodes crashed and came back up and re-attached to the DRBD pair after a quickdata discard and replay. I continued with some scheduled system maintenance, and performed a node failover with no incident.It wasn't until 20 minutes later that the corrupt disk pages started making their presence felt, and by then Itwas too late. Luckily we were still verifying, but with our secondaries ruined, we had to restore from backup. A 30-minuteoutage became a 4-hour one. Afterwards, we put in a new policy that any crash means a DRBD verify at minimum, and until the node passes, it is to beconsidered invalid and unusable. If you haven't already, I suggest something similar for your setup. Verify a crashed nodebefore using it again, no matter how much pressure you're under. It can always get worse. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd | Suite 500 | Chicago IL, 60604 312-676-8870 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
pgsql-general by date: