Marco Colombo <marco@esi.it> writes:
> Tom Lane wrote:
>> However this would seem to imply disk drive misfeasance above and beyond
>> your motherboard problem.
> Well, no. How about this theory:
> 1) everything is ok:
> the backend executes write()/fsync() for transactions 1-5
> 2) hardware fails some how at MB level (imagine CPU/RAM overheating):
> RAM gets corrupted - kernel starts oopsing (but goes on)
> meanwhile, the backend executes write()/fsync() for transactions 6-10,
> but randomly corrupted data gets written to disk.
> 3) unrecoverable kernel error occurs, the show stops.
> On recover, transactions 6-9 don't even look like valid log entries, while
> 10, for some reason, does (maybe only data is corrupted).
> I'm not familiar with the details of WAL files and post-crash recovery,
> but is that possible? Or does the process stop at the first failure?
Recovery will stop at the first corrupted record, so it would not happen
like that. But you are right, the MB failure alone might have been
enough to corrupt the outgoing WAL log data and thus produce the
scenario I described. Once Postgres *thinks* transactions 1-10 are
safely down to disk in the WAL log, it will feel free to update the data
files in any random order that seems convenient. So the write of record
10 could have occurred before the rest, and if that happened not to get
corrupted by the MB problem, we could see the result lec describes.
Of course this is all guesswork since we have no direct evidence to look
at, but it seems fairly plausible.
> Anyway, if your CPU/RAM is failing, no DB technology can save you.
Agreed. Software certainly cannot make any guarantees if it can't even
execute correctly ...
regards, tom lane