Tom Lane Sent: Saturday, August 18, 2012 7:16 AM
> The startup process's stack trace is
> #0 0x26fd1c in RecordIsValid (record=0x4008d7a0, recptr=80658424, emode=15)
> at xlog.c:3713
> 3713 COMP_CRC32(crc, XLogRecGetData(record), len);
> (gdb) bt
> #0 0x26fd1c in RecordIsValid (record=0x4008d7a0, recptr=80658424, emode=15)
> at xlog.c:3713
> #1 0x270690 in ReadRecord (RecPtr=0x7b03bad0, emode=15,
> fetching_ckpt=0 '\000') at xlog.c:4006
> The current WAL address is 80658424 == 0x04cebff8, that is just 8 bytes
> short of a page boundary, and what RecordIsValid thinks it is dealing
> with is
> so it merrily tries to compute a checksum on a gigabyte worth of data,
> and soon falls off the end of memory.
> In reality, inspection of the WAL file suggests that this is the end of
> valid data and what should have happened is that replay just stopped.
> The xl_len and so forth shown above are just garbage from off the end of
> what was actually read from the file (everything beyond offset 0xcebff8
> in file 4 is in fact zeroes).
> I'm not sure whether this is just a matter of having failed to
> sanity-check that xl_tot_len is at least SizeOfXLogRecord, or whether
> there is a deeper problem with the new design of continuation records
> that makes it impossible to validate records safely.
Earlier there was a check related to total length in ReadRecord, before it calls RecordIsValid() if
(record->xl_tot_len< SizeOfXLogRecord + record->xl_len || record->xl_tot_len > SizeOfXLogRecord +
record->xl_len+ XLR_MAX_BKP_BLOCKS * (sizeof(BkpBlock) + BLCKSZ))
I think that missing check of total length has caused this problem. However now this check will be different.
With Regards,
Amit Kapila.