On 20.08.2012 17:04, Tom Lane wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:
>> On 18.08.2012 08:52, Amit kapila wrote:
>>> I think that missing check of total length has caused this problem. However now this check will be different.
>
>> That check still exists, in ValidXLogRecordHeader(). However, we now
>> allocate the buffer for the whole record before that check, based on
>> xl_tot_len, if the record header is split across pages. The theory in
>> allocating the buffer is that a bogus xl_tot_len field will cause the
>> malloc() to fail, returning NULL, and we treat that the same as a broken
>> header.
>
> Uh, no, you misread it. xl_tot_len is *zero* in this example. The
> problem is that RecordIsValid believes xl_len (and backup block size)
> even when it exceeds xl_tot_len.
Ah yes, I see that now. I think all we need then is a check for
xl_tot_len >= SizeOfXLogRecord.
>> I think we need to delay the allocation of the record buffer. We need to
>> read and validate the whole record header first, like we did before,
>> before we trust xl_tot_len enough to call malloc() with it. I'll take a
>> shot at doing that.
>
> I don't believe this theory at all. Overcommit applies to writing on
> pages that were formerly shared with the parent process --- it should
> not have anything to do with malloc'ing new space. But anyway, this
> is not what happened in my example.
I was thinking that we might read gigabytes worth of bogus WAL into the
memory buffer, if xl_tot_len is bogus and large, e.g 0xffffffff. But now
that I look closer, the xlog record is validated after reading the first
continuation page, so we should catch a bogus xl_tot_len value at that
point. And there is a cross-check with xl_rem_len on every continuation
page, too.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com