Re: New WAL code dumps core trivially on replay of bad data - Mailing list pgsql-hackers

From Tom Lane
Subject Re: New WAL code dumps core trivially on replay of bad data
Date
Msg-id 711.1345476340@sss.pgh.pa.us
Whole thread Raw
In response to Re: New WAL code dumps core trivially on replay of bad data  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: New WAL code dumps core trivially on replay of bad data
List pgsql-hackers
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> On 20.08.2012 17:04, Tom Lane wrote:
>> Uh, no, you misread it.  xl_tot_len is *zero* in this example.  The
>> problem is that RecordIsValid believes xl_len (and backup block size)
>> even when it exceeds xl_tot_len.

> Ah yes, I see that now. I think all we need then is a check for 
> xl_tot_len >= SizeOfXLogRecord.

That should get us back to a reliability level similar to the old code.

However, I think that we also need to improve RecordIsValid so that at
each step, it checks it hasn't overrun xl_tot_len *before* touching the
corresponding part of the record buffer.

> I was thinking that we might read gigabytes worth of bogus WAL into the 
> memory buffer, if xl_tot_len is bogus and large, e.g 0xffffffff. But now 
> that I look closer, the xlog record is validated after reading the first 
> continuation page, so we should catch a bogus xl_tot_len value at that 
> point. And there is a cross-check with xl_rem_len on every continuation 
> page, too.

Yeah.  Even if xl_tot_len is bogus, we should realize that within a
couple of pages at most.  The core of the problem here is that
RecordIsValid is not being careful to confine its touches to the
guaranteed-to-exist bytes of the record buffer, ie 0 .. xl_tot_len-1.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: New WAL code dumps core trivially on replay of bad data
Next
From: Tom Lane
Date:
Subject: Re: SP-GiST for ranges based on 2d-mapping and quad-tree