Re: New WAL code dumps core trivially on replay of bad data - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: New WAL code dumps core trivially on replay of bad data
Date
Msg-id 5032629D.2040202@enterprisedb.com
Whole thread Raw
In response to Re: New WAL code dumps core trivially on replay of bad data  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 20.08.2012 18:25, Tom Lane wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
>> I was thinking that we might read gigabytes worth of bogus WAL into the
>> memory buffer, if xl_tot_len is bogus and large, e.g 0xffffffff. But now
>> that I look closer, the xlog record is validated after reading the first
>> continuation page, so we should catch a bogus xl_tot_len value at that
>> point. And there is a cross-check with xl_rem_len on every continuation
>> page, too.
>
> Yeah.  Even if xl_tot_len is bogus, we should realize that within a
> couple of pages at most.  The core of the problem here is that
> RecordIsValid is not being careful to confine its touches to the
> guaranteed-to-exist bytes of the record buffer, ie 0 .. xl_tot_len-1.

Hmm, RecordIsValid() assumes that the whole record has been read into 
memory already, where "whole record" is defined by xl_tot_len. The 
problem is that xl_len disagrees with xl_tot_len. Validating the XLOG 
header would've caught that, but in this case the caller had not called 
ValidXLogRecordHeader().

However, a suitably corrupt record might have a valid header, but 
*appear* to have larger backup blocks than the header claims. You would 
indeed overrun the memory buffer while calculating the CRC, then. So 
yeah, we should check that.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Unexpected plperl difference between 8.4 and 9.1
Next
From: Tom Lane
Date:
Subject: Re: The pgrminclude problem