Re: New WAL code dumps core trivially on replay of bad data - Mailing list pgsql-hackers

From Tom Lane
Subject Re: New WAL code dumps core trivially on replay of bad data
Date
Msg-id 28954.1345471492@sss.pgh.pa.us
Whole thread Raw
In response to Re: New WAL code dumps core trivially on replay of bad data  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: New WAL code dumps core trivially on replay of bad data
Re: New WAL code dumps core trivially on replay of bad data
List pgsql-hackers
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> On 18.08.2012 08:52, Amit kapila wrote:
>> I think that missing check of total length has caused this problem. However now this check will be different.

> That check still exists, in ValidXLogRecordHeader(). However, we now 
> allocate the buffer for the whole record before that check, based on 
> xl_tot_len, if the record header is split across pages. The theory in 
> allocating the buffer is that a bogus xl_tot_len field will cause the 
> malloc() to fail, returning NULL, and we treat that the same as a broken 
> header.

Uh, no, you misread it.  xl_tot_len is *zero* in this example.  The
problem is that RecordIsValid believes xl_len (and backup block size)
even when it exceeds xl_tot_len.

> I think we need to delay the allocation of the record buffer. We need to 
> read and validate the whole record header first, like we did before, 
> before we trust xl_tot_len enough to call malloc() with it. I'll take a 
> shot at doing that.

I don't believe this theory at all.  Overcommit applies to writing on
pages that were formerly shared with the parent process --- it should
not have anything to do with malloc'ing new space.  But anyway, this
is not what happened in my example.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: enhanced error fields
Next
From: Andres Freund
Date:
Subject: Re: New WAL code dumps core trivially on replay of bad data