Home > mailing lists

Re: New WAL code dumps core trivially on replay of bad data - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: New WAL code dumps core trivially on replay of bad data
Date	August 20, 2012 14:05:32
Msg-id	28954.1345471492@sss.pgh.pa.us Whole thread Raw
In response to	Re: New WAL code dumps core trivially on replay of bad data (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses	Re: New WAL code dumps core trivially on replay of bad data Re: New WAL code dumps core trivially on replay of bad data
List	pgsql-hackers

Tree view

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> On 18.08.2012 08:52, Amit kapila wrote:
>> I think that missing check of total length has caused this problem. However now this check will be different.

> That check still exists, in ValidXLogRecordHeader(). However, we now 
> allocate the buffer for the whole record before that check, based on 
> xl_tot_len, if the record header is split across pages. The theory in 
> allocating the buffer is that a bogus xl_tot_len field will cause the 
> malloc() to fail, returning NULL, and we treat that the same as a broken 
> header.

Uh, no, you misread it.  xl_tot_len is *zero* in this example.  The
problem is that RecordIsValid believes xl_len (and backup block size)
even when it exceeds xl_tot_len.

> I think we need to delay the allocation of the record buffer. We need to 
> read and validate the whole record header first, like we did before, 
> before we trust xl_tot_len enough to call malloc() with it. I'll take a 
> shot at doing that.

I don't believe this theory at all.  Overcommit applies to writing on
pages that were formerly shared with the parent process --- it should
not have anything to do with malloc'ing new space.  But anyway, this
is not what happened in my example.
        regards, tom lane

pgsql-hackers by date:

From: Pavel Stehule
Date: 20 August 2012, 13:10:24
Subject: Re: enhanced error fields

From: Andres Freund
Date: 20 August 2012, 14:08:06
Subject: Re: New WAL code dumps core trivially on replay of bad data

Re: New WAL code dumps core trivially on replay of bad data - Mailing list pgsql-hackers

Previous

Next