Re: Moving more work outside WALInsertLock - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Moving more work outside WALInsertLock
Date
Msg-id 4EEB3477.4080502@enterprisedb.com
Whole thread Raw
In response to Re: Moving more work outside WALInsertLock  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Moving more work outside WALInsertLock  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On 16.12.2011 05:27, Tom Lane wrote:
> * We write a WAL record that starts 8 bytes before a sector boundary,
> so that the prev_link is in one sector and the rest of the record in
> the next one(s).

prev-link is not the first field in the header. The CRC is.

> * Time passes, and we recycle that WAL file.
>
> * We write another WAL record that starts 8 bytes before the same sector
> boundary, so that the prev_link is in one sector and the rest of the
> record in the next one(s).
>
> * System crashes, after having written out the earlier sector but not
> the later one(s).
>
> On restart, the replay code will see a prev_link that matches what it
> expects.  If the CRC for the remainder of the record is not dependent
> on the prev_link, then the remainder of the old record will look good
> too, and we'll attempt to replay it, n*16MB too late.

The CRC would be in the previous sector with the prev-link, so the CRC 
of the old record would have to match the CRC of the new record. I guess 
that's not totally impossible, though - there could be some WAL-logged 
operations where the payload of the WAL record is often exactly the 
same. Like a heap clean record, when the same page is repeatedly pruned.

> Including the prev_link in the CRC adds a significant amount of
> protection against such problems.  We should not remove this protection
> in the name of shaving a few cycles.

Yeah. I did some quick testing with a patch to leave prev-link out of 
the calculation, and move the record CRC calculation outside the lock, 
too. I don't remember the numbers, but while it did make some 
difference, it didn't seem worthwhile.

Anyway, I'm looking at ways to make the memcpy() of the payload happen 
without the lock, in parallel, and once you do that the record header 
CRC calculation can be done in parallel, too. That makes it irrelevant 
from a performance point of view whether the prev-link is included in 
the CRC or not.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Marti Raudsepp
Date:
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Next
From: Simon Riggs
Date:
Subject: Re: ALTER TABLE lock strength reduction patch is unsafe