Re: Proposal of PITR performance improvement for 8.4. - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Proposal of PITR performance improvement for 8.4.
Date
Msg-id 1225269154.3971.278.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: Proposal of PITR performance improvement for 8.4.  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Proposal of PITR performance improvement for 8.4.
List pgsql-hackers
On Tue, 2008-10-28 at 14:21 +0200, Heikki Linnakangas wrote:

> 1. You should avoid useless posix_fadvise() calls. In the naive 
> implementation, where you simply call posix_fadvise() for every page 
> referenced in every WAL record, you'll do 1-2 posix_fadvise() syscalls 
> per WAL record, and that's a lot of overhead. We face the same design 
> question as with Greg's patch to use posix_fadvise() to prefetch index 
> and bitmap scans: what should the interface to the buffer manager look 
> like? The simplest approach would be a new function call like 
> AdviseBuffer(Relation, BlockNumber), that calls posix_fadvise() for the 
> page if it's not in the buffer cache, but is a no-op otherwise. But that 
> means more overhead, since for every page access, we need to find the 
> page twice in the buffer cache; once for the AdviseBuffer() call, and 
> 2nd time for the actual ReadBuffer(). 

That's a much smaller overhead than waiting for an I/O. The CPU overhead
isn't really a problem if we're I/O bound.

> It would be more efficient to pin 
> the buffer in the AdviseBuffer() call already, but that requires much 
> more changes to the callers.

That would be hard to cleanup safely, plus we'd have difficulty with
timing: is there enough buffer space to allow all the prefetched blocks
live in cache at once? If not, this approach would cause problems.

> 2. The format of each WAL record is different, so you need a "readahead 
> handler" for every resource manager, for every record type. It would be 
> a lot simpler if there was a standardized way to store that information 
> in the WAL records.

I would prefer a new rmgr API call that returns a list of blocks. That's
better than trying to make everything fit one pattern. If the call
doesn't exist then that rmgr won't get prefetch.

> 3. IIRC I tried to handle just a few most important WAL records at 
> first, but it turned out that you really need to handle all WAL records 
> (that are used at all) before you see any benefit. Otherwise, every time 
> you hit a WAL record that you haven't done posix_fadvise() on, the 
> recovery "stalls", and you don't need much of those to diminish the gains.
> 
> Not sure how these apply to your approach, it's very different. You seem 
> to handle 1. by collecting all the page references for the WAL file, and 
> sorting and removing the duplicates. I wonder how much CPU time is spent 
> on that?

Removing duplicates seems like it will save CPU.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: Svenne Krap
Date:
Subject: Re: Feature Request - Table Definition query
Next
From: KaiGai Kohei
Date:
Subject: Updates of SE-PostgreSQL 8.4devel patches (r1155)