Re: V4 of PITR performance improvement for 8.4 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: V4 of PITR performance improvement for 8.4
Date
Msg-id 49B51791.5080303@enterprisedb.com
Whole thread Raw
In response to Re: V4 of PITR performance improvement for 8.4  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: V4 of PITR performance improvement for 8.4  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
Fujii Masao wrote:
> On Tue, Mar 3, 2009 at 1:47 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> Hi Suzuki-san,
>>
>> On Thu, Feb 26, 2009 at 5:03 AM, Koichi Suzuki <koichi.szk@gmail.com> wrote:
>>> My reply to Gregory's comment didn't have any objections.   I believe,
>>> as I posted to Wiki page, latest posted patch is okay and waiting for
>>> review.
>> One of your latest patches doesn't match with HEAD, so I updated it.
> 
> Oops! I failed in attaching the patch. This is second try.

Thanks. This patch seems to be missing the new readahead.c file. I 
grabbed that from the previous patch version.

It's annoying that we have to write *_readahead functions for each and 
every record type. In most record types, we already pass the information 
about the pages involved to XLogInsert, for full page writes. If we 
could change the WAL format so that that information is stored in WAL 
even when a full page image is taken, we could do the read-ahead for 
every WAL record type in a single function. Getting the API right needs 
some thinking, but that would be a lot nicer approach in the long run.

I agree with Itagaki-san's earlier comment that we should find a way to 
avoid the ReadAheadHasRoom calls 
(http://archives.postgresql.org/message-id/20090114101659.89CD.52131E4D@oss.ntt.co.jp). 
Let's just leave enough slack in the queue, so that it never fills up. 
And if the unthinkable happens and it does fill up, we can just throw 
away the readahead requests that don't fit; they're just hints anyway.

The API for ReadAheadAddEntry should include ForkNumber. Although all 
the readahead calls included in the patch all access the main fork, 
that's really just an omission that probably should be fixed even though 
the FSM and visibility map should become cached pretty quickly for any 
active tables.

effective_io_concurrency setting is ignored. If it's set to 1, we should 
disable prefetching altogether for the sake of both robustness (let you 
recover even if there's a bug in the readahead code) and performance 
(avoid useless posix_fadvise calls, sorting etc. if there's only one 
spindle).

The bursty queuing behavior feels pretty strange to me, though I guess 
it works pretty well in practice. I would've assumed a FIFO queue of WAL 
records, with some kind of a cache of recently issued posix_fadvise() calls.

As the patch stands, it's not limited to archive recovery. The code in 
readahead.c seems to assume that the readahead queue will always be 
flushed between xlog segment switch, but that is not enforced in xlog.c.

malloc() can return NULL on out of memory. Need to check that, or use 
palloc() instead.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Visibility function comment addition
Next
From: Robert Haas
Date:
Subject: Re: status of remaining patches