Re: [GENERAL] Slow PITR restore - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: [GENERAL] Slow PITR restore
Date
Msg-id 1197635527.7974.14.camel@hannu-laptop
Whole thread Raw
In response to Re: [GENERAL] Slow PITR restore  (Heikki Linnakangas <heikki@enterprisedb.com>)
Responses Re: [GENERAL] Slow PITR restore  (Markus Schiltknecht <markus@bluegap.ch>)
List pgsql-hackers
Ühel kenal päeval, N, 2007-12-13 kell 20:25, kirjutas Heikki
Linnakangas:
...
> Hmm. That assumes that nothing else than the WAL replay will read
> pages into shared buffers. I guess that's true at the moment, but it
> doesn't seem impossible that something like Florian's read-only queries
> on a stand by server would change that.
>
> > I think that is better than both methods mentioned, and definitely
> > simpler than my brute-force method. It also lends itself to using both
> > previously mentioned methods as additional techniques if we really
> > needed to. I suspect reordering the I/Os in this way is going to make a
> > huge difference to cache hit rates.
>
> But it won't actually do anything to scale the I/O. You're still going
> to be issuing only one read request at a time. The order of those
> requests will be better from cache hit point of view, which is good, but
> the problem remains that if the modified data blocks are scattered
> around the database, you'll be doing random I/O, one request at a time.

Why one-at-a-time ?

You could have a long list of pages need to read in, and ask for them
all at the same time.

Here's what I mean

1 ) allocate buffers for N database pages, and a queue for N wal records
2 ) read N wal records to wal record queue, assign database page numbers
from these to buffer pages and issue posix_fadvise() for all as you go.
2a ) if there were repeated pages and thus there are free buffers,
allocate queu items and read some more wal records and assign buffer and
fadvise until N fubbers used
3) process wal record queue to buffers read in by 2
4) write the buffers back to disk

repeat from 2), freeing LRU buffers

Here reads in 2) will be optimised by system via posix_fadvise, and also
the caches can be split between multiple workers by page number hash or
some other random/uniform means to use more than one CPU

-------------
Hannu



pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: [GENERAL] Slow PITR restore
Next
From: Hannu Krosing
Date:
Subject: Re: VLDB Features