Re: Page replacement algorithm in buffer cache - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Page replacement algorithm in buffer cache
Date
Msg-id 515A1093.8090403@nasby.net
Whole thread Raw
In response to Re: Page replacement algorithm in buffer cache  (Ants Aasma <ants@cybertec.at>)
Responses Re: Page replacement algorithm in buffer cache
List pgsql-hackers
On 3/23/13 7:41 AM, Ants Aasma wrote:
> On Sat, Mar 23, 2013 at 6:04 AM, Jim Nasby <jim@nasby.net> wrote:
>> Partitioned clock sweep strikes me as a bad idea... you could certainly get
>> unlucky and end up with a lot of hot stuff in one partition.
>
> Surely that is not worse than having everything in a single partition.
> Given a decent partitioning function it's very highly unlikely to have
> more than a few of the hottest buffers end up in a single partition.

One could argue that it is worse because you've added another layer of unpredictability to performance. If something
happensto suddenly put two heavily hit sets in the same partition your previously good performance suddenly tanks.
 

Maybe that issue isn't real enough to be worth worrying about, but I still think it'd be easier and cleaner to try
keepingstuff on the free list first...
 

>> Another idea that'sbeen broughht up inthe past is to have something in the
>> background keep a minimum number of buffers on the free list. That's how OS
>> VM systems I'm familiar with work, so there's precedent for it.
>>
>> I recall there were at least some theoretical concerns about this, but I
>> don't remember if anyone actually tested the idea.
>
> Yes, having bgwriter do the actual cleaning up seems like a good idea.
> The whole bgwriter infrastructure will need some serious tuning. There
> are many things that could be shifted to background if we knew it
> could keep up, like hint bit setting on dirty buffers being flushed
> out. But again, we have the issue of having good tests to see where
> the changes hurt.

I think at some point we need to stop depending on just bgwriter for all this stuff. I believe it would be much cleaner
ifwe had separate procs for everything we needed (although some synergies might exist; if we wanted to set hint bits
duringwrite then bgwriter *is* the logical place to put that).
 

In this case, I don't think keeping stuff on the free list is close enough to checkpoints that we'd want bgwriter to
handleboth. At most we might want them to pass some metrics back in forth.
 
-- 
--
Jim C. Nasby, Data Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



pgsql-hackers by date:

Previous
From: Brendan Jurd
Date:
Subject: Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)
Next
From: Jim Nasby
Date:
Subject: Re: Page replacement algorithm in buffer cache