Re: Page replacement algorithm in buffer cache - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Page replacement algorithm in buffer cache
Date
Msg-id 20130402103239.GC2415@alap2.anarazel.de
Whole thread Raw
In response to Re: Page replacement algorithm in buffer cache  (Jim Nasby <jim@nasby.net>)
Responses Re: Page replacement algorithm in buffer cache
List pgsql-hackers
On 2013-04-01 17:56:19 -0500, Jim Nasby wrote:
> On 3/23/13 7:41 AM, Ants Aasma wrote:
> >Yes, having bgwriter do the actual cleaning up seems like a good idea.
> >The whole bgwriter infrastructure will need some serious tuning. There
> >are many things that could be shifted to background if we knew it
> >could keep up, like hint bit setting on dirty buffers being flushed
> >out. But again, we have the issue of having good tests to see where
> >the changes hurt.
> 
> I think at some point we need to stop depending on just bgwriter for all this stuff. I believe it would be much
cleanerif we had separate procs for everything we needed (although some synergies might exist; if we wanted to set hint
bitsduring write then bgwriter *is* the logical place to put that).
 
> 
> In this case, I don't think keeping stuff on the free list is close enough to checkpoints that we'd want bgwriter to
handleboth. At most we might want them to pass some metrics back in forth.
 

bgwriter isn't doing checkpoints anymore, there's the checkpointer since 9.2.

In my personal experience and measurement bgwriter is pretty close to
useless right now. I think - pretty similar to what Amit has done - it
should perform part of a real clock sweep instead of just looking ahead
of the current position without changing usagecounts and the sweep
position and put enough buffers on the freelist to sustain the need till
its next activity phase. I hacked around that one night in a hotel and
got impressive speedups (and quite some breakage) for bigger than s_b
workloads.

That would reduce quite a bit of pain points:
- fewer different processes/cpus looking at buffer headers ahead in the cycle
- fewer cpus changing usagecounts
- dirty pages are far more likely to be flushed out already when a new page is needed
- stuff like the relation extension lock which right now frequently have to search far and wide for new pages while
holdingthe extension lock exlusively should finish quite a bit faster
 

If the freelist lock is separated from the lock protecting the clock
sweep this should get quite a bit of a scalability boost without having
potential unfairness you can have with partitioning the lock or such.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: regression test failed when enabling checksum
Next
From: Peter Eisentraut
Date:
Subject: Re: citext like searches using index