Re: our buffer replacement strategy is kind of lame - Mailing list pgsql-hackers

From Robert Haas
Subject Re: our buffer replacement strategy is kind of lame
Date
Msg-id CA+TgmobYD_dDd1ipBJ0t9a99=3PfiYQJuXD2jeXrO7N_yjyq0g@mail.gmail.com
Whole thread Raw
In response to Re: our buffer replacement strategy is kind of lame  (Jim Nasby <jim@nasby.net>)
List pgsql-hackers
On Tue, Jan 3, 2012 at 6:22 PM, Jim Nasby <jim@nasby.net> wrote:
> On Jan 3, 2012, at 11:15 AM, Robert Haas wrote:
>>> So you don't think a freelist is worth having, but you want a list of
>>> allocation targets.
>>> What is the practical difference?
>>
>> I think that our current freelist is practically useless, because it
>> is almost always empty, and the cases where it's not empty (startup,
>> and after a table or database drop) are so narrow that we don't really
>> get any benefit out of having it.  However, I'm not opposed to the
>> idea of a freelist in general: I think that if we actually put in some
>> effort to keep the freelist in a non-empty state it would help a lot,
>> because backends would then have much less work to do at buffer
>> allocation time.
>
> This is exactly what the FreeBSD VM system does (which is at least one of the places where the idea of a clock sweep
forPG came from ages ago). There is a process that does nothing but attempt to keep X amount of memory on the free
list,where it can immediately be grabbed by anything that needs memory. Pages on the freelist are guaranteed to be
clean(as in not dirty), but not zero'd. In fact, IIRC if a page on the freelist gets referenced again it can be pulled
backout of the free list and put back into an active state. 
>
> The one downside I see to this is that we'd need some heuristic to determine how many buffers we want to keep on the
freelist. 

Fortuitously, I believe the background writer already has most of the
necessary logic: it attempts to predict how many buffers are about to
be needed - I think based on a decaying average.

Actually, I think that logic could use some improvement, because I
believe I've heard Greg Smith comment that it's often necessary to
tune bgwriter_delay downward.  It'd be nice to make the delay adaptive
somehow, to avoid the need for manual tuning (and unnecessary wake-ups
when the system goes idle).

But possibly the existing logic is good enough for a first cut.
However, in the interest of full disclosure, I'll admit that I've done
no testing in this area at all and am talking mostly out of my
posterior.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Daniel Farina
Date:
Subject: pg_internal.init and an index file have the same inode
Next
From: Robert Haas
Date:
Subject: Re: Should I implement DROP INDEX CONCURRENTLY?