Re: our buffer replacement strategy is kind of lame - Mailing list pgsql-hackers

From Robert Haas
Subject Re: our buffer replacement strategy is kind of lame
Date
Msg-id CA+Tgmob793NeyRu0dHwBRWJFkobVwMpCSs1E7W9h1KsPe2vM1A@mail.gmail.com
Whole thread Raw
In response to Re: our buffer replacement strategy is kind of lame  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Sun, Aug 14, 2011 at 10:35 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Sun, Aug 14, 2011 at 1:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> The big problem with this idea is that it pretty much requires that
>> the work you mentioned in one of your other emails - separating the
>> background writer and checkpoint machinery into two separate processes
>> - to happen first.  So I'm probably going to have to code that up to
>> see whether this works.  If you're planning to post that patch soon
>> I'll wait for it.  Otherwise, I'm going to have to cobble together
>> something that is at least good enough for testing.
>
> No, the big problem with the idea is that regrettably it is just an
> idea on your part and has no basis in external theory or measurement.
> I would not object to you investigating such a path and I think you
> are someone that could invent something new and original there, but it
> seems much less likely to be fruitful, or at least would require
> significant experimental results to demonstrate an improvement in a
> wide range of use cases to the rest of the hackers.

All right, well, I'll mull over whether it's worth pursuing.  Unless I
or someone else comes up with an idea I like better, I think it
probably is.

> As to you not being able to work on your idea until I've split
> bgwriter/checkpoint, that's completely unnecessary, and you know it. A
> single ifdef is sufficient there, if at all.

Hmm.  Well, it might be unnecessary, but if I knew it were
unnecessary, I wouldn't have said that I thought it was necessary.

> The path I was working on (as shown in the earlier patch) was to apply
> some corrections to the existing algorithm to reduce its worst case
> behaviour. That's something I've seen mention of people doing for
> RedHat kernels.

Yeah.  Your idea is appealing because it bounds the amount of time .
There is some chance that you might kick out a really hot buffer if
there are a long series of such buffers in a row.  Without testing, I
don't know whether that's a serious problem or not.

> Overall, I think a minor modification is the appropriate path. If
> Linux or other OS already use ClockPro then we already benefit from
> it. It seems silly to track blocks that recently left shared buffers
> when they are very likely still actually in memory in the filesystem
> cache.

You may be right.  Basically, my concern is that buffer eviction is
too slow.  On a 32-core system, it's easy to construct a workload
where the whole system bottlenecks on the rate at which buffers can be
evicted and replaced - not because the system is fundamentally
incapable of copying data around that quickly, but because everything
piles up behind BufFreelistLock, and to a lesser extent the buffer
mapping locks.  Your idea may help with that, but I doubt that it's a
complete solution.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: our buffer replacement strategy is kind of lame
Next
From: Tom Lane
Date:
Subject: Re: VACUUM FULL versus TOAST