Home > mailing lists

Re: patch: improve SLRU replacement algorithm - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: patch: improve SLRU replacement algorithm
Date	April 5, 2012 17:28:48
Msg-id	CA+TgmoZEzcdH4Pc22uvyCRA+FUYDtNrJqouJCu9ABRuF_weJ6Q@mail.gmail.com Whole thread Raw
In response to	Re: patch: improve SLRU replacement algorithm (Greg Stark <stark@mit.edu>)
List	pgsql-hackers

Tree view

On Thu, Apr 5, 2012 at 12:44 PM, Greg Stark <stark@mit.edu> wrote:
> On Thu, Apr 5, 2012 at 3:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I'm not sure I find those numbers all that helpful, but there they
>> are.  There are a couple of outliers beyond 12 s on the patched run,
>> but I wouldn't read anything into that; the absolute worst values
>> bounce around a lot from test to test.  However, note that every
>> bucket between 2s and 8s improves, sometimes dramatically.
>
> The numbers seem pretty compelling to me.

Thanks.

> They seem to indicate that
> you've killed one of the big source of stalls but that there are more
> lurking including at least one which causes small number of small
> stalls.

The data in my OP identifies the other things that can cause stalls >=
100 ms with considerable specificity.

> The only fear I have is that I'm still wondering what happens to your
> code when *all* the buffers become blocked on I/O. Can you catch
> whether this ever occurred in your test and explain what should happen
> in that case?

If all the buffers are I/O-busy, it just falls back to picking the
least-recently-used buffer, which is a reasonable heuristic, since
that I/O is likely to be done first.  However, when I ran this with
all the debugging instrumentation enabled, it reported no waits in
slru.c consistent with that situation ever having occurred.  So if
something like that did happen during the test run, it produced a wait
of less than 100 ms, but I think it's more likely that it never
happened at all.

I think part of the confusion here may relate to a previous discussion
about increasing the number of CLOG buffers.  During that discussion,
I postulated that increasing the number of CLOG buffers improved
performance because we could encounter a situation where every buffer
is I/O-busy, causing new backends that wanted to perform an I/O to
have to wait until some backend that had been doing an I/O finished
it.  It's now clear that I was totally wrong, because you don't need
to have every buffer busy before the next backend that needs a CLOG
page blocks on an I/O.  As soon as ONE backend blocks on a CLOG buffer
I/O, every other backend that needs to evict a page will pile up on
the same I/O.  I just assumed that we couldn't possibly be doing
anything that silly, but we are.

So here's my new theory: the real reason why increasing the number of
CLOG pages improved performance is because it caused dirty pages to
reach the tail of the LRU list less frequently.  It's particularly bad
if a page gets written and fsync'd but then someone still needs to
WRITE that page, so it gets pulled back in and written and fsync'd a
second time.  Such events are less likely with more buffers.  Of
course, increasing the number of buffers also decreases cache pressure
in general.  What's clear from these numbers as well is that there is
a tremendous amount of CLOG cache churn, and therefore we can infer
that most of those I/Os complete almost immediately - if they did not,
it would be impossible to replace 5000 CLOG buffers in a second no
matter how many backends you have.  It's the occasional I/Os that
don't completely almost immediately that are at issue here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Marko Kreen
Date: 05 April 2012, 17:05:37
Subject: Re: Speed dblink using alternate libpq tuple storage

From: Peter Eisentraut
Date: 05 April 2012, 17:33:44
Subject: Re: Fix PL/Python metadata when there is no result

Re: patch: improve SLRU replacement algorithm - Mailing list pgsql-hackers

Previous

Next