Re: patch: improve SLRU replacement algorithm - Mailing list pgsql-hackers

From Robert Haas
Subject Re: patch: improve SLRU replacement algorithm
Date
Msg-id CA+TgmoZi_SU_824gprzHGpdxPmKUGAKPtz476iDmVFfiVtEzEA@mail.gmail.com
Whole thread Raw
In response to Re: patch: improve SLRU replacement algorithm  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: patch: improve SLRU replacement algorithm
List pgsql-hackers
On Wed, Apr 4, 2012 at 4:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Measurement?
>
> Sounds believable, I just want to make sure we have measured things.

Yes, I measured things.  I didn't post the results because they're
almost identical to the previous set of results which I already
posted.  That is, I wrote the patch; I ran it through the
instrumentation framework; the same long waits with the same set of
file/line combinations were still present.  Then I wrote the patch
that is attached to the OP, and also tested that, and those long waits
went away completely.

> I believe that, but if all buffers are I/O busy we should avoid
> waiting on a write I/O if possible.

I thought about that, but I don't see that there's any point in
further complicating the algorithm.  The current patch eliminates ALL
the long waits present in this code path, which means that the
situation where every CLOG buffer is I/O-busy at the same time either
never happens, or never causes any significant stalls.  I think it's a
bad idea to make this any more complicated than is necessary to do the
right thing in real-world cases.

> That seems much smarter. I'm thinking this should be back patched
> because it appears to be fairly major, so I'm asking for some more
> certainty that every thing you say here is valid. No doubt much of it
> is valid, but that's not enough.

Yeah, I was thinking about that.  What we're doing right now seems
pretty stupid, so maybe it's worth considering a back-patch.  OTOH,
I'm usually loathe to tinker with performance in stable releases.
I'll defer to the opinions of others on this point.

>> Applying this patch does in fact eliminate the stalls.
>
> I'd like to see that measured from a user perspective. It would be
> good to see the response time distribution for run with and without
> the patch.

My feeling is that you're not going to see very much difference in a
latency-by-second graph, because XLogInsert is responsible for lots
and lots of huge stalls also.  That's going to mask the impact of
fixing this problem.  However, it's not much work to run the test, so
I'll do that.

>> 2. I think we might want to revisit Simon's idea of background-writing
>> SLRU pages.
>
> Agreed. No longer anywhere near as important. I'll take a little
> credit for identifying the right bottleneck, since you weren't a
> believer before.

I don't think I ever said it was a bad idea; I just couldn't measure
any benefit.  I think now we know why, or at least have a clue; and
maybe some ideas about how to measure it better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: patch: bytea_agg
Next
From: Robert Haas
Date:
Subject: Re: patch: improve SLRU replacement algorithm