Re: patch: improve SLRU replacement algorithm - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: patch: improve SLRU replacement algorithm
Date
Msg-id CAMkU=1zCbtHyKZbs52R-QTzRN2j0c2hZvEvtsDAxCN02-pkUbw@mail.gmail.com
Whole thread Raw
In response to Re: patch: improve SLRU replacement algorithm  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: patch: improve SLRU replacement algorithm  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Apr 5, 2012 at 7:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Apr 5, 2012 at 9:29 AM, Greg Stark <stark@mit.edu> wrote:
>> On Thu, Apr 5, 2012 at 2:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Sorry, I don't understand specifically what you're looking for.  I
>>> provided latency percentiles in the last email; what else do you want?
>>
>> I think he wants how many waits were there that were between 0 and 1s
>> how many between 1s and 2s, etc. Mathematically it's equivalent but I
>> also have trouble visualizing just how much improvement is represented
>> by 90th percentile dropping from 1688 to 1620 (ms?)
>
> Yes, milliseconds.  Sorry for leaving out that detail.  I've run these
> scripts so many times that my eyes are crossing.  Here are the
> latencies, bucketized by seconds, first for master and then for the
> patch, on the same test run as before:
>
> 0 26179411
> 1 3642
> 2 660
> 3 374
> 4 166
> 5 356
> 6 41
> 7 8
> 8 56
> 9 0
> 10 0
> 11 21
> 12 11
>
> 0 26199130
> 1 4840
> 2 267
> 3 290
> 4 40
> 5 77
> 6 36
> 7 3
> 8 2
> 9 33
> 10 37
> 11 2
> 12 1
> 13 4
> 14 5
> 15 3
> 16 0
> 17 1
> 18 1
> 19 1
>
> I'm not sure I find those numbers all that helpful, but there they
> are.  There are a couple of outliers beyond 12 s on the patched run,
> but I wouldn't read anything into that; the absolute worst values
> bounce around a lot from test to test.  However, note that every
> bucket between 2s and 8s improves, sometimes dramatically.


However, if it "improved" a bucket by pushing the things out of it
into a higher bucket, that is not really an improvement.  At 8 seconds
*or higher*, for example, it goes from 88 things in master to 90
things in the patch.

Maybe something like a Kaplan-Meier survival curve analysis would be
the way to go (where a long transaction "survival" is bad).  But
probably overkill.

What were full_page_writes and wal_buffers set to for these runs?


> It's worth
> keeping in mind here that the system is under extreme I/O strain on
> this test, and the kernel responds by forcing user processes to sleep
> when they try to do I/O.

Should the tests be dialed back a bit so that the I/O strain is less
extreme?  Analysis is probably best done right after where the
scalability knee is, not long after that point where the server has
already collapsed to a quivering mass.

Cheers,

Jeff


pgsql-hackers by date:

Previous
From: Marko Kreen
Date:
Subject: Re: Speed dblink using alternate libpq tuple storage
Next
From: Greg Stark
Date:
Subject: Re: patch: improve SLRU replacement algorithm