Re: Page replacement algorithm in buffer cache - Mailing list pgsql-hackers
From | Jim Nasby |
---|---|
Subject | Re: Page replacement algorithm in buffer cache |
Date | |
Msg-id | 515A1393.9090908@nasby.net Whole thread Raw |
In response to | Re: Page replacement algorithm in buffer cache (Merlin Moncure <mmoncure@gmail.com>) |
Responses |
Re: Page replacement algorithm in buffer cache
|
List | pgsql-hackers |
On 4/1/13 4:55 PM, Merlin Moncure wrote: > On Mon, Apr 1, 2013 at 4:09 PM, Andres Freund<andres@2ndquadrant.com> wrote: >> >On 2013-04-01 08:28:13 -0500, Merlin Moncure wrote: >>> >>On Sun, Mar 31, 2013 at 1:27 PM, Jeff Janes<jeff.janes@gmail.com> wrote: >>>> >> >On Friday, March 22, 2013, Ants Aasma wrote: >>>>> >> >> >>>>> >> >>On Fri, Mar 22, 2013 at 10:22 PM, Merlin Moncure<mmoncure@gmail.com> >>>>> >> >>wrote: >>>>>> >> >> >well if you do a non-locking test first you could at least avoid some >>>>>> >> >> >cases (and, if you get the answer wrong, so what?) by jumping to the >>>>>> >> >> >next buffer immediately. if the non locking test comes good, only >>>>>> >> >> >then do you do a hardware TAS. >>>>>> >> >> > >>>>>> >> >> >you could in fact go further and dispense with all locking in front of >>>>>> >> >> >usage_count, on the premise that it's only advisory and not a real >>>>>> >> >> >refcount. so you only then lock if/when it's time to select a >>>>>> >> >> >candidate buffer, and only then when you did a non locking test first. >>>>>> >> >> > this would of course require some amusing adjustments to various >>>>>> >> >> >logical checks (usage_count <= 0, heh). >>>>> >> >> >>>>> >> >>Moreover, if the buffer happens to miss a decrement due to a data >>>>> >> >>race, there's a good chance that the buffer is heavily used and >>>>> >> >>wouldn't need to be evicted soon anyway. (if you arrange it to be a >>>>> >> >>read-test-inc/dec-store operation then you will never go out of >>>>> >> >>bounds) However, clocksweep and usage_count maintenance is not what is >>>>> >> >>causing contention because that workload is distributed. The issue is >>>>> >> >>pinning and unpinning. >>>> >> > >>>> >> > >>>> >> >That is one of multiple issues. Contention on the BufFreelistLock is >>>> >> >another one. I agree that usage_count maintenance is unlikely to become a >>>> >> >bottleneck unless one or both of those is fixed first (and maybe not even >>>> >> >then) >>> >> >>> >>usage_count manipulation is not a bottleneck but that is irrelevant. >>> >>It can be affected by other page contention which can lead to priority >>> >>inversion. I don't be believe there is any reasonable argument that >>> >>sitting and spinning while holding the BufFreelistLock is a good idea. >> > >> >In my experience the mere fact of (unlockedly, but still) accessing all the >> >buffer headers can cause noticeable slowdowns in write only/mostly workloads with >> >big amounts of shmem. >> >Due to the write only nature large amounts of the buffers have a similar >> >usagecounts (since they are infrequently touched after the initial insertion) >> >and there are no free ones around so the search for a buffer frequently runs >> >through*all* buffer headers multiple times till it decremented all usagecounts >> >to 0. Then comes a period where free buffers are found easily (since all >> >usagecounts from the current sweep point onwards are zero). After that it >> >starts all over. >> >I now have seen that scenario multiple times:( > Interesting -- I was thinking about that too, but it's a separate > problem with a different trigger. Maybe a bailout should be in there > so that after X usage_count adjustments the sweeper summarily does an > eviction, or maybe the "max" declines from 5 once per hundred buffers > inspected or some such. What's the potential downside on that though? IE: what happens if this scheme suddenly evicts the root page on a heavilyused index? You'll suddenly have a ton of stuff blocked waiting for that page to come back in. This is a use case that I think would benefit greatly from a background process that keeps pages in the free list. That said, I now suspect that your "frightened turtle" approach would be of higher value than "bgfreelist"... but I suspectwe'll ultimately want both of them for different reasons. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
pgsql-hackers by date: