Re: Shared locking in slru.c - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Shared locking in slru.c |
Date | |
Msg-id | 2237.1133814758@sss.pgh.pa.us Whole thread Raw |
In response to | Shared locking in slru.c (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
I wrote: > The way the attached patch attacks this is for the shared-lock access > case to simply set the page's LRU counter to zero, without bumping up > the LRU counters of the other pages as the normal adjustment would do. > ... > I'm not totally happy with this heuristic, though, and was > wondering if anyone had a better idea. Anyone seen a lock-free > data structure for LRU or approximately-LRU state tracking? I've come up with what seems a slightly better way to do this. The idea is to replace the current structure for tracking which page is the least- recently-used with this: typedef struct SlruSharedData { ... /*---------- * We mark a page "most recently used" by setting * page_lru_count[slotno] = ++cur_lru_count; * The oldest page is therefore the one with the highest value of * cur_lru_count - page_lru_count[slotno] *---------- */ int cur_lru_count; ... int page_lru_count[NUM_SLRU_BUFFERS]; ... which makes the SlruRecentlyUsed macro look like #define SlruRecentlyUsed(shared, slotno) \ do { \ int new_lru_count = (shared)->cur_lru_count; \ if(new_lru_count != (shared)->page_lru_count[slotno]) { \ (shared)->cur_lru_count = ++new_lru_count; \ (shared)->page_lru_count[slotno] = new_lru_count; \ } \ } while (0) and SlruSelectLRUPage() has to look for the maximum value of "cur_count - shared->page_lru_count[slotno]" rather than just "shared->page_lru_count[slotno]" as before. This seems like a win in any case since it takes cycles out of the commonly used path at a small increase in the cost of SlruSelectLRUPage --- but in that routine you are about to do I/O anyway, so a few extra subtractions are negligible. However, the real reason for doing this is that I think it's OK for the proposed SimpleLruReadPage_ReadOnly routine to apply this form of SlruRecentlyUsed even though it holds only a shared lock. Assuming that int reads and writes are atomic, the only possible failures are 1. If a process running the macro is delayed, it might store a stale (hence too small) value back into cur_lru_count or a page_lru_count array element after someone else has incremented them to a larger value. 2. Two processes might read the same cur_lru_count value at the same time, so that one of their increments is lost. This has the same end effect as #1, though. Given reasonable care in SlruSelectLRUPage (see attached excerpt), we can defend against these scenarios and usually still make a reasonable choice of which page to evict. In any case, the worst possible scenario is that we make a nonoptimal choice of page to evict due to confused lru_count state. This price seems well worth the chance to reduce contention for shared memory. Thoughts, objections? regards, tom lane /* * If we find any EMPTY slot, just select that one. Else locate the * least-recently-used slot toreplace. * * Normally the page_lru_count values will all be different and so * there will be a well-definedLRU page. But since we allow * concurrent execution of SlruRecentlyUsed() within * SimpleLruReadPage_ReadOnly(),it is possible that multiple pages * acquire the same lru_count values. In that casewe break ties by * choosing the furthest-back page. * * In no case will we select the slot containinglatest_page_number * for replacement, even if it appears least recently used. * * Notice thatthis next line forcibly advances cur_lru_count to a * value that is certainly beyond any value that will be inthe * page_lru_count array after the loop finishes. This ensures that * the next execution of SlruRecentlyUsedwill give us good data, * even if it's against a page that has the current counter value. */ cur_count = (shared->cur_lru_count)++; best_delta = -1; bestslot = 0; /* no-op, just keepscompiler quiet */ best_page_number = 0; /* ditto */ for (slotno = 0; slotno < NUM_SLRU_BUFFERS; slotno++) { int this_delta; int this_page_number; if (shared->page_status[slotno] == SLRU_PAGE_EMPTY) return slotno; this_delta = cur_count- shared->page_lru_count[slotno]; if (this_delta < 0) { /* * Cleanup in case shared updates have caused cur_count * increments to get "lost". We back off the page counts, * rather than trying to increase cur_count, to avoid any * question of infinite loopsor failure in the presence of * wrapped-around counts. */ shared->page_lru_count[slotno]= cur_count; this_delta = 0; } this_page_number = shared->page_number[slotno]; if ((this_delta > best_delta || (this_delta == best_delta && ctl->PagePrecedes(this_page_number, best_page_number))) && this_page_number != shared->latest_page_number) { bestslot = slotno; best_delta = this_delta; best_page_number = this_page_number; } }
pgsql-hackers by date: