Re: Shared locking in slru.c - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Shared locking in slru.c
Date
Msg-id 2237.1133814758@sss.pgh.pa.us
Whole thread Raw
In response to Shared locking in slru.c  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:
> The way the attached patch attacks this is for the shared-lock access
> case to simply set the page's LRU counter to zero, without bumping up
> the LRU counters of the other pages as the normal adjustment would do.
> ...
> I'm not totally happy with this heuristic, though, and was
> wondering if anyone had a better idea.  Anyone seen a lock-free
> data structure for LRU or approximately-LRU state tracking?

I've come up with what seems a slightly better way to do this.  The idea
is to replace the current structure for tracking which page is the least-
recently-used with this:

typedef struct SlruSharedData
{   ...
   /*----------    * We mark a page "most recently used" by setting    *        page_lru_count[slotno] =
++cur_lru_count;   * The oldest page is therefore the one with the highest value of    *        cur_lru_count -
page_lru_count[slotno]   *----------    */   int            cur_lru_count;
 
   ...
   int            page_lru_count[NUM_SLRU_BUFFERS];
   ...

which makes the SlruRecentlyUsed macro look like

#define SlruRecentlyUsed(shared, slotno) \   do { \       int        new_lru_count = (shared)->cur_lru_count; \
if(new_lru_count != (shared)->page_lru_count[slotno]) { \           (shared)->cur_lru_count = ++new_lru_count; \
  (shared)->page_lru_count[slotno] = new_lru_count; \       } \   } while (0)
 

and SlruSelectLRUPage() has to look for the maximum value of
"cur_count - shared->page_lru_count[slotno]" rather than just
"shared->page_lru_count[slotno]" as before.  This seems like a win
in any case since it takes cycles out of the commonly used path at
a small increase in the cost of SlruSelectLRUPage --- but in that
routine you are about to do I/O anyway, so a few extra subtractions
are negligible.

However, the real reason for doing this is that I think it's OK for
the proposed SimpleLruReadPage_ReadOnly routine to apply this form
of SlruRecentlyUsed even though it holds only a shared lock.  Assuming
that int reads and writes are atomic, the only possible failures are

1. If a process running the macro is delayed, it might store a stale
(hence too small) value back into cur_lru_count or a page_lru_count
array element after someone else has incremented them to a larger value.

2. Two processes might read the same cur_lru_count value at the same
time, so that one of their increments is lost.  This has the same end
effect as #1, though.

Given reasonable care in SlruSelectLRUPage (see attached excerpt), we
can defend against these scenarios and usually still make a reasonable
choice of which page to evict.  In any case, the worst possible scenario
is that we make a nonoptimal choice of page to evict due to confused
lru_count state.  This price seems well worth the chance to reduce
contention for shared memory.

Thoughts, objections?
        regards, tom lane

       /*        * If we find any EMPTY slot, just select that one. Else locate the        * least-recently-used slot
toreplace.        *        * Normally the page_lru_count values will all be different and so        * there will be a
well-definedLRU page.  But since we allow        * concurrent execution of SlruRecentlyUsed() within        *
SimpleLruReadPage_ReadOnly(),it is possible that multiple pages        * acquire the same lru_count values.  In that
casewe break ties by        * choosing the furthest-back page.        *        * In no case will we select the slot
containinglatest_page_number        * for replacement, even if it appears least recently used.        *        * Notice
thatthis next line forcibly advances cur_lru_count to a        * value that is certainly beyond any value that will be
inthe        * page_lru_count array after the loop finishes.  This ensures that        * the next execution of
SlruRecentlyUsedwill give us good data,        * even if it's against a page that has the current counter value.
*/      cur_count = (shared->cur_lru_count)++;       best_delta = -1;       bestslot = 0;            /* no-op, just
keepscompiler quiet */       best_page_number = 0;    /* ditto */       for (slotno = 0; slotno < NUM_SLRU_BUFFERS;
slotno++)      {           int            this_delta;           int            this_page_number;
 
           if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)               return slotno;           this_delta =
cur_count- shared->page_lru_count[slotno];           if (this_delta < 0)           {               /*                *
Cleanup in case shared updates have caused cur_count                * increments to get "lost".  We back off the page
counts,               * rather than trying to increase cur_count, to avoid any                * question of infinite
loopsor failure in the presence of                * wrapped-around counts.                */
shared->page_lru_count[slotno]= cur_count;               this_delta = 0;           }           this_page_number =
shared->page_number[slotno];          if ((this_delta > best_delta ||                (this_delta == best_delta &&
         ctl->PagePrecedes(this_page_number, best_page_number))) &&               this_page_number !=
shared->latest_page_number)          {               bestslot = slotno;               best_delta = this_delta;
    best_page_number = this_page_number;           }       }
 


pgsql-hackers by date:

Previous
From: Gustavo Tonini
Date:
Subject: Replication on the backend
Next
From: Tom Lane
Date:
Subject: Re: [PATCHES] snprintf() argument reordering not working