[PATCH] Let's get rid of the freelist and the buffer_strategy_lock - Mailing list pgsql-hackers
From | Greg Burd |
---|---|
Subject | [PATCH] Let's get rid of the freelist and the buffer_strategy_lock |
Date | |
Msg-id | E2D6FCDC-BE98-4F95-B45E-699C3E17BA10@burd.me Whole thread Raw |
Responses |
Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock
Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock |
List | pgsql-hackers |
Hello, In conversations [1] recently about considering how best to adapt the code to become NUMA-aware Andres commented, "FWIW,I've started to wonder if we shouldn't just get rid of the freelist entirely" and because I'm a glutton for punishment(and I think this idea has some merit) I took him up on this task. In freelist.c the function StrategyGetBuffer() currently tries first to use the BufferAccessStrategy, if present, via GetBufferFromRing(). Failing that, the second step is to check the "freelist" as defined by StrategyControl->firstFreeBuffer/lastFreeBufferto determine if it contains any available buffers, and finally it will "Usethe clock-sweep algorithm to find a free buffer." The freelist was intended to be "a list of buffers that are primecandidates for replacement" but I question the value of that versus the overhead of managing it. Without the list someoperations are likely (I plan to measure this) faster due, as Anders points out in [1], "just using clock sweep actuallymakes things like DROP TABLE perform better because it doesn't need to maintain the freelist anymore." It may bethe case that with very large NBuffers where most are in use that this approach is slower (I plan to measure this too),but in those cases I'd imagine the freelist is likely empty and so the code will use the clock-sweep algorithm anyway,so I'm not sure there is a performance penalty at all. This change does remove the have_free_buffer() function used by the contrib/pg_prewarm module. On the surface this doesn'tseem to cause any issues, but honestly I've not thought too deeply on this one. v2-0001 Eliminate the freelist from the buffer manager and depend on clock-sweep. Once removed [2] and tests passing [3] I took a long hard look at the buffer_strategy_lock that used to serialize concurrentaccess to members of BufferStrategyControl and I couldn't find a good reason to keep it around. Let's review whatit is guarding: completePasses: a count of the number of times the clock-sweep hand wraps around. StrategySyncStart() provides this to thebgwriter which in turn uses it to compute a strategic location at which to start scanning for pages to evict. There'san interesting comment that indicates both a "requirement" and an equivocal "but that's highly unlikely and wouldn'tbe particularly harmful" statement conflicting with itself. I tried to find a reason that nextVictimBuffers couldoverflow or that the use of the completePasses value could somehow cause harm if off by one or more in the bgwriterand either I missed it (please tell me) or there isn't one. However, it does make sense to change completePassesinto an atomic value so that it is consistent across backends and in the bgwriter. bgwprocno: when not -1 is the PID of the allocation notification latch (ProcGlobal->allProcs[bgwprocno].procLatch). Thisis a "power savings" feature where the goal is to signal the bgwriter "When a backend starts using buffers again, itwill wake us up by setting our latch." Here the code reads, "Since we don't want to rely on a spinlock for this we forcea read from shared memory once, and then set the latch based on that value." and uses INT_ACCESS_ONCE() to read thevalue and set the latch. The function StrategyNotifyBgWriter() is where bgwprocno is set, I see no reason to use atomicor other synchronization here. And that's all there is to it now that the freelist is gone. As a result, IMO it seems unnecessary to require a spin lockfor access to BufferStrategyControl. v2-0002 Remove the buffer_strategy_lock. This attached patch is also a branch on GitHub [2] if that is of interest or helpful, it passes check world [3] (I use theGitHub PR to trigger CirrusCI tests, not as the way to convey the change set). I also made a few minor changes such that we're consistently referring to "clock-sweep" (not "clock sweep" or "clocksweep"),I'm not wedded to those but consistency isn't a bad thing, right? As an aside, we're really implementing "generalized CLOCK" [4][5] which uses counters rather a single bit as pointed out[6] in the CLOCK-Pro [7] paper, but I digress. I'd like to hear ideas for worst cases to test and/or benchmark. I plan on attempting what I saw Michael do with flamegraphsbefore/after/delta in a follow up to this. If people feel this has merit I'll add it to CF/2. thank you for your time considering this idea, -greg [1] https://postgr.es/m/ndvygkpdx44pmi4xbkf52gfrl77cohpefr42tipvd5dgiaeuyd%40fe2og2kxyjnc [2] https://github.com/gburd/postgres/tree/gregburd/rm-freelist/patch-v2 [3] https://github.com/gburd/postgres/pull/4/checks [4] V. F. Nicola, A. Dan, and D. M. Dias, "Analysis of the Generalized Clock Buffer Replacement Scheme for Database TransactionProcessing", Proceeding of 1992 ACM SIGMETRICS Conference, June 1992, pp. 35-46. [5] A. J. Smith, "Sequentiality and Prefetching in Database Systems", ACM Trans. on Database Systems, Vol. 3, No. 3, 1978,pp. 223-247. [6] "In a generalized CLOCK version called GCLOCK [25,17], a counter is associated with each page rather than a single bit.Its counter will be incremented if a page is hit. The cycling clock hand sweeps over the pages decrementing their countersuntil a page whose counter is zero is found for replacement." [7] CLOCK-Pro: An Effective Improvement of the CLOCK Replacement https://www.usenix.org/legacy/event/usenix05/tech/general/full_papers/jiang/jiang.pdf
Attachment
pgsql-hackers by date: