Re: [Testperf-general] Re: First set of OSDL Shared Memscalability results, some wierdness ... - Mailing list pgsql-performance
From | Tom Lane |
---|---|
Subject | Re: [Testperf-general] Re: First set of OSDL Shared Memscalability results, some wierdness ... |
Date | |
Msg-id | 10202.1097858894@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [Testperf-general] Re: First set of OSDL Shared Memscalability results, some wierdness ... ("Simon Riggs" <simon@2ndquadrant.com>) |
Responses |
Re: [Testperf-general] Re: First set of OSDL Shared Memscalability
Re: [Testperf-general] Re: First set of OSDL Shared Memscalability results, some wierdness ... |
List | pgsql-performance |
"Simon Riggs" <simon@2ndquadrant.com> writes: > Speculating wildly because I don't know that portion of the code this might > be: > CONJECTURE 1: the act of searching for a block in cache is an O(n) > operation, not an O(1) or O(log n) operation I'm not sure how this meme got into circulation, but I've seen a couple of people recently either conjecturing or asserting that. Let me remind people of the actual facts: 1. We use a hashtable to keep track of which blocks are currently in shared buffers. Either a cache hit or a cache miss should be O(1), because the hashtable size is scaled proportionally to shared_buffers, and so the number of hash entries examined should remain constant. 2. There are some allegedly-not-performance-critical operations that do scan through all the buffers, and therefore are O(N) in shared_buffers. I just eyeballed all the latter, and came up with this list of O(N) operations and their call points: AtEOXact_Buffers transaction commit or abort UnlockBuffers transaction abort, backend exit StrategyDirtyBufferList background writer's idle loop FlushRelationBuffers VACUUM DROP TABLE, DROP INDEX TRUNCATE, CLUSTER, REINDEX ALTER TABLE SET TABLESPACE DropRelFileNodeBuffers TRUNCATE (only for ON COMMIT TRUNC temp tables) REINDEX (inplace case only) smgr_internal_unlink (ie, the tail end of DROP TABLE/INDEX) DropBuffers DROP DATABASE The fact that the first two are called during transaction commit/abort is mildly alarming. The constant factors are going to be very tiny though, because what these routines actually do is scan backend-local status arrays looking for locked buffers, which they're not going to find very many of. For instance AtEOXact_Buffers looks like int i; for (i = 0; i < NBuffers; i++) { if (PrivateRefCount[i] != 0) { // some code that should never be executed at all in the commit // case, and not that much in the abort case either } } I suppose with hundreds of thousands of shared buffers this might get to the point of being noticeable, but I've never seen it show up at all in profiling with more-normal buffer counts. Not sure if it's worth devising a more complex data structure to aid in finding locked buffers. (To some extent this code is intended to be belt-and-suspenders stuff for catching omissions elsewhere, and so a more complex data structure that could have its own bugs is not especially attractive.) The one that's bothering me at the moment is StrategyDirtyBufferList, which is a new overhead in 8.0. It wouldn't directly affect foreground query performance, but indirectly it would hurt by causing the bgwriter to suck more CPU cycles than one would like (and it holds the BufMgrLock while it's doing it, too :-(). One easy way you could see whether this is an issue in the OSDL test is to see what happens if you double all three bgwriter parameters (delay, percent, maxpages). This should result in about the same net I/O demand from the bgwriter, but StrategyDirtyBufferList will be executed half as often. I doubt that the other ones are issues. We could improve them by devising a way to quickly find all buffers for a given relation, but I am just about sure that complicating the buffer management to do so would be a net loss for normal workloads. > For the record, what I think we need is dynamically resizable > shared_buffers, not a-priori knowledge of what you should set > shared_buffers to. This isn't likely to happen because the SysV shared memory API isn't conducive to it. Absent some amazingly convincing demonstration that we have to have it, the effort of making it happen in a portable way isn't going to get spent. > I've been thinking about implementing a scheme that helps you decide how > big the shared_buffers SHOULD BE, by making the LRU list bigger than the > cache itself, so you'd be able to see whether there is beneficial effect in > increasing shared_buffers. ARC already keeps such a list --- couldn't you learn what you want to know from the existing data structure? It'd be fairly cool if we could put out warnings "you ought to increase shared_buffers" analogous to the existing facility for noting excessive checkpointing. regards, tom lane
pgsql-performance by date: