Re: WIP: dynahash replacement for buffer table - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: WIP: dynahash replacement for buffer table |
Date | |
Msg-id | CA+Tgmoak-hK4k-sjCBXQvBR3BZ-Tib3zx_0xB1nb6DgL0fiH=w@mail.gmail.com Whole thread Raw |
In response to | Re: WIP: dynahash replacement for buffer table (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: WIP: dynahash replacement for buffer table
(Andres Freund <andres@2ndquadrant.com>)
|
List | pgsql-hackers |
On Wed, Nov 5, 2014 at 6:19 PM, Andres Freund <andres@2ndquadrant.com> wrote: > * In benchmarks it becomes apparent that the dynamic element width makes > some macros like CHashTableGetNode() and > CHashTableGetGarbageByBucket() quite expensive. At least gcc on x86 > can't figure how to build an efficient expression for the > target. There's two ways to address this: > a) Actually morph chash into something that will be included into the > user sites. Since I think there'll only be a limited number of those > that's probably acceptable. Have you benchmarked this? How much does it help? > b) Simplify the access rules. I quite seriously doubt that the > interleaving of garbage and freelists is actually necessary/helpful. That wasn't added for nothing. I did a bunch of performance testing on this vs. dynahash (I think the test code is included in the patch) and the region of memory containing the freelists practically caught fire. Spreading them out like this greatly improved the performance under heavy concurrency, but even with those changes the freelists were extremely hot. Now, since this is the buffer mapping table rather than a tight loop around hash table manipulation, the difference may not be enough to measure. But on a microbenchmark it *definitely* was. > * There's several cases where it's noticeable that chash creates more > cacheline misses - that's imo a good part of why the single threaded > performance regresses. There's good reasons for the current design > without a inline bucket, namely that it makes the concurrency handling > easier. But I think that can be countered by still storing a pointer - > just to an element inside the bucket. Afaics that'd allow this to be > introduced quite easily? It's not obvious to me how that would work. If you just throw those elements on the garbage lists with everything else, it will soon be the case that one bucket header ends up using the cell stored in some other bucket, which isn't going to be awesome. So you need some kind of more complex scheme to preserve locality. > I'm afraid that I can't see us going forward unless we address the > noticeable single threaded penalties. Do you see that differently? I'm not sure. We're talking about something on the order of half a percent on my tests, and lots of changes cause things to bounce around by that much. I'm not sure it makes sense to get too worked up about that if the alternative is to add a big pile of new complexity. > * There's some whitespace damage. (space followed by tab, followed by > actual contents) That's the least of our problems at this point. > * I still think it's a good idea to move the full memory barriers into > the cleanup/writing process by doing write memory barriers when > acquiring a hazard pointer and RMW atomic operations (e.g. atomic add) > in the process testing for the hazard pointer. Can you cite any existing precedent in other system software? Does Linux do anything like that, for example? > * Shouldn't we relax the CPU in a couple places like CHashAllocate(), > CHashBucketScan()'s loops? You mean like, have it sleep the way SpinLockAcquire() does? Yeah, possibly, but that adds non-trivial code complexity which may not be worth it if - as is hoped for - there's no real contention there. > * I don't understand right now (but then I'm in a Hotel bar) why it's > safe for CHashAddToGarbage() to clobber the hash value? > CHashBucketScan() relies the chain being ordered by hashcode. No? > Another backend might just have been past the IsMarked() bit and look > at the hashcode? I think the worst case scenario is that we falsely conclude that there's a hash match when there really isn't. The ensuing memcmp() will clarify matters. > * We really should find a way to sensibly print the stats. Yep. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: