Re: WIP: dynahash replacement for buffer table - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: WIP: dynahash replacement for buffer table |
Date | |
Msg-id | CA+Tgmoa80iLreNDPhFVu856dcivsWF9x2sUcgNQ6Uy=PS56rWQ@mail.gmail.com Whole thread Raw |
In response to | Re: WIP: dynahash replacement for buffer table (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: WIP: dynahash replacement for buffer table
|
List | pgsql-hackers |
On Thu, Oct 16, 2014 at 6:53 PM, Andres Freund <andres@2ndquadrant.com> wrote: > When using shared_buffers = 96GB there's a performance benefit, but not > huge: > master (f630b0dd5ea2de52972d456f5978a012436115e): 153621.8 > master + LW_SHARED + lockless StrategyGetBuffer(): 477118.4 > master + LW_SHARED + lockless StrategyGetBuffer() + chash: 496788.6 > master + LW_SHARED + lockless StrategyGetBuffer() + chash-nomb: 499562.7 > > But with shared_buffers = 16GB: > master (f630b0dd5ea2de52972d456f5978a012436115e): 177302.9 > master + LW_SHARED + lockless StrategyGetBuffer(): 206172.4 > master + LW_SHARED + lockless StrategyGetBuffer() + chash: 413344.1 > master + LW_SHARED + lockless StrategyGetBuffer() + chash-nomb: 426405.8 Very interesting. This doesn't show that chash is the right solution, but it definitely shows that doing nothing is the wrong solution. It shows that, even with the recent bump to 128 lock manager partitions, and LW_SHARED on top of that, workloads that actually update the buffer mapping table still produce a lot of contention there. This hasn't been obvious to me from profiling, but the numbers above make it pretty clear. It also seems to suggest that trying to get rid of the memory barriers isn't a very useful optimization project. We might get a couple of percent out of it, but it's pretty small potatoes, so unless it can be done more easily than I suspect, it's probably not worth bothering with. An approach I think might have more promise is to have bufmgr.c call the CHash stuff directly instead of going through buf_table.c. Right now, for example, BufferAlloc() creates and initializes a BufferTag and passes a pointer to that buffer tag to BufTableLookup, which copies it into a BufferLookupEnt. But it would be just as easy for BufferAlloc() to put the BufferLookupEnt on its own stack, and then you wouldn't need to copy the data an extra time. Now a 20-byte copy isn't a lot, but it's completely unnecessary and looks easy to get rid of. > I had to play with setting max_connections+1 sometimes to get halfway > comparable results for master - unaligned data otherwise causes wierd > results otherwise. Without doing that the performance gap between master > 96/16G was even bigger. We really need to fix that... > > This is pretty awesome. Thanks. I wasn't quite sure how to test this or where the workloads that it would benefit would be found, so I appreciate you putting time into it. And I'm really glad to hear that it delivers good results. I think it would be useful to plumb the chash statistics into the stats collector or at least a debugging dump of some kind for testing. They include a number of useful contention measures,and I'd be interested to see what those look like on this workload. (If we're really desperate for every last ounce of performance, we could also disable those statistics in production builds. That's probably worth testing at least once to see if it matters much, but I kind of hope it doesn't.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: