Re: Patch: fix lock contention for HASHHDR.mutex - Mailing list pgsql-hackers
From | Aleksander Alekseev |
---|---|
Subject | Re: Patch: fix lock contention for HASHHDR.mutex |
Date | |
Msg-id | 20160208133937.648f1fcf@fujitsu Whole thread Raw |
In response to | Re: Patch: fix lock contention for HASHHDR.mutex (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Patch: fix lock contention for HASHHDR.mutex
|
List | pgsql-hackers |
Hello, Robert. > So: do we have clear evidence that we need 128 partitions here, or > might, say, 16 work just fine? Yes, this number of partitions was chosen based on this benchmark (see "spinlock array" column): http://www.postgresql.org/message-id/20151229184851.1bb7d1bd@fujitsu In fact we can choose 16, 32, 64 or 128 partitions - any number works better than current implementation with a single spinlock. But according to this benchmark 128 partitions give twice as much TPS then 16 partitions, almost as if we didn't have any locks at all (see "no locks" column). Still I'm a bit concerned that 128 partitions require (128-16)* ( sizeof(slock_t) + sizeof(void*) + sizeof(long) ) bytes of additional memory per hash table which is: (gdb) p (128-16)*(sizeof(slock_t) + sizeof(long) + sizeof(void*)) $1 = 1904 ... bytes of shared memory on amd64. I'm not sure if this is considered OK. If it is I suggest to leave number 128. If it's not, perhaps we could agree on say 64 partitions as a compromise. Described benchmark doesn't represent any possible workload anyway. I personally believe that we don't have so many small hash tables that 1.8 Kb of additional shared memory per hash table would be an issue. > I don't really want to increase the number of lock manager partition > locks to 128 just for the convenience of this patch, and even less do > I want to impose the requirement that every future shared hash table > must use an equal number of partitions. May I ask why? Does it create a real problem? > I don't see any reason why the number of free list partitions needs > to be a constant, or why it needs to be equal to the number of hash > bucket partitions. That just seems like a completely unnecessary > constraint. You can take the hash value, or whatever else you use to > pick the partition, modulo any number you want. True, number of spinlocks / freeLists could be a variable. I believe it's an old-good simplicity vs flexibility question. Lets say we allow user (i.e. calling side) to choose spinlocks and free lists number. So now we should use freeList[hash % items_number] formula which means division instead of binary shift. Do we really need such an expensive operation just to calculate index of a free list? Probably not. Let limit possible items_number values so it could be only power of two. Now there is only four possible values user would more likely to choose (16, 32, 64, 128), which is not as flexible anymore. What about memory? If we allocate mutex, nentries and freeList dynamically depending on items_number value, HASHHDR should store pointers to allocated memory which means additional indirection for almost every access to mutex, nentries and freeList fields. Probably a bad idea. Otherwise we still waste shared memory, probably even more memory since we don't want maximum items_number value to be too low. Etc. I believe such a change will cause only additional code complexity (code is already far from trivial in dynahash.c) so next time it will be much harder to change anything, probably some bugs we will miss and will not solve any real problem. Why just not to choose a constant which seems to be good enough instead?
pgsql-hackers by date: