Re: Patch: fix lock contention for HASHHDR.mutex - Mailing list pgsql-hackers

From Aleksander Alekseev
Subject Re: Patch: fix lock contention for HASHHDR.mutex
Date
Msg-id 20160208133937.648f1fcf@fujitsu
Whole thread Raw
In response to Re: Patch: fix lock contention for HASHHDR.mutex  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Patch: fix lock contention for HASHHDR.mutex
List pgsql-hackers
Hello, Robert.

> So: do we have clear evidence that we need 128 partitions here, or
> might, say, 16 work just fine?

Yes, this number of partitions was chosen based on this benchmark (see
"spinlock array" column):

http://www.postgresql.org/message-id/20151229184851.1bb7d1bd@fujitsu

In fact we can choose 16, 32, 64 or 128 partitions - any number works
better than current implementation with a single spinlock. But
according to this benchmark 128 partitions give twice as much TPS then
16 partitions, almost as if we didn't have any locks at all (see "no
locks" column).

Still I'm a bit concerned that 128 partitions require (128-16)*
( sizeof(slock_t) + sizeof(void*) + sizeof(long) ) bytes of additional
memory per hash table which is:

(gdb) p (128-16)*(sizeof(slock_t) + sizeof(long) + sizeof(void*))
$1 = 1904

... bytes of shared memory on amd64.

I'm not sure if this is considered OK. If it is I suggest to leave
number 128. If it's not, perhaps we could agree on say 64 partitions as
a compromise. Described benchmark doesn't represent any possible
workload anyway. I personally believe that we don't have so many small
hash tables that 1.8 Kb of additional shared memory per hash table would
be an issue.

> I don't really want to increase the number of lock manager partition
> locks to 128 just for the convenience of this patch, and even less do
> I want to impose the requirement that every future shared hash table
> must use an equal number of partitions.

May I ask why? Does it create a real problem?

> I don't see any reason why the number of free list partitions needs
> to be a constant, or why it needs to be equal to the number of hash
> bucket partitions.  That just seems like a completely unnecessary
> constraint. You can take the hash value, or whatever else you use to
> pick the partition, modulo any number you want. 

True, number of spinlocks / freeLists could be a variable. I believe
it's an old-good simplicity vs flexibility question.

Lets say we allow user (i.e. calling side) to choose spinlocks and free
lists number. So now we should use freeList[hash % items_number] formula
which means division instead of binary shift. Do we really need such an
expensive operation just to calculate index of a free list? Probably
not. Let limit possible items_number values so it could be only power
of two. Now there is only four possible values user would more likely to
choose (16, 32, 64, 128), which is not as flexible anymore.

What about memory? If we allocate mutex, nentries and freeList
dynamically depending on items_number value, HASHHDR should store
pointers to allocated memory which means additional indirection for
almost every access to mutex, nentries and freeList fields. Probably a
bad idea. Otherwise we still waste shared memory, probably even more
memory since we don't want maximum items_number value to be too low.

Etc.

I believe such a change will cause only additional code complexity
(code is already far from trivial in dynahash.c) so next time it will
be much harder to change anything, probably some bugs we will miss and
will not solve any real problem. Why just not to choose a constant which
seems to be good enough instead?



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: backpatch for REL9_4_STABLE of commit 40482e606733675eb9e5b2f7221186cf81352da1
Next
From: Etsuro Fujita
Date:
Subject: Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)