Re: Patch: fix lock contention for HASHHDR.mutex - Mailing list pgsql-hackers

From Aleksander Alekseev
Subject Re: Patch: fix lock contention for HASHHDR.mutex
Date
Msg-id 20160301174313.218e04c7@fujitsu
Whole thread Raw
In response to Re: Patch: fix lock contention for HASHHDR.mutex  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Patch: fix lock contention for HASHHDR.mutex
Re: Patch: fix lock contention for HASHHDR.mutex
List pgsql-hackers
Hello, Amit

> I am not sure, if this is exactly what has been suggested by Robert,
> so it is not straightforward to see if his suggestion can allow us to
> use NUM_FREELISTS as 8 rather than 32.  I think instead of trying to
> use FREELISTBUFF, why not do it as Robert has suggested and try with
> different values of NUM_FREELISTS?

Well, what I did is in fact a bit more general solution then Robert
suggested. At first I just joined freeList and mutex arrays into one
array and tried different NUM_FREELISTS values (see FREELISTS column).
That was Robert's idea. Unfortunately it didn't give any considerable
speedup comparing with previous approach.

Then I tried to manually control sizeof every array item (see
different SIZE values). By making it larger we can put every array item
into a separate cache line. This approach helped a bit in terms of TPS
(before: +33.9%, after: +34.2%) but only if NUM_FREELISTS remains the
same (32).

So answering your question - it turned out that we _can't_ reduce
NUM_FREELISTS this way. 

Also I don't believe that +0.3% TPS according to synthetic benchmark
that doesn't represent any possible workload worth it considering
additional code complexity.

> In which case, do you think entries can go negative?  I think the
> nentries is incremented and decremented in the way as without patch,
> so I am not getting what can make it go negative.

In this context we are discussing a quick and dirty fix "what would
happen if FREELIST_IDX would be implemented as getpid() % NUM_FREE_LIST
instead of hash % NUM_FREELIST". The code is:

int freelist_idx = FREELIST_IDX(hctl, hashvalue); 

// ...

switch (action)
{

// ...
 case HASH_REMOVE:   if (currBucket != NULL)   {     if (IS_PARTITIONED(hctl))       SpinLockAcquire(&(FREELIST(hctl,
freelist_idx).mutex));
     // Assert(FREELIST(hctl, freelist_idx).nentries > 0);      FREELIST(hctl, freelist_idx).nentries--;
     /* remove record from hash bucket's chain. */     *prevBucketPtr = currBucket->link;

// ...

No matter what freelist was used when element was added to a hash table.
We always try to return free memory to the same freelist number getpid()
% FREELIST_ITEMS and decrease number of elements in a hash table using
corresponding nentries field. For this reason nentries could be
negative. Still sum of all nentries remains valid.




I realize that this thread is quite long already and that its been a
while since previous commitfest ended. So I would like to provide a
short summery of this thread from my perspective:

1. Last version of a path is here:

http://www.postgresql.org/message-id/CA+Tgmobtf9nH566_jjs=jrTyMq5HdQdaRF5j7o+AbdOWQHE_Ow@mail.gmail.com

Its reviewed, tested, well-commented and apparently nobody has strong
objections against it.

2. Robert suggested a few possible improvements. Experiments showed
that in best case resulting patches are as good as path from item 1,
but code become considerably more complex.

3. Number of further changes (changing calling convention for
ShmemInitHash(), changing LOG2_NUM_LOCK_PARTITIONS constant) will be
discussed separately when and if path from this thread will be applied.

> 
> 
> 
> >
> > > I am however wondering if it to set the freelist affinity based on
> > > something other than the hash value, like say the PID, so that the
> > > same process doesn't keep switching to a different freelist for
> > > every buffer eviction.
> >
> > Also I tried to use PID to determine freeList number:
> >
> > ```
> > #include <sys/types.h>
> > #include <unistd.h>
> >
> > ...
> >
> > #define FREELIST_IDX(hctl, hashcode) \
> >   (IS_PARTITIONED(hctl) ? ((uint32)getpid()) % NUM_FREELISTS : 0)
> >
> >
> ...
> >
> > // now nentries could be negative in this case
> > // Assert(FREELIST(hctl, freelist_idx).nentries > 0);
> >
> >
> In which case, do you think entries can go negative?  I think the
> nentries is incremented and decremented in the way as without patch,
> so I am not getting what can make it go negative.
> 
> 
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Sort returns more rows than seq scan?
Next
From: Greg Stark
Date:
Subject: Re: WIP: Upper planner pathification