Re: Improving spin-lock implementation on ARM. - Mailing list pgsql-hackers

From Krunal Bauskar
Subject Re: Improving spin-lock implementation on ARM.
Date
Msg-id CAB10pyZp_NV=PncaQS757NH6fXWAjs1d1r62Pv9g2Nv5vF9pUQ@mail.gmail.com
Whole thread Raw
In response to Re: Improving spin-lock implementation on ARM.  (Krunal Bauskar <krunalbauskar@gmail.com>)
Responses Re: Improving spin-lock implementation on ARM.
List pgsql-hackers

On Tue, 8 Dec 2020 at 14:33, Krunal Bauskar <krunalbauskar@gmail.com> wrote:


On Thu, 3 Dec 2020 at 21:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Krunal Bauskar <krunalbauskar@gmail.com> writes:
> Any updates or further inputs on this.

As far as LSE goes: my take is that tampering with the
compiler/platform's default optimization options requires *very*
strong evidence, which we have not got and likely won't get.  Users
who are building for specific hardware can choose to supply custom
CFLAGS, of course.  But we shouldn't presume to do that for them,
because we don't know what they are building for, or with what.

I'm very willing to consider the CAS spinlock patch, but it still
feels like there's not enough evidence to show that it's a universal
win.  The way to move forward on that is to collect more measurements
on additional ARM-based platforms.  And I continue to think that
pgbench is only a very crude tool for testing spinlock performance;
we should look at other tests.

Thanks Tom.

Given pg-bench limited option I decided to try things with sysbench to expose
the real contention using zipfian type (zipfian pattern causes part of the database
to get updated there-by exposing main contention point).

----------------------------------------------------------------------------
Baseline for 256 threads update-index use-case:
-   44.24%        174935  postgres         postgres             [.] s_lock
transactions:
    transactions:                        5587105 (92988.40 per sec.)

Patched for 256 threads update-index use-case:
     0.02%            80  postgres  postgres  [.] s_lock
transactions:
    transactions:                        10288781 (171305.24 per sec.)

perf diff
     0.02%    +44.22%  postgres             [.] s_lock
----------------------------------------------------------------------------

As we see from the above result s_lock is exposing major contention that could be relaxed using the
said cas patch. Performance improvement in range of 80% is observed.

Taking this guideline we decided to run it for all scalability for update and non-update use-case.
Check the attached graph. Consistent improvement is observed.

I presume this should help re-establish that for major contention cases existing tas approach will always give up.

-------------------------------------------------------------------------------------------

Unfortunately, I don't have access to different ARM arch except for Kunpeng or Graviton2 where
we have already proved the value of the patch.
[ref: Apple M1 as per your evaluation patch doesn't show regression for select. Maybe if possible can you try update scenarios too].

Do you know anyone from the community who has access to other ARM arches we can request them to evaluate?
But since it is has proven on 2 independent ARM arch I am pretty confident it will scale with other ARM arches too.
 

Any direction on how we can proceed on this?

* We have tested it with both cloud vendors that provide ARM instances.
* We have tested it with Apple M1 (partially at-least)
* Ampere use to provide instance on packet.com but now it is an evaluation program only.

No other active arm instance offering a cloud provider.

Given our evaluation so far has proven to be +ve can we think of considering it on basis of the available
data which is quite encouraging with 80% improvement seen for heavy contention use-cases.

 

From a system structural standpoint, I seriously dislike that lwlock.c
patch: putting machine-specific variant implementations into that file
seems like a disaster for maintainability.  So it would need to show a
very significant gain across a range of hardware before I'd want to
consider adopting it ... and it has not shown that.

                        regards, tom lane


--
Regards,
Krunal Bauskar


--
Regards,
Krunal Bauskar

pgsql-hackers by date:

Previous
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist
Next
From: Peter Smith
Date:
Subject: Re: Single transaction in the tablesync worker?