Home > mailing lists

Re: Improving spin-lock implementation on ARM. - Mailing list pgsql-hackers

From	Krunal Bauskar
Subject	Re: Improving spin-lock implementation on ARM.
Date	December 8, 2020 09:03:59
Msg-id	CAB10pyYwOWZxoyYmz35zUk_PkdGPh2J8CiNhBZ3MzPMgGi7_RQ@mail.gmail.com Whole thread
In response to	Re: Improving spin-lock implementation on ARM. (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Improving spin-lock implementation on ARM.
List	pgsql-hackers

Tree view

On Thu, 3 Dec 2020 at 21:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Krunal Bauskar <krunalbauskar@gmail.com> writes:
> Any updates or further inputs on this.

As far as LSE goes: my take is that tampering with the
compiler/platform's default optimization options requires *very*
strong evidence, which we have not got and likely won't get. Users
who are building for specific hardware can choose to supply custom
CFLAGS, of course. But we shouldn't presume to do that for them,
because we don't know what they are building for, or with what.

I'm very willing to consider the CAS spinlock patch, but it still
feels like there's not enough evidence to show that it's a universal
win. The way to move forward on that is to collect more measurements
on additional ARM-based platforms. And I continue to think that
pgbench is only a very crude tool for testing spinlock performance;
we should look at other tests.

Thanks Tom.

Given pg-bench limited option I decided to try things with sysbench to expose

the real contention using zipfian type (zipfian pattern causes part of the database

to get updated there-by exposing main contention point).

----------------------------------------------------------------------------
Baseline for 256 threads update-index use-case:
- 44.24% 174935 postgres postgres [.] s_lock
transactions:
transactions: 5587105 (92988.40 per sec.)

Patched for 256 threads update-index use-case:
0.02% 80 postgres postgres [.] s_lock
transactions:
transactions: 10288781 (171305.24 per sec.)

perf diff

0.02% +44.22% postgres [.] s_lock
----------------------------------------------------------------------------

As we see from the above result s_lock is exposing major contention that could be relaxed using the

said cas patch. Performance improvement in range of 80% is observed.

Taking this guideline we decided to run it for all scalability for update and non-update use-case.

Check the attached graph. Consistent improvement is observed.

I presume this should help re-establish that for major contention cases existing tas approach will always give up.

-------------------------------------------------------------------------------------------

Unfortunately, I don't have access to different ARM arch except for Kunpeng or Graviton2 where

we have already proved the value of the patch.

[ref: Apple M1 as per your evaluation patch doesn't show regression for select. Maybe if possible can you try update scenarios too].

Do you know anyone from the community who has access to other ARM arches we can request them to evaluate?

But since it is has proven on 2 independent ARM arch I am pretty confident it will scale with other ARM arches too.

From a system structural standpoint, I seriously dislike that lwlock.c
patch: putting machine-specific variant implementations into that file
seems like a disaster for maintainability. So it would need to show a
very significant gain across a range of hardware before I'd want to
consider adopting it ... and it has not shown that.

regards, tom lane

Regards,
Krunal Bauskar

Attachment

Screenshot from 2020-12-08 14-16-24.png

pgsql-hackers by date:

From: Andrey Borodin
Date: 08 December 2020, 08:42:12
Subject: Re: PoC Refactor AM analyse API

From: Amit Langote
Date: 08 December 2020, 09:15:53
Subject: Re: Huge memory consumption on partitioned table with FKs

Re: Improving spin-lock implementation on ARM. - Mailing list pgsql-hackers

Attachment

Previous

Next