Home > mailing lists

Re: Improving spin-lock implementation on ARM. - Mailing list pgsql-hackers

From	Alexander Korotkov
Subject	Re: Improving spin-lock implementation on ARM.
Date	November 30, 2020 23:46:44
Msg-id	CAPpHfdtvNORgWQ=VQrTDKjY0dZuSoa_pC34mO_im4+WMim-OJA@mail.gmail.com Whole thread Raw
In response to	Re: Improving spin-lock implementation on ARM. (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Improving spin-lock implementation on ARM. (Tom Lane <tgl@sss.pgh.pa.us>) Re: Improving spin-lock implementation on ARM. (Krunal Bauskar <krunalbauskar@gmail.com>)
List	pgsql-hackers

Tree view

On Mon, Nov 30, 2020 at 9:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <aekorotkov@gmail.com> writes:
> > I tend to think that LSE is enabled by default in Apple's clang based
> > on your previous message[1].  In order to dispel the doubts could you
> > please provide assembly of SpinLockAcquire for following clang
> > options.
> > "-O2"
> > "-O2 -march=armv8-a+lse"
> > "-O2 -march=armv8-a"
>
> Huh.  Those options make exactly zero difference to the code generated
> for SpinLockAcquire/SpinLockRelease; it's the same as I showed upthread,
> for either the HEAD definition of TAS() or the CAS patch's version.
>
> So now I'm at a loss as to the reason for the performance difference
> I got.  -march=armv8-a+lse does make a difference to code generation
> someplace, because the overall size of the postgres executable changes
> by 16kB or so.  One might argue that the performance difference is due
> to better code elsewhere than the spinlocks ... but the test I'm running
> is basically just
>
>         while (count-- > 0)
>         {
>                 XLogGetLastRemovedSegno();
>
>                 CHECK_FOR_INTERRUPTS();
>         }
>
> so it's hard to see where a non-spinlock-related code change would come
> in.  That loop itself definitely generates the same code either way.
>
> I did find this interesting output from "clang -v":
>
> -target-cpu vortex -target-feature +v8.3a -target-feature +fp-armv8 -target-feature +neon -target-feature +crc
-target-feature+crypto -target-feature +fullfp16 -target-feature +ras -target-feature +lse -target-feature +rdm
-target-feature+rcpc -target-feature +zcm -target-feature +zcz -target-feature +sha2 -target-feature +aes 
>
> whereas adding -march=armv8-a+lse changes that to just
>
> -target-cpu vortex -target-feature +neon -target-feature +lse -target-feature +zcm -target-feature +zcz
>
> On the whole, that would make one think that -march=armv8-a+lse
> should produce worse code than the default settings.

Great, thanks.

So, I think the following hypothesis isn't disproved yet.
1) On ARM with LSE support, PostgreSQL built with LSE is faster than
PostgreSQL built without LSE.  Even if the latter is patched with
anything considered in this thread.
2) None of the patches considered in this thread give a clear
advantage for PostgreSQL built with LSE.

To further confirm this let's wait for Kunpeng 920 tests by Krunal
Bauskar and Amit Khandekar.  Also it would be nice if someone will run
benchmarks similar to [1] on Apple M1.

Links
1. https://www.postgresql.org/message-id/CAPpHfdsGqVd6EJ4mr_RZVE5xSiCNBy4MuSvdTrKmTpM0eyWGpg%40mail.gmail.com

------
Regards,
Alexander Korotkov

pgsql-hackers by date:

From: Tom Lane
Date: 30 November 2020, 23:46:19
Subject: Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour

From: Alexander Korotkov
Date: 01 December 2020, 00:01:41
Subject: Re: Improving spin-lock implementation on ARM.

Re: Improving spin-lock implementation on ARM. - Mailing list pgsql-hackers

Previous

Next