Re: [HACKERS] Fix performance of generic atomics - Mailing list pgsql-hackers

From Sokolov Yura
Subject Re: [HACKERS] Fix performance of generic atomics
Date
Msg-id 9fccff0670a2ec3c031d459564892f42@postgrespro.ru
Whole thread Raw
Responses Re: [HACKERS] Fix performance of generic atomics  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
A bit cleaner version of a patch.

Sokolov Yura писал 2017-05-25 15:22:
> Good day, everyone.
> 
> I've been played with pgbench on huge machine.
> (72 cores, 56 for postgresql, enough memory to fit base
> both into shared_buffers and file cache)
> (pgbench scale 500, unlogged tables, fsync=off,
> synchronous commit=off, wal_writer_flush_after=0).
> 
> With 200 clients performance is around 76000tps and main
> bottleneck in this dumb test is LWLockWaitListLock.
> 
> I added gcc specific implementation for pg_atomic_fetch_or_u32_impl
> (ie using __sync_fetch_and_or) and performance became 83000tps.
> 
> It were a bit strange at a first look, cause __sync_fetch_and_or
> compiles to almost same CAS loop.
> 
> Looking closely, I noticed that intrinsic performs doesn't do
> read in the loop body, but at loop initialization. It is correct
> behavior cause `lock cmpxchg` instruction stores old value in EAX
> register.
> 
> It is expected behavior, and pg_compare_and_exchange_*_impl does
> the same in all implementations. So there is no need to re-read
> value in the loop body:
> 
> Example diff for pg_atomic_exchange_u32_impl:
> 
>  static inline uint32
>  pg_atomic_exchange_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 
> xchg_)
>  {
>      uint32 old;
> +    old = pg_atomic_read_u32_impl(ptr);
>      while (true)
>      {
> -        old = pg_atomic_read_u32_impl(ptr);
>          if (pg_atomic_compare_exchange_u32_impl(ptr, &old, xchg_))
>              break;
>      }
>      return old;
>  }
> 
> After applying this change to all generic atomic functions
> (and for pg_atomic_fetch_or_u32_impl ), performance became
> equal to __sync_fetch_and_or intrinsic.
> 
> Attached patch contains patch for all generic atomic
> functions, and also __sync_fetch_and_(or|and) for gcc, cause
> I believe GCC optimize code around intrinsic better than
> around inline assembler.
> (final performance is around 86000tps, but difference between
> 83000tps and 86000tps is not so obvious in NUMA system).
> 
> With regards,

-- 
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Server ignores contents of SASLInitialResponse
Next
From: tushar
Date:
Subject: [HACKERS] No parameter values checking while creating Altersubscription...Connection