Re: Atomics hardware support table & supported architectures - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Atomics hardware support table & supported architectures
Date
Msg-id 20140618161300.GD3968@awork2.anarazel.de
Whole thread Raw
In response to Re: Atomics hardware support table & supported architectures  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Atomics hardware support table & supported architectures  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2014-06-18 11:15:15 -0400, Robert Haas wrote:
> On Tue, Jun 17, 2014 at 1:55 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > What happens is that gcc will do a syscall triggering the kernel to turn
> > of scheduling; perform the math and store the result; turn scheduling on
> > again. That way there cannot be a other operation influencing the
> > calculation/store. Imagine if you have hardware that, internally, only
> > does stores in 4 byte units. Even if it's a single CPU machine, which
> > most of those are, the kernel could schedule a separate process after
> > the first 4bytes have been written. Oops. The kernel has ways to prevent
> > that, userspace doesn't...
> 
> Interesting.  "Turn off scheduling" sounds like a pretty dangerous syscall.

The kernel does that internally, just during cmpxchg. It'll reenable
interrupts before returning. There's hundreds of places inside at least
linux doing that internally...

Should you be interested in the details:
https://www.kernel.org/doc/Documentation/arm/kernel_user_helpers.txt

> >> > Does somebody want other columns in there?
> >>
> >> I think the main question at the developer meeting was how far we want
> >> to go with supporting primitives like atomic add, atomic and, atomic
> >> or, etc.  So I think we should add columns for those.
> >
> > Well, once CAS is available, atomic add etc is all trivially
> > implementable - without further hardware support. It might be more
> > efficient to use the native instruction (e.g. xadd can be much better
> > than a cmpxchg loop because there's no retries), but that's just
> > optimization that won't matter unless you have a fair bit of
> > concurrency.
> >
> > There's currently fallbacks like:
> > #ifndef PG_HAS_ATOMIC_FETCH_ADD_U32
> > #define PG_HAS_ATOMIC_FETCH_ADD_U32
> > STATIC_IF_INLINE uint32
> > pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 add_)
> > {
> >         uint32 old;
> >         while (true)
> >         {
> >                 old = pg_atomic_read_u32_impl(ptr);
> >                 if (pg_atomic_compare_exchange_u32_impl(ptr, &old, old + add_))
> >                         break;
> >         }
> >         return old;
> > }
> 
> I understand, but the performance characteristics are quite different.

There might be cases where that's true, but in the majority of cases
where the variable isn't highly contended it's just about the same. In
many cases the microcode will implement atomic ops using ll/sc
operations internally anyway.

> My understanding from the developer meeting was that we'd be OK with
> having, say, three levels of support for atomic ops: all ops
> supported, only TAS, none.  Or maybe four: all ops, CAS + TAS, only
> TAS, none.  But I think there was resistance (in which I participate)
> to the idea of, say, having platform 1 with "add" but not "and" and
> "or", platform 2 with "and" and "or" but not "add", platform 3 with
> both, platform 4 with neither, etc.  Then it becomes too hard for
> developers to predict whether something that is a win on their
> platform will be a loss on some other platform.

I don't think developers really have to care. In nearly all the cases
the above cmpxchg loop will loop precisely once. Usually the cacheline
will stay in the exclusive state on that process, and that's it. In most
cases what it'll be replacing will be something protected by a spinlock
anyways - and the above isn't any worse than that.

Note that we're far from the only one falling back to cmpxchg for all
these ops - that's pretty much a standard fallback.

I'm fine with always implementing everything but tas, cas, add ontop of
cas for now. I want or/and/sub/... to be conveniently available to C
code, but they don't need to be based on hardware ops. Having open-coded
versions of the above in several places sounds like a horrible idea to
me. Both, because we might want to optimize it at some point and because
it's horrible readability wise.

> >> > 3) sparcv8: Last released model 1997.
> >>
> >> I seem to recall hearing about this in a customer situation relatively
> >> recently, so there may be a few of these still kicking around out
> >> there.
> >
> > Really? As I'd written in a reply solaris 10 (released 2005) dropped
> > support for it. Dropping support for a platform that's been desupported
> > 10 years ago by it's manufacturer doesn't sound bad imo...
> 
> We definitely have at least one customer using Solaris 9.  I don't
> know their architecture for certain, but they did recently install a
> new version of PostgreSQL.

But Solaris 9 doesn't imply sparcv8. Ultrasparc (first sparcv9) support
was added in solaris 2.5... That's a *long* time ago.

If my googling turned up the right information you needed to use special
-xarch flags for sun studio to even compile binaries that are pure v8
(v8plus is fine btw) since sun studio 9... Same for gcc apparently (-mno-v8plus),
although I don't know how far back.

The reason I'd like to kill it is that there's basically zero chance of
ever testing it and we'd need to write configure check that not only
detect the suncc intrinsics but also runs them and tests whether they
work since they already exist for a long while. Ugly.

> >> > 6) armv-v5

> I think in doubtful cases we might as well keep the support in.

Fair point.

> If
> you've got the fallback to non-atomics, keeping the other code around
> doesn't hurt much, and might make it easier for someone who is
> interested in one of those platforms.  It's fine and good to kill
> things that are totally dead, but I think it's better for a user of
> some obscure platform to find that it doesn't *quite* work than that
> we've deliberately broken it.  But maybe I am being too conservative.

Well, I think my view is that it's placing burden on us to differentiate
more fine grained between architectures for the very unlikely case that
somebody will ever need it. I'm not that adverse to applying a patch
adding the extra support if somebody shows up actually interested; but
blindly writing code for something that's unlikely to ever be used
sounds like a waste of time.

FWIW, if it's on linux, we don't need to care anyway. Gcc abstracts it
away.

> >> > Note that this is *not* a requirement for the atomics abstraction - it
> >> > now has a fallback to spinlocks if atomics aren't available.
> >>
> >> That seems great.  Hopefully with a configure option to disable
> >> atomics so that it's easy to test the fallback.
> >
> > It's a #define right now. Do you think we really need a configure
> > option?
> 
> Well, we've got one for --disable-spinlocks, so it seems like it would
> be a good idea for symmetry.

Ok. I'd rather get rid of --disable-spinlocks alltogether, but I don't
really want to fight that fight now :)

> More than that, I actually really hate
> things that don't have a configure option, like WAL_DEBUG, because you
> have to change a checked-in file, which shows up as a diff, and if
> you're not careful you check it in, and if you are careful it still
> gets blown away every time you git reset --hard, which I do a lot.  I
> think the fact that both Heikki and I on separate occasions have made
> commits enabling WAL_DEBUG shows pretty clearly the weaknesses of that
> method of doing business.

Why don't you pass it to configure or add it in Makefile.custom? That's
what I do.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: BUG #10680 - ldapbindpasswd leaks to postgresql log
Next
From: Stephen Frost
Date:
Subject: Re: How about a proper TEMPORARY TABLESPACE?