Thread: Our "fallback" atomics implementation doesn't actually work

Our "fallback" atomics implementation doesn't actually work

From
Tom Lane
Date:
I was trying to measure whether unnamed POSIX semaphores are any faster
or slower than the SysV kind.  Plain pgbench is not very helpful for
determining this, because we've optimized the system to the point that
you don't hit semop waits all that often.  So I tried this:

configure USE_UNNAMED_POSIX_SEMAPHORES=1 --disable-cassert --disable-spinlocks --disable-atomics

figuring that that would stress the semop paths nicely.  But what I find
is that the database locks up after a few seconds of running "pgbench -S
-c 20 -j 20 bench" on an 8-core RHEL6 machine.

I think what is happening is that there are circular assumptions that end
up trying to implement a spinlock in terms of a spinlock, or otherwise
somehow recursively use the process's semaphore.  It's a bit hard to tell
though because the atomics code is such an underdocumented rat's nest of
#ifdefs.

Curiously, I did not see such a hang with regular SysV semaphores.
That may be just a timing thing, or it may have something to do with
POSIX semaphores being actually futexes on this platform (so that there
is state inside the process not outside it).

No hang observed without --disable-atomics, either.
        regards, tom lane



Re: Our "fallback" atomics implementation doesn't actually work

From
Andres Freund
Date:
Hi,

On 2016-10-05 14:01:05 -0400, Tom Lane wrote:
> I think what is happening is that there are circular assumptions that end
> up trying to implement a spinlock in terms of a spinlock, or otherwise
> somehow recursively use the process's semaphore.  It's a bit hard to tell
> though because the atomics code is such an underdocumented rat's nest of
> #ifdefs.

I don't think that should be the case, but I'll look into it.  How long
did it take for you to reproduce the issue?

Greetings,

Andres Freund



Re: Our "fallback" atomics implementation doesn't actually work

From
Andres Freund
Date:
On 2016-10-05 14:01:05 -0400, Tom Lane wrote:
> I was trying to measure whether unnamed POSIX semaphores are any faster
> or slower than the SysV kind.  Plain pgbench is not very helpful for
> determining this, because we've optimized the system to the point that
> you don't hit semop waits all that often.  So I tried this:
> 
> configure USE_UNNAMED_POSIX_SEMAPHORES=1 --disable-cassert --disable-spinlocks --disable-atomics

Pretty independent from the complaint at hand, but if I just do that I get
undefined reference to symbol 'sem_post@@GLIBC_2.2.5'

I needed to add -pthread -lrt to LDFLAGS to make it work.

Andres



Re: Our "fallback" atomics implementation doesn't actually work

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2016-10-05 14:01:05 -0400, Tom Lane wrote:
>> I think what is happening is that there are circular assumptions that end
>> up trying to implement a spinlock in terms of a spinlock, or otherwise
>> somehow recursively use the process's semaphore.  It's a bit hard to tell
>> though because the atomics code is such an underdocumented rat's nest of
>> #ifdefs.

> I don't think that should be the case, but I'll look into it.  How long
> did it take for you to reproduce the issue?

It hangs up within 10 or 20 seconds for me.  I didn't try hard to get a
census of where, but at least some of the callers are trying to acquire
buffer partition locks.
        regards, tom lane



Re: Our "fallback" atomics implementation doesn't actually work

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2016-10-05 14:01:05 -0400, Tom Lane wrote:
>> configure USE_UNNAMED_POSIX_SEMAPHORES=1 --disable-cassert --disable-spinlocks --disable-atomics

> Pretty independent from the complaint at hand, but if I just do that I get
> undefined reference to symbol 'sem_post@@GLIBC_2.2.5'

> I needed to add -pthread -lrt to LDFLAGS to make it work.

Yeah, on my machine man sem_init specifies "Link with -lrt or -pthread".
But I see -lrt getting added into the backend link anyway, presumably
as a result of one of these configure calls:

AC_SEARCH_LIBS(shm_open, rt)
AC_SEARCH_LIBS(shm_unlink, rt)
AC_SEARCH_LIBS(sched_yield, rt)

If we were to try to support USE_UNNAMED_POSIX_SEMAPHORES as default
(which is where I'm thinking about going) we'd have to tweak configure
to check sem_init similarly.
        regards, tom lane



Re: Our "fallback" atomics implementation doesn't actually work

From
Andres Freund
Date:
Hi,

I was able to reproduce it in a read-write workload, instead of the
read-only workload you'd proposed.

On 2016-10-05 14:01:05 -0400, Tom Lane wrote:
> Curiously, I did not see such a hang with regular SysV semaphores.
> That may be just a timing thing, or it may have something to do with
> POSIX semaphores being actually futexes on this platform (so that there
> is state inside the process not outside it).

Without yet having analyzed this deeply, could it actually be that the
reason is that sem_post/wait aren't proper memory barriers?  On a glance
the symptoms look like values have been modified without proper locks...

Greetings,

Andres Freund



Re: Our "fallback" atomics implementation doesn't actually work

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Without yet having analyzed this deeply, could it actually be that the
> reason is that sem_post/wait aren't proper memory barriers?  On a glance
> the symptoms look like values have been modified without proper locks...

Hmm, possible ...
        regards, tom lane



Re: Our "fallback" atomics implementation doesn't actually work

From
Andres Freund
Date:
On 2016-10-05 15:02:09 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Without yet having analyzed this deeply, could it actually be that the
> > reason is that sem_post/wait aren't proper memory barriers?  On a glance
> > the symptoms look like values have been modified without proper locks...
>
> Hmm, possible ...

Hm. After a long battle of head vs. wall I think I see what the problem
is.  For the fallback atomics implementation I somehow had assumed that
pg_atomic_write_u32() doesn't need to lock, as it's just an unlocked
write.  But that's not true, because it has to cause
pg_atomic_compare_exchange_u32 to fail.  The lack of this can cause a
"leftover" lockbit, when UnlockBufHdr() occurs concurrently an operation
using compare_exchange.

For me the problem often takes a lot longer to reproduce (once only
after 40min), could you run with the attached patch, and see whether
that fixes things for you?

Andres

Attachment

Re: Our "fallback" atomics implementation doesn't actually work

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Hm. After a long battle of head vs. wall I think I see what the problem
> is.  For the fallback atomics implementation I somehow had assumed that
> pg_atomic_write_u32() doesn't need to lock, as it's just an unlocked
> write.  But that's not true, because it has to cause
> pg_atomic_compare_exchange_u32 to fail.

Hah ... obvious once you see it.

> For me the problem often takes a lot longer to reproduce (once only
> after 40min), could you run with the attached patch, and see whether
> that fixes things for you?

For me, with the described test case, HEAD fails within a minute,
two times out of three or so.  I've not reproduced it after half an
hour of beating on this patch.  Looks good.
        regards, tom lane



Re: Our "fallback" atomics implementation doesn't actually work

From
Andres Freund
Date:
On 2016-10-06 00:06:33 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Hm. After a long battle of head vs. wall I think I see what the problem
> > is.  For the fallback atomics implementation I somehow had assumed that
> > pg_atomic_write_u32() doesn't need to lock, as it's just an unlocked
> > write.  But that's not true, because it has to cause
> > pg_atomic_compare_exchange_u32 to fail.
> 
> Hah ... obvious once you see it.
> 
> > For me the problem often takes a lot longer to reproduce (once only
> > after 40min), could you run with the attached patch, and see whether
> > that fixes things for you?
> 
> For me, with the described test case, HEAD fails within a minute,
> two times out of three or so.  I've not reproduced it after half an
> hour of beating on this patch.  Looks good.

It's not quite there yet, unfortunately. At the moment
pg_atomic_write_u32() is used for local buffers - and we explicitly
don't want that to be locking for temp buffers
(c.f. 6b93fcd149329d4ee7319561b30fc15a573c6307).

Don't really have a great idea about addressing this, besides either
just living with the lock for temp buffers on fallback platforms (which
don't have much of a practical relevance), or introduce
pg_atomic_unlocked_write_u32() or something. Neither seems great.

Regards,

Andres



Re: Our "fallback" atomics implementation doesn't actually work

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> It's not quite there yet, unfortunately. At the moment
> pg_atomic_write_u32() is used for local buffers - and we explicitly
> don't want that to be locking for temp buffers
> (c.f. 6b93fcd149329d4ee7319561b30fc15a573c6307).

Hm.

> Don't really have a great idea about addressing this, besides either
> just living with the lock for temp buffers on fallback platforms (which
> don't have much of a practical relevance), or introduce
> pg_atomic_unlocked_write_u32() or something. Neither seems great.

Maybe we could hack it with some macro magic that would cause
pg_atomic_write_u32() to be expanded into a simple assignment in
localbuf.c only?
        regards, tom lane



Re: Our "fallback" atomics implementation doesn't actually work

From
Andres Freund
Date:
On 2016-10-07 17:12:45 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > It's not quite there yet, unfortunately. At the moment
> > pg_atomic_write_u32() is used for local buffers - and we explicitly
> > don't want that to be locking for temp buffers
> > (c.f. 6b93fcd149329d4ee7319561b30fc15a573c6307).
>
> Hm.
>
> > Don't really have a great idea about addressing this, besides either
> > just living with the lock for temp buffers on fallback platforms (which
> > don't have much of a practical relevance), or introduce
> > pg_atomic_unlocked_write_u32() or something. Neither seems great.
>
> Maybe we could hack it with some macro magic that would cause
> pg_atomic_write_u32() to be expanded into a simple assignment in
> localbuf.c only?

I think it's just as well to add a variant that's globally documented to
have no locking, there might be further uses of it. It's already in two
files (bufmgr.c/localbuf.c), and I don't think it's impossible that
further usages will crop up.

Patch that I intend to push soon-ish attached.

Andres

Attachment