Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb] - Mailing list pgsql-bugs

From Bruce Momjian
Subject Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb]
Date
Msg-id 20120203014821.GA11939@momjian.us
Whole thread Raw
In response to Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb]  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb]
List pgsql-bugs
Sorry for the late reply, but Heikki, can you get this Itanium
information into s_lock.h as a comment, particularly the information
about the +Ovolatile=__unordered flag?

---------------------------------------------------------------------------

On Mon, Dec 19, 2011 at 11:25:06PM +0200, Heikki Linnakangas wrote:
> On 19.12.2011 22:12, Tom Lane wrote:
> >Noah Misch<noah@leadboat.com>  writes:
> >>On Mon, Dec 19, 2011 at 05:09:11PM +0200, Heikki Linnakangas wrote:
> >>>That is not sufficient on platforms with a weak memory model, like Itanium.
> >
> >>Other processors may observe the lock as held after its release, but there's no
> >>correctness problem.
> >
> >How weak is the memory model, exactly?
> >
> >A correctness problem would ensue if out-of-order stores are possible,
> >ie other processors could observe the lock being freed (and then acquire
> >it) before seeing changes to shared variables that had been made while
> >holding the lock.
> >
> >I'm a little dubious that this applies to Itanium, because I don't see
> >any memory fence instruction in the TAS macro.  If we needed one in
> >S_UNLOCK I'd rather expect there to be one in TAS as well.
>
> There's a pretty good manual called "Implementing Spinlocks on
> Itanium and PA-RISC" from HP at: http://h21007.www2.hp.com/portal/download/files/unprot/itanium/spinlocks.pdf.
> It says, in section "3.2.1 Try to get a spinlock":
>
> > Note also, that the ‘xchg’ instruction always
> > has the ‘acquire’ semantics, so it enforces the correct memory ordering.
>
> But the current implementation seems to be safe, anyway.  In section
> "3.2.3 Freeing a spinlock", that manual says:
>
> > Note, that a simple assignment statement into a volatile variable
> will not work, as we are
> > assuming that the +Ovolatile=__unordered compile option is being used.
>
> So +Ovolatile=__unordered would allow the kind of optimization that
> I thought was possible, and we would have a problem if we used it.
> But the default is more conservative, and a simple assignment does
> work.
>
> I compiled the attached test program on an HP Itanium box, using the
> same flags you get from PostgreSQL's configure on that box. The
> relevant assembler output is:
>
>         xchg4           r14 = [r15], r14           // M [slocktest.c: 66/3]
> //file/line/col slocktest.c/67/3
>         ld1.acq         r16 = [r11]                // M [slocktest.c: 67/3]
>         nop.i           0                          // I
> //file/line/col slocktest.c/68/3
>         st1.rel         [r11] = r10             ;; // M [slocktest.c: 68/3]
> //file/line/col slocktest.c/69/3
>         st4.rel         [r15] = r0                 // M [slocktest.c: 69/3]
> //file/line/col slocktest.c/70/1
>
>
> The trick I missed is that the compiler attaches .rel to all the
> stores and .acq to all the loads through a volatile pointer. gcc
> seems to do the same. So we're safe.
>
>
> It would be interesting to test whether using +Ovolatile=__unordered
> would make a difference to performance. Tacking those memory fence
> modifiers to all the volatile loads and stores might be expensive.
>
> --
>   Heikki Linnakangas
>   EnterpriseDB   http://www.enterprisedb.com

> #include <stdio.h>
>
> #if defined(__GNUC__) || defined(__INTEL_COMPILER)
> #if defined(__ia64__) || defined(__ia64)    /* Intel Itanium */
> #define HAS_TEST_AND_SET
>
> typedef unsigned int slock_t;
>
> #define TAS(lock) tas(lock)
>
> /* On IA64, it's a win to use a non-locking test before the xchg proper */
> #define TAS_SPIN(lock)    (*(lock) ? 1 : TAS(lock))
>
> #ifndef __INTEL_COMPILER
>
> static __inline__ int
> tas(volatile slock_t *lock)
> {
>     long int    ret;
>
>     __asm__ __volatile__(
>         "    xchg4     %0=%1,%2    \n"
> :        "=r"(ret), "+m"(*lock)
> :        "r"(1)
> :        "memory");
>     return (int) ret;
> }
>
> #else /* __INTEL_COMPILER */
>
> static __inline__ int
> tas(volatile slock_t *lock)
> {
>     int        ret;
>
>     ret = _InterlockedExchange(lock,1);    /* this is a xchg asm macro */
>
>     return ret;
> }
>
> #endif /* __INTEL_COMPILER */
> #endif     /* __ia64__ || __ia64 */
> #endif    /* defined(__GNUC__) || defined(__INTEL_COMPILER) */
>
> #if defined(__hpux) && defined(__ia64) && !defined(__GNUC__)
>
> #define HAS_TEST_AND_SET
>
> typedef unsigned int slock_t;
>
> #include <ia64/sys/inline.h>
> #define TAS(lock) _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE)
> /* On IA64, it's a win to use a non-locking test before the xchg proper */
> #define TAS_SPIN(lock)    (*(lock) ? 1 : TAS(lock))
>
> #endif    /* HPUX on IA64, non gcc */
>
> slock_t lock;
> char shared2;
>
> int main(int argc, char **argv)
> {
>   volatile char *p = &shared2;
>   char local;
>
>   TAS(&lock);
>   local = *p;
>   *p = 123;
>   *((volatile slock_t *) &lock) = 0;
> }

>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs


--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

pgsql-bugs by date:

Previous
From: Steven Schlansker
Date:
Subject: Re: [JDBC] BUG #6293: JDBC driver performance
Next
From: Tom Lane
Date:
Subject: Re: BUG #6425: Bus error in slot_deform_tuple