Re: make check hang on AIX 5L p690 4way/I have two solutions - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: make check hang on AIX 5L p690 4way/I have two solutions
Date
Msg-id 200209020441.g824fhp29551@candle.pha.pa.us
Whole thread Raw
In response to make check hang on AIX 5L p690 4way/I have two solutions  ("Tomoyuki Niijima" <NIIJIMA@jp.ibm.com>)
List pgsql-patches
I have applied the following patch to PostgreSQL CVS.  If there are AIX
portability issues, they will show up during beta testing.  Thanks for
the fix.  I have heard of other AIX folks with similar problems.

---------------------------------------------------------------------------

Tomoyuki Niijima wrote:
> Your name               : Tomoyuki Niijima
> Your email address      : niijima@jp.ibm.com
>
>
> System Configuration
> ---------------------
>   Architecture (example: Intel Pentium)         : IBM 7040-681 (pSeries
> 690) 4way (LPAR)
>
>   Operating System (example: Linux 2.0.26 ELF)  : AIX 5L 5.1
>
>   PostgreSQL version (example: PostgreSQL-7.2.1):   PostgreSQL-7.2.1
>
>   Compiler used (example:  gcc 2.95.2)          : gcc 2.9
>
>
> Please enter a FULL description of your problem:
> ------------------------------------------------
> I tried to build PostgreSQL with the following step to see backends hung
> during the regression test. The problem has been reproduced on two machine
> but both of these are the same type of hardware and software. I also tried
> to recreate the problem on other machines, on older version of AIX but I
> couldn't.
>
>
> Please describe a way to repeat the problem.   Please try to provide a
> concise reproducible example, if at all possible:
> ----------------------------------------------------------------------
> ./configure --enable-multibyte=EUC_JP --with-CC=gcc
> make
>
> I learned that backend slept in semop() by attaching dbx (AIX debugger) to
> one of 'postgres:' processes.
>
>
>
> If you know how this problem might be fixed, list the solution below:
> ---------------------------------------------------------------------
> After looked through pgsql-hackers mailing list, I focused on spin lock
> issue to solve the problem. The easiest and may not be the best solution
> for the problem is to give up HAS_TEST_AND_SET. This actually works.
>
> *** src/include/port/aix.h.org  Tue Feb 13 23:32:52 2001
> --- src/include/port/aix.h      Fri Aug 30 01:02:28 2002
> ***************
> *** 1,8 ****
>   #define CLASS_CONFLICT
>   #define DISABLE_XOPEN_NLS
> ! #define HAS_TEST_AND_SET
>   #define NO_MKTIME_BEFORE_1970
> ! typedef unsigned int slock_t;
>
>   #include <sys/machine.h>              /* ENDIAN definitions for network
>                                                                  *
> communication
>  */
> --- 1,8 ----
>   #define CLASS_CONFLICT
>   #define DISABLE_XOPEN_NLS
> ! /* #define HAS_TEST_AND_SET */
>   #define NO_MKTIME_BEFORE_1970
> ! /* typedef unsigned int slock_t; */
>
>   #include <sys/machine.h>              /* ENDIAN definitions for network
>                                                                  *
> communication
>  */
>
>
> One another and better solution for the problem is to use _check_lock() and
> _clear_lock() as spin lock.  Important thing here is to define S_UNLOCK()
> with _clear_lock().  This will solve the so called "Compiler bug" issue
> someone wrote on the mailing list.
>
> We have some other API such as cs(), compare_and_swap() and fetch_and_or()
> to do test and set on AIX, but any of these didn't solve my problem.  I
> wrote tiny testing program to see if we have any bug of these API of AIX,
> but I couldn't see any problem except for compare_and_swap(). It seems that
> you can not use compare_and_swap() for the purpose, as it would not work as
> spin lock on any SMP machines I tested.  I don't know the reason why cs()
> nor fetch_and_or()/fetch_and_and() will not work with PostgreSQL on p690.
> These worked with my testing program on all machines I tested.
>
> *** ./src/include/storage/s_lock.h.org  Fri Aug 30 01:13:15 2002
> --- ./src/include/storage/s_lock.h      Wed Jan 30 00:44:42 2002
> ***************
> *** 440,447 ****
>    * Note that slock_t on POWER/POWER2/PowerPC is int instead of char
>    * (see storage/ipc.h).
>    */
> ! #define TAS(lock)     _check_lock(lock, 0, 1)
> ! #define S_UNLOCK(lock)        _clear_lock(lock, 0)
>   #endif         /* _AIX */
>
>
> --- 440,446 ----
>    * Note that slock_t on POWER/POWER2/PowerPC is int instead of char
>    * (see storage/ipc.h).
>    */
> ! #define TAS(lock)     cs((int *) (lock), 0, 1)
>   #endif         /* _AIX */
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/include/storage/s_lock.h
===================================================================
RCS file: /cvsroot/pgsql-server/src/include/storage/s_lock.h,v
retrieving revision 1.99
diff -c -c -r1.99 s_lock.h
*** src/include/storage/s_lock.h    20 Jun 2002 20:29:52 -0000    1.99
--- src/include/storage/s_lock.h    2 Sep 2002 04:39:30 -0000
***************
*** 439,445 ****
   *
   * Note that slock_t on POWER/POWER2/PowerPC is int instead of char
   */
! #define TAS(lock)    cs((int *) (lock), 0, 1)
  #endif     /* _AIX */


--- 439,446 ----
   *
   * Note that slock_t on POWER/POWER2/PowerPC is int instead of char
   */
! #define TAS(lock)            _check_lock(lock, 0, 1)
! #define S_UNLOCK(lock)        _clear_lock(lock, 0)
  #endif     /* _AIX */



pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Minor regression test fix
Next
From: Bruce Momjian
Date:
Subject: Re: failed to build libpq.so on AIX 4 and 5/I have a solution