Thread: make check hang on AIX 5L p690 4way/I have two solutions

make check hang on AIX 5L p690 4way/I have two solutions

From
"Tomoyuki Niijima"
Date:
Your name               : Tomoyuki Niijima
Your email address      : niijima@jp.ibm.com


System Configuration
---------------------
  Architecture (example: Intel Pentium)         : IBM 7040-681 (pSeries
690) 4way (LPAR)

  Operating System (example: Linux 2.0.26 ELF)  : AIX 5L 5.1

  PostgreSQL version (example: PostgreSQL-7.2.1):   PostgreSQL-7.2.1

  Compiler used (example:  gcc 2.95.2)          : gcc 2.9


Please enter a FULL description of your problem:
------------------------------------------------
I tried to build PostgreSQL with the following step to see backends hung
during the regression test. The problem has been reproduced on two machine
but both of these are the same type of hardware and software. I also tried
to recreate the problem on other machines, on older version of AIX but I
couldn't.


Please describe a way to repeat the problem.   Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------
./configure --enable-multibyte=EUC_JP --with-CC=gcc
make

I learned that backend slept in semop() by attaching dbx (AIX debugger) to
one of 'postgres:' processes.



If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------
After looked through pgsql-hackers mailing list, I focused on spin lock
issue to solve the problem. The easiest and may not be the best solution
for the problem is to give up HAS_TEST_AND_SET. This actually works.

*** src/include/port/aix.h.org  Tue Feb 13 23:32:52 2001
--- src/include/port/aix.h      Fri Aug 30 01:02:28 2002
***************
*** 1,8 ****
  #define CLASS_CONFLICT
  #define DISABLE_XOPEN_NLS
! #define HAS_TEST_AND_SET
  #define NO_MKTIME_BEFORE_1970
! typedef unsigned int slock_t;

  #include <sys/machine.h>              /* ENDIAN definitions for network
                                                                 *
communication
 */
--- 1,8 ----
  #define CLASS_CONFLICT
  #define DISABLE_XOPEN_NLS
! /* #define HAS_TEST_AND_SET */
  #define NO_MKTIME_BEFORE_1970
! /* typedef unsigned int slock_t; */

  #include <sys/machine.h>              /* ENDIAN definitions for network
                                                                 *
communication
 */


One another and better solution for the problem is to use _check_lock() and
_clear_lock() as spin lock.  Important thing here is to define S_UNLOCK()
with _clear_lock().  This will solve the so called "Compiler bug" issue
someone wrote on the mailing list.

We have some other API such as cs(), compare_and_swap() and fetch_and_or()
to do test and set on AIX, but any of these didn't solve my problem.  I
wrote tiny testing program to see if we have any bug of these API of AIX,
but I couldn't see any problem except for compare_and_swap(). It seems that
you can not use compare_and_swap() for the purpose, as it would not work as
spin lock on any SMP machines I tested.  I don't know the reason why cs()
nor fetch_and_or()/fetch_and_and() will not work with PostgreSQL on p690.
These worked with my testing program on all machines I tested.

*** ./src/include/storage/s_lock.h.org  Fri Aug 30 01:13:15 2002
--- ./src/include/storage/s_lock.h      Wed Jan 30 00:44:42 2002
***************
*** 440,447 ****
   * Note that slock_t on POWER/POWER2/PowerPC is int instead of char
   * (see storage/ipc.h).
   */
! #define TAS(lock)     _check_lock(lock, 0, 1)
! #define S_UNLOCK(lock)        _clear_lock(lock, 0)
  #endif         /* _AIX */


--- 440,446 ----
   * Note that slock_t on POWER/POWER2/PowerPC is int instead of char
   * (see storage/ipc.h).
   */
! #define TAS(lock)     cs((int *) (lock), 0, 1)
  #endif         /* _AIX */





Re: make check hang on AIX 5L p690 4way/I have two solutions

From
Bruce Momjian
Date:
I have applied the following patch to PostgreSQL CVS.  If there are AIX
portability issues, they will show up during beta testing.  Thanks for
the fix.  I have heard of other AIX folks with similar problems.

---------------------------------------------------------------------------

Tomoyuki Niijima wrote:
> Your name               : Tomoyuki Niijima
> Your email address      : niijima@jp.ibm.com
>
>
> System Configuration
> ---------------------
>   Architecture (example: Intel Pentium)         : IBM 7040-681 (pSeries
> 690) 4way (LPAR)
>
>   Operating System (example: Linux 2.0.26 ELF)  : AIX 5L 5.1
>
>   PostgreSQL version (example: PostgreSQL-7.2.1):   PostgreSQL-7.2.1
>
>   Compiler used (example:  gcc 2.95.2)          : gcc 2.9
>
>
> Please enter a FULL description of your problem:
> ------------------------------------------------
> I tried to build PostgreSQL with the following step to see backends hung
> during the regression test. The problem has been reproduced on two machine
> but both of these are the same type of hardware and software. I also tried
> to recreate the problem on other machines, on older version of AIX but I
> couldn't.
>
>
> Please describe a way to repeat the problem.   Please try to provide a
> concise reproducible example, if at all possible:
> ----------------------------------------------------------------------
> ./configure --enable-multibyte=EUC_JP --with-CC=gcc
> make
>
> I learned that backend slept in semop() by attaching dbx (AIX debugger) to
> one of 'postgres:' processes.
>
>
>
> If you know how this problem might be fixed, list the solution below:
> ---------------------------------------------------------------------
> After looked through pgsql-hackers mailing list, I focused on spin lock
> issue to solve the problem. The easiest and may not be the best solution
> for the problem is to give up HAS_TEST_AND_SET. This actually works.
>
> *** src/include/port/aix.h.org  Tue Feb 13 23:32:52 2001
> --- src/include/port/aix.h      Fri Aug 30 01:02:28 2002
> ***************
> *** 1,8 ****
>   #define CLASS_CONFLICT
>   #define DISABLE_XOPEN_NLS
> ! #define HAS_TEST_AND_SET
>   #define NO_MKTIME_BEFORE_1970
> ! typedef unsigned int slock_t;
>
>   #include <sys/machine.h>              /* ENDIAN definitions for network
>                                                                  *
> communication
>  */
> --- 1,8 ----
>   #define CLASS_CONFLICT
>   #define DISABLE_XOPEN_NLS
> ! /* #define HAS_TEST_AND_SET */
>   #define NO_MKTIME_BEFORE_1970
> ! /* typedef unsigned int slock_t; */
>
>   #include <sys/machine.h>              /* ENDIAN definitions for network
>                                                                  *
> communication
>  */
>
>
> One another and better solution for the problem is to use _check_lock() and
> _clear_lock() as spin lock.  Important thing here is to define S_UNLOCK()
> with _clear_lock().  This will solve the so called "Compiler bug" issue
> someone wrote on the mailing list.
>
> We have some other API such as cs(), compare_and_swap() and fetch_and_or()
> to do test and set on AIX, but any of these didn't solve my problem.  I
> wrote tiny testing program to see if we have any bug of these API of AIX,
> but I couldn't see any problem except for compare_and_swap(). It seems that
> you can not use compare_and_swap() for the purpose, as it would not work as
> spin lock on any SMP machines I tested.  I don't know the reason why cs()
> nor fetch_and_or()/fetch_and_and() will not work with PostgreSQL on p690.
> These worked with my testing program on all machines I tested.
>
> *** ./src/include/storage/s_lock.h.org  Fri Aug 30 01:13:15 2002
> --- ./src/include/storage/s_lock.h      Wed Jan 30 00:44:42 2002
> ***************
> *** 440,447 ****
>    * Note that slock_t on POWER/POWER2/PowerPC is int instead of char
>    * (see storage/ipc.h).
>    */
> ! #define TAS(lock)     _check_lock(lock, 0, 1)
> ! #define S_UNLOCK(lock)        _clear_lock(lock, 0)
>   #endif         /* _AIX */
>
>
> --- 440,446 ----
>    * Note that slock_t on POWER/POWER2/PowerPC is int instead of char
>    * (see storage/ipc.h).
>    */
> ! #define TAS(lock)     cs((int *) (lock), 0, 1)
>   #endif         /* _AIX */
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/include/storage/s_lock.h
===================================================================
RCS file: /cvsroot/pgsql-server/src/include/storage/s_lock.h,v
retrieving revision 1.99
diff -c -c -r1.99 s_lock.h
*** src/include/storage/s_lock.h    20 Jun 2002 20:29:52 -0000    1.99
--- src/include/storage/s_lock.h    2 Sep 2002 04:39:30 -0000
***************
*** 439,445 ****
   *
   * Note that slock_t on POWER/POWER2/PowerPC is int instead of char
   */
! #define TAS(lock)    cs((int *) (lock), 0, 1)
  #endif     /* _AIX */


--- 439,446 ----
   *
   * Note that slock_t on POWER/POWER2/PowerPC is int instead of char
   */
! #define TAS(lock)            _check_lock(lock, 0, 1)
! #define S_UNLOCK(lock)        _clear_lock(lock, 0)
  #endif     /* _AIX */