Re: [HACKERS] Deadlock in XLogInsert at AIX - Mailing list pgsql-hackers
From | Bernd Helmle |
---|---|
Subject | Re: [HACKERS] Deadlock in XLogInsert at AIX |
Date | |
Msg-id | 1485786380.3084.2.camel@oopsware.de Whole thread Raw |
In response to | Re: [HACKERS] Deadlock in XLogInsert at AIX (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
List | pgsql-hackers |
Hi Konstantin, We had observed exactly the same issues on a customer system with the same environment and PostgreSQL 9.5.5. Additionally, we've tested on Linux with XL/C 12 and 13 with exactly the same deadlock behavior. So we assumed that this is somehow a compiler issue. Am Dienstag, den 24.01.2017, 19:26 +0300 schrieb Konstantin Knizhnik: > More information about the problem - Postgres log contains several > records: > > 2017-01-24 19:15:20.272 MSK [19270462] LOG: request to flush past > end > of generated WAL; request 6/AAEBE000, currpos 6/AAEBC2B0 > > and them correspond to the time when deadlock happen. Yeah, the same logs here: LOG: request to flush past end of generated WAL; request 1/1F4C6000, currpos 1/1F4C40E0 STATEMENT: UPDATE pgbench_accounts SET abalance = abalance + -2653 WHERE aid = 3662494; > There is the following comment in xlog.c concerning this message: > > /* > * No-one should request to flush a piece of WAL that hasn't > even been > * reserved yet. However, it can happen if there is a block with > a > bogus > * LSN on disk, for example. XLogFlush checks for that situation > and > * complains, but only after the flush. Here we just assume that > to > mean > * that all WAL that has been reserved needs to be finished. In > this > * corner-case, the return value can be smaller than 'upto' > argument. > */ > > So looks like it should not happen. > The first thing to suspect is spinlock implementation which is > different > for GCC and XLC. > But ... if I rebuild Postgres without spinlocks, then the problem is > still reproduced. Before we got the results from XLC on Linux (where Postgres show the same behavior) i had a look into the spinlock implementation. If i got it right, XLC doesn't use the ppc64 specific ones, but the fallback implementation (system monitoring on AIX also has shown massive calls for signal(0)...). So i tried the following patch: diff --git a/src/include/port/atomics/arch-ppc.h b/src/include/port/atomics/arch-ppc.h new file mode 100644 index f901a0c..028cced *** a/src/include/port/atomics/arch-ppc.h --- b/src/include/port/atomics/arch-ppc.h *************** *** 23,26 **** --- 23,33 ---- #define pg_memory_barrier_impl() __asm__ __volatile__ ("sync" : : : "memory") #define pg_read_barrier_impl() __asm__ __volatile__ ("lwsync" : : : "memory") #define pg_write_barrier_impl() __asm__ __volatile__ ("lwsync" : : : "memory") + + #elif defined(__IBMC__) || defined(__IBMCPP__) + + #define pg_memory_barrier_impl() __asm__ __volatile__ (" sync \n" ::: "memory") + #define pg_read_barrier_impl() __asm__ __volatile__ (" lwsync \n" ::: "memory") + #define pg_write_barrier_impl() __asm__ __volatile__ (" lwsync \n" ::: "memory") + #endif This didn't change the picture, though.
pgsql-hackers by date: