Thread: 9.3.9 and pg_multixact corruption

9.3.9 and pg_multixact corruption

From
Bernd Helmle
Date:
A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11
instance.

The database crashed with the following log messages:

2015-09-08 00:49:16 CEST [2912] PANIC:  could not access status of
transaction 1068235595
2015-09-08 00:49:16 CEST [2912] DETAIL:  Could not open file
"pg_multixact/members/FFFF5FC4": No such file or directory.
2015-09-08 00:49:16 CEST [2912] STATEMENT:  delete from StockTransfer
where oid = $1 and tanum = $2 

When they called us later, it turned out that the crash happened during a
base backup, leaving a backup_label behind which prevented the database
coming up again with a invalid checkpoint location. However, removing the
backup_label still didn't let the database through recovery, it failed
again with the former error, this time during recovery:

2015-09-08 11:40:04 CEST [27047] LOG:  database system was interrupted
while in recovery at 2015-09-08 11:19:44 CEST
2015-09-08 11:40:04 CEST [27047] HINT:  This probably means that some data
is corrupted and you will have to use the last backup for recovery.
2015-09-08 11:40:04 CEST [27047] LOG:  database system was not properly
shut down; automatic recovery in progress
2015-09-08 11:40:05 CEST [27047] LOG:  redo starts at 1A52/2313FEF8
2015-09-08 11:40:47 CEST [27082] FATAL:  the database system is starting up
2015-09-08 11:40:59 CEST [27047] FATAL:  could not access status of
transaction 1068235595
2015-09-08 11:40:59 CEST [27047] DETAIL:  Could not seek in file
"pg_multixact/members/FFFF5FC4" to offset 4294950912: Invalid argument.
2015-09-08 11:40:59 CEST [27047] CONTEXT:  xlog redo create mxid 1068235595
offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) 
2015-09-08 11:40:59 CEST [27045] LOG:  startup process (PID 27047) exited
with exit code 1
2015-09-08 11:40:59 CEST [27045] LOG:  aborting startup due to startup
process failure

Some side notes:

An additional recovery from a base backup and archive recovery yield to the
same error, as soon as the affected tuple was touched with a DELETE. The
affected table was fully dumpable via pg_dump, though.

We also have a core dump, but no direct access to the machine. If there's
more information  required (and i believe it is), let me know where to dig
deeper. I also would like to request a backtrace from the existing core
dump, but in the absence of a sparc64 machine here we need to ask the
customer to get one.

-- 
Thanks
Bernd



Re: 9.3.9 and pg_multixact corruption

From
Alvaro Herrera
Date:
Bernd Helmle wrote:
> A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11
> instance.
> 

> 2015-09-08 11:40:59 CEST [27047] FATAL:  could not access status of
> transaction 1068235595
> 2015-09-08 11:40:59 CEST [27047] DETAIL:  Could not seek in file
> "pg_multixact/members/FFFF5FC4" to offset 4294950912: Invalid argument.
> 2015-09-08 11:40:59 CEST [27047] CONTEXT:  xlog redo create mxid 1068235595
> offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) 

I think the math to compute segment number and byte offset of the member
might be bogus here.  The file names in pg_multixact/members is supposed
to go to 14078 (hex) in a 8kB-BLCKSZ build, and of course it goes a bit
higher in builds with smaller page sizes, but nowhere as high as
FFFF5FC4.  And the offset is way too close to 2^32 (exactly 16384 less
than 2^32, to be precise.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: 9.3.9 and pg_multixact corruption

From
Alvaro Herrera
Date:
Bernd Helmle wrote:

> 2015-09-08 11:40:59 CEST [27047] DETAIL:  Could not seek in file
> "pg_multixact/members/FFFF5FC4" to offset 4294950912: Invalid argument.
> 2015-09-08 11:40:59 CEST [27047] CONTEXT:  xlog redo create mxid 1068235595
> offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) 

I just noticed that this offset number 2147483648 is exactly 2^31.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: 9.3.9 and pg_multixact corruption

From
Alvaro Herrera
Date:
Bernd Helmle wrote:
> A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11
> instance.
> 
> The database crashed with the following log messages:
> 
> 2015-09-08 00:49:16 CEST [2912] PANIC:  could not access status of
> transaction 1068235595
> 2015-09-08 00:49:16 CEST [2912] DETAIL:  Could not open file
> "pg_multixact/members/FFFF5FC4": No such file or directory.
> 2015-09-08 00:49:16 CEST [2912] STATEMENT:  delete from StockTransfer
> where oid = $1 and tanum = $2 

I wonder if these bogus page and offset numbers are just
SlruReportIOError being confused because pg_multixact/members is so
weird (I don't think it should be the case, since this stuff is using
page numbers only, not anything related to how each page is layed out).

Anyway, can you please request pg_controldata to be run on the failed
cluster and paste it here?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: 9.3.9 and pg_multixact corruption

From
Thomas Munro
Date:
On Fri, Sep 11, 2015 at 10:45 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Bernd Helmle wrote:
> A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11
> instance.
>
> The database crashed with the following log messages:
>
> 2015-09-08 00:49:16 CEST [2912] PANIC:  could not access status of
> transaction 1068235595
> 2015-09-08 00:49:16 CEST [2912] DETAIL:  Could not open file
> "pg_multixact/members/FFFF5FC4": No such file or directory.
> 2015-09-08 00:49:16 CEST [2912] STATEMENT:  delete from StockTransfer
> where oid = $1 and tanum = $2

I wonder if these bogus page and offset numbers are just
SlruReportIOError being confused because pg_multixact/members is so
weird (I don't think it should be the case, since this stuff is using
page numbers only, not anything related to how each page is layed out).

But SlruReportIOError uses the same macro to build the filename as SlruReadPhysicalPage and other functions, namely SlruFileName which uses sprintf with %04X (unsigned integer uppercase hex) and gives it segno (which is always an int), so I don't think the problem is in error reporting only.

Assuming default block size, to get FFFF5FC4 from SlruFileName you need segno == -41020.

We have int segno = pageno / 32 (that's SLRU_PAGES_PER_SEGMENT), so to get segno == -41020 you need pageno between -1312640 and -1312609 (whose bit patterns  reinterpreted as unsigned are 4293654656 and 4293654687).

In various places we have int pageno = offset / (uint32) 1636, expanded from this macro (which calls the offset an xid):

#define MXOffsetToMemberPage(xid) ((xid) / (TransactionId) MULTIXACT_MEMBERS_PER_PAGE)

I don't really see how any uint32 value could produce such a pageno via that macro.  Even if called in an environment where (xid) is accidentally an int, the int / unsigned expression would convert it to unsigned first (unless (xid) is a bigger type like int64_t: by the rules of int promotion you'd get signed division in that case, hmm...).  But it's always called with a MultiXactOffset AKA uint32 variable.

So via that route, there is no MultiXactOffset value that can't be mapped to a segment in the range "0000", "14078".  Famously, it wraps after that.

Maybe the negative pageno came from somewhere else.  Where?  Inside SLRU code we can see pageno = shared->page_number[slotno]... maybe the SLRU slots got corrupted somehow?

--

Re: 9.3.9 and pg_multixact corruption

From
Thomas Munro
Date:
On Fri, Sep 11, 2015 at 11:51 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Fri, Sep 11, 2015 at 10:45 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Bernd Helmle wrote:
> A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11
> instance.
>
> The database crashed with the following log messages:
>
> 2015-09-08 00:49:16 CEST [2912] PANIC:  could not access status of
> transaction 1068235595
> 2015-09-08 00:49:16 CEST [2912] DETAIL:  Could not open file
> "pg_multixact/members/FFFF5FC4": No such file or directory.
> 2015-09-08 00:49:16 CEST [2912] STATEMENT:  delete from StockTransfer
> where oid = $1 and tanum = $2

I wonder if these bogus page and offset numbers are just
SlruReportIOError being confused because pg_multixact/members is so
weird (I don't think it should be the case, since this stuff is using
page numbers only, not anything related to how each page is layed out).

But SlruReportIOError uses the same macro to build the filename as SlruReadPhysicalPage and other functions, namely SlruFileName which uses sprintf with %04X (unsigned integer uppercase hex) and gives it segno (which is always an int), so I don't think the problem is in error reporting only.

Assuming default block size, to get FFFF5FC4 from SlruFileName you need segno == -41020.

Oops, I meant to attach the proviso "Assuming default block size" to the assumption further down that MULTIXACT_MEMBERS_PER_PAGE == 1636.
 
We have int segno = pageno / 32 (that's SLRU_PAGES_PER_SEGMENT), so to get segno == -41020 you need pageno between -1312640 and -1312609 (whose bit patterns  reinterpreted as unsigned are 4293654656 and 4293654687).

In various places we have int pageno = offset / (uint32) 1636, expanded from this macro (which calls the offset an xid):

#define MXOffsetToMemberPage(xid) ((xid) / (TransactionId) MULTIXACT_MEMBERS_PER_PAGE)

I don't really see how any uint32 value could produce such a pageno via that macro.  Even if called in an environment where (xid) is accidentally an int, the int / unsigned expression would convert it to unsigned first (unless (xid) is a bigger type like int64_t: by the rules of int promotion you'd get signed division in that case, hmm...).  But it's always called with a MultiXactOffset AKA uint32 variable.

So via that route, there is no MultiXactOffset value that can't be mapped to a segment in the range "0000", "14078".  Famously, it wraps after that.

Maybe the negative pageno came from somewhere else.  Where?  Inside SLRU code we can see pageno = shared->page_number[slotno]... maybe the SLRU slots got corrupted somehow?

--

Re: 9.3.9 and pg_multixact corruption

From
Bernd Helmle
Date:
...and here it is ;)

--On 10. September 2015 19:45:46 -0300 Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

> Anyway, can you please request pg_controldata to be run on the failed
> cluster and paste it here?

pg_control version number:            937
Catalog version number:               201306121
Database system identifier:           5995776571405068134
Database cluster state:               in archive recovery
pg_control last modified:             Di 08 Sep 2015 14:58:36 CEST
Latest checkpoint location:           1A52/3CFAF758
Prior checkpoint location:            1A52/3CFAF758
Latest checkpoint's REDO location:    1A52/2313FEF8
Latest checkpoint's REDO WAL file:    0000000100001A5200000023
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0/2896610102
Latest checkpoint's NextOID:          261892
Latest checkpoint's NextMultiXactId:  1068223816
Latest checkpoint's NextMultiOffset:  2147460090
Latest checkpoint's oldestXID:        2693040605
Latest checkpoint's oldestXID's DB:   16400
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1012219584
Latest checkpoint's oldestMulti's DB: 16400
Time of latest checkpoint:            Di 08 Sep 2015 00:47:01 CEST
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     1A52/2313FEF8
Min recovery ending loc's timeline:   1
Backup start location:                1A52/2313FEF8
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            archive
Current max_connections setting:      500
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0

-- 
Bernd               



Re: 9.3.9 and pg_multixact corruption

From
Christoph Berg
Date:
Re: Bernd Helmle 2015-09-10 <7E3C7F8D210AC9A423E96F3A@eje.local>
> 2015-09-08 11:40:59 CEST [27047] DETAIL:  Could not seek in file
> "pg_multixact/members/FFFF5FC4" to offset 4294950912: Invalid argument.
> 2015-09-08 11:40:59 CEST [27047] CONTEXT:  xlog redo create mxid 1068235595
> offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) 
> 2015-09-08 11:40:59 CEST [27045] LOG:  startup process (PID 27047) exited
> with exit code 1
> 2015-09-08 11:40:59 CEST [27045] LOG:  aborting startup due to startup
> process failure
> 
> Some side notes:
> 
> An additional recovery from a base backup and archive recovery yield to the
> same error, as soon as the affected tuple was touched with a DELETE. The
> affected table was fully dumpable via pg_dump, though.

A few more words here: the archive recovery was a pitr to 00:45, so
well before the problem, and the cluster was initially working well,
but crashed shortly after with the same mxid 1068235595 message. The
crash was triggered from a delete on a different table (which was
related schema-wise, but iirc neither of these tables has any FKs).

We then rewound the system to a zfs snapshot taken when the archive
recovery had finished (db shut down cleanly), and put it up again,
when it again crashed with mxid 1068235595, this time on a third
table.

The original crash and the first post-recovery crash happened a few
minutes after pg_start_backup(), though the next crash was without
that.


(While the archive recovery was running, I had pg_resetxlog the
original cluster. It was possible to isolate the ctid of an affected
tuple, but it wasn't possible to DELETE it, yielding an error message
similar to the above, but the database would continue. I then zeroed
the bad block using dd (zero_damaged_pages didn't help), only to find
that at least one more tuple in that table was affected (with a
different mxid).)

Christoph



Re: 9.3.9 and pg_multixact corruption

From
Andres Freund
Date:
On 2015-09-11 14:25:39 +0200, Christoph Berg wrote:
> A few more words here: the archive recovery was a pitr to 00:45, so
> well before the problem, and the cluster was initially working well,
> but crashed shortly after with the same mxid 1068235595 message. The
> crash was triggered from a delete on a different table (which was
> related schema-wise, but iirc neither of these tables has any FKs).
>
> We then rewound the system to a zfs snapshot taken when the archive
> recovery had finished (db shut down cleanly), and put it up again,
> when it again crashed with mxid 1068235595, this time on a third
> table.
>
> The original crash and the first post-recovery crash happened a few
> minutes after pg_start_backup(), though the next crash was without
> that.

Do you still have access to that data? That'd make investigation of both
the issue and fixes/workaround far easier.

Greetings,

Andres Freund



Re: 9.3.9 and pg_multixact corruption

From
Andreas Seltenreich
Date:
Thomas Munro writes:

> In various places we have int pageno = offset / (uint32) 1636, expanded
> from this macro (which calls the offset an xid):

It appears to depend on the context it is expanded in, as some of the
code must have gotten the segment number right:

,----[ ls -sh pg_multixact/members/ ]
| 256K 97E0
| [...]
| 256K A03B
|  24K A03C = -FFFF5FC4
|    0 FFFF5FC4
`----

> I don't really see how any uint32 value could produce such a pageno via
> that macro.  Even if called in an environment where (xid) is accidentally
> an int, the int / unsigned expression would convert it to unsigned first
> (unless (xid) is a bigger type like int64_t: by the rules of int promotion
> you'd get signed division in that case, hmm...).  But it's always called
> with a MultiXactOffset AKA uint32 variable.

I managed disassemble RecordNewMultiXact from the core dump using a
cross-binutils, and it reveals that the compiler[1] appears to have
indeed generated a signed division here.  I'm attaching a piece of C
code that does the same computation as the assembly (I think), as well
as the disassembly itself.

regards,
Andreas

Footnotes: 
[1]  Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18, 64-bit

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

uint32_t offset2page(uint32_t offset)
{ uint64_t l0, i3, i5, i4, l6_1, l6_2, o2, o4, o1, o3; l0 = offset; i3 = 0x5fc3e800ULL; i5 = i3 ^ -375; i4 =
(int32_t)l0;
 l6_1 = i4 * i5; o2 = l6_1 >> 32; o4 = l0 + o2; o1 = ((int32_t)o4) >> 10; o3 = ((int32_t)l0) >> 31; l6_2 = o1 - o3;
returnl6_2;
 
}

int main(int argc, char *argv[])
{ uint32_t page = offset2page(atol(argv[1])); printf("page: %d\n", page); printf("segment: %04X\n", (int32_t)page/32);
return0;
 
}

RecordNewMultiXact:  100112be8:    07 00 04 01     sethi  %hi(0x100400), %g3  100112bec:    9d e3 bf 30     save  %sp,
-208,%sp  100112bf0:    82 10 e2 bd     or  %g3, 0x2bd, %g1  100112bf4:    90 10 20 0e     mov  0xe, %o0  100112bf8:
a536 20 00     srl  %i0, 0, %l2  100112bfc:    9f 28 70 0c     sllx  %g1, 0xc, %o7  100112c00:    a2 0c a7 ff     and
%l2,0x7ff, %l1  100112c04:    b9 3e a0 00     sra  %i2, 0, %i4  100112c08:    b4 03 ed d8     add  %o7, 0xdd8, %i2
100112c0c:   b1 36 60 00     srl  %i1, 0, %i0  100112c10:    f0 23 a8 af     st  %i0, [ %sp + 0x8af ]  100112c14:    b2
1000 1b     mov  %i3, %i1  100112c18:    40 08 1d 66     call  0x10031a1b0 ; LWLockAcquire  100112c1c:    92 10 20 00
 clr  %o1  100112c20:    93 34 a0 0b     srl  %l2, 0xb, %o1  100112c24:    90 10 00 1a     mov  %i2, %o0  100112c28:
9734 a0 00     srl  %l2, 0, %o3  100112c2c:    7f ff f6 bd     call  0x100110720 ; SimpleLruReadPage  100112c30:    94
1020 01     mov  1, %o2  100112c34:    fa 5e a0 00     ldx  [ %i2 ], %i5  100112c38:    ab 3a 20 00     sra  %o0, 0,
%l5 100112c3c:    96 10 20 01     mov  1, %o3  100112c40:    a9 2d 70 03     sllx  %l5, 3, %l4  100112c44:    e0 5f 60
08    ldx  [ %i5 + 8 ], %l0  100112c48:    a7 3c 60 00     sra  %l1, 0, %l3  100112c4c:    ad 2c f0 02     sllx  %l3,
2,%l6  100112c50:    da 5c 00 14     ldx  [ %l0 + %l4 ], %o5  100112c54:    f0 23 40 16     st  %i0, [ %o5 + %l6 ]
100112c58:   d8 5e a0 00     ldx  [ %i2 ], %o4  100112c5c:    d4 5b 20 18     ldx  [ %o4 + 0x18 ], %o2  100112c60:
d62a 80 15     stb  %o3, [ %o2 + %l5 ]  100112c64:    40 08 1f 2d     call  0x10031a918 ; LWLockRelease  100112c68:
9010 20 0e     mov  0xe, %o0  100112c6c:    90 10 20 0f     mov  0xf, %o0  100112c70:    40 08 1d 50     call
0x10031a1b0; LWLockAcquire  100112c74:    92 10 20 00     clr  %o1  100112c78:    80 a7 20 00     cmp  %i4, 0
100112c7c:   04 40 00 7c     ble,pn   %icc, 0x100112e6c  100112c80:    90 07 3f ff     add  %i4, -1, %o0  100112c84:
e003 a8 af     ld  [ %sp + 0x8af ], %l0  100112c88:    37 17 f0 fa     sethi  %hi(0x5fc3e800), %i3  100112c8c:    86 10
3fff     mov  -1, %g3  100112c90:    ba 1e fe 89     xor  %i3, -375, %i5  100112c94:    d0 23 a8 b7     st  %o0, [ %sp
+0x8b7 ]  100112c98:    a2 10 26 64     mov  0x664, %l1  100112c9c:    c0 23 a8 b3     clr  [ %sp + 0x8b3 ]  100112ca0:
  37 10 1e 0b     sethi  %hi(0x40782c00), %i3  100112ca4:    b9 3c 20 00     sra  %l0, 0, %i4  100112ca8:    ac 4f 00
1d    mulx  %i4, %i5, %l6  100112cac:    95 35 b0 20     srlx  %l6, 0x20, %o2  100112cb0:    ba 10 26 63     mov
0x663,%i5  100112cb4:    98 04 00 0a     add  %l0, %o2, %o4  100112cb8:    b8 10 20 01     mov  1, %i4  100112cbc:
933b 20 0a     sra  %o4, 0xa, %o1  100112cc0:    97 3c 20 1f     sra  %l0, 0x1f, %o3  100112cc4:    ac 22 40 0b     sub
%o1, %o3, %l6  100112cc8:    89 2d a0 04     sll  %l6, 4, %g4  100112ccc:    82 01 00 16     add  %g4, %l6, %g1
100112cd0:   b1 28 60 02     sll  %g1, 2, %i0  100112cd4:    84 26 00 01     sub  %i0, %g1, %g2  100112cd8:    f0 03 a8
af    ld  [ %sp + 0x8af ], %i0  100112cdc:    9f 28 a0 03     sll  %g2, 3, %o7  100112ce0:    90 05 80 0f     add  %l6,
%o7,%o0  100112ce4:    a7 2a 20 02     sll  %o0, 2, %l3  100112ce8:    a6 24 00 13     sub  %l0, %l3, %l3  100112cec:
98 06 e1 11     add  %i3, 0x111, %o4  100112cf0:    0b 14 07 82     sethi  %hi(0x501e0800), %g5  100112cf4:    89 36 20
02    srl  %i0, 2, %g4  100112cf8:    ae 01 63 45     add  %g5, 0x345, %l7  100112cfc:    97 33 20 00     srl  %o4, 0,
%o3 100112d00:    82 49 00 0b     mulx  %g4, %o3, %g1  100112d04:    9a 49 00 17     mulx  %g4, %l7, %o5  100112d08:
9130 70 20     srlx  %g1, 0x20, %o0  100112d0c:    9e 21 00 08     sub  %g4, %o0, %o7  100112d10:    8b 33 e0 01
srl %o7, 1, %g5  100112d14:    ae 02 00 05     add  %o0, %g5, %l7  100112d18:    95 33 70 20     srlx  %o5, 0x20, %o2
100112d1c:   9b 35 e0 04     srl  %l7, 4, %o5  100112d20:    85 32 a0 07     srl  %o2, 7, %g2  100112d24:    94 0b 7f
f0    and  %o5, -16, %o2  100112d28:    c4 23 a8 bf     st  %g2, [ %sp + 0x8bf ]  100112d2c:    9f 35 e0 08     srl
%l7,8, %o7  100112d30:    80 a0 80 03     cmp  %g2, %g3  100112d34:    96 02 80 0f     add  %o2, %o7, %o3  100112d38:
99 2a e0 02     sll  %o3, 2, %o4  100112d3c:    92 23 00 0b     sub  %o4, %o3, %o1  100112d40:    96 0e 20 03     and
%i0,3, %o3  100112d44:    83 2a 60 03     sll  %o1, 3, %g1  100112d48:    90 03 c0 01     add  %o7, %g1, %o0
100112d4c:   8a 21 00 08     sub  %g4, %o0, %g5  100112d50:    89 29 60 02     sll  %g5, 2, %g4  100112d54:    ae 01 00
05    add  %g4, %g5, %l7  100112d58:    af 2d e0 02     sll  %l7, 2, %l7  100112d5c:    b1 32 e0 00     srl  %o3, 0,
%i0 100112d60:    94 05 e0 04     add  %l7, 4, %o2  100112d64:    9b 2e 20 02     sll  %i0, 2, %o5  100112d68:    98 02
800d     add  %o2, %o5, %o4  100112d6c:    89 2a e0 03     sll  %o3, 3, %g4  100112d70:    c8 23 a8 bb     st  %g4, [
%sp+ 0x8bb ]  100112d74:    b1 3b 20 00     sra  %o4, 0, %i0  100112d78:    02 40 00 0a     be,pn   %icc, 0x100112da0
100112d7c:   af 3d e0 00     sra  %l7, 0, %l7  100112d80:    90 06 bf a8     add  %i2, -88, %o0  100112d84:    93 3d a0
00    sra  %l6, 0, %o1  100112d88:    97 34 a0 00     srl  %l2, 0, %o3  100112d8c:    7f ff f6 65     call  0x100110720
;SimpleLruReadPage  100112d90:    94 10 20 01     mov  1, %o2  100112d94:    ab 3a 20 00     sra  %o0, 0, %l5
100112d98:   c6 03 a8 bf     ld  [ %sp + 0x8bf ], %g3  100112d9c:    a9 2d 70 03     sllx  %l5, 3, %l4  100112da0:
d25e bf a8     ldx  [ %i2 + -88 ], %o1  100112da4:    89 3e 20 00     sra  %i0, 0, %g4  100112da8:    98 10 20 ff
mov 0xff, %o4  100112dac:    f0 06 60 00     ld  [ %i1 ], %i0  100112db0:    a6 04 e0 01     inc  %l3  100112db4:    ca
5a60 08     ldx  [ %o1 + 8 ], %g5  100112db8:    c1 6e 61 00     prefetch  [ %i1 + 0x100 ], #n_reads  100112dbc:    c4
5940 14     ldx  [ %g5 + %l4 ], %g2  100112dc0:    f0 20 80 04     st  %i0, [ %g2 + %g4 ]  100112dc4:    de 5e bf a8
ldx  [ %i2 + -88 ], %o7  100112dc8:    ca 06 60 04     ld  [ %i1 + 4 ], %g5  100112dcc:    b2 06 60 08     add  %i1, 8,
%i1 100112dd0:    d0 5b e0 08     ldx  [ %o7 + 8 ], %o0  100112dd4:    de 03 a8 bb     ld  [ %sp + 0x8bb ], %o7
100112dd8:   da 5a 00 14     ldx  [ %o0 + %l4 ], %o5  100112ddc:    d2 03 40 17     ld  [ %o5 + %l7 ], %o1  100112de0:
 97 2b 00 0f     sll  %o4, %o7, %o3  100112de4:    94 3a e0 00     xnor  %o3, 0, %o2  100112de8:    b0 0a 40 0a     and
%o1, %o2, %i0  100112dec:    89 29 40 0f     sll  %g5, %o7, %g4  100112df0:    82 16 00 04     or  %i0, %g4, %g1
100112df4:   c2 23 40 17     st  %g1, [ %o5 + %l7 ]  100112df8:    de 5e bf a8     ldx  [ %i2 + -88 ], %o7  100112dfc:
 d0 5b e0 18     ldx  [ %o7 + 0x18 ], %o0  100112e00:    f8 2a 00 15     stb  %i4, [ %o0 + %l5 ]  100112e04:    d8 03
a8b3     ld  [ %sp + 0x8b3 ], %o4  100112e08:    ee 03 a8 af     ld  [ %sp + 0x8af ], %l7  100112e0c:    9a 05 e0 01
add  %l7, 1, %o5  100112e10:    da 23 a8 af     st  %o5, [ %sp + 0x8af ]  100112e14:    94 03 20 01     add  %o4, 1,
%o2 100112e18:    d4 23 a8 b3     st  %o2, [ %sp + 0x8b3 ]  100112e1c:    ae 02 80 10     add  %o2, %l0, %l7
100112e20:   9b 35 e0 1f     srl  %l7, 0x1f, %o5  100112e24:    98 1b 60 01     xor  %o5, 1, %o4  100112e28:    96 20
000c     neg  %o4, %o3  100112e2c:    92 0a c0 1d     and  %o3, %i5, %o1  100112e30:    8a 02 60 01     add  %o1, 1,
%g5 100112e34:    88 1c c0 05     xor  %l3, %g5, %g4  100112e38:    b1 31 20 00     srl  %g4, 0, %i0  100112e3c:    82
2000 18     neg  %i0, %g1  100112e40:    d6 03 a8 b7     ld  [ %sp + 0x8b7 ], %o3  100112e44:    85 30 70 3f     srlx
%g1,0x3f, %g2  100112e48:    9e 18 a0 01     xor  %g2, 1, %o7  100112e4c:    ae 20 00 0f     neg  %o7, %l7  100112e50:
 80 a2 80 0b     cmp  %o2, %o3  100112e54:    90 0d e0 01     and  %l7, 1, %o0  100112e58:    9a 0d c0 11     and  %l7,
%l1,%o5  100112e5c:    ac 05 80 08     add  %l6, %o0, %l6  100112e60:    a6 24 c0 0d     sub  %l3, %o5, %l3  100112e64:
  24 4f ff a2     ble,a   %icc, 0x100112cec  100112e68:    f0 03 a8 af     ld  [ %sp + 0x8af ], %i0  100112e6c:    b0
1020 0f     mov  0xf, %i0  100112e70:    40 08 1e aa     call  0x10031a918 ; LWLockRelease  100112e74:    81 e8 00 00
 restore 
 



Re: 9.3.9 and pg_multixact corruption

From
Christoph Berg
Date:
Re: Andreas Seltenreich 2015-09-13 <87si6i1875.fsf@credativ.de>
> I managed disassemble RecordNewMultiXact from the core dump using a
> cross-binutils, and it reveals that the compiler[1] appears to have
> indeed generated a signed division here.  I'm attaching a piece of C
> code that does the same computation as the assembly (I think), as well
> as the disassembly itself.
> 
> Footnotes: 
> [1]  Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18, 64-bit

Hi,

a short update here: the customer updated the compiler to a newer
version, is now compiling using -O2 instead of -O3, and the code
generated now looks sane, so this turned out to be a compiler issue.
(Though it's unclear if the upgrade fixed it, or the different -O
level.)

Thanks to all who provided feedback, it was very valuable in actually
tracking down the root of the issue.

Christoph



Re: 9.3.9 and pg_multixact corruption

From
Jim Nasby
Date:
On 9/20/15 9:23 AM, Christoph Berg wrote:
> a short update here: the customer updated the compiler to a newer
> version, is now compiling using -O2 instead of -O3, and the code
> generated now looks sane, so this turned out to be a compiler issue.
> (Though it's unclear if the upgrade fixed it, or the different -O
> level.)

Do we officially not support anything > -O2? If so it'd be nice if 
configure threw at least a warning (if not an error that you had to 
explicitly over-ride).
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: 9.3.9 and pg_multixact corruption

From
Alvaro Herrera
Date:
Jim Nasby wrote:
> On 9/20/15 9:23 AM, Christoph Berg wrote:
> >a short update here: the customer updated the compiler to a newer
> >version, is now compiling using -O2 instead of -O3, and the code
> >generated now looks sane, so this turned out to be a compiler issue.
> >(Though it's unclear if the upgrade fixed it, or the different -O
> >level.)
> 
> Do we officially not support anything > -O2? If so it'd be nice if configure
> threw at least a warning (if not an error that you had to explicitly
> over-ride).

Keep in mind this is Sun OS C -- not one of the most popular compilers
in the world.  I don't know what you suggest: have a test program that
configure runs and detects whether the compiler does the wrong thing?
It doesn't seem a sane idea to maintain test cases for all known
compiler bugs ...

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: 9.3.9 and pg_multixact corruption

From
Andreas Seltenreich
Date:
[ adding Bjorn Munch to Cc ]

Jim Nasby writes:
> On 9/20/15 9:23 AM, Christoph Berg wrote:
>> a short update here: the customer updated the compiler to a newer
>> version, is now compiling using -O2 instead of -O3, and the code
>> generated now looks sane, so this turned out to be a compiler issue.
>> (Though it's unclear if the upgrade fixed it, or the different -O
>> level.)
>
> Do we officially not support anything > -O2? If so it'd be nice if
> configure threw at least a warning (if not an error that you had to
> explicitly over-ride).

At least the solaris binaries distributed via postgresql.org[1] have
been compiled with -xO3 according to pg_config.  And their code for
multixact.c looks inconspicuous.  To recap the data points:

| compiler                                          | flags | multixact.o |
|---------------------------------------------------+-------+-------------|
| Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18 | -xO3  | bad         |
| Sun C 5.13 SunOS_Sparc 2014/10/20                 | -xO2  | good        |
| Sun C 5.8 Patch 121015-04 2007/01/10              | -xO3  | good        |

regards,
Andreas

Footnotes: 
[1]  http://www.postgresql.org/download/solaris/



Re: 9.3.9 and pg_multixact corruption

From
Andreas Seltenreich
Date:
Alvaro Herrera writes:

> Jim Nasby wrote:
>> Do we officially not support anything > -O2? If so it'd be nice if configure
>> threw at least a warning (if not an error that you had to explicitly
>> over-ride).
>
> Keep in mind this is Sun OS C -- not one of the most popular compilers
> in the world.  I don't know what you suggest: have a test program that
> configure runs and detects whether the compiler does the wrong thing?
> It doesn't seem a sane idea to maintain test cases for all known
> compiler bugs ...

I think the intention was to make configure complain if there's a -O > 2
in CFLAGS.

OTOH, a unit test for multixact.c that exercises the code including
wraparounds sounds like a desirable thing regardless of the fact that it
could have caught this miscompilation earlier than 6 months into
production.

regards,
Andreas



Re: 9.3.9 and pg_multixact corruption

From
Bjorn Munch
Date:
On 25/09 09.37, Andreas Seltenreich wrote:
> [ adding Bjorn Munch to Cc ]

Oh. I am on the -hackers list but usually just scan for any subject
mentioning Solaris and this one did not. :-)

> Jim Nasby writes:
> > On 9/20/15 9:23 AM, Christoph Berg wrote:
> >> a short update here: the customer updated the compiler to a newer
> >> version, is now compiling using -O2 instead of -O3, and the code
> >> generated now looks sane, so this turned out to be a compiler issue.
> >> (Though it's unclear if the upgrade fixed it, or the different -O
> >> level.)
> >
> > Do we officially not support anything > -O2? If so it'd be nice if
> > configure threw at least a warning (if not an error that you had to
> > explicitly over-ride).
> 
> At least the solaris binaries distributed via postgresql.org[1] have
> been compiled with -xO3 according to pg_config.  And their code for
> multixact.c looks inconspicuous.  To recap the data points:
> 
> | compiler                                          | flags | multixact.o |
> |---------------------------------------------------+-------+-------------|
> | Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18 | -xO3  | bad         |
> | Sun C 5.13 SunOS_Sparc 2014/10/20                 | -xO2  | good        |
> | Sun C 5.8 Patch 121015-04 2007/01/10              | -xO3  | good        |

All the binaries I have compiled for distribution via postgresql.org
have been built with Sun Studio 11, aka Sun C 5.8 as listed on the
bottom here. Except the 9.5 Alpha where I finally had to upgrade to
version 12.1 (aka Sun C 5.10).

I am also pretty sure they have always been compiled with -O3. At
least that's what the build script is set to and I don't think that
has even been changed.

- Bjorn Munch



Re: 9.3.9 and pg_multixact corruption

From
Robert Haas
Date:
On Fri, Sep 25, 2015 at 3:41 AM, Andreas Seltenreich
<andreas.seltenreich@credativ.de> wrote:
> I think the intention was to make configure complain if there's a -O > 2
> in CFLAGS.

-1 on that idea.  I really don't think that we should categorically
decide we don't support higher optimization levels.  If the compiler
has a bug, then the compiler manufacturer should fix it, and it's not
our fault.  If the compiler doesn't have a bug and our stuff is
blowing up, then we have a bug and should fix it.  I suppose there
could be some grey area but hopefully not too much.

> OTOH, a unit test for multixact.c that exercises the code including
> wraparounds sounds like a desirable thing regardless of the fact that it
> could have caught this miscompilation earlier than 6 months into
> production.

Definitely.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: 9.3.9 and pg_multixact corruption

From
Josh Berkus
Date:
On 09/28/2015 08:10 AM, Robert Haas wrote:
> -1 on that idea.  I really don't think that we should categorically
> decide we don't support higher optimization levels.  If the compiler
> has a bug, then the compiler manufacturer should fix it, and it's not
> our fault.  If the compiler doesn't have a bug and our stuff is
> blowing up, then we have a bug and should fix it.  I suppose there
> could be some grey area but hopefully not too much.

Or it's PILBChAK. I know Sun-CC used to warn that -O3 was unsuitable for
most programs because it could change behavior.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: 9.3.9 and pg_multixact corruption

From
Robert Haas
Date:
On Mon, Sep 28, 2015 at 2:34 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 09/28/2015 08:10 AM, Robert Haas wrote:
>> -1 on that idea.  I really don't think that we should categorically
>> decide we don't support higher optimization levels.  If the compiler
>> has a bug, then the compiler manufacturer should fix it, and it's not
>> our fault.  If the compiler doesn't have a bug and our stuff is
>> blowing up, then we have a bug and should fix it.  I suppose there
>> could be some grey area but hopefully not too much.
>
> Or it's PILBChAK. I know Sun-CC used to warn that -O3 was unsuitable for
> most programs because it could change behavior.

I'm attempting to decipher this acronym.  Problem is located between
chair and keyboard?  But I don't see how that applies here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: 9.3.9 and pg_multixact corruption

From
Noah Misch
Date:
On Mon, Sep 28, 2015 at 11:10:52AM -0400, Robert Haas wrote:
> On Fri, Sep 25, 2015 at 3:41 AM, Andreas Seltenreich <andreas.seltenreich@credativ.de> wrote:
> > OTOH, a unit test for multixact.c that exercises the code including
> > wraparounds sounds like a desirable thing regardless of the fact that it
> > could have caught this miscompilation earlier than 6 months into
> > production.
> 
> Definitely.

+1.  Catching compiler bugs is an important role of "make check".  That suite
has a notable gap if it passes despite a compiler bug ruining the binary.