Thread: 9.3.9 and pg_multixact corruption
A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11 instance. The database crashed with the following log messages: 2015-09-08 00:49:16 CEST [2912] PANIC: could not access status of transaction 1068235595 2015-09-08 00:49:16 CEST [2912] DETAIL: Could not open file "pg_multixact/members/FFFF5FC4": No such file or directory. 2015-09-08 00:49:16 CEST [2912] STATEMENT: delete from StockTransfer where oid = $1 and tanum = $2 When they called us later, it turned out that the crash happened during a base backup, leaving a backup_label behind which prevented the database coming up again with a invalid checkpoint location. However, removing the backup_label still didn't let the database through recovery, it failed again with the former error, this time during recovery: 2015-09-08 11:40:04 CEST [27047] LOG: database system was interrupted while in recovery at 2015-09-08 11:19:44 CEST 2015-09-08 11:40:04 CEST [27047] HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. 2015-09-08 11:40:04 CEST [27047] LOG: database system was not properly shut down; automatic recovery in progress 2015-09-08 11:40:05 CEST [27047] LOG: redo starts at 1A52/2313FEF8 2015-09-08 11:40:47 CEST [27082] FATAL: the database system is starting up 2015-09-08 11:40:59 CEST [27047] FATAL: could not access status of transaction 1068235595 2015-09-08 11:40:59 CEST [27047] DETAIL: Could not seek in file "pg_multixact/members/FFFF5FC4" to offset 4294950912: Invalid argument. 2015-09-08 11:40:59 CEST [27047] CONTEXT: xlog redo create mxid 1068235595 offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) 2015-09-08 11:40:59 CEST [27045] LOG: startup process (PID 27047) exited with exit code 1 2015-09-08 11:40:59 CEST [27045] LOG: aborting startup due to startup process failure Some side notes: An additional recovery from a base backup and archive recovery yield to the same error, as soon as the affected tuple was touched with a DELETE. The affected table was fully dumpable via pg_dump, though. We also have a core dump, but no direct access to the machine. If there's more information required (and i believe it is), let me know where to dig deeper. I also would like to request a backtrace from the existing core dump, but in the absence of a sparc64 machine here we need to ask the customer to get one. -- Thanks Bernd
Bernd Helmle wrote: > A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11 > instance. > > 2015-09-08 11:40:59 CEST [27047] FATAL: could not access status of > transaction 1068235595 > 2015-09-08 11:40:59 CEST [27047] DETAIL: Could not seek in file > "pg_multixact/members/FFFF5FC4" to offset 4294950912: Invalid argument. > 2015-09-08 11:40:59 CEST [27047] CONTEXT: xlog redo create mxid 1068235595 > offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) I think the math to compute segment number and byte offset of the member might be bogus here. The file names in pg_multixact/members is supposed to go to 14078 (hex) in a 8kB-BLCKSZ build, and of course it goes a bit higher in builds with smaller page sizes, but nowhere as high as FFFF5FC4. And the offset is way too close to 2^32 (exactly 16384 less than 2^32, to be precise.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Bernd Helmle wrote: > 2015-09-08 11:40:59 CEST [27047] DETAIL: Could not seek in file > "pg_multixact/members/FFFF5FC4" to offset 4294950912: Invalid argument. > 2015-09-08 11:40:59 CEST [27047] CONTEXT: xlog redo create mxid 1068235595 > offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) I just noticed that this offset number 2147483648 is exactly 2^31. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Bernd Helmle wrote: > A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11 > instance. > > The database crashed with the following log messages: > > 2015-09-08 00:49:16 CEST [2912] PANIC: could not access status of > transaction 1068235595 > 2015-09-08 00:49:16 CEST [2912] DETAIL: Could not open file > "pg_multixact/members/FFFF5FC4": No such file or directory. > 2015-09-08 00:49:16 CEST [2912] STATEMENT: delete from StockTransfer > where oid = $1 and tanum = $2 I wonder if these bogus page and offset numbers are just SlruReportIOError being confused because pg_multixact/members is so weird (I don't think it should be the case, since this stuff is using page numbers only, not anything related to how each page is layed out). Anyway, can you please request pg_controldata to be run on the failed cluster and paste it here? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Sep 11, 2015 at 10:45 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
But SlruReportIOError uses the same macro to build the filename as SlruReadPhysicalPage and other functions, namely SlruFileName which uses sprintf with %04X (unsigned integer uppercase hex) and gives it segno (which is always an int), so I don't think the problem is in error reporting only.
Assuming default block size, to get FFFF5FC4 from SlruFileName you need segno == -41020.
We have int segno = pageno / 32 (that's SLRU_PAGES_PER_SEGMENT), so to get segno == -41020 you need pageno between -1312640 and -1312609 (whose bit patterns reinterpreted as unsigned are 4293654656 and 4293654687).
-- Bernd Helmle wrote:
> A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11
> instance.
>
> The database crashed with the following log messages:
>
> 2015-09-08 00:49:16 CEST [2912] PANIC: could not access status of
> transaction 1068235595
> 2015-09-08 00:49:16 CEST [2912] DETAIL: Could not open file
> "pg_multixact/members/FFFF5FC4": No such file or directory.
> 2015-09-08 00:49:16 CEST [2912] STATEMENT: delete from StockTransfer
> where oid = $1 and tanum = $2
I wonder if these bogus page and offset numbers are just
SlruReportIOError being confused because pg_multixact/members is so
weird (I don't think it should be the case, since this stuff is using
page numbers only, not anything related to how each page is layed out).
But SlruReportIOError uses the same macro to build the filename as SlruReadPhysicalPage and other functions, namely SlruFileName which uses sprintf with %04X (unsigned integer uppercase hex) and gives it segno (which is always an int), so I don't think the problem is in error reporting only.
Assuming default block size, to get FFFF5FC4 from SlruFileName you need segno == -41020.
We have int segno = pageno / 32 (that's SLRU_PAGES_PER_SEGMENT), so to get segno == -41020 you need pageno between -1312640 and -1312609 (whose bit patterns reinterpreted as unsigned are 4293654656 and 4293654687).
In various places we have int pageno = offset / (uint32) 1636, expanded from this macro (which calls the offset an xid):
#define MXOffsetToMemberPage(xid) ((xid) / (TransactionId) MULTIXACT_MEMBERS_PER_PAGE)
I don't really see how any uint32 value could produce such a pageno via that macro. Even if called in an environment where (xid) is accidentally an int, the int / unsigned expression would convert it to unsigned first (unless (xid) is a bigger type like int64_t: by the rules of int promotion you'd get signed division in that case, hmm...). But it's always called with a MultiXactOffset AKA uint32 variable.
So via that route, there is no MultiXactOffset value that can't be mapped to a segment in the range "0000", "14078". Famously, it wraps after that.
Maybe the negative pageno came from somewhere else. Where? Inside SLRU code we can see pageno = shared->page_number[slotno]... maybe the SLRU slots got corrupted somehow?
So via that route, there is no MultiXactOffset value that can't be mapped to a segment in the range "0000", "14078". Famously, it wraps after that.
Maybe the negative pageno came from somewhere else. Where? Inside SLRU code we can see pageno = shared->page_number[slotno]... maybe the SLRU slots got corrupted somehow?
Thomas Munro
http://www.enterprisedb.com
http://www.enterprisedb.com
On Fri, Sep 11, 2015 at 11:51 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
Oops, I meant to attach the proviso "Assuming default block size" to the assumption further down that MULTIXACT_MEMBERS_PER_PAGE == 1636.
On Fri, Sep 11, 2015 at 10:45 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:Bernd Helmle wrote:
> A customer had a severe issue with a PostgreSQL 9.3.9/sparc64/Solaris 11
> instance.
>
> The database crashed with the following log messages:
>
> 2015-09-08 00:49:16 CEST [2912] PANIC: could not access status of
> transaction 1068235595
> 2015-09-08 00:49:16 CEST [2912] DETAIL: Could not open file
> "pg_multixact/members/FFFF5FC4": No such file or directory.
> 2015-09-08 00:49:16 CEST [2912] STATEMENT: delete from StockTransfer
> where oid = $1 and tanum = $2
I wonder if these bogus page and offset numbers are just
SlruReportIOError being confused because pg_multixact/members is so
weird (I don't think it should be the case, since this stuff is using
page numbers only, not anything related to how each page is layed out).
But SlruReportIOError uses the same macro to build the filename as SlruReadPhysicalPage and other functions, namely SlruFileName which uses sprintf with %04X (unsigned integer uppercase hex) and gives it segno (which is always an int), so I don't think the problem is in error reporting only.
Assuming default block size, to get FFFF5FC4 from SlruFileName you need segno == -41020.
Oops, I meant to attach the proviso "Assuming default block size" to the assumption further down that MULTIXACT_MEMBERS_PER_PAGE == 1636.
We have int segno = pageno / 32 (that's SLRU_PAGES_PER_SEGMENT), so to get segno == -41020 you need pageno between -1312640 and -1312609 (whose bit patterns reinterpreted as unsigned are 4293654656 and 4293654687).In various places we have int pageno = offset / (uint32) 1636, expanded from this macro (which calls the offset an xid):#define MXOffsetToMemberPage(xid) ((xid) / (TransactionId) MULTIXACT_MEMBERS_PER_PAGE)
I don't really see how any uint32 value could produce such a pageno via that macro. Even if called in an environment where (xid) is accidentally an int, the int / unsigned expression would convert it to unsigned first (unless (xid) is a bigger type like int64_t: by the rules of int promotion you'd get signed division in that case, hmm...). But it's always called with a MultiXactOffset AKA uint32 variable.
So via that route, there is no MultiXactOffset value that can't be mapped to a segment in the range "0000", "14078". Famously, it wraps after that.
Maybe the negative pageno came from somewhere else. Where? Inside SLRU code we can see pageno = shared->page_number[slotno]... maybe the SLRU slots got corrupted somehow?
Thomas Munro
http://www.enterprisedb.com
http://www.enterprisedb.com
...and here it is ;) --On 10. September 2015 19:45:46 -0300 Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Anyway, can you please request pg_controldata to be run on the failed > cluster and paste it here? pg_control version number: 937 Catalog version number: 201306121 Database system identifier: 5995776571405068134 Database cluster state: in archive recovery pg_control last modified: Di 08 Sep 2015 14:58:36 CEST Latest checkpoint location: 1A52/3CFAF758 Prior checkpoint location: 1A52/3CFAF758 Latest checkpoint's REDO location: 1A52/2313FEF8 Latest checkpoint's REDO WAL file: 0000000100001A5200000023 Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/2896610102 Latest checkpoint's NextOID: 261892 Latest checkpoint's NextMultiXactId: 1068223816 Latest checkpoint's NextMultiOffset: 2147460090 Latest checkpoint's oldestXID: 2693040605 Latest checkpoint's oldestXID's DB: 16400 Latest checkpoint's oldestActiveXID: 0 Latest checkpoint's oldestMultiXid: 1012219584 Latest checkpoint's oldestMulti's DB: 16400 Time of latest checkpoint: Di 08 Sep 2015 00:47:01 CEST Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 1A52/2313FEF8 Min recovery ending loc's timeline: 1 Backup start location: 1A52/2313FEF8 Backup end location: 0/0 End-of-backup record required: no Current wal_level setting: archive Current max_connections setting: 500 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 -- Bernd
Re: Bernd Helmle 2015-09-10 <7E3C7F8D210AC9A423E96F3A@eje.local> > 2015-09-08 11:40:59 CEST [27047] DETAIL: Could not seek in file > "pg_multixact/members/FFFF5FC4" to offset 4294950912: Invalid argument. > 2015-09-08 11:40:59 CEST [27047] CONTEXT: xlog redo create mxid 1068235595 > offset 2147483648 nmembers 2: 2896635220 (upd) 2896635510 (keysh) > 2015-09-08 11:40:59 CEST [27045] LOG: startup process (PID 27047) exited > with exit code 1 > 2015-09-08 11:40:59 CEST [27045] LOG: aborting startup due to startup > process failure > > Some side notes: > > An additional recovery from a base backup and archive recovery yield to the > same error, as soon as the affected tuple was touched with a DELETE. The > affected table was fully dumpable via pg_dump, though. A few more words here: the archive recovery was a pitr to 00:45, so well before the problem, and the cluster was initially working well, but crashed shortly after with the same mxid 1068235595 message. The crash was triggered from a delete on a different table (which was related schema-wise, but iirc neither of these tables has any FKs). We then rewound the system to a zfs snapshot taken when the archive recovery had finished (db shut down cleanly), and put it up again, when it again crashed with mxid 1068235595, this time on a third table. The original crash and the first post-recovery crash happened a few minutes after pg_start_backup(), though the next crash was without that. (While the archive recovery was running, I had pg_resetxlog the original cluster. It was possible to isolate the ctid of an affected tuple, but it wasn't possible to DELETE it, yielding an error message similar to the above, but the database would continue. I then zeroed the bad block using dd (zero_damaged_pages didn't help), only to find that at least one more tuple in that table was affected (with a different mxid).) Christoph
On 2015-09-11 14:25:39 +0200, Christoph Berg wrote: > A few more words here: the archive recovery was a pitr to 00:45, so > well before the problem, and the cluster was initially working well, > but crashed shortly after with the same mxid 1068235595 message. The > crash was triggered from a delete on a different table (which was > related schema-wise, but iirc neither of these tables has any FKs). > > We then rewound the system to a zfs snapshot taken when the archive > recovery had finished (db shut down cleanly), and put it up again, > when it again crashed with mxid 1068235595, this time on a third > table. > > The original crash and the first post-recovery crash happened a few > minutes after pg_start_backup(), though the next crash was without > that. Do you still have access to that data? That'd make investigation of both the issue and fixes/workaround far easier. Greetings, Andres Freund
Thomas Munro writes: > In various places we have int pageno = offset / (uint32) 1636, expanded > from this macro (which calls the offset an xid): It appears to depend on the context it is expanded in, as some of the code must have gotten the segment number right: ,----[ ls -sh pg_multixact/members/ ] | 256K 97E0 | [...] | 256K A03B | 24K A03C = -FFFF5FC4 | 0 FFFF5FC4 `---- > I don't really see how any uint32 value could produce such a pageno via > that macro. Even if called in an environment where (xid) is accidentally > an int, the int / unsigned expression would convert it to unsigned first > (unless (xid) is a bigger type like int64_t: by the rules of int promotion > you'd get signed division in that case, hmm...). But it's always called > with a MultiXactOffset AKA uint32 variable. I managed disassemble RecordNewMultiXact from the core dump using a cross-binutils, and it reveals that the compiler[1] appears to have indeed generated a signed division here. I'm attaching a piece of C code that does the same computation as the assembly (I think), as well as the disassembly itself. regards, Andreas Footnotes: [1] Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18, 64-bit #include <stdint.h> #include <stdio.h> #include <stdlib.h> uint32_t offset2page(uint32_t offset) { uint64_t l0, i3, i5, i4, l6_1, l6_2, o2, o4, o1, o3; l0 = offset; i3 = 0x5fc3e800ULL; i5 = i3 ^ -375; i4 = (int32_t)l0; l6_1 = i4 * i5; o2 = l6_1 >> 32; o4 = l0 + o2; o1 = ((int32_t)o4) >> 10; o3 = ((int32_t)l0) >> 31; l6_2 = o1 - o3; returnl6_2; } int main(int argc, char *argv[]) { uint32_t page = offset2page(atol(argv[1])); printf("page: %d\n", page); printf("segment: %04X\n", (int32_t)page/32); return0; } RecordNewMultiXact: 100112be8: 07 00 04 01 sethi %hi(0x100400), %g3 100112bec: 9d e3 bf 30 save %sp, -208,%sp 100112bf0: 82 10 e2 bd or %g3, 0x2bd, %g1 100112bf4: 90 10 20 0e mov 0xe, %o0 100112bf8: a536 20 00 srl %i0, 0, %l2 100112bfc: 9f 28 70 0c sllx %g1, 0xc, %o7 100112c00: a2 0c a7 ff and %l2,0x7ff, %l1 100112c04: b9 3e a0 00 sra %i2, 0, %i4 100112c08: b4 03 ed d8 add %o7, 0xdd8, %i2 100112c0c: b1 36 60 00 srl %i1, 0, %i0 100112c10: f0 23 a8 af st %i0, [ %sp + 0x8af ] 100112c14: b2 1000 1b mov %i3, %i1 100112c18: 40 08 1d 66 call 0x10031a1b0 ; LWLockAcquire 100112c1c: 92 10 20 00 clr %o1 100112c20: 93 34 a0 0b srl %l2, 0xb, %o1 100112c24: 90 10 00 1a mov %i2, %o0 100112c28: 9734 a0 00 srl %l2, 0, %o3 100112c2c: 7f ff f6 bd call 0x100110720 ; SimpleLruReadPage 100112c30: 94 1020 01 mov 1, %o2 100112c34: fa 5e a0 00 ldx [ %i2 ], %i5 100112c38: ab 3a 20 00 sra %o0, 0, %l5 100112c3c: 96 10 20 01 mov 1, %o3 100112c40: a9 2d 70 03 sllx %l5, 3, %l4 100112c44: e0 5f 60 08 ldx [ %i5 + 8 ], %l0 100112c48: a7 3c 60 00 sra %l1, 0, %l3 100112c4c: ad 2c f0 02 sllx %l3, 2,%l6 100112c50: da 5c 00 14 ldx [ %l0 + %l4 ], %o5 100112c54: f0 23 40 16 st %i0, [ %o5 + %l6 ] 100112c58: d8 5e a0 00 ldx [ %i2 ], %o4 100112c5c: d4 5b 20 18 ldx [ %o4 + 0x18 ], %o2 100112c60: d62a 80 15 stb %o3, [ %o2 + %l5 ] 100112c64: 40 08 1f 2d call 0x10031a918 ; LWLockRelease 100112c68: 9010 20 0e mov 0xe, %o0 100112c6c: 90 10 20 0f mov 0xf, %o0 100112c70: 40 08 1d 50 call 0x10031a1b0; LWLockAcquire 100112c74: 92 10 20 00 clr %o1 100112c78: 80 a7 20 00 cmp %i4, 0 100112c7c: 04 40 00 7c ble,pn %icc, 0x100112e6c 100112c80: 90 07 3f ff add %i4, -1, %o0 100112c84: e003 a8 af ld [ %sp + 0x8af ], %l0 100112c88: 37 17 f0 fa sethi %hi(0x5fc3e800), %i3 100112c8c: 86 10 3fff mov -1, %g3 100112c90: ba 1e fe 89 xor %i3, -375, %i5 100112c94: d0 23 a8 b7 st %o0, [ %sp +0x8b7 ] 100112c98: a2 10 26 64 mov 0x664, %l1 100112c9c: c0 23 a8 b3 clr [ %sp + 0x8b3 ] 100112ca0: 37 10 1e 0b sethi %hi(0x40782c00), %i3 100112ca4: b9 3c 20 00 sra %l0, 0, %i4 100112ca8: ac 4f 00 1d mulx %i4, %i5, %l6 100112cac: 95 35 b0 20 srlx %l6, 0x20, %o2 100112cb0: ba 10 26 63 mov 0x663,%i5 100112cb4: 98 04 00 0a add %l0, %o2, %o4 100112cb8: b8 10 20 01 mov 1, %i4 100112cbc: 933b 20 0a sra %o4, 0xa, %o1 100112cc0: 97 3c 20 1f sra %l0, 0x1f, %o3 100112cc4: ac 22 40 0b sub %o1, %o3, %l6 100112cc8: 89 2d a0 04 sll %l6, 4, %g4 100112ccc: 82 01 00 16 add %g4, %l6, %g1 100112cd0: b1 28 60 02 sll %g1, 2, %i0 100112cd4: 84 26 00 01 sub %i0, %g1, %g2 100112cd8: f0 03 a8 af ld [ %sp + 0x8af ], %i0 100112cdc: 9f 28 a0 03 sll %g2, 3, %o7 100112ce0: 90 05 80 0f add %l6, %o7,%o0 100112ce4: a7 2a 20 02 sll %o0, 2, %l3 100112ce8: a6 24 00 13 sub %l0, %l3, %l3 100112cec: 98 06 e1 11 add %i3, 0x111, %o4 100112cf0: 0b 14 07 82 sethi %hi(0x501e0800), %g5 100112cf4: 89 36 20 02 srl %i0, 2, %g4 100112cf8: ae 01 63 45 add %g5, 0x345, %l7 100112cfc: 97 33 20 00 srl %o4, 0, %o3 100112d00: 82 49 00 0b mulx %g4, %o3, %g1 100112d04: 9a 49 00 17 mulx %g4, %l7, %o5 100112d08: 9130 70 20 srlx %g1, 0x20, %o0 100112d0c: 9e 21 00 08 sub %g4, %o0, %o7 100112d10: 8b 33 e0 01 srl %o7, 1, %g5 100112d14: ae 02 00 05 add %o0, %g5, %l7 100112d18: 95 33 70 20 srlx %o5, 0x20, %o2 100112d1c: 9b 35 e0 04 srl %l7, 4, %o5 100112d20: 85 32 a0 07 srl %o2, 7, %g2 100112d24: 94 0b 7f f0 and %o5, -16, %o2 100112d28: c4 23 a8 bf st %g2, [ %sp + 0x8bf ] 100112d2c: 9f 35 e0 08 srl %l7,8, %o7 100112d30: 80 a0 80 03 cmp %g2, %g3 100112d34: 96 02 80 0f add %o2, %o7, %o3 100112d38: 99 2a e0 02 sll %o3, 2, %o4 100112d3c: 92 23 00 0b sub %o4, %o3, %o1 100112d40: 96 0e 20 03 and %i0,3, %o3 100112d44: 83 2a 60 03 sll %o1, 3, %g1 100112d48: 90 03 c0 01 add %o7, %g1, %o0 100112d4c: 8a 21 00 08 sub %g4, %o0, %g5 100112d50: 89 29 60 02 sll %g5, 2, %g4 100112d54: ae 01 00 05 add %g4, %g5, %l7 100112d58: af 2d e0 02 sll %l7, 2, %l7 100112d5c: b1 32 e0 00 srl %o3, 0, %i0 100112d60: 94 05 e0 04 add %l7, 4, %o2 100112d64: 9b 2e 20 02 sll %i0, 2, %o5 100112d68: 98 02 800d add %o2, %o5, %o4 100112d6c: 89 2a e0 03 sll %o3, 3, %g4 100112d70: c8 23 a8 bb st %g4, [ %sp+ 0x8bb ] 100112d74: b1 3b 20 00 sra %o4, 0, %i0 100112d78: 02 40 00 0a be,pn %icc, 0x100112da0 100112d7c: af 3d e0 00 sra %l7, 0, %l7 100112d80: 90 06 bf a8 add %i2, -88, %o0 100112d84: 93 3d a0 00 sra %l6, 0, %o1 100112d88: 97 34 a0 00 srl %l2, 0, %o3 100112d8c: 7f ff f6 65 call 0x100110720 ;SimpleLruReadPage 100112d90: 94 10 20 01 mov 1, %o2 100112d94: ab 3a 20 00 sra %o0, 0, %l5 100112d98: c6 03 a8 bf ld [ %sp + 0x8bf ], %g3 100112d9c: a9 2d 70 03 sllx %l5, 3, %l4 100112da0: d25e bf a8 ldx [ %i2 + -88 ], %o1 100112da4: 89 3e 20 00 sra %i0, 0, %g4 100112da8: 98 10 20 ff mov 0xff, %o4 100112dac: f0 06 60 00 ld [ %i1 ], %i0 100112db0: a6 04 e0 01 inc %l3 100112db4: ca 5a60 08 ldx [ %o1 + 8 ], %g5 100112db8: c1 6e 61 00 prefetch [ %i1 + 0x100 ], #n_reads 100112dbc: c4 5940 14 ldx [ %g5 + %l4 ], %g2 100112dc0: f0 20 80 04 st %i0, [ %g2 + %g4 ] 100112dc4: de 5e bf a8 ldx [ %i2 + -88 ], %o7 100112dc8: ca 06 60 04 ld [ %i1 + 4 ], %g5 100112dcc: b2 06 60 08 add %i1, 8, %i1 100112dd0: d0 5b e0 08 ldx [ %o7 + 8 ], %o0 100112dd4: de 03 a8 bb ld [ %sp + 0x8bb ], %o7 100112dd8: da 5a 00 14 ldx [ %o0 + %l4 ], %o5 100112ddc: d2 03 40 17 ld [ %o5 + %l7 ], %o1 100112de0: 97 2b 00 0f sll %o4, %o7, %o3 100112de4: 94 3a e0 00 xnor %o3, 0, %o2 100112de8: b0 0a 40 0a and %o1, %o2, %i0 100112dec: 89 29 40 0f sll %g5, %o7, %g4 100112df0: 82 16 00 04 or %i0, %g4, %g1 100112df4: c2 23 40 17 st %g1, [ %o5 + %l7 ] 100112df8: de 5e bf a8 ldx [ %i2 + -88 ], %o7 100112dfc: d0 5b e0 18 ldx [ %o7 + 0x18 ], %o0 100112e00: f8 2a 00 15 stb %i4, [ %o0 + %l5 ] 100112e04: d8 03 a8b3 ld [ %sp + 0x8b3 ], %o4 100112e08: ee 03 a8 af ld [ %sp + 0x8af ], %l7 100112e0c: 9a 05 e0 01 add %l7, 1, %o5 100112e10: da 23 a8 af st %o5, [ %sp + 0x8af ] 100112e14: 94 03 20 01 add %o4, 1, %o2 100112e18: d4 23 a8 b3 st %o2, [ %sp + 0x8b3 ] 100112e1c: ae 02 80 10 add %o2, %l0, %l7 100112e20: 9b 35 e0 1f srl %l7, 0x1f, %o5 100112e24: 98 1b 60 01 xor %o5, 1, %o4 100112e28: 96 20 000c neg %o4, %o3 100112e2c: 92 0a c0 1d and %o3, %i5, %o1 100112e30: 8a 02 60 01 add %o1, 1, %g5 100112e34: 88 1c c0 05 xor %l3, %g5, %g4 100112e38: b1 31 20 00 srl %g4, 0, %i0 100112e3c: 82 2000 18 neg %i0, %g1 100112e40: d6 03 a8 b7 ld [ %sp + 0x8b7 ], %o3 100112e44: 85 30 70 3f srlx %g1,0x3f, %g2 100112e48: 9e 18 a0 01 xor %g2, 1, %o7 100112e4c: ae 20 00 0f neg %o7, %l7 100112e50: 80 a2 80 0b cmp %o2, %o3 100112e54: 90 0d e0 01 and %l7, 1, %o0 100112e58: 9a 0d c0 11 and %l7, %l1,%o5 100112e5c: ac 05 80 08 add %l6, %o0, %l6 100112e60: a6 24 c0 0d sub %l3, %o5, %l3 100112e64: 24 4f ff a2 ble,a %icc, 0x100112cec 100112e68: f0 03 a8 af ld [ %sp + 0x8af ], %i0 100112e6c: b0 1020 0f mov 0xf, %i0 100112e70: 40 08 1e aa call 0x10031a918 ; LWLockRelease 100112e74: 81 e8 00 00 restore
Re: Andreas Seltenreich 2015-09-13 <87si6i1875.fsf@credativ.de> > I managed disassemble RecordNewMultiXact from the core dump using a > cross-binutils, and it reveals that the compiler[1] appears to have > indeed generated a signed division here. I'm attaching a piece of C > code that does the same computation as the assembly (I think), as well > as the disassembly itself. > > Footnotes: > [1] Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18, 64-bit Hi, a short update here: the customer updated the compiler to a newer version, is now compiling using -O2 instead of -O3, and the code generated now looks sane, so this turned out to be a compiler issue. (Though it's unclear if the upgrade fixed it, or the different -O level.) Thanks to all who provided feedback, it was very valuable in actually tracking down the root of the issue. Christoph
On 9/20/15 9:23 AM, Christoph Berg wrote: > a short update here: the customer updated the compiler to a newer > version, is now compiling using -O2 instead of -O3, and the code > generated now looks sane, so this turned out to be a compiler issue. > (Though it's unclear if the upgrade fixed it, or the different -O > level.) Do we officially not support anything > -O2? If so it'd be nice if configure threw at least a warning (if not an error that you had to explicitly over-ride). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby wrote: > On 9/20/15 9:23 AM, Christoph Berg wrote: > >a short update here: the customer updated the compiler to a newer > >version, is now compiling using -O2 instead of -O3, and the code > >generated now looks sane, so this turned out to be a compiler issue. > >(Though it's unclear if the upgrade fixed it, or the different -O > >level.) > > Do we officially not support anything > -O2? If so it'd be nice if configure > threw at least a warning (if not an error that you had to explicitly > over-ride). Keep in mind this is Sun OS C -- not one of the most popular compilers in the world. I don't know what you suggest: have a test program that configure runs and detects whether the compiler does the wrong thing? It doesn't seem a sane idea to maintain test cases for all known compiler bugs ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
[ adding Bjorn Munch to Cc ] Jim Nasby writes: > On 9/20/15 9:23 AM, Christoph Berg wrote: >> a short update here: the customer updated the compiler to a newer >> version, is now compiling using -O2 instead of -O3, and the code >> generated now looks sane, so this turned out to be a compiler issue. >> (Though it's unclear if the upgrade fixed it, or the different -O >> level.) > > Do we officially not support anything > -O2? If so it'd be nice if > configure threw at least a warning (if not an error that you had to > explicitly over-ride). At least the solaris binaries distributed via postgresql.org[1] have been compiled with -xO3 according to pg_config. And their code for multixact.c looks inconspicuous. To recap the data points: | compiler | flags | multixact.o | |---------------------------------------------------+-------+-------------| | Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18 | -xO3 | bad | | Sun C 5.13 SunOS_Sparc 2014/10/20 | -xO2 | good | | Sun C 5.8 Patch 121015-04 2007/01/10 | -xO3 | good | regards, Andreas Footnotes: [1] http://www.postgresql.org/download/solaris/
Alvaro Herrera writes: > Jim Nasby wrote: >> Do we officially not support anything > -O2? If so it'd be nice if configure >> threw at least a warning (if not an error that you had to explicitly >> over-ride). > > Keep in mind this is Sun OS C -- not one of the most popular compilers > in the world. I don't know what you suggest: have a test program that > configure runs and detects whether the compiler does the wrong thing? > It doesn't seem a sane idea to maintain test cases for all known > compiler bugs ... I think the intention was to make configure complain if there's a -O > 2 in CFLAGS. OTOH, a unit test for multixact.c that exercises the code including wraparounds sounds like a desirable thing regardless of the fact that it could have caught this miscompilation earlier than 6 months into production. regards, Andreas
On 25/09 09.37, Andreas Seltenreich wrote: > [ adding Bjorn Munch to Cc ] Oh. I am on the -hackers list but usually just scan for any subject mentioning Solaris and this one did not. :-) > Jim Nasby writes: > > On 9/20/15 9:23 AM, Christoph Berg wrote: > >> a short update here: the customer updated the compiler to a newer > >> version, is now compiling using -O2 instead of -O3, and the code > >> generated now looks sane, so this turned out to be a compiler issue. > >> (Though it's unclear if the upgrade fixed it, or the different -O > >> level.) > > > > Do we officially not support anything > -O2? If so it'd be nice if > > configure threw at least a warning (if not an error that you had to > > explicitly over-ride). > > At least the solaris binaries distributed via postgresql.org[1] have > been compiled with -xO3 according to pg_config. And their code for > multixact.c looks inconspicuous. To recap the data points: > > | compiler | flags | multixact.o | > |---------------------------------------------------+-------+-------------| > | Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18 | -xO3 | bad | > | Sun C 5.13 SunOS_Sparc 2014/10/20 | -xO2 | good | > | Sun C 5.8 Patch 121015-04 2007/01/10 | -xO3 | good | All the binaries I have compiled for distribution via postgresql.org have been built with Sun Studio 11, aka Sun C 5.8 as listed on the bottom here. Except the 9.5 Alpha where I finally had to upgrade to version 12.1 (aka Sun C 5.10). I am also pretty sure they have always been compiled with -O3. At least that's what the build script is set to and I don't think that has even been changed. - Bjorn Munch
On Fri, Sep 25, 2015 at 3:41 AM, Andreas Seltenreich <andreas.seltenreich@credativ.de> wrote: > I think the intention was to make configure complain if there's a -O > 2 > in CFLAGS. -1 on that idea. I really don't think that we should categorically decide we don't support higher optimization levels. If the compiler has a bug, then the compiler manufacturer should fix it, and it's not our fault. If the compiler doesn't have a bug and our stuff is blowing up, then we have a bug and should fix it. I suppose there could be some grey area but hopefully not too much. > OTOH, a unit test for multixact.c that exercises the code including > wraparounds sounds like a desirable thing regardless of the fact that it > could have caught this miscompilation earlier than 6 months into > production. Definitely. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 09/28/2015 08:10 AM, Robert Haas wrote: > -1 on that idea. I really don't think that we should categorically > decide we don't support higher optimization levels. If the compiler > has a bug, then the compiler manufacturer should fix it, and it's not > our fault. If the compiler doesn't have a bug and our stuff is > blowing up, then we have a bug and should fix it. I suppose there > could be some grey area but hopefully not too much. Or it's PILBChAK. I know Sun-CC used to warn that -O3 was unsuitable for most programs because it could change behavior. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Mon, Sep 28, 2015 at 2:34 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 09/28/2015 08:10 AM, Robert Haas wrote: >> -1 on that idea. I really don't think that we should categorically >> decide we don't support higher optimization levels. If the compiler >> has a bug, then the compiler manufacturer should fix it, and it's not >> our fault. If the compiler doesn't have a bug and our stuff is >> blowing up, then we have a bug and should fix it. I suppose there >> could be some grey area but hopefully not too much. > > Or it's PILBChAK. I know Sun-CC used to warn that -O3 was unsuitable for > most programs because it could change behavior. I'm attempting to decipher this acronym. Problem is located between chair and keyboard? But I don't see how that applies here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Sep 28, 2015 at 11:10:52AM -0400, Robert Haas wrote: > On Fri, Sep 25, 2015 at 3:41 AM, Andreas Seltenreich <andreas.seltenreich@credativ.de> wrote: > > OTOH, a unit test for multixact.c that exercises the code including > > wraparounds sounds like a desirable thing regardless of the fact that it > > could have caught this miscompilation earlier than 6 months into > > production. > > Definitely. +1. Catching compiler bugs is an important role of "make check". That suite has a notable gap if it passes despite a compiler bug ruining the binary.