Thread: Bus error in pg_logical_slot_get_changes (9.4.7, sparc)
Hi, I've just uploaded 9.4.7 for Debian's wheezy-backports suite. Unfortunately it's now crashing twice in the contrib/test_decoding regression tests on sparc. I could reproduce the error on the porter box: LOG: server process (PID 22534) was terminated by signal 10: Bus error DETAIL: Failed process was running: SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1') WHERE data ~ 'UPDATE'; LOG: server process (PID 22558) was terminated by signal 10: Bus error DETAIL: Failed process was running: SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); Reading symbols from /home/myon/postgresql-9.4-9.4.7/build/src/backend/postgres...done. [New LWP 22558] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/sparc-linux-gnu/libthread_db.so.1". Core was generated by `postgres: myon regression [local] SELECT '. Program terminated with signal 10, Bus error. #0 ReorderBufferRestoreChange (data=0x5de0f7 "", txn=0x622660, rb=0x6225d8) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/reorderbuffer.c:2327 2327 Size tuplelen = ((HeapTuple) data)->t_len; (gdb) l 2322 data += tuplelen; 2323 } 2324 2325 if (change->data.tp.newtuple) 2326 { 2327 Size tuplelen = ((HeapTuple) data)->t_len; 2328 2329 change->data.tp.newtuple = 2330 ReorderBufferGetTupleBuf(rb, tuplelen - offsetof(HeapTupleHeaderData, t_bits)); 2331 (gdb) bt full #0 ReorderBufferRestoreChange (data=0x5de0f7 "", txn=0x622660, rb=0x6225d8) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/reorderbuffer.c:2327 tuplelen = <optimized out> ondisk = <optimized out> change = 0x11ea888 #1 ReorderBufferRestoreChanges (rb=0x6225d8, txn=0x622660, fd=0x79860910, segno=0x79860918) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/reorderbuffer.c:2269 ondisk = <optimized out> restored = <optimized out> last_segno = 1 cleanup_iter = {cur = <optimized out>, next = <optimized out>, end = 0x6226b0} __func__ = "ReorderBufferRestoreChanges" #2 0x00261790 in ReorderBufferIterTXNNext (state=0x798608f0, rb=0x6225d8) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/reorderbuffer.c:981 change = 0x7985f858 entry = 0x79860900 off = 0 #3 ReorderBufferCommit (rb=0x6225d8, xid=<optimized out>, commit_lsn=27808752, end_lsn=<optimized out>, commit_time=513854161898478) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/reorderbuffer.c:1355 change = 0x7985f858 save_exception_stack = 0xffbc6538 save_context_stack = 0x0 local_sigjmp_buf = {{__jmpbuf = {-1279391982, -1279392038, 1277457654}, __mask_was_saved = 0, __saved_mask = {__val = {0, 4290536232, 2500668, 1296, 6195616, 4290536240, 3926116, 4290536336, 1, 5595136, 5598208, 0, 0, 1, 26206208, 0, 0, 0, 0, 0, 0, 1296, 1295, 1, 15213480, 0, 1, 1296, 0, 0, 1296, 0}}}} txn = 0x622660 snapshot_now = 0x5ddfa0 command_id = 1 iterstate = 0x798608f0 __func__ = "ReorderBufferCommit" #4 0x0025c9bc in DecodeXactOp (buf=0xffbc6460, ctx=0x6145c0) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/decode.c:226 xlrec = <optimized out> subxacts = <optimized out> invals = <optimized out> builder = <optimized out> reorder = <optimized out> info = <optimized out> r = 0xffbc6470 #5 LogicalDecodingProcessRecord (ctx=0x6145c0, record=0x6179a0) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/decode.c:111 buf = {origptr = 27808752, endptr = 27809568, record = {xl_tot_len = 812, xl_xid = 1296, xl_len = 780, xl_info = 0 '\000', xl_rmid = 1 '\001', xl_prev = 27808688, xl_crc = 1979174325}, record_data = 0x6179c0 ""} __func__ = "LogicalDecodingProcessRecord" #6 0x0025e69c in pg_logical_slot_get_changes_guts (fcinfo=0xffbc6694, confirm=<optimized out>, binary=<optimized out>) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/logicalfuncs.c:440 record = <optimized out> errm = 0x0 save_exception_stack = 0xffbc6b28 save_context_stack = 0x0 local_sigjmp_buf = {{__jmpbuf = {-1279393398, -1279393542, 1277657190}, __mask_was_saved = 0, __saved_mask = {__val = {4290536824, 1264536, 1314, 5446656, 5447680, 0, 5447680, 0, 0, 0, 0, 1, 25, 0, 0, 0, 0, 1, 3782, 1, 4290537052, 1, 0, 4294967295, 1, 0, 4290536944, 2309772, 0, 0, 0, 4290537540}}}} name = 0x0 upto_lsn = 0 upto_nchanges = 0 rsinfo = <optimized out> per_query_ctx = <optimized out> oldcontext = 0x5e8320 end_of_wal = 28052656 startptr = 0 ctx = 0x6145c0 old_resowner = 0x5e7df8 arr = <optimized out> ndim = <optimized out> options = 0x60dfd0 p = 0x60e598 __func__ = "pg_logical_slot_get_changes_guts" #7 0x001ab830 in ExecMakeTableFunctionResult (funcexpr=0x60ccd8, econtext=0x60c6a0, argContext=<optimized out>, expectedDesc=0x60d548, randomAccess=0 '\000') at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/executor/execQual.c:2196 result = 4290537288 tupstore = 0x0 tupdesc = 0x0 funcrettype = 2249 returnsTuple = <optimized out> returnsSet = 1 '\001' fcinfo = {flinfo = 0x60cce8, context = 0x0, resultinfo = 0xffbc6640, fncollation = 100, isnull = 0 '\000', nargs = 4, arg = {5960656, 0, 0, 5959720, 2, 5734184, 31808, 15213480, 4290537200, 2212260, 5732864, 5595136, 0, 0, 1072693248, 0, 5959856, 5598208, 0, 5959856, 5961824, 2, 700, 1884286976, 5732864, 5732032, 5733440, 0, 4294836227, 0, 0, 15213480, 184, 131328, 33554432, 0, 5638448, 0, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15213480, 0, 0, 0, 0, 0, 0, 0, 0, 4294836227, 0, 0, 3314144, 184, 131328, 33554432, 0, 5638448, 0, 25, 0, 0, 0, 0, 15213480, 0 <repeats 26 times>}, argnull = "\000\001\001\000\000\000\000\000\000\000\000\000\000\062\221\340\000\000\000\270\000\002\001\000\002\000\000\000\000\000\000\000\000V\t0\000\000\000\000\000\000\000\031", '\000'<repeats 55 times>} fcusage = {fs = 0x0, save_f_total_time = {tv_sec = 5891064, tv_usec = 1}, save_total = {tv_sec = 4039680, tv_usec = 1}, f_start = {tv_sec = 0, tv_usec = 1072693248}} rsinfo = {type = T_ReturnSetInfo, econtext = 0x60c6a0, expectedDesc = 0x60d548, allowedModes = 11, returnMode = SFRM_Materialize, isDone = ExprSingleResult, setResult = 0x60e068, setDesc = 0x60e768} tmptup = {t_len = 0, t_self = {ip_blkid = {bi_hi = 61, bi_lo = 47104}, ip_posid = 89}, t_tableOid = 0, t_data = 0x79820c1c} callerContext = 0x5e8298 direct_function_call = <optimized out> first_time = 1 '\001' __func__ = "ExecMakeTableFunctionResult" #8 0x001bdf90 in FunctionNext (node=0x60c618) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/executor/nodeFunctionscan.c:94 tstore = 0x0 estate = <optimized out> direction = ForwardScanDirection scanslot = 0x60c780 alldone = <optimized out> oldpos = <optimized out> funcno = <optimized out> att = <optimized out> #9 0x001ad7f0 in ExecScanFetch (recheckMtd=0x1bdca0 <FunctionRecheck>, accessMtd=0x1bdcc0 <FunctionNext>, node=0x60c618) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/executor/execScan.c:82 estate = <optimized out> #10 ExecScan (node=0x60c618, accessMtd=0x1bdcc0 <FunctionNext>, recheckMtd=0x1bdca0 <FunctionRecheck>) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/executor/execScan.c:167 slot = 0x60c618 econtext = 0x60c6a0 qual = 0x0 projInfo = 0x60daa8 isDone = 5732760 resultSlot = <optimized out> #11 0x001a6790 in ExecProcNode (node=0x60c618) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/executor/execProcnode.c:426 result = <optimized out> __func__ = "ExecProcNode" #12 0x001a35dc in ExecutePlan (dest=0x578840, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, planstate=0x60c618, estate=0x60c590) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/executor/execMain.c:1490 slot = <optimized out> current_tuple_count = 0 #13 standard_ExecutorRun (queryDesc=0x5face0, direction=<optimized out>, count=0) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/executor/execMain.c:319 estate = 0x60c590 operation = CMD_SELECT dest = 0x578840 sendTuples = <optimized out> oldcontext = 0x5e8ad8 #14 0x002ab0d4 in PortalRunSelect (portal=0x5ebb60, forward=1 '\001', count=0, dest=0x578840) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/tcop/pquery.c:942 queryDesc = 0x5face0 direction = ForwardScanDirection nprocessed = <optimized out> __func__ = "PortalRunSelect" #15 0x002ac45c in PortalRun (portal=0x5ebb60, count=2147483647, isTopLevel=1 '\001', dest=0x578840, altdest=0x578840, completionTag=0xffbc6d20 "") at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/tcop/pquery.c:786 save_exception_stack = 0xffbc6c64 save_context_stack = 0x0 local_sigjmp_buf = {{__jmpbuf = {-1279389822, -1279390062, 1277732494}, __mask_was_saved = 0, __saved_mask = {__val = {4290538424, 5639264, 16777218, 0, 2, 1314, 0, 1276, 5595424, 5598208, 5444608, 5595136, 5381120, 161, 5640592, 0, 5732840, 0, 0, 0, 0, 5732824, 4290538424, 2786944, 102, 6192632, 2, 1, 0, 1, 1024, 0}}}} result = <optimized out> nprocessed = <optimized out> saveTopTransactionResourceOwner = 0x560c60 saveTopTransactionContext = 0x560bd8 saveActivePortal = 0x0 saveResourceOwner = 0x560c60 savePortalContext = 0x0 saveMemoryContext = 0x560bd8 __func__ = "PortalRun" #16 0x002a8558 in exec_simple_query ( query_string=0x5ae0d0 "SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids','0', 'skip-empty-xacts', '1');") at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/tcop/postgres.c:1072 parsetree = 0x5aef78 portal = 0x5ebb60 snapshot_set = <optimized out> commandTag = <optimized out> completionTag = "\000OPY 203\000LE\000\000CE", '\000' <repeats 32 times>, "\024\000\000\000\024\377\377\377\377\000S\024\204\000S\023\204" plantree_list = 0x578828 receiver = 0x578840 format = 0 dest = DestRemote parsetree_list = 0x5af010 save_log_statement_stats = 0 '\000' was_logged = 0 '\000' msec_str = "\000\274m\210\000\005\365D\000\000\000\000\000W\335\f0\n\000\034\000\000\000 \000\000\000\000\000\000\000\002" parsetree_item = 0x5af000 isTopLevel = <optimized out> #17 PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=0x55fb00 "regression", username=<optimized out>) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/tcop/postgres.c:4079 query_string = 0x5ae0d0 "SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL,'include-xids', '0', 'skip-empty-xacts', '1');" firstchar = 5959544 input_message = { data = 0x5ae0d0 "SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids','0', 'skip-empty-xacts', '1');", len = 139, maxlen = 1024, cursor = 139} local_sigjmp_buf = {{__jmpbuf = {-1279390062, -1279391582, 1277696902}, __mask_was_saved = 1, __saved_mask = {__val = {0, 0, 0, 0, 1882843284, 1879202304, 1881990312, 0, 1, 0, 0, 0, 1881277836, 4, 0, 0, 0, 0, 0, 0, 1886277632, 1886289096, 1886289096, 5758488, 0, 0, 0, 4290538792, 1885242328, 4290538768, 2359384, 0}}}} send_ready_for_query = 0 '\000' __func__ = "PostgresMain" #18 0x0005f724 in BackendRun (port=0x57dc80) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/postmaster/postmaster.c:4285 ac = 1 secs = 513854165 usecs = 348281 i = 1 av = 0x55fc70 maxac = <optimized out> #19 BackendStartup (port=0x57dc80) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/postmaster/postmaster.c:3948 bn = 0x10b3 pid = 0 #20 ServerLoop () at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/postmaster/postmaster.c:1679 port = 0x57dc80 rmask = {fds_bits = {16, 0 <repeats 31 times>}} selres = <optimized out> now = <optimized out> readmask = {fds_bits = {16, 0 <repeats 31 times>}} nSockets = 5 last_lockfile_recheck_time = 1460538957 last_touch_time = 1460538957 __func__ = "ServerLoop" #21 0x0024c04c in PostmasterMain (argc=8, argv=<optimized out>) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/postmaster/postmaster.c:1287 opt = <optimized out> status = <optimized out> userDoption = <optimized out> listen_addr_saved = <optimized out> i = <optimized out> output_config_variable = <optimized out> __func__ = "PostmasterMain" #22 0x00060750 in main (argc=8, argv=0x55ef80) at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/main/main.c:233 No locals. Original build log (but doesn't have much useful content): https://buildd.debian.org/status/fetch.php?pkg=postgresql-9.4&arch=sparc&ver=9.4.7-0%2Bdeb8u1~bpo70%2B1&stamp=1460482241 Mit freundlichen Grüßen, Christoph Berg -- Senior Berater, Tel.: +49 (0)21 61 / 46 43-187 credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209 Hohenzollernstr. 133, 41061 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer pgp fingerprint: 5C48 FE61 57F4 9179 5970 87C6 4C5A 6BAB 12D2 A7AE
Hi, On 2016-04-13 11:41:17 +0200, Christoph Berg wrote: > I've just uploaded 9.4.7 for Debian's wheezy-backports suite. > Unfortunately it's now crashing twice in the contrib/test_decoding > regression tests on sparc. I could reproduce the error on the porter > box: > > LOG: server process (PID 22534) was terminated by signal 10: Bus error > DETAIL: Failed process was running: SELECT data > FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1') > WHERE data ~ 'UPDATE'; > > LOG: server process (PID 22558) was terminated by signal 10: Bus error > DETAIL: Failed process was running: SELECT substr(data, 1, 200) FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > > > Reading symbols from /home/myon/postgresql-9.4-9.4.7/build/src/backend/postgres...done. > [New LWP 22558] > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/sparc-linux-gnu/libthread_db.so.1". > Core was generated by `postgres: myon regression [local] SELECT '. > Program terminated with signal 10, Bus error. > #0 ReorderBufferRestoreChange (data=0x5de0f7 "", txn=0x622660, rb=0x6225d8) > at /home/myon/postgresql-9.4-9.4.7/build/../src/backend/replication/logical/reorderbuffer.c:2327 > 2327 Size tuplelen = ((HeapTuple) data)->t_len; > (gdb) l > 2322 data += tuplelen; > 2323 } > 2324 > 2325 if (change->data.tp.newtuple) > 2326 { > 2327 Size tuplelen = ((HeapTuple) data)->t_len; Hrmpf, I guess this is an alignment issue. We should probably replace that ->t_len lookup with a memcpy. IIRC it's not entirely trivial to access porter boxes anymore - could you perhaps just replace those two t_len lookups with a memcpy (i.e. memcpy(&tuplelen, &(HeapTuple) data)->t_len, sizeof(uint32))? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2016-04-13 11:41:17 +0200, Christoph Berg wrote: >> I've just uploaded 9.4.7 for Debian's wheezy-backports suite. >> Unfortunately it's now crashing twice in the contrib/test_decoding >> regression tests on sparc. I could reproduce the error on the porter >> box: > Hrmpf, I guess this is an alignment issue. Maybe, but then why didn't the buildfarm detect it? We certainly have several critters that would be sticky about this. regards, tom lane
On 2016-04-13 13:54:10 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2016-04-13 11:41:17 +0200, Christoph Berg wrote: > >> I've just uploaded 9.4.7 for Debian's wheezy-backports suite. > >> Unfortunately it's now crashing twice in the contrib/test_decoding > >> regression tests on sparc. I could reproduce the error on the porter > >> box: > > > Hrmpf, I guess this is an alignment issue. > > Maybe, but then why didn't the buildfarm detect it? We certainly > have several critters that would be sticky about this. I'm wondering the same. My theory is that it might only actually matter when built optimized; which few animals do. The instruction selection definitely can influence whether an alignment issue becomes aparent or not. Unless that idea pans out I guess I'll have to go through the motions to get access to that porter box. I had that at some point in the past (debugging something turning out to be a compiler bug). Andres
Re: Andres Freund 2016-04-13 <20160413161906.aykti6iwdc7zv7l6@alap3.anarazel.de> > > 2327 Size tuplelen = ((HeapTuple) data)->t_len; > > Hrmpf, I guess this is an alignment issue. We should probably replace > that ->t_len lookup with a memcpy. IIRC it's not entirely trivial to > access porter boxes anymore - could you perhaps just replace those two > t_len lookups with a memcpy (i.e. memcpy(&tuplelen, &(HeapTuple) > data)->t_len, sizeof(uint32))? That fixed the contrib/test_decoding checks, thanks. Will do another full build over night to double check. Mit freundlichen Grüßen, Christoph Berg -- Senior Berater, Tel.: +49 (0)21 61 / 46 43-187 credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209 Hohenzollernstr. 133, 41061 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer pgp fingerprint: 5C48 FE61 57F4 9179 5970 87C6 4C5A 6BAB 12D2 A7AE
Attachment
Re: Andres Freund 2016-04-13 <20160413175827.dmlbtdf7c3mgmnex@alap3.anarazel.de> > Unless that idea pans out I guess I'll have to go through the > motions to get access to that porter box. I had that at some point in > the past (debugging something turning out to be a compiler bug). Your account is still there, but these guest accounts are always only unlocked for a limited time, so it's not functional now but could be re-enabled easily. But ... Re: To Andres Freund 2016-04-13 <20160413211152.GA21711@msg.credativ.de> > That fixed the contrib/test_decoding checks, thanks. > Will do another full build over night to double check. ... the patch worked indeed, thanks! Also, the sparc architecture will be officially dead at the end of the month because wheezy is EOL and wheezy-lts + wheezy-backports will be amd64+i386 only. (We'll still have sparc64, but it's not a release architecture. "sparc" is not even on debian-ports.org...) Christoph
Re: To Andres Freund 2016-04-14 <20160414092919.GA11804@msg.df7cb.de> > > That fixed the contrib/test_decoding checks, thanks. > > Will do another full build over night to double check. > > ... the patch worked indeed, thanks! Duh, the patch does work on sparc, but it breaks amd64: *** /home/cbe/projects/postgresql/debian/9.4/build/../contrib/test_decoding/expected/ddl.out 2016-03-28 22:09:55.000000000+0200 --- /home/cbe/projects/postgresql/debian/9.4/build/contrib/test_decoding/./regression_output/results/ddl.out 2016-04-1413:12:45.310632488 +0200 *************** *** 189,201 **** FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1') GROUP BY substring(data, 1, 24) ORDER BY 1,2; ! count | min | max ! -------+-------------------------------------------------+------------------------------------------------------------------------ ! 1 | BEGIN | BEGIN ! 1 | COMMIT | COMMIT ! 20467 | table public.tr_etoomuch: DELETE: id[integer]:1 | table public.tr_etoomuch: UPDATE: id[integer]:9999 data[integer]:-9999 ! (3 rows) ! -- check updates of primary keys work correctly BEGIN; CREATE TABLE spoolme AS SELECT g.i FROM generate_series(1, 5000) g(i); --- 189,195 ---- FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1') GROUP BY substring(data, 1, 24) ORDER BY 1,2; ! ERROR: invalid memory alloc request size 94596654694480 -- check updates of primary keys work correctly BEGIN; CREATE TABLE spoolme AS SELECT g.i FROM generate_series(1, 5000) g(i); Christoph
Christoph Berg <myon@debian.org> writes: >> ... the patch worked indeed, thanks! > Duh, the patch does work on sparc, but it breaks amd64: Well, yeah, because if Size is wider than int32 then that memcpy isn't enough to initialize it. But looking around a bit, I think that ReorderBufferRestoreChange is almost completely broken on alignment-picky machines, if the expectation is that the input data could be aligned arbitrarily. Aside from this particular problem, it contains multiple places where we cast "data" to something other than char*, and that is sufficient to allow a spec-compliant compiler to decide that "data" is aligned on more than a char boundary and hence generate code that depends on such alignment. I also noticed that the INTERNAL_SNAPSHOT case doesn't bother to advance "data" past the data read, where all the other switch cases do. In the attached, I made it do likewise, but it looks like you could equally easily just drop the final update of "data" in each switch case, because nothing is looking at it after the switch. Proposed patch attached, but I'm still wondering why the alignment-picky buildfarm members aren't all failing on this. It seems to strongly suggest that the regression tests' coverage is lacking here. regards, tom lane diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 52c6986..b6f83f8 100644 *** a/src/backend/replication/logical/reorderbuffer.c --- b/src/backend/replication/logical/reorderbuffer.c *************** static void *** 2449,2463 **** ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn, char *data) { - ReorderBufferDiskChange *ondisk; ReorderBufferChange *change; - ondisk = (ReorderBufferDiskChange *) data; - change = ReorderBufferGetChange(rb); /* copy static part */ ! memcpy(change, &ondisk->change, sizeof(ReorderBufferChange)); data += sizeof(ReorderBufferDiskChange); --- 2449,2461 ---- ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn, char *data) { ReorderBufferChange *change; change = ReorderBufferGetChange(rb); /* copy static part */ ! memcpy(change, data + offsetof(ReorderBufferDiskChange, change), ! sizeof(ReorderBufferChange)); data += sizeof(ReorderBufferDiskChange); *************** ReorderBufferRestoreChange(ReorderBuffer *** 2471,2477 **** case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT: if (change->data.tp.oldtuple) { ! Size tuplelen = ((HeapTuple) data)->t_len; change->data.tp.oldtuple = ReorderBufferGetTupleBuf(rb, tuplelen - SizeofHeapTupleHeader); --- 2469,2479 ---- case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT: if (change->data.tp.oldtuple) { ! uint32 tuplelen; ! ! /* Must get tuplelen the hard way in case it's misaligned */ ! memcpy(&tuplelen, data + offsetof(HeapTupleData, t_len), ! sizeof(uint32)); change->data.tp.oldtuple = ReorderBufferGetTupleBuf(rb, tuplelen - SizeofHeapTupleHeader); *************** ReorderBufferRestoreChange(ReorderBuffer *** 2492,2498 **** if (change->data.tp.newtuple) { ! Size tuplelen = ((HeapTuple) data)->t_len; change->data.tp.newtuple = ReorderBufferGetTupleBuf(rb, tuplelen - SizeofHeapTupleHeader); --- 2494,2504 ---- if (change->data.tp.newtuple) { ! uint32 tuplelen; ! ! /* Must get tuplelen the hard way in case it's misaligned */ ! memcpy(&tuplelen, data + offsetof(HeapTupleData, t_len), ! sizeof(uint32)); change->data.tp.newtuple = ReorderBufferGetTupleBuf(rb, tuplelen - SizeofHeapTupleHeader); *************** ReorderBufferRestoreChange(ReorderBuffer *** 2538,2552 **** } case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT: { ! Snapshot oldsnap; Snapshot newsnap; Size size; ! oldsnap = (Snapshot) data; size = sizeof(SnapshotData) + ! sizeof(TransactionId) * oldsnap->xcnt + ! sizeof(TransactionId) * (oldsnap->subxcnt + 0); change->data.snapshot = MemoryContextAllocZero(rb->context, size); --- 2544,2558 ---- } case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT: { ! SnapshotData oldsnap; Snapshot newsnap; Size size; ! memcpy(&oldsnap, data, sizeof(SnapshotData)); size = sizeof(SnapshotData) + ! sizeof(TransactionId) * oldsnap.xcnt + ! sizeof(TransactionId) * oldsnap.subxcnt; change->data.snapshot = MemoryContextAllocZero(rb->context, size); *************** ReorderBufferRestoreChange(ReorderBuffer *** 2557,2562 **** --- 2563,2570 ---- (((char *) newsnap) + sizeof(SnapshotData)); newsnap->subxip = newsnap->xip + newsnap->xcnt; newsnap->copied = true; + + data += size; break; } /* the base struct contains all the data, easy peasy */
Re: Tom Lane 2016-04-14 <423.1460653903@sss.pgh.pa.us> > Proposed patch attached, but I'm still wondering why the alignment-picky > buildfarm members aren't all failing on this. It seems to strongly > suggest that the regression tests' coverage is lacking here. This patch works both on sparc and amd64. Thanks! (Tested on 9.4.7 only.) Christoph
Hi, On 2016-04-14 13:11:43 -0400, Tom Lane wrote: > Well, yeah, because if Size is wider than int32 then that memcpy isn't > enough to initialize it. Yes, I wasn't suggesting this is a proper patch, just wanted to know whether this actually is the issue. > But looking around a bit, I think that ReorderBufferRestoreChange is > almost completely broken on alignment-picky machines, if the expectation > is that the input data could be aligned arbitrarily. 'data' cannot be aligned arbitrarily. It's palloc aligned, and the buffer is used separately for everything is loaded from disk. I.e. the data pointer passed to ReorderBufferRestoreChange is aligned to a MAXALIGN boundary. The alignment issues come into play for data *after* the ReorderBufferDiskChange. > Aside from this particular problem, it contains multiple places where > we cast "data" to something other than char*, and that is sufficient > to allow a spec-compliant compiler to decide that "data" is aligned on > more than a char boundary and hence generate code that depends on such > alignment. Afaics all cases but the ->newtuple branch showing the errors ought to be ok because of the above. > I also noticed that the INTERNAL_SNAPSHOT case doesn't bother to advance > "data" past the data read, where all the other switch cases do. In the > attached, I made it do likewise, but it looks like you could equally > easily just drop the final update of "data" in each switch case, > because nothing is looking at it after the switch. Yea, I wasn't sure about keeping that. > Proposed patch attached, but I'm still wondering why the alignment-picky > buildfarm members aren't all failing on this. It seems to strongly > suggest that the regression tests' coverage is lacking here. Well, then Christoph's build wouldn't have caught the issue either? Coverage isn't perfect, but it's not bad: http://coverage.postgresql.org/src/backend/replication/logical/reorderbuffer.c.gcov.html Greetings, Andres Freund
Christoph Berg <myon@debian.org> writes: > Re: Tom Lane 2016-04-14 <423.1460653903@sss.pgh.pa.us> >> Proposed patch attached, but I'm still wondering why the alignment-picky >> buildfarm members aren't all failing on this. It seems to strongly >> suggest that the regression tests' coverage is lacking here. > This patch works both on sparc and amd64. Thanks! > (Tested on 9.4.7 only.) I've been looking around and testing some more. I noticed that the only caller of ReorderBufferRestoreChange always passes a maxaligned buffer, so most of the changes I suggested are unnecessary. AFAICT, it's only the second fetch of t_len that is at any risk. Even there, at least in HEAD, I cannot construct a test case in which the first tuple's t_len is not a multiple of 4. It's hard to tell what is causing that, because surely heap_form_tuple guarantees no such thing. I have a suspicion that something in the impenetrable thicket that passes for prefix/suffix optimization in log_heap_update is forcing the WAL-logged tuple length to be rounded off to sizeof(int) (*not* MAXALIGN). It might be that 9.4 acts differently here, but the lack of crashes in our buildfarm suggests otherwise. The best theory I can come up with is that somehow, your Sparc compiler is deciding that "data" is 8-aligned at this point and generating a code sequence that depends on that, even though it's only being asked to fetch a uint32. If the machine were a 64-bit machine, the compiler could legitimately make such a deduction because of the presence of a pointer field in HeapTupleData; but I dunno how it gets to that conclusion if the architecture is 32-bit. Anyway, there are few enough architectures where it could make sense to use an 8-aligned instruction to fetch a 4-byte value that it's not so surprising we've not seen it crash elsewhere. So I now think my previous patch is overkill, and we should instead use something like the attached. regards, tom lane diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 52c6986..4520708 100644 *** a/src/backend/replication/logical/reorderbuffer.c --- b/src/backend/replication/logical/reorderbuffer.c *************** ReorderBufferRestoreChanges(ReorderBuffe *** 2444,2449 **** --- 2444,2453 ---- /* * Convert change from its on-disk format to in-memory format and queue it onto * the TXN's ->changes list. + * + * Note: although "data" is declared char*, at entry it points to a + * maxalign'd buffer, making it safe in most of this function to assume + * that the pointed-to data is suitably aligned for direct access. */ static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn, *************** ReorderBufferRestoreChange(ReorderBuffer *** 2471,2477 **** case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT: if (change->data.tp.oldtuple) { ! Size tuplelen = ((HeapTuple) data)->t_len; change->data.tp.oldtuple = ReorderBufferGetTupleBuf(rb, tuplelen - SizeofHeapTupleHeader); --- 2475,2481 ---- case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT: if (change->data.tp.oldtuple) { ! uint32 tuplelen = ((HeapTuple) data)->t_len; change->data.tp.oldtuple = ReorderBufferGetTupleBuf(rb, tuplelen - SizeofHeapTupleHeader); *************** ReorderBufferRestoreChange(ReorderBuffer *** 2492,2498 **** if (change->data.tp.newtuple) { ! Size tuplelen = ((HeapTuple) data)->t_len; change->data.tp.newtuple = ReorderBufferGetTupleBuf(rb, tuplelen - SizeofHeapTupleHeader); --- 2496,2506 ---- if (change->data.tp.newtuple) { ! /* here, data might not be suitably aligned! */ ! uint32 tuplelen; ! ! memcpy(&tuplelen, data + offsetof(HeapTupleData, t_len), ! sizeof(uint32)); change->data.tp.newtuple = ReorderBufferGetTupleBuf(rb, tuplelen - SizeofHeapTupleHeader);
Andres Freund <andres@anarazel.de> writes: > On 2016-04-14 13:11:43 -0400, Tom Lane wrote: >> But looking around a bit, I think that ReorderBufferRestoreChange is >> almost completely broken on alignment-picky machines, if the expectation >> is that the input data could be aligned arbitrarily. > 'data' cannot be aligned arbitrarily. It's palloc aligned, and the > buffer is used separately for everything is loaded from disk. I.e. the > data pointer passed to ReorderBufferRestoreChange is aligned to a > MAXALIGN boundary. Right, I figured that out awhile later; it would have been better if it was documented. See my second patch attempt. regards, tom lane
Andres Freund <andres@anarazel.de> writes: > On 2016-04-14 13:11:43 -0400, Tom Lane wrote: >> Proposed patch attached, but I'm still wondering why the alignment-picky >> buildfarm members aren't all failing on this. It seems to strongly >> suggest that the regression tests' coverage is lacking here. > Well, then Christoph's build wouldn't have caught the issue either? > Coverage isn't perfect, but it's not bad: > http://coverage.postgresql.org/src/backend/replication/logical/reorderbuffer.c.gcov.html BTW, so far as that goes, the question is whether we ever come through here with two tuples of which the first has a t_len that's not aligned at least on a 4-byte boundary. gcov can't tell you that. I put in an Assert and satisfied myself that the current tests do *not* ever create such a case. I tried modifying the test by adding updates that set a trailing text field to different-length strings, and still couldn't make it happen. So something is rounding off the tuple length, though only to sizeof(int) not to maxalign, which is very curious. regards, tom lane
On 2016-04-14 15:14:37 -0400, Tom Lane wrote: > I've been looking around and testing some more. I noticed that the only > caller of ReorderBufferRestoreChange always passes a maxaligned buffer, > so most of the changes I suggested are unnecessary. AFAICT, it's only > the second fetch of t_len that is at any risk. I think so too. And I agree that a comment about this would have been rather helpful... > Even there, at least in > HEAD, I cannot construct a test case in which the first tuple's t_len is > not a multiple of 4. That's really kinda weird. It's a tuple from WAL, and the length's from the record; the same way wal replay gets it. Wonder if this might not indicate a relevant bug somewhere... > I have a suspicion that > something in the impenetrable thicket that passes for prefix/suffix > optimization in log_heap_update is forcing the WAL-logged tuple length to > be rounded off to sizeof(int) (*not* MAXALIGN). I don't think so, because the prefix optimization is essentially disabled for logical... > So I now think my previous patch is overkill, and we should instead use > something like the attached. Looks good to me. Thanks! - Andres
Andres Freund <andres@anarazel.de> writes: > On 2016-04-14 15:14:37 -0400, Tom Lane wrote: >> Even there, at least in >> HEAD, I cannot construct a test case in which the first tuple's t_len is >> not a multiple of 4. > That's really kinda weird. It's a tuple from WAL, and the length's from > the record; the same way wal replay gets it. Wonder if this might not > indicate a relevant bug somewhere... Got it: it's a testing deficiency. What I was not aware of is that this code path (with two tuples to be restored) seems to only be entered for updates that change the primary key. And guess what, there are only a couple of instances of that in the test_decoding tests, and all of them have old tuples that are a multiple of 4 long. I'd been modifying the tests in a way that didn't produce any new exercise of the target code path at all :-( I have a test addition now that produces a bus error on gaur, and probably on every alignment-picky platform: diff --git a/contrib/test_decoding/sql/ddl.sql b/contrib/test_decoding/sql/ddl.sql index b1f7bf6..e99b2d3 100644 *** a/contrib/test_decoding/sql/ddl.sql --- b/contrib/test_decoding/sql/ddl.sql *************** INSERT INTO tr_etoomuch(data) SELECT g.i *** 116,121 **** --- 116,123 ---- SELECT 'tx logical msg' FROM pg_logical_emit_message(true, 'test', 'tx logical msg'); DELETE FROM tr_etoomuch WHERE id < 5000; UPDATE tr_etoomuch SET data = - data WHERE id > 5000; + CREATE TABLE tr_oddlength (id text primary key, data text); + INSERT INTO tr_oddlength VALUES('ab', 'foo'); COMMIT; /* display results, but hide most of the output */ *************** ORDER BY 1,2; *** 128,133 **** --- 130,137 ---- BEGIN; CREATE TABLE spoolme AS SELECT g.i FROM generate_series(1, 5000) g(i); UPDATE tr_etoomuch SET id = -id WHERE id = 5000; + UPDATE tr_oddlength SET id = 'x', data = 'quux'; + UPDATE tr_oddlength SET id = 'yy', data = 'a'; DELETE FROM spoolme; DROP TABLE spoolme; COMMIT; Will commit this along with the fix. Meanwhile, there's still the question of why Christoph's sparc machine got a bus error on a case where other picky platforms don't. But I'm prepared to write that off as some code-generation vagary. regards, tom lane