RE: [HACKERS] logical decoding of two-phase transactions - Mailing list pgsql-hackers
From | tanghy.fnst@fujitsu.com |
---|---|
Subject | RE: [HACKERS] logical decoding of two-phase transactions |
Date | |
Msg-id | OS0PR01MB61138651E2130F3E9152C45BFB2C9@OS0PR01MB6113.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: [HACKERS] logical decoding of two-phase transactions (Peter Smith <smithpb2250@gmail.com>) |
List | pgsql-hackers |
Hi Ajin >The above patch had some changes missing which resulted in some tap >tests failing. Sending an updated patchset. Keeping the patchset >version the same. Thanks for your patch. I see a problem about Segmentation fault when using it. Please take a look at this. The steps to reproduce the problem are as follows. ------publisher------ create table test (a int primary key, b varchar); create publication pub for table test; ------subscriber------ create table test (a int primary key, b varchar); create subscription sub connection 'dbname=postgres' publication pub with(two_phase=on); Then, I prepare, commit, rollback transactions and TRUNCATE table in a sql as follows: ------------- BEGIN; INSERT INTO test SELECT i, md5(i::text) FROM generate_series(1, 10000) s(i); PREPARE TRANSACTION 't1'; COMMIT PREPARED 't1'; BEGIN; INSERT INTO test SELECT i, md5(i::text) FROM generate_series(10001, 20000) s(i); PREPARE TRANSACTION 't2'; ROLLBACK PREPARED 't2'; TRUNCATE test; ------------- To make sure the problem produce easily, I looped above operations in my sql file about 10 times, then I can 100% reproduceit and got segmentation fault in publisher log as follows: ------------- 2021-05-18 16:30:56.952 CST [548189] postmaster LOG: server process (PID 548222) was terminated by signal 11: Segmentationfault 2021-05-18 16:30:56.952 CST [548189] postmaster DETAIL: Failed process was running: START_REPLICATION SLOT "sub" LOGICAL0/0 (proto_version '3', two_phase 'on', publication_names '"pub"') ------------- Here is the core dump information : ------------- #0 0x000000000090afe4 in pq_sendstring (buf=buf@entry=0x251ca80, str=0x0) at pqformat.c:199 #1 0x0000000000ab0a2b in logicalrep_write_begin_prepare (out=0x251ca80, txn=txn@entry=0x25346e8) at proto.c:124 #2 0x00007f9528842dd6 in pgoutput_begin_prepare (ctx=ctx@entry=0x2514700, txn=txn@entry=0x25346e8) at pgoutput.c:495 #3 0x00007f9528843f70 in pgoutput_truncate (ctx=0x2514700, txn=0x25346e8, nrelations=1, relations=0x262f678, change=0x25370b8)at pgoutput.c:905 #4 0x0000000000aa57cb in truncate_cb_wrapper (cache=<optimized out>, txn=<optimized out>, nrelations=<optimized out>, relations=<optimizedout>, change=<optimized out>) at logical.c:1103 #5 0x0000000000abf333 in ReorderBufferApplyTruncate (streaming=false, change=0x25370b8, relations=0x262f678, nrelations=1,txn=0x25346e8, rb=0x2516710) at reorderbuffer.c:1918 #6 ReorderBufferProcessTXN (rb=rb@entry=0x2516710, txn=0x25346e8, commit_lsn=commit_lsn@entry=27517176, snapshot_now=<optimizedout>, command_id=command_id@entry=0, streaming=streaming@entry=false) at reorderbuffer.c:2278 #7 0x0000000000ac0b14 in ReorderBufferReplay (txn=<optimized out>, rb=rb@entry=0x2516710, xid=xid@entry=738, commit_lsn=commit_lsn@entry=27517176, end_lsn=end_lsn@entry=27517544, commit_time=commit_time@entry=674644388404356, origin_id=0, origin_lsn=0) at reorderbuffer.c:2591 #8 0x0000000000ac1713 in ReorderBufferCommit (rb=0x2516710, xid=xid@entry=738, commit_lsn=27517176, end_lsn=27517544, commit_time=commit_time@entry=674644388404356, origin_id=origin_id@entry=0, origin_lsn=0) at reorderbuffer.c:2615 #9 0x0000000000a9f702 in DecodeCommit (ctx=ctx@entry=0x2514700, buf=buf@entry=0x7ffdd027c2b0, parsed=parsed@entry=0x7ffdd027c140,xid=xid@entry=738, two_phase=<optimized out>) at decode.c:742 #10 0x0000000000a9fc6c in DecodeXactOp (ctx=ctx@entry=0x2514700, buf=buf@entry=0x7ffdd027c2b0) at decode.c:278 #11 0x0000000000aa1b75 in LogicalDecodingProcessRecord (ctx=0x2514700, record=0x2514ac0) at decode.c:142 #12 0x0000000000af6db1 in XLogSendLogical () at walsender.c:2876 #13 0x0000000000afb6aa in WalSndLoop (send_data=send_data@entry=0xaf6d49 <XLogSendLogical>) at walsender.c:2306 #14 0x0000000000afbdac in StartLogicalReplication (cmd=cmd@entry=0x24da288) at walsender.c:1206 #15 0x0000000000afd646 in exec_replication_command ( cmd_string=cmd_string@entry=0x2452570 "START_REPLICATION SLOT \"sub\" LOGICAL 0/0 (proto_version '3', two_phase 'on',publication_names '\"pub\"')") at walsender.c:1646 #16 0x0000000000ba3514 in PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7ffdd027c560, dbname=<optimized out>, username=<optimizedout>) at postgres.c:4482 #17 0x0000000000a7284a in BackendRun (port=port@entry=0x2477b60) at postmaster.c:4491 #18 0x0000000000a78bba in BackendStartup (port=port@entry=0x2477b60) at postmaster.c:4213 #19 0x0000000000a78ff9 in ServerLoop () at postmaster.c:1745 #20 0x0000000000a7bbdf in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x244bae0) at postmaster.c:1417 #21 0x000000000090dc80 in main (argc=3, argv=0x244bae0) at main.c:209 ------------- I noticed that it called pgoutput_truncate function and pgoutput_begin_prepare function. It seems odd because TRUNCATE isnot in a prepared transaction in my case. I tried to debug this to learn more and found that in pgoutput_truncate function, the value of in_prepared_txn was true.Later, it got a segmentation fault when it tried to get gid in logicalrep_write_begin_prepare function - it has no gidso we got the segmentation fault. FYI: I also tested the case in synchronous mode, and it can execute successfully. So, I think the value of in_prepared_txn issometimes incorrect in asynchronous mode. Maybe there's a better way to get this. Regards Tang
pgsql-hackers by date: