Thread: Another logical decoding assertion failure
http://www.postgresql.org/message-id/BLU436-SMTP12682D628F61AB9736099C3DCE80@phx.gbl recalls me that I also saw an assertion failure recently. Although I wanted to isolate and report my issue when my vacation is over, this report made me curious whether I saw the same. Eventually it seems to be a different symptom. Following are the steps that trigger the failure in my environment (9.5devel), although it's not always that straightforward - sometimes I need to create and populate one more table. (I use the 'test_decoding' contrib module.) postgres=# SELECT pg_create_logical_replication_slot('my_slot', 'test_decoding');pg_create_logical_replication_slot ------------------------------------(my_slot,0/16F3B30) (1 row) postgres=# CREATE TABLE a AS SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA; SELECT 0 postgres=# INSERT INTO a SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL); INSERT 0 2 postgres=# INSERT INTO a SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL); The connection to the server was lost. Attempting reset: Failed. !> The stack trace (should I upload / send the whole core file anywhere?): (gdb) bt #0 0x00007f12d0640849 in raise () from /lib64/libc.so.6 #1 0x00007f12d0641cd8 in abort () from /lib64/libc.so.6 #2 0x00000000008c2d51 in ExceptionalCondition (conditionName=0xa926a0 "!(((bool)((relation)->rd_refcnt == 0)))", errorType=0xa92210 "FailedAssertion", fileName=0xa92104 "relcache.c", lineNumber=1892) at assert.c:54 #3 0x00000000008b2776 in RelationDestroyRelation (relation=0x7f12c766d240, remember_tupdesc=0 '\000') at relcache.c:1892 #4 0x00000000008b2c28 in RelationClearRelation (relation=0x7f12c766d240, rebuild=1 '\001') at relcache.c:2106 #5 0x00000000008b2fa3 in RelationFlushRelation (relation=0x7f12c766d240) at relcache.c:2204 #6 0x00000000008b30b5 in RelationCacheInvalidateEntry (relationId=16384) at relcache.c:2256 #7 0x00000000008abc65 in LocalExecuteInvalidationMessage (msg=0x28c1260) at inval.c:567 #8 0x000000000073e1f5 in ReorderBufferExecuteInvalidations (rb=0x28b0318, txn=0x28b0430) at reorderbuffer.c:1798 #9 0x000000000073ddbf in ReorderBufferForget (rb=0x28b0318, xid=1279, lsn=24154944) at reorderbuffer.c:1645 #10 0x00000000007383f8 in DecodeCommit (ctx=0x2894658, buf=0x7fff3e658c30, xid=1279, dboid=12736, commit_time=461418357793855, nsubxacts=0, sub_xids=0x28af650, ninval_msgs=69,msgs=0x28af650) at decode.c:539 #11 0x0000000000737c91 in DecodeXactOp (ctx=0x2894658, buf=0x7fff3e658c30) at decode.c:207 #12 0x00000000007379ce in LogicalDecodingProcessRecord (ctx=0x2894658, record=0x28af610) at decode.c:103 #13 0x000000000073b0d6 in pg_logical_slot_get_changes_guts (fcinfo=0x7fff3e658f50, confirm=1 '\001', binary=0 '\000') at logicalfuncs.c:432 #14 0x000000000073b221 in pg_logical_slot_get_changes (fcinfo=0x7fff3e658f50) at logicalfuncs.c:478 #15 0x000000000063e1a3 in ExecMakeTableFunctionResult (funcexpr=0x288be78, econtext=0x288b758, argContext=0x28b5580, expectedDesc=0x288ccd8, randomAccess=0 '\000') at execQual.c:2157 #16 0x000000000065e302 in FunctionNext (node=0x288ba18) at nodeFunctionscan.c:95 #17 0x000000000064537e in ExecScanFetch (node=0x288ba18, accessMtd=0x65e24b <FunctionNext>, recheckMtd=0x65e630 <FunctionRecheck>) at execScan.c:82 #18 0x00000000006453ed in ExecScan (node=0x288ba18, accessMtd=0x65e24b <FunctionNext>, recheckMtd=0x65e630 <FunctionRecheck>) at execScan.c:132 #19 0x000000000065e665 in ExecFunctionScan (node=0x288ba18) at nodeFunctionscan.c:269 #20 0x000000000063a649 in ExecProcNode (node=0x288ba18) at execProcnode.c:426 #21 0x000000000065ca53 in ExecModifyTable (node=0x288b8c0) at nodeModifyTable.c:926 #22 0x000000000063a577 in ExecProcNode (node=0x288b8c0) at execProcnode.c:377 #23 0x0000000000638465 in ExecutePlan (estate=0x288b518, planstate=0x288b8c0, operation=CMD_INSERT, sendTuples=0 '\000', numberTuples=0, direction=ForwardScanDirection, dest=0x2815ac0) at execMain.c:1475 #24 0x000000000063667a in standard_ExecutorRun (queryDesc=0x288f948, direction=ForwardScanDirection, count=0) at execMain.c:308 #25 0x0000000000636512 in ExecutorRun (queryDesc=0x288f948, direction=ForwardScanDirection, count=0) at execMain.c:256 #26 0x00000000007998ec in ProcessQuery (plan=0x28159c8, sourceText=0x2854d18 "INSERT INTO a SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL);", params=0x0, dest=0x2815ac0, completionTag=0x7fff3e659920 "") at pquery.c:185 #27 0x000000000079b16e in PortalRunMulti (portal=0x2890638, isTopLevel=1 '\001', dest=0x2815ac0, altdest=0x2815ac0, completionTag=0x7fff3e659920 "") at pquery.c:1279 #28 0x000000000079a7b1 in PortalRun (portal=0x2890638, count=9223372036854775807, isTopLevel=1 '\001', dest=0x2815ac0, altdest=0x2815ac0, completionTag=0x7fff3e659920 "") at pquery.c:816 #29 0x000000000079487b in exec_simple_query ( query_string=0x2854d18 "INSERT INTO a SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL);") at postgres.c:1045 #30 0x00000000007989ec in PostgresMain (argc=1, argv=0x27eaab0, dbname=0x27ea910 "postgres", username=0x27ea8f0 "anton") at postgres.c:4010 #31 0x0000000000721da1 in BackendRun (port=0x280c630) at postmaster.c:4113 #32 0x00000000007214de in BackendStartup (port=0x280c630) at postmaster.c:3787 #33 0x000000000071de16 in ServerLoop () at postmaster.c:1566 // Tony
On 2014-08-15 14:53:45 +0200, Antonin Houska wrote: > http://www.postgresql.org/message-id/BLU436-SMTP12682D628F61AB9736099C3DCE80@phx.gbl > recalls me that I also saw an assertion failure recently. Although I > wanted to isolate and report my issue when my vacation is over, this > report made me curious whether I saw the same. Eventually it seems to be > a different symptom. That's something separate, yes. > Following are the steps that trigger the failure in my environment > (9.5devel), although it's not always that straightforward - sometimes I > need to create and populate one more table. (I use the 'test_decoding' > contrib module.) > > > postgres=# SELECT pg_create_logical_replication_slot('my_slot', > 'test_decoding'); > pg_create_logical_replication_slot > ------------------------------------ > (my_slot,0/16F3B30) > (1 row) > > postgres=# CREATE TABLE a AS SELECT * FROM > pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA; > SELECT 0 > postgres=# INSERT INTO a SELECT * FROM > pg_logical_slot_get_changes('my_slot', NULL, NULL); > INSERT 0 2 > postgres=# INSERT INTO a SELECT * FROM > pg_logical_slot_get_changes('my_slot', NULL, NULL); > The connection to the server was lost. Attempting reset: Failed. > !> Is this just to reproduce the issue, or are you really storing the results of logical decoding in a table again? Why? That'll just result in circularity, no? It's not something I really thought about allowing when writing this. Possibly we can make it work, but I don't really see the point. We obviously shouldn't just crash, but that's not my point. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 08/15/2014 03:16 PM, Andres Freund wrote: > On 2014-08-15 14:53:45 +0200, Antonin Houska wrote: >> postgres=# SELECT pg_create_logical_replication_slot('my_slot', >> 'test_decoding'); >> pg_create_logical_replication_slot >> ------------------------------------ >> (my_slot,0/16F3B30) >> (1 row) >> >> postgres=# CREATE TABLE a AS SELECT * FROM >> pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA; >> SELECT 0 >> postgres=# INSERT INTO a SELECT * FROM >> pg_logical_slot_get_changes('my_slot', NULL, NULL); >> INSERT 0 2 >> postgres=# INSERT INTO a SELECT * FROM >> pg_logical_slot_get_changes('my_slot', NULL, NULL); >> The connection to the server was lost. Attempting reset: Failed. >> !> > > Is this just to reproduce the issue, or are you really storing the > results of logical decoding in a table again? > > Why? That'll just result in circularity, no? It's not something I > really thought about allowing when writing this. Possibly we can make it > work, but I don't really see the point. We obviously shouldn't just > crash, but that's not my point. It was basically an experiment. I tried to capture changes into a table. I don't know whether it's legal (i.e. whether the current call of pg_logical_slot_get_changes() should include changes that the current transaction did). Unloged / temporary table may be necessary for case like this. The reason I report it is that (I think), if such an use case is not legal, it should be detected somehow and handled using ereport / elog. // Tony
On 2014-08-15 14:53:45 +0200, Antonin Houska wrote: > http://www.postgresql.org/message-id/BLU436-SMTP12682D628F61AB9736099C3DCE80@phx.gbl > recalls me that I also saw an assertion failure recently. Although I > wanted to isolate and report my issue when my vacation is over, this > report made me curious whether I saw the same. Eventually it seems to be > a different symptom. > > Following are the steps that trigger the failure in my environment > (9.5devel), although it's not always that straightforward - sometimes I > need to create and populate one more table. (I use the 'test_decoding' > contrib module.) > > > postgres=# SELECT pg_create_logical_replication_slot('my_slot', > 'test_decoding'); > pg_create_logical_replication_slot > ------------------------------------ > (my_slot,0/16F3B30) > (1 row) > > postgres=# CREATE TABLE a AS SELECT * FROM > pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA; > SELECT 0 > postgres=# INSERT INTO a SELECT * FROM > pg_logical_slot_get_changes('my_slot', NULL, NULL); > INSERT 0 2 > postgres=# INSERT INTO a SELECT * FROM > pg_logical_slot_get_changes('my_slot', NULL, NULL); > The connection to the server was lost. Attempting reset: Failed. > !> This was actually triggered by a bug in logical decoding. I've pushed a fix. Thanks for the report! I still think it's a bad idea to do something like the above, but even after a fair amount of thinking I can't come up with a case that's an actual problem. So I think we can just allow it. Obviously it's not a good idea to use a output plugin without filtering, and a logged table for this - each subsequent get_changes() will otherwise contain the rows inserted into the table the previous round. The attached regression tests contains pretty output demonstrating that:BEGINtable public.changeresult: INSERT: data[text]:'BEGIN'table public.changeresult: INSERT: data[text]:'table public.changeresult:INSERT: data[text]:''BEGIN'''table public.changeresult: INSERT: data[text]:'table public.changeresult:INSERT: data[text]:''table public.somechange: INSERT: id[integer]:1'''table public.changeresult: INSERT:data[text]:'table public.changeresult: INSERT: data[text]:''COMMIT'''table public.changeresult: INSERT: data[text]:'COMMIT'tablepublic.changeresult: INSERT: data[text]:'BEGIN'table public.changeresult: INSERT: data[text]:'tablepublic.changeresult: INSERT: data[text]:''BEGIN'''table public.changeresult: INSERT: data[text]:'table public.changeresult:INSERT: data[text]:''table public.changeresult: INSERT: data[text]:''''BEGIN'''''''table public.changeresult:INSERT: data[text]:'table public.changeresult: INSERT: data[text]:''table public.changeresult: INSERT:data[text]:''''table public.somechange: INSERT: id[integer]:1'''''''table public.changeresult: INSERT: data[text]:'tablepublic.changeresult: INSERT: data[text]:''table public.changeresult: INSERT: data[text]:''''COMMIT'''''''tablepublic.changeresult: INSERT: data[text]:'table public.changeresult: INSERT: data[text]:''COMMIT'''tablepublic.changeresult: INSERT: data[text]:'COMMIT'COMMIT (14 rows) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services