Thread: Another logical decoding assertion failure

Another logical decoding assertion failure

From
Antonin Houska
Date:
http://www.postgresql.org/message-id/BLU436-SMTP12682D628F61AB9736099C3DCE80@phx.gbl
recalls me that I also saw an assertion failure recently. Although I
wanted to isolate and report my issue when my vacation is over, this
report made me curious whether I saw the same. Eventually it seems to be
a different symptom.

Following are the steps that trigger the failure in my environment
(9.5devel), although it's not always that straightforward - sometimes I
need to create and populate one more table. (I use the 'test_decoding'
contrib module.)


postgres=# SELECT pg_create_logical_replication_slot('my_slot',
'test_decoding');pg_create_logical_replication_slot
------------------------------------(my_slot,0/16F3B30)
(1 row)

postgres=# CREATE TABLE a AS SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA;
SELECT 0
postgres=# INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);
INSERT 0 2
postgres=# INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);
The connection to the server was lost. Attempting reset: Failed.
!>

The stack trace (should I upload / send the whole core file anywhere?):

(gdb) bt
#0  0x00007f12d0640849 in raise () from /lib64/libc.so.6
#1  0x00007f12d0641cd8 in abort () from /lib64/libc.so.6
#2  0x00000000008c2d51 in ExceptionalCondition (conditionName=0xa926a0
"!(((bool)((relation)->rd_refcnt == 0)))",   errorType=0xa92210 "FailedAssertion", fileName=0xa92104
"relcache.c", lineNumber=1892) at assert.c:54
#3  0x00000000008b2776 in RelationDestroyRelation
(relation=0x7f12c766d240, remember_tupdesc=0 '\000') at relcache.c:1892
#4  0x00000000008b2c28 in RelationClearRelation
(relation=0x7f12c766d240, rebuild=1 '\001') at relcache.c:2106
#5  0x00000000008b2fa3 in RelationFlushRelation
(relation=0x7f12c766d240) at relcache.c:2204
#6  0x00000000008b30b5 in RelationCacheInvalidateEntry
(relationId=16384) at relcache.c:2256
#7  0x00000000008abc65 in LocalExecuteInvalidationMessage
(msg=0x28c1260) at inval.c:567
#8  0x000000000073e1f5 in ReorderBufferExecuteInvalidations
(rb=0x28b0318, txn=0x28b0430) at reorderbuffer.c:1798
#9  0x000000000073ddbf in ReorderBufferForget (rb=0x28b0318, xid=1279,
lsn=24154944) at reorderbuffer.c:1645
#10 0x00000000007383f8 in DecodeCommit (ctx=0x2894658,
buf=0x7fff3e658c30, xid=1279, dboid=12736, commit_time=461418357793855,   nsubxacts=0, sub_xids=0x28af650,
ninval_msgs=69,msgs=0x28af650) at
 
decode.c:539
#11 0x0000000000737c91 in DecodeXactOp (ctx=0x2894658,
buf=0x7fff3e658c30) at decode.c:207
#12 0x00000000007379ce in LogicalDecodingProcessRecord (ctx=0x2894658,
record=0x28af610) at decode.c:103
#13 0x000000000073b0d6 in pg_logical_slot_get_changes_guts
(fcinfo=0x7fff3e658f50, confirm=1 '\001', binary=0 '\000') at
logicalfuncs.c:432
#14 0x000000000073b221 in pg_logical_slot_get_changes
(fcinfo=0x7fff3e658f50) at logicalfuncs.c:478
#15 0x000000000063e1a3 in ExecMakeTableFunctionResult
(funcexpr=0x288be78, econtext=0x288b758, argContext=0x28b5580,   expectedDesc=0x288ccd8, randomAccess=0 '\000') at
execQual.c:2157
#16 0x000000000065e302 in FunctionNext (node=0x288ba18) at
nodeFunctionscan.c:95
#17 0x000000000064537e in ExecScanFetch (node=0x288ba18,
accessMtd=0x65e24b <FunctionNext>, recheckMtd=0x65e630 <FunctionRecheck>)   at execScan.c:82
#18 0x00000000006453ed in ExecScan (node=0x288ba18, accessMtd=0x65e24b
<FunctionNext>, recheckMtd=0x65e630 <FunctionRecheck>)   at execScan.c:132
#19 0x000000000065e665 in ExecFunctionScan (node=0x288ba18) at
nodeFunctionscan.c:269
#20 0x000000000063a649 in ExecProcNode (node=0x288ba18) at
execProcnode.c:426
#21 0x000000000065ca53 in ExecModifyTable (node=0x288b8c0) at
nodeModifyTable.c:926
#22 0x000000000063a577 in ExecProcNode (node=0x288b8c0) at
execProcnode.c:377
#23 0x0000000000638465 in ExecutePlan (estate=0x288b518,
planstate=0x288b8c0, operation=CMD_INSERT, sendTuples=0 '\000',
numberTuples=0,   direction=ForwardScanDirection, dest=0x2815ac0) at execMain.c:1475
#24 0x000000000063667a in standard_ExecutorRun (queryDesc=0x288f948,
direction=ForwardScanDirection, count=0) at execMain.c:308
#25 0x0000000000636512 in ExecutorRun (queryDesc=0x288f948,
direction=ForwardScanDirection, count=0) at execMain.c:256
#26 0x00000000007998ec in ProcessQuery (plan=0x28159c8,   sourceText=0x2854d18 "INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);", params=0x0,
dest=0x2815ac0,   completionTag=0x7fff3e659920 "") at pquery.c:185
#27 0x000000000079b16e in PortalRunMulti (portal=0x2890638, isTopLevel=1
'\001', dest=0x2815ac0, altdest=0x2815ac0,   completionTag=0x7fff3e659920 "") at pquery.c:1279
#28 0x000000000079a7b1 in PortalRun (portal=0x2890638,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x2815ac0,
altdest=0x2815ac0,   completionTag=0x7fff3e659920 "") at pquery.c:816
#29 0x000000000079487b in exec_simple_query (   query_string=0x2854d18 "INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);") at postgres.c:1045
#30 0x00000000007989ec in PostgresMain (argc=1, argv=0x27eaab0,
dbname=0x27ea910 "postgres", username=0x27ea8f0 "anton") at postgres.c:4010
#31 0x0000000000721da1 in BackendRun (port=0x280c630) at postmaster.c:4113
#32 0x00000000007214de in BackendStartup (port=0x280c630) at
postmaster.c:3787
#33 0x000000000071de16 in ServerLoop () at postmaster.c:1566


// Tony



Re: Another logical decoding assertion failure

From
Andres Freund
Date:
On 2014-08-15 14:53:45 +0200, Antonin Houska wrote:
> http://www.postgresql.org/message-id/BLU436-SMTP12682D628F61AB9736099C3DCE80@phx.gbl
> recalls me that I also saw an assertion failure recently. Although I
> wanted to isolate and report my issue when my vacation is over, this
> report made me curious whether I saw the same. Eventually it seems to be
> a different symptom.

That's something separate, yes.

> Following are the steps that trigger the failure in my environment
> (9.5devel), although it's not always that straightforward - sometimes I
> need to create and populate one more table. (I use the 'test_decoding'
> contrib module.)
> 
> 
> postgres=# SELECT pg_create_logical_replication_slot('my_slot',
> 'test_decoding');
>  pg_create_logical_replication_slot
> ------------------------------------
>  (my_slot,0/16F3B30)
> (1 row)
> 
> postgres=# CREATE TABLE a AS SELECT * FROM
> pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA;
> SELECT 0
> postgres=# INSERT INTO a SELECT * FROM
> pg_logical_slot_get_changes('my_slot', NULL, NULL);
> INSERT 0 2
> postgres=# INSERT INTO a SELECT * FROM
> pg_logical_slot_get_changes('my_slot', NULL, NULL);
> The connection to the server was lost. Attempting reset: Failed.
> !>

Is this just to reproduce the issue, or are you really storing the
results of logical decoding in a table again?

Why? That'll just result in circularity, no?  It's not something I
really thought about allowing when writing this. Possibly we can make it
work, but I don't really see the point. We obviously shouldn't just
crash, but that's not my point.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Another logical decoding assertion failure

From
Antonin Houska
Date:
On 08/15/2014 03:16 PM, Andres Freund wrote:
> On 2014-08-15 14:53:45 +0200, Antonin Houska wrote:
>> postgres=# SELECT pg_create_logical_replication_slot('my_slot',
>> 'test_decoding');
>>  pg_create_logical_replication_slot
>> ------------------------------------
>>  (my_slot,0/16F3B30)
>> (1 row)
>>
>> postgres=# CREATE TABLE a AS SELECT * FROM
>> pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA;
>> SELECT 0
>> postgres=# INSERT INTO a SELECT * FROM
>> pg_logical_slot_get_changes('my_slot', NULL, NULL);
>> INSERT 0 2
>> postgres=# INSERT INTO a SELECT * FROM
>> pg_logical_slot_get_changes('my_slot', NULL, NULL);
>> The connection to the server was lost. Attempting reset: Failed.
>> !>
> 
> Is this just to reproduce the issue, or are you really storing the
> results of logical decoding in a table again?
> 
> Why? That'll just result in circularity, no?  It's not something I
> really thought about allowing when writing this. Possibly we can make it
> work, but I don't really see the point. We obviously shouldn't just
> crash, but that's not my point.

It was basically an experiment. I tried to capture changes into a table.
I don't know whether it's legal (i.e. whether the current call of
pg_logical_slot_get_changes() should include changes that the current
transaction did). Unloged / temporary table may be necessary for case
like this.

The reason I report it is that (I think), if such an use case is not
legal, it should be detected somehow and handled using ereport / elog.

// Tony




Re: Another logical decoding assertion failure

From
Andres Freund
Date:
On 2014-08-15 14:53:45 +0200, Antonin Houska wrote:
> http://www.postgresql.org/message-id/BLU436-SMTP12682D628F61AB9736099C3DCE80@phx.gbl
> recalls me that I also saw an assertion failure recently. Although I
> wanted to isolate and report my issue when my vacation is over, this
> report made me curious whether I saw the same. Eventually it seems to be
> a different symptom.
> 
> Following are the steps that trigger the failure in my environment
> (9.5devel), although it's not always that straightforward - sometimes I
> need to create and populate one more table. (I use the 'test_decoding'
> contrib module.)
> 
> 
> postgres=# SELECT pg_create_logical_replication_slot('my_slot',
> 'test_decoding');
>  pg_create_logical_replication_slot
> ------------------------------------
>  (my_slot,0/16F3B30)
> (1 row)
> 
> postgres=# CREATE TABLE a AS SELECT * FROM
> pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA;
> SELECT 0
> postgres=# INSERT INTO a SELECT * FROM
> pg_logical_slot_get_changes('my_slot', NULL, NULL);
> INSERT 0 2
> postgres=# INSERT INTO a SELECT * FROM
> pg_logical_slot_get_changes('my_slot', NULL, NULL);
> The connection to the server was lost. Attempting reset: Failed.
> !>

This was actually triggered by a bug in logical decoding. I've pushed a
fix. Thanks for the report!

I still think it's a bad idea to do something like the above, but even
after a fair amount of thinking I can't come up with a case that's an
actual problem. So I think we can just allow it.
Obviously it's not a good idea to use a output plugin without filtering,
and a logged table for this - each subsequent get_changes() will
otherwise contain the rows inserted into the table the previous
round. The attached regression tests contains pretty output demonstrating
that:BEGINtable public.changeresult: INSERT: data[text]:'BEGIN'table public.changeresult: INSERT: data[text]:'table
public.changeresult:INSERT: data[text]:''BEGIN'''table public.changeresult: INSERT: data[text]:'table
public.changeresult:INSERT: data[text]:''table public.somechange: INSERT: id[integer]:1'''table public.changeresult:
INSERT:data[text]:'table public.changeresult: INSERT: data[text]:''COMMIT'''table public.changeresult: INSERT:
data[text]:'COMMIT'tablepublic.changeresult: INSERT: data[text]:'BEGIN'table public.changeresult: INSERT:
data[text]:'tablepublic.changeresult: INSERT: data[text]:''BEGIN'''table public.changeresult: INSERT: data[text]:'table
public.changeresult:INSERT: data[text]:''table public.changeresult: INSERT: data[text]:''''BEGIN'''''''table
public.changeresult:INSERT: data[text]:'table public.changeresult: INSERT: data[text]:''table public.changeresult:
INSERT:data[text]:''''table public.somechange: INSERT: id[integer]:1'''''''table public.changeresult: INSERT:
data[text]:'tablepublic.changeresult: INSERT: data[text]:''table public.changeresult: INSERT:
data[text]:''''COMMIT'''''''tablepublic.changeresult: INSERT: data[text]:'table public.changeresult: INSERT:
data[text]:''COMMIT'''tablepublic.changeresult: INSERT: data[text]:'COMMIT'COMMIT
 
(14 rows)

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services