Re: logical decoding bug: segfault in ReorderBufferToastReplace() - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: logical decoding bug: segfault in ReorderBufferToastReplace()
Date
Msg-id 20191205133836.lulisfo2fphdrjp3@development
Whole thread Raw
In response to logical decoding bug: segfault in ReorderBufferToastReplace()  (Jeremy Schneider <schnjere@amazon.com>)
Responses Re: logical decoding bug: segfault in ReorderBufferToastReplace()
Re: logical decoding bug: segfault in ReorderBufferToastReplace()
List pgsql-bugs
On Wed, Dec 04, 2019 at 05:36:16PM -0800, Jeremy Schneider wrote:
>On 9/8/19 14:01, Tom Lane wrote:
>> Fix RelationIdGetRelation calls that weren't bothering with error checks.
>>
>> ...
>>
>> Details
>> -------
>> https://git.postgresql.org/pg/commitdiff/69f883fef14a3fc5849126799278abcc43f40f56
>
>We had two different databases this week (with the same schema) both
>independently hit the condition of this recent commit from Tom. It's on
>11.5 so we're actually segfaulting and restarting rather than just
>causing the walsender process to ERROR, but regardless there's still
>some underlying bug here.
>
>We have core files and we're still working to see if we can figure out
>what's going on, but I thought I'd report now in case anyone has extra
>ideas or suggestions.  The segfault is on line 3034 of reorderbuffer.c.
>
>https://github.com/postgres/postgres/blob/REL_11_5/src/backend/replication/logical/reorderbuffer.c#L3034
>
>3033     toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid);
>3034     toast_desc = RelationGetDescr(toast_rel);
>
>We'll keep looking; let me know any feedback! Would love to track down
>whatever bug is in the logical decoding code, if that's what it is.
>
>==========
>
>backtrace showing the call stack...
>
>Core was generated by `postgres: walsender <NAME-REDACTED>
><DNS-REDACTED>(31712)'.
>Program terminated with signal 11, Segmentation fault.
>#0  ReorderBufferToastReplace (rb=0x3086af0, txn=0x3094a78,
>relation=0x2b79177249c8, relation=0x2b79177249c8, change=0x30ac938)
>    at reorderbuffer.c:3034
>3034    reorderbuffer.c: No such file or directory.
>...
>(gdb) #0  ReorderBufferToastReplace (rb=0x3086af0, txn=0x3094a78,
>relation=0x2b79177249c8, relation=0x2b79177249c8, change=0x30ac938)
>    at reorderbuffer.c:3034
>#1  ReorderBufferCommit (rb=0x3086af0, xid=xid@entry=1358809,
>commit_lsn=9430473346032, end_lsn=<optimized out>,
>    commit_time=commit_time@entry=628712466364268,
>origin_id=origin_id@entry=0, origin_lsn=origin_lsn@entry=0) at
>reorderbuffer.c:1584
>#2  0x0000000000716248 in DecodeCommit (xid=1358809,
>parsed=0x7ffc4ce123f0, buf=0x7ffc4ce125b0, ctx=0x3068f70) at decode.c:637
>#3  DecodeXactOp (ctx=0x3068f70, buf=buf@entry=0x7ffc4ce125b0) at
>decode.c:245
>#4  0x000000000071655a in LogicalDecodingProcessRecord (ctx=0x3068f70,
>record=0x3069208) at decode.c:117
>#5  0x0000000000727150 in XLogSendLogical () at walsender.c:2886
>#6  0x0000000000729192 in WalSndLoop (send_data=send_data@entry=0x7270f0
><XLogSendLogical>) at walsender.c:2249
>#7  0x0000000000729f91 in StartLogicalReplication (cmd=0x30485a0) at
>walsender.c:1111
>#8  exec_replication_command (
>    cmd_string=cmd_string@entry=0x2f968b0 "START_REPLICATION SLOT
>\"<NAME-REDACTED>\" LOGICAL 893/38002B98 (proto_version '1',
>publication_names '\"<NAME-REDACTED>\"')") at walsender.c:1628
>#9  0x000000000076e939 in PostgresMain (argc=<optimized out>,
>argv=argv@entry=0x2fea168, dbname=0x2fea020 "<NAME-REDACTED>",
>    username=<optimized out>) at postgres.c:4182
>#10 0x00000000004bdcb5 in BackendRun (port=0x2fdec50) at postmaster.c:4410
>#11 BackendStartup (port=0x2fdec50) at postmaster.c:4082
>#12 ServerLoop () at postmaster.c:1759
>#13 0x00000000007062f9 in PostmasterMain (argc=argc@entry=7,
>argv=argv@entry=0x2f92540) at postmaster.c:1432
>#14 0x00000000004be73b in main (argc=7, argv=0x2f92540) at main.c:228
>
>==========
>
>Some additional context...
>
># select * from pg_publication_rel;
> prpubid | prrelid
>---------+---------
>   71417 |   16453
>   71417 |   54949
>(2 rows)
>
>(gdb) print toast_rel
>$4 = (struct RelationData *) 0x0
>
>(gdb) print *relation->rd_rel
>$11 = {relname = {data = "<NAME-REDACTED>", '\000' <repeats 44 times>},
>relnamespace = 16402, reltype = 16430, reloftype = 0,
>relowner = 16393, relam = 0, relfilenode = 16428, reltablespace = 0,
>relpages = 0, reltuples = 0, relallvisible = 0, reltoastrelid = 0,

Hmmm, so reltoastrelid = 0, i.e. the relation does not have a TOAST
relation. Yet we're calling ReorderBufferToastReplace on the decoded
record ... interesting.

Can you share structure of the relation causing the issue?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #16152: postgresql10-plpython-10.11-2PGDG.rhel7.x86_64 requires an unexistant package
Next
From: Scott Volkers
Date:
Subject: Re: BUG #16148: Query on Large table hangs in ETL flows and gives outof memory when run in pgAdmin4