Re: logical decoding bug: segfault in ReorderBufferToastReplace() - Mailing list pgsql-bugs
From | Tomas Vondra |
---|---|
Subject | Re: logical decoding bug: segfault in ReorderBufferToastReplace() |
Date | |
Msg-id | 20191205133836.lulisfo2fphdrjp3@development Whole thread Raw |
In response to | logical decoding bug: segfault in ReorderBufferToastReplace() (Jeremy Schneider <schnjere@amazon.com>) |
Responses |
Re: logical decoding bug: segfault in ReorderBufferToastReplace()
Re: logical decoding bug: segfault in ReorderBufferToastReplace() |
List | pgsql-bugs |
On Wed, Dec 04, 2019 at 05:36:16PM -0800, Jeremy Schneider wrote: >On 9/8/19 14:01, Tom Lane wrote: >> Fix RelationIdGetRelation calls that weren't bothering with error checks. >> >> ... >> >> Details >> ------- >> https://git.postgresql.org/pg/commitdiff/69f883fef14a3fc5849126799278abcc43f40f56 > >We had two different databases this week (with the same schema) both >independently hit the condition of this recent commit from Tom. It's on >11.5 so we're actually segfaulting and restarting rather than just >causing the walsender process to ERROR, but regardless there's still >some underlying bug here. > >We have core files and we're still working to see if we can figure out >what's going on, but I thought I'd report now in case anyone has extra >ideas or suggestions. The segfault is on line 3034 of reorderbuffer.c. > >https://github.com/postgres/postgres/blob/REL_11_5/src/backend/replication/logical/reorderbuffer.c#L3034 > >3033 toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid); >3034 toast_desc = RelationGetDescr(toast_rel); > >We'll keep looking; let me know any feedback! Would love to track down >whatever bug is in the logical decoding code, if that's what it is. > >========== > >backtrace showing the call stack... > >Core was generated by `postgres: walsender <NAME-REDACTED> ><DNS-REDACTED>(31712)'. >Program terminated with signal 11, Segmentation fault. >#0 ReorderBufferToastReplace (rb=0x3086af0, txn=0x3094a78, >relation=0x2b79177249c8, relation=0x2b79177249c8, change=0x30ac938) > at reorderbuffer.c:3034 >3034 reorderbuffer.c: No such file or directory. >... >(gdb) #0 ReorderBufferToastReplace (rb=0x3086af0, txn=0x3094a78, >relation=0x2b79177249c8, relation=0x2b79177249c8, change=0x30ac938) > at reorderbuffer.c:3034 >#1 ReorderBufferCommit (rb=0x3086af0, xid=xid@entry=1358809, >commit_lsn=9430473346032, end_lsn=<optimized out>, > commit_time=commit_time@entry=628712466364268, >origin_id=origin_id@entry=0, origin_lsn=origin_lsn@entry=0) at >reorderbuffer.c:1584 >#2 0x0000000000716248 in DecodeCommit (xid=1358809, >parsed=0x7ffc4ce123f0, buf=0x7ffc4ce125b0, ctx=0x3068f70) at decode.c:637 >#3 DecodeXactOp (ctx=0x3068f70, buf=buf@entry=0x7ffc4ce125b0) at >decode.c:245 >#4 0x000000000071655a in LogicalDecodingProcessRecord (ctx=0x3068f70, >record=0x3069208) at decode.c:117 >#5 0x0000000000727150 in XLogSendLogical () at walsender.c:2886 >#6 0x0000000000729192 in WalSndLoop (send_data=send_data@entry=0x7270f0 ><XLogSendLogical>) at walsender.c:2249 >#7 0x0000000000729f91 in StartLogicalReplication (cmd=0x30485a0) at >walsender.c:1111 >#8 exec_replication_command ( > cmd_string=cmd_string@entry=0x2f968b0 "START_REPLICATION SLOT >\"<NAME-REDACTED>\" LOGICAL 893/38002B98 (proto_version '1', >publication_names '\"<NAME-REDACTED>\"')") at walsender.c:1628 >#9 0x000000000076e939 in PostgresMain (argc=<optimized out>, >argv=argv@entry=0x2fea168, dbname=0x2fea020 "<NAME-REDACTED>", > username=<optimized out>) at postgres.c:4182 >#10 0x00000000004bdcb5 in BackendRun (port=0x2fdec50) at postmaster.c:4410 >#11 BackendStartup (port=0x2fdec50) at postmaster.c:4082 >#12 ServerLoop () at postmaster.c:1759 >#13 0x00000000007062f9 in PostmasterMain (argc=argc@entry=7, >argv=argv@entry=0x2f92540) at postmaster.c:1432 >#14 0x00000000004be73b in main (argc=7, argv=0x2f92540) at main.c:228 > >========== > >Some additional context... > ># select * from pg_publication_rel; > prpubid | prrelid >---------+--------- > 71417 | 16453 > 71417 | 54949 >(2 rows) > >(gdb) print toast_rel >$4 = (struct RelationData *) 0x0 > >(gdb) print *relation->rd_rel >$11 = {relname = {data = "<NAME-REDACTED>", '\000' <repeats 44 times>}, >relnamespace = 16402, reltype = 16430, reloftype = 0, >relowner = 16393, relam = 0, relfilenode = 16428, reltablespace = 0, >relpages = 0, reltuples = 0, relallvisible = 0, reltoastrelid = 0, Hmmm, so reltoastrelid = 0, i.e. the relation does not have a TOAST relation. Yet we're calling ReorderBufferToastReplace on the decoded record ... interesting. Can you share structure of the relation causing the issue? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-bugs by date: