Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker - Mailing list pgsql-bugs
From | Tomas Vondra |
---|---|
Subject | Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker |
Date | |
Msg-id | 20191121220854.ebdzjtloel56sfwu@development Whole thread Raw |
In response to | Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker
Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker |
List | pgsql-bugs |
On Thu, Nov 21, 2019 at 10:41:15PM +0100, Tomas Vondra wrote: >On Thu, Nov 21, 2019 at 07:58:44PM +0100, Ondřej Jirman wrote: >>On Thu, Nov 21, 2019 at 06:35:55PM +0100, Ondřej Jirman wrote: >>>One missing piece is what exactly is the contents of the outstanding output from >>>pgoutput plugin, that the replica crashes on and doesn't apply. Are there any >>>tools for inspecting the binary output from the pgoutput plugin? Maybe that can >>>provide a clue. >> >>So I've looked at it manually, and the segfaulting transaction doesn't make >>much sense to me. On primary a row with 37880 byte cover_image was inserted, >>but pgoutput plugin sends these records: >> >>B >>R - public.videos title cover_image metadata,... >>U - new row where cover_image tuple doesn't have any data, just a flag ('u') >>C >> >>Which means 'unchanged toast column' according to logical/proto.c >> >>Yet cover_image is defined as not null. >> >>I guess it's some special handling for VARATT_IS_EXTERNAL_ONDISK. > >Actually, no. The second message is 'U' so update, not insert. And we >don't send TOAST values (oversized values, stored aside) that were not >updated. Which is this case, because the cover_image was not updated (as >implied by the 'u' flag). > >For the record, the decoded data looks like this: > >1) relation structure (8 columns) > >R, 19048, public, videos, d, 8, > (true, id, 23 => int4, 0xffffffff) > (false, title, 25 => text, 0xffffffff) > (false, cover_image, 17 => bytea, 0xffffffff) > (false, metadata, 3802 => jsonb, 0xffffffff) > (false, category, 23 => int4, 0xffffffff) > (false, published, 1082 => date, 0xffffffff) > (false, added, 1114 => timestamp, 0xffffffff) > (false, played, 16 => uuid, 0xffffffff) > >2) update, with the TOAST value undefined/unchanged > >U, 19048, N, 8, > (4, 8393), > (86, Pazderka: O média se povede boj. Kdyby nás politici milovali, bylo by něco špatně), > u, /* unchanged bytea */ > (313, {.. json ..}), > (3, 242), > (10, 2019-11-18), > (26, 2019-11-18 12:50:01.816376), > (1, t) > >So it's perfectly well formatted, I think. > >> >>To me this looks like this transaction is assuming data for bytea column >>cover_image are already replicated from earlier? >> >>Maybe it is not though? It's certainly breaking some assumption on the >>replica, because this is the precise point where segfault happens. >> > >I think this more and more seems like some sort of bug in the apply >code, not resetting a NULL flag and interpreting the bogus value as >a valid pointer. Wouldn't the be the first such bug. > FWIW my hunch is the bug is somewhere in this chunk of code from apply_heap_update: if (found) { /* Process and store remote tuple in the slot */ oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); ExecCopySlot(remoteslot, localslot); slot_modify_cstrings(remoteslot, rel, newtup.values, newtup.changed); MemoryContextSwitchTo(oldctx); EvalPlanQualSetSlot(&epqstate, remoteslot); /* Do the actual update. */ ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot); } Can you please switch to this fram in the backtrace and: 1) Print newtup.values and newtup.changed matches, so that we can check it contains the tuple I shared before? There should be 8 elements in each, with 'values' being values formatted as text, and 'changed' being flags which values were updated. 2) Print remoteslot->tts_tupleDescriptor. 3) Print rel->attrmap (should have 8 elements) 4) Print remoteslot->tts_values and remoteslot->tts_isnull (8 elements). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-bugs by date: