Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
Date
Msg-id 20191121220854.ebdzjtloel56sfwu@development
Whole thread Raw
In response to Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker
Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
List pgsql-bugs
On Thu, Nov 21, 2019 at 10:41:15PM +0100, Tomas Vondra wrote:
>On Thu, Nov 21, 2019 at 07:58:44PM +0100, Ondřej Jirman wrote:
>>On Thu, Nov 21, 2019 at 06:35:55PM +0100, Ondřej Jirman wrote:
>>>One missing piece is what exactly is the contents of the outstanding output from
>>>pgoutput plugin, that the replica crashes on and doesn't apply. Are there any
>>>tools for inspecting the binary output from the pgoutput plugin? Maybe that can
>>>provide a clue.
>>
>>So I've looked at it manually, and the segfaulting transaction doesn't make
>>much sense to me. On primary a row with 37880 byte cover_image was inserted,
>>but pgoutput plugin sends these records:
>>
>>B
>>R - public.videos  title cover_image metadata,...
>>U - new row where cover_image tuple doesn't have any data, just a flag ('u')
>>C
>>
>>Which means 'unchanged toast column' according to logical/proto.c
>>
>>Yet cover_image is defined as not null.
>>
>>I guess it's some special handling for VARATT_IS_EXTERNAL_ONDISK.
>
>Actually, no. The second message is 'U' so update, not insert. And we
>don't send TOAST values (oversized values, stored aside) that were not
>updated. Which is this case, because the cover_image was not updated (as
>implied by the 'u' flag).
>
>For the record, the decoded data looks like this:
>
>1) relation structure (8 columns)
>
>R, 19048, public, videos, d, 8,
>  (true, id, 23 => int4, 0xffffffff)
>  (false, title, 25 => text, 0xffffffff)
>  (false, cover_image, 17 => bytea, 0xffffffff)
>  (false, metadata, 3802 => jsonb, 0xffffffff)
>  (false, category, 23 => int4, 0xffffffff)
>  (false, published, 1082 => date, 0xffffffff)
>  (false, added, 1114 => timestamp, 0xffffffff)
>  (false, played, 16 => uuid, 0xffffffff)
>
>2) update, with the TOAST value undefined/unchanged
>
>U, 19048, N, 8,
>  (4, 8393),
>  (86, Pazderka: O média se povede boj. Kdyby nás politici milovali, bylo by něco špatně),
>  u, /* unchanged bytea */
>  (313, {.. json ..}),
>  (3, 242),
>  (10, 2019-11-18),
>  (26, 2019-11-18 12:50:01.816376),
>  (1, t)
>
>So it's perfectly well formatted, I think.
>
>>
>>To me this looks like this transaction is assuming data for bytea column
>>cover_image are already replicated from earlier?
>>
>>Maybe it is not though? It's certainly breaking some assumption on the
>>replica, because this is the precise point where segfault happens.
>>
>
>I think this more and more seems like some sort of bug in the apply
>code, not resetting a NULL flag and interpreting the bogus value as
>a valid pointer. Wouldn't the be the first such bug.
>

FWIW my hunch is the bug is somewhere in this chunk of code from
apply_heap_update:

     if (found)
     {
         /* Process and store remote tuple in the slot */
         oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
         ExecCopySlot(remoteslot, localslot);
         slot_modify_cstrings(remoteslot, rel, newtup.values, newtup.changed);
         MemoryContextSwitchTo(oldctx);

         EvalPlanQualSetSlot(&epqstate, remoteslot);

         /* Do the actual update. */
         ExecSimpleRelationUpdate(estate, &epqstate, localslot, remoteslot);
     }

Can you please switch to this fram in the backtrace and:

1) Print newtup.values and newtup.changed matches, so that we can check
it contains the tuple I shared before? There should be 8 elements in
each, with 'values' being values formatted as text, and 'changed' being
flags which values were updated.

2) Print remoteslot->tts_tupleDescriptor.

3) Print rel->attrmap (should have 8 elements)

4) Print remoteslot->tts_values and remoteslot->tts_isnull (8 elements).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-bugs by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
Next
From: Tom Lane
Date:
Subject: Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker