Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
Date
Msg-id 20191121234701.2wtyc77zd5ux3myf@development
Whole thread Raw
In response to Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker
List pgsql-bugs
On Thu, Nov 21, 2019 at 05:37:25PM -0500, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> FWIW my hunch is the bug is somewhere in this chunk of code from
>> apply_heap_update:
>
>>          oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
>>          ExecCopySlot(remoteslot, localslot);
>>          slot_modify_cstrings(remoteslot, rel, newtup.values, newtup.changed);
>>          MemoryContextSwitchTo(oldctx);
>
>My first reaction to looking at this was that ExecCopySlot() could
>be dropped entirely, because the first thing that slot_modify_cstrings
>does is
>
>    slot_getallattrs(slot);
>    ExecClearTuple(slot);
>
>That slot_getallattrs call seems 100% useless as well, because
>surely ExecClearTuple is just going to drop all the data.
>

I don't think that's quite true. After the ExecCopySlot call, the
pass-by-ref Datums in remoteslot will point to a tuple attached to
localslot. But it does not pass the tuple 'ownership' to the remoteslot,
i.e. the flag TTS_FLAG_SHOULDFREE won't be set, i.e. the tuple won't be
freed.

>On closer inspection, though, it looks like the author of this
>code is relying on ExecClearTuple to *not* destroy the data,
>because the subsequent loop might overwrite only some of the
>slot's columns, before it does
>
>    ExecStoreVirtualTuple(slot);
>
>which supposes that all the columns are valid.
>
>But of course this is 100% broken, because whatever ExecClearTuple
>may have done or not done to the slot's datum/isnull arrays, it
>certainly pfree'd the slot's physical tuple.  So any pass-by-ref
>datums are now dangling pointers.
>
>I imagine the only reason this code has gotten past the valgrind
>animals is that we're not testing cases where non-replaced columns
>in the subscriber table are of pass-by-ref types.
>

I haven't checked, but I'd imagine we actually do such tests. I've
however tried to reproduce this, unsuccessfully.

>I draw a direct line between the existence of this bug and the lack
>of commentary in slot_modify_cstrings about what it's doing.  If the
>author had troubled to write a comment explaining these machinations,
>he might've noticed the bug, or at least reviewers might have.
>

Not sure. More comments would not hurt, but I don't see any bug yet.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-bugs by date:

Previous
From: mayur
Date:
Subject: Re: BUG #16130: planner does not pick unique btree index and goes for seq scan but unsafe hash index works.
Next
From: Tom Lane
Date:
Subject: Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker