Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> FWIW my hunch is the bug is somewhere in this chunk of code from
> apply_heap_update:
> oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
> ExecCopySlot(remoteslot, localslot);
> slot_modify_cstrings(remoteslot, rel, newtup.values, newtup.changed);
> MemoryContextSwitchTo(oldctx);
My first reaction to looking at this was that ExecCopySlot() could
be dropped entirely, because the first thing that slot_modify_cstrings
does is
slot_getallattrs(slot);
ExecClearTuple(slot);
That slot_getallattrs call seems 100% useless as well, because
surely ExecClearTuple is just going to drop all the data.
On closer inspection, though, it looks like the author of this
code is relying on ExecClearTuple to *not* destroy the data,
because the subsequent loop might overwrite only some of the
slot's columns, before it does
ExecStoreVirtualTuple(slot);
which supposes that all the columns are valid.
But of course this is 100% broken, because whatever ExecClearTuple
may have done or not done to the slot's datum/isnull arrays, it
certainly pfree'd the slot's physical tuple. So any pass-by-ref
datums are now dangling pointers.
I imagine the only reason this code has gotten past the valgrind
animals is that we're not testing cases where non-replaced columns
in the subscriber table are of pass-by-ref types.
I draw a direct line between the existence of this bug and the lack
of commentary in slot_modify_cstrings about what it's doing. If the
author had troubled to write a comment explaining these machinations,
he might've noticed the bug, or at least reviewers might have.
regards, tom lane