Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker
Date
Msg-id 26015.1574441403@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker  (Ondřej Jirman <ienieghapheoghaiwida@xff.cz>)
List pgsql-bugs
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Fri, Nov 22, 2019 at 01:54:01AM +0100, Ondřej Jirman wrote:
>> On Thu, Nov 21, 2019 at 07:45:34PM -0500, Tom Lane wrote:
>>> =?utf-8?Q?Ond=C5=99ej?= Jirman <ienieghapheoghaiwida@xff.cz> writes:
>>>> newtup.changed
>>>> {true, true, false, true, true, true, true, true, false <repeats 1656 times>}

>>> So column 3 is not getting replaced.

> Can you show us the attribute list as defined in the system, including
> e.g. dropped columns? That is, something like
>    SELECT attnum, attname, atttypid FROM pg_attribute
>     WHERE attrelid = 'public.videos'::regclass;
> both from the published and subscriber.

After digging around a bit more in the logrep code, it seems like we'd
only have a situation with newtup.changed[i] = false if the publisher
had hit this bit in logicalrep_write_tuple:

        else if (att->attlen == -1 && VARATT_IS_EXTERNAL_ONDISK(values[i]))
        {
            pq_sendbyte(out, 'u');    /* unchanged toast column */
            continue;
        }

So this seems to indicate that Ondřej's case involves an update on a row
that contained a toasted-out-of-line column that did not get updated.
That's different from the trigger conditions that I postulated, but the
end result is the same: slot_modify_cstrings would have a case where
it's supposed to retain the old data for a pass-by-ref column, and it'd
fail to do that correctly.

I also discovered that before v12, the calling code looked like this:

        ExecStoreTuple(localslot->tts_tuple, remoteslot, InvalidBuffer, false);
        slot_modify_cstrings(remoteslot, rel, newtup.values, newtup.changed);

so that at that time, "remoteslot" indeed did not have its own local
copy of the tuple, and thus the code failed to fail.  Perhaps this was
intentional on the part of whoever wrote this, but it's still an
unacceptable abuse of the API if you ask me.

I added some tests involving dropped columns to what I had last night
and pushed it.   However, looking at this now, I see that we really
still don't have enough coverage --- the code path I showed above
is not hit by the regression tests, according to the code coverage
bot.  I don't think it's really acceptable for such a corner case
bit of the logrep protocol to not get tested :-(

            regards, tom lane



pgsql-bugs by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
Next
From: Ondřej Jirman
Date:
Subject: Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker