Re: BUG #16036: Segmentation fault while doing an update - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #16036: Segmentation fault while doing an update
Date
Msg-id 20191004032020.se46diwqas43e3tw@alap3.anarazel.de
Whole thread Raw
In response to Re: BUG #16036: Segmentation fault while doing an update  (Антон Власов <druidvav@gmail.com>)
Responses Re: BUG #16036: Segmentation fault while doing an update
List pgsql-bugs
Hi,

On 2019-10-04 05:11:16 +0300, Антон Власов wrote:
> Reproducible schema:
> 
> create table test_table (
>     id serial, field int default 0 not null
> );
> 
> create or replace function test_table_bu() returns trigger
>     language plpgsql
> as
> $$
> begin
>   return new;
> end;
> $$;
> 
> create trigger test_table_bu
>     before update
>     on test_table
>     for each row
> execute procedure test_table_bu();
> 
> insert into test_table (field) values (0);
> test.sh:
> 
> psql -d gdeposylka -c "update test_table set field = field + 1 where id = 1;" &
> psql -d gdeposylka -c "update test_table set field = field + 1 where id = 1;" &

Thanks, that's very helpful!

I've started to narrow down the problem:

bool
ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
                     ResultRelInfo *relinfo,
                     ItemPointer tupleid,
                     HeapTuple fdw_trigtuple,
                     TupleTableSlot *newslot)
{
...

        /*
         * In READ COMMITTED isolation level it's possible that target tuple
         * was changed due to concurrent update.  In that case we have a raw
         * subplan output tuple in newSlot, and need to run it through the
         * junk filter to produce an insertable tuple.
         *
         * Caution: more than likely, the passed-in slot is the same as the
         * junkfilter's output slot, so we are clobbering the original value
         * of slottuple by doing the filtering.  This is OK since neither we
         * nor our caller have any more interest in the prior contents of that
         * slot.
         */
        if (newSlot != NULL)
        {
            TupleTableSlot *slot = ExecFilterJunk(relinfo->ri_junkFilter, newSlot);

            ExecCopySlot(newslot, slot);
        }

the problem is that 'newslot' and 'slot' are the same. As copying first
clears the target slot, that results in tuple slot in a bogus state, with
BufferHeapTupleTableSlot->base.tuple = NULL, but TTS_FLAG_EMPTY not set,
but TTS_FLAG_SHOULDFREE set.

The only reason that doesn't run into the assert in
ExecCopySlotHeapTuple() that the slot bet not be empty, is that we've
already cleared that on the destination slot (which is also the source
slot) at that point...

        dstslot->tts_flags &= ~TTS_FLAG_EMPTY;
        oldContext = MemoryContextSwitchTo(dstslot->tts_mcxt);
        bdstslot->base.tuple = ExecCopySlotHeapTuple(srcslot);
        MemoryContextSwitchTo(oldContext);

So I think what we need to do is:
1) only call ExecCopySlot() if slot != newslot - this fixes the crash
2) Add an assert to ExecCopySlot() ensuring source and target slot are
   distinct
3) Figure out why our tests didn't catch this, this afaict should be a
   tested scenario
4) Fix confusing variable naming around newSlot/newslot in ExecBRUpdateTriggers


Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: Антон Власов
Date:
Subject: Re: BUG #16036: Segmentation fault while doing an update
Next
From: Tom Lane
Date:
Subject: Re: BUG #16036: Segmentation fault while doing an update