Re: PostgreSQL12 crash bug report - Mailing list pgsql-bugs

From Andres Freund
Subject Re: PostgreSQL12 crash bug report
Date
Msg-id 20190801004000.3xis75ulpun4ha3p@alap3.anarazel.de
Whole thread Raw
In response to Re: PostgreSQL12 crash bug report  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: PostgreSQL12 crash bug report  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Hi,

On 2019-07-31 19:07:29 -0400, Tom Lane wrote:
> I wrote:
> > yi huang <yi.codeplayer@gmail.com> writes:
> >> [ crashes with ]
> >> $ pgbench -h 127.0.0.1 -p 5432 -U exchange exchange -T 2 -c 4 -f /tmp/test.sql
> 
> > As per the stack trace, we're trying to build a new tuple for the output
> > of a ValuesScan node, but the targetlist for that seems completely insane:
> 
> After digging through this, it seems clear that it's the fault of
> ad0bda5d2, specifically this code in EvalPlanQualSlot:
> 
>             *slot = ExecAllocTableSlot(&epqstate->estate->es_tupleTable,
>                                        epqstate->origslot->tts_tupleDescriptor,
>                                        &TTSOpsVirtual);
> 
> It's setting up an es_epqTupleSlot[] entry on the assumption that it
> should have the same tupdesc as the output tuple that's to be rechecked.
> This might accidentally fail to malfunction in single-table cases,
> but it's quite broken for any join case --- in particular, for the
> given test case, the scan tuple for the VALUES node surely doesn't have
> the same tupdesc as the join result.

:(

We really need to improve the test coverage for EPQ :(. I tried to add a
few cases, but it's still one of the least tested complicated areas in
PG.

To make sure I understand - the problem isn't the slot that we've
created in nodeModifyTable.c via EvalPlanQualSlot(), right? It's the one
we create in EvalPlanQualFetchRowMarks(), because we don't have a proper
tuple type handy to create the slot with?  And then ExecScanFetch() (on
behalf of ValueScan in this case) expects something with the correct
format in that slot, which causes the backtrace you show. And this
problem occurs "only" for ROW_MARK_COPY, because for everything else we
have an associated relation.

Previously we simply didn't need to know the type during EPQ setup,
because we only stored a HeapTuple anyway. And we'd know the appropriate
tupledesc at the places that access the tuple.

Right?


> I think we need to change the API of EvalPlanQualSlot, because it
> cannot determine the correct tupdesc when it's not handed a Relation.
> I'm not entirely sure that its immediate callers can either :-( so
> this might propagate out a ways.  Or perhaps we should move the
> slot-making someplace else.

I think we'll probably have to split EvalPlanQualSlot() into two
functions. One for nodeModifyTable.c and nodeLockRows.c, and the
!ROW_MARK_COPY cases in EvalPlanQualFetchRowMarks().


One bigger change - but possibly one worth it - would be to basically
move the work done in EvalPlanQualFetchRowMarks() to be done on-demand,
at least for ROW_MARK_COPY. We don't need to do

            /* fetch the whole-row Var for the relation */
            datum = ExecGetJunkAttribute(epqstate->origslot,
                                         aerm->wholeAttNo,
                                         &isNull);
            /* non-locked rels could be on the inside of outer joins */
            if (isNull)
                continue;

that early in the query. We could call EvalPlanQualFetchRowMark() (note
the singular) from ExecScanFetch(), that ought to currently be the only
place that accesses ROW_MARK_COPY type rowmarks?

We could do the same for the other types of rowmarks, but I think that'd
be semantically somewhat murkier.

That would have the advantage that we'd not unnecessarily put row
marks we might never need into slots. There's probably not that many
cases where that matters, though.


Or we could keep enough information for ROW_MARK_COPY RowMarks around to
create the necessary tupledesc (there's a number of variants as to how -
don't zap the information in add_rte_to_flat_rtable, have setrefs.c
somehow keep maintain rti indexed pointers to Plan nodes, ...).

Not sure yet what's best here.

Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: UPDATE with json_populate_recordset empty array crashes server
Next
From: Michael Paquier
Date:
Subject: Re: BUG #15932: Module passwordcheck doesn't reference previous hooks