Re: [HACKERS] Re: Query fails when SRFs are part of FROM clause(Commit id: 69f4b9c85f) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] Re: Query fails when SRFs are part of FROM clause(Commit id: 69f4b9c85f)
Date
Msg-id 20170412225855.oklyfkbwj2dmbrpu@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Re: Query fails when SRFs are part of FROM clause(Commit id: 69f4b9c85f)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2017-04-05 09:39:37 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-04-05 02:47:55 -0400, Noah Misch wrote:
> >> [Action required within three days.  This is a generic notification.]
>
> > I've a very preliminary patch.  I'd like to only start polishing it up
> > once the code freeze is over, so I can work on getting some patches in -
> > note that I myself have no pending patches.  Once frozen I'll polish it
> > up and send that within a few days.
>
> FWIW, I'm willing to help out on this.

Help would be appreciated.  I've pondered, and partially implemented,
several approaches so far, and my conclusion is that we should just do
nothing.  The amount of corner cases is just too big, and the utility of
the feature too small.

To recap, the issue is that in previous versions it was, by accident
(there's no test, code comments "supporting" the feature talk about a
different corner case, and the behaviour isn't correct in some cases)
allowed to do something like:
SELECT * FROM CAST(generate_series(1,3) * 5 AS int);
while
SELECT * FROM generate_series(1,3) * 5;
is not allowed.

The reason that that works from the gram.y perspective is that CAST etc
types of func_expr's.  The reason that it worked from a code perspective
is that execQual.c:ExecMakeTableFunctionResult() has the following:/* * Normally the passed expression tree will be a
FuncExprState,since the * grammar only allows a function call at the top level of a table * function reference.
However,if the function doesn't return set then * the planner might have replaced the function call via
constant-folding* or inlining.  So if we see any other kind of expression node, execute * it via the general
ExecEvalExpr()code; the only difference is that we * don't get a chance to pass a special ReturnSetInfo to any
functions* buried in the expression. */if (funcexpr && IsA(funcexpr, FuncExprState) &&    IsA(funcexpr->expr,
FuncExpr))
and back then ExecEvalExpr() was allowed to return sets.  It also
depends on some default initializations (e.g. rsinfo.returnMode =
SFRM_ValuePerCall).

This yields plenty weird behaviour in < v10. E.g. the following is
disallowed: SELECT * FROM int4mul(generate_series(1,3), 1); ERROR:  0A000: set-valued function called in context that
cannotaccept a set
 
as is SELECT * FROM CAST(int4mul(generate_series(1,3), 1) AS bigint); ERROR:  0A000: set-valued function called in
contextthat cannot accept a set
 
because the cast is implemented as int8(expr) which avoids the fallback
path as it's a FuncExpr, but SELECT * FROM CAST(int4mul(generate_series(1,3), 1) AS text);
works because the cast is implemented as a io coercion, which is not a
funcexpr. Therefore it uses the fallback ExecEvalExpr().

The mismatch between ExecEvalExpr() and nodeFunctionscan.c SRF behaviour
back then also yields odd behaviour, e.g.: SELECT * FROM generate_series(1,0);
returns zero rows, but SELECT * FROM CAST(generate_series(1,0) * 5 AS INT);
returns one NULL row.


In v10, as it stands, these all error out, because the SRFs are now only
to be evaluated via either nodeFunctionscan.c (FROM) or via
nodeProjectSet.c (targetlist), not ExecEvalExpr() anymore.


I've basically pondered three different methods of implementing
something akin to the old behaviour:

1) Move the non-SRF part into nodeFunctionscan.c's targetlist, and let  it be evaluated there.  E.g. if the expression
is CAST(generate_series(1,5) AS text), evaluate the generate_series(1,5)  using nodeFunctionscan's
FunctionScanPerFuncStatemachinery, but  implement the CAST as CAST(Var(whatever, 1) AS Text).
 
  That doesn't work well for composite/record returning rows, because  RangeTblFunction's returning composites are
expandedinto multiple  columns. E.g.  SELECT * FROM CAST(CAST(twocol_noset_outline(generate_series(1,3), 'text') AS
text)AS twocol);  returns all the columns of twocol, not a single wholerow datum.
 
  There's also some issues around what to do with cases involving  volatile functions when the output is not
referenced,or referenced  multiple times e.g.    SELECT f, f FROM CAST(generate_series(1,3) * nextval(...)) AS f;
wouldevaluate nextval() twice with such an approach...
 

2) During planning, split of the SRF bit from the !SRF bit and have  nodeFunctionscan.c evaluate both separately, like
1). That allows to  avoid the volatility issue, because the targetlist is still projected  separately.
 
  I've prototyped this to a reasonable point, by having  create_functionscan_plan() process each RangeTblFunction and
split the expression into SRF and non-SRF parts, and then have  FunctionNext() evaluate the non-SRF part.
 
  At the current state of my prototype this happens to allow simple SRF  in arguments cases like SELECT * FROM
int4mul(generate_series(1,3),1)  which are disallowed in < v10.
 

  This is reasonably-ish in complexity for scalar values, but gets  quite complicated for composite/record datums.
Suddenlythere's two  different types of values that we need to deal with, the type of  datum returned by the SRF, and
thetype of datum returned by the  final expression - they might be the same, they might not.  Especially for
record-returningfunctions, where ROWS FROM( AS...)  determines the column types, and ExecMakeTableFunctionResult() does
a tupledesc_match() to verify the SRF returned something reasonable,  it gets wild: What would we even be matching the
typesagainst?
 


3) Implement nested FROM SRFs as pipelined executor nodes, similar to  the way targetlist SRFs are now handled.  E.g.
somethinglike  SELECT * FROM CAST(generate_series(1,10) * 5 AS int);
 
  would be implemented as one nodeFunctionscan.c for the  generate_series(), and then something like nodeProjectSet.c
(ora  nodeFunctionscan.c branch that'd make a sub-node available) would  evaluate the CAST(Var() * 5 AS int).
 
  This approach has the advantage that it'd allow us, potentially in  the future, to get rid of the restriction that
normalROWS FROM  doesn't allow SRFs in arguments.
 
  I think this again runs into trouble with row-expansion in the return  value from SRFs, which isn't actually desired
tillthe outermost  "level" of the expression.  I guess we could add a mode to  nodeFunctionscan.c that forces
composite/recordtypes to be returned  as wholerow vars instead of being split up.
 


My conclusion here is that it's way too complicated to properly
implement a feature that only seems to exist by accident and has plenty
of weird behaviour.  Currently we continue to accept all the !SRF
expressions that were previously accepted, but I'd even consider
insisting that the expression needs to be a proper FuncExpr at
parse-analysis time (before inlining/const evaluation did its thing).

Unless somebody has a radically better idea how to implement this?


- Andres



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [pgsql-www] [HACKERS] Small issue in online devel documentationbuild
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] pg_upgrade vs extension upgrades