Re: Changed SRF in targetlist handling - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Changed SRF in targetlist handling
Date
Msg-id 20160802233055.kbhqv6plqh432cwh@alap3.anarazel.de
Whole thread Raw
In response to Re: Changed SRF in targetlist handling  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Changed SRF in targetlist handling  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 2016-08-02 19:02:38 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I've an implementation that
>
> > 1) turns all targetlist SRF (tSRF from now on) into ROWS FROM
> >    expressions. If there's tSRFs in the argument of a tSRF those becomes
> >    a separate, lateral, ROWS FROM expression.
>
> > 2) If grouping/window functions are present, the entire query is wrapped
> >    in a subquery RTE, except for the set-returning function. All
> >    referenced Var|Aggref|GroupingFunc|WindowFunc|Param nodes in the
> >    original targetlist are made to reference that subquery, which gets a
> >    TargetEntry for them.
>
> FWIW, I'd be inclined to do the subquery RTE all the time,

Yea, that's what I ended up doing.


> adding some
> optimization fence to ensure it doesn't get folded back.  That fixes
> your problem here:

> > So far I have one problem without an easy solution: Historically queries
> > like
> > =# SELECT id, generate_series(1,2) FROM (VALUES(1),(2)) few(id);
> > ┌────┬─────────────────┐
> > │ id │ generate_series │
> > ├────┼─────────────────┤
> > │  1 │               1 │
> > │  1 │               2 │
> > │  2 │               1 │
> > │  2 │               2 │
> > └────┴─────────────────┘
> > have preserved the SRF output ordering. But by turning the SRF into a
> > ROWS FROM, there's no guarantee that the cross join between "few" and
> > generate_series(1,3) above is implemented in that order.

But I don't see how that fixes the above problem?  The join, on the
top-level because of aggregates, still can be implemented as
subquery join srf or as srf join subquery, with the different output order
that implies.  I've duct-taped together a solution for that, by forcing
the lateral machinery to always see a dependency from the SRF to the
subquery; but that probably needs a nicer fix than a RangeTblEntry->deps
field which is processed in extract_lateral_references() ;)


> > Besides that I'm structurally wondering whether turning the original
> > query into a subquery is the right thing to do. It requires some kind of
> > ugly munching of Query->*, and has the above problem.
>
> It does not seem like it should be that hard, certainly no worse than
> subquery pullup.  Want to show code?

It's not super hard, there's some stuff like pushing/not-pushing
various sortgrouprefs to the subquery. But I think we can live with it.

Let me clean up the code some, hope to have something today or tomorrow.


> > An alternative approach would be to do this during parse-analysis, but I
> > think that might end up being confusing, because stored rules would
> > suddenly have a noticeably different structure, and it'd tie us a lot
> > more to the details of that transformation than I'd like.
>
> -1 on that; we do not want this transformation visible in stored rules.

Agreed.


> > Besides the removal of the least-common-multiple behaviour of tSRF queries,
> > there's some other consequences that using function scans have:
> > Previously if a tSRF was never evaluated, it didn't cause the number of
> > rows from being increased. E.g.
> > SELECT id, COALESCE(1, generate_series(1,2)) FROM (VALUES(1),(2)) few(id);
> > only produced two rows.  But using joins means that a simple
> > implementation of using ROWS FROM returns four rows.
>
> Hmm.  I don't mind changing behavior in that sort of corner case.
> If we're prepared to discard the LCM behavior, this seems at least
> an order of magnitude less likely to be worth worrying about.

I think it's fine, and potentially less confusing.


> Would it be worth detecting SRFs below CASE/COALESCE/etc and throwing
> an error?  It would be easier to sell throwing an error than silently
> changing behavior, I think.

Hm. We could, but I think the new behaviour would actually make sense in
the long run. Interpreting the coalesce to run on the output of the SRF
doesn't seem bad to me.


I found another edgecase, which we need to make a decision about.
'record' returning SRFs can't be transformed easily into a ROWS
FROM. Consider e.g. the following from the regression tests:

create function array_to_set(anyarray) returns setof record as $$ select i AS "index", $1[i] AS "value" from
generate_subscripts($1,1)  i
 
$$ language sql strict immutable;

select array_to_set(array['one', 'two']);
┌──────────────┐
│ array_to_set │
├──────────────┤
│ (1,one)      │
│ (2,two)      │
└──────────────┘
(2 rows)

which currently works. That currently can't be modeled as ROWS FROM()
directly, because that desperately wants to return the columns as
columns, which we can't do for 'record' returning things, because they
don't have defined columns.  For composite returning SRFs I've currently
implemented that by generating a ROWS() expression, but that doesn't
work for record.

So it seems like we need some, not necessarily user exposed, way of
making nodeFunctionscan.c return the return value as one datum.  One
way, as suggested by Andrew G. on IRC, was to interpret empty column
definition in ROWS FROM interpreted that way.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [sqlsmith] Failed assertion in joinrels.c
Next
From: Robert Haas
Date:
Subject: Re: [sqlsmith] Failed assertion in joinrels.c