Re: Changed SRF in targetlist handling - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Changed SRF in targetlist handling
Date
Msg-id 11228.1470178958@sss.pgh.pa.us
Whole thread Raw
In response to Re: Changed SRF in targetlist handling  (Andres Freund <andres@anarazel.de>)
Responses Re: Changed SRF in targetlist handling  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> I've an implementation that

> 1) turns all targetlist SRF (tSRF from now on) into ROWS FROM
>    expressions. If there's tSRFs in the argument of a tSRF those becomes
>    a separate, lateral, ROWS FROM expression.

> 2) If grouping/window functions are present, the entire query is wrapped
>    in a subquery RTE, except for the set-returning function. All
>    referenced Var|Aggref|GroupingFunc|WindowFunc|Param nodes in the
>    original targetlist are made to reference that subquery, which gets a
>    TargetEntry for them.

FWIW, I'd be inclined to do the subquery RTE all the time, adding some
optimization fence to ensure it doesn't get folded back.  That fixes
your problem here:

> So far I have one problem without an easy solution: Historically queries
> like
> =# SELECT id, generate_series(1,2) FROM (VALUES(1),(2)) few(id);
> ┌────┬─────────────────┐
> │ id │ generate_series │
> ├────┼─────────────────┤
> │  1 │               1 │
> │  1 │               2 │
> │  2 │               1 │
> │  2 │               2 │
> └────┴─────────────────┘
> have preserved the SRF output ordering. But by turning the SRF into a
> ROWS FROM, there's no guarantee that the cross join between "few" and
> generate_series(1,3) above is implemented in that order.


> Besides that I'm structurally wondering whether turning the original
> query into a subquery is the right thing to do. It requires some kind of
> ugly munching of Query->*, and has the above problem.

It does not seem like it should be that hard, certainly no worse than
subquery pullup.  Want to show code?

> An alternative approach would be to do this during parse-analysis, but I
> think that might end up being confusing, because stored rules would
> suddenly have a noticeably different structure, and it'd tie us a lot
> more to the details of that transformation than I'd like.

-1 on that; we do not want this transformation visible in stored rules.

> Besides the removal of the least-common-multiple behaviour of tSRF queries,
> there's some other consequences that using function scans have:
> Previously if a tSRF was never evaluated, it didn't cause the number of
> rows from being increased. E.g.
> SELECT id, COALESCE(1, generate_series(1,2)) FROM (VALUES(1),(2)) few(id);
> only produced two rows.  But using joins means that a simple
> implementation of using ROWS FROM returns four rows.

Hmm.  I don't mind changing behavior in that sort of corner case.
If we're prepared to discard the LCM behavior, this seems at least
an order of magnitude less likely to be worth worrying about.

Having said that, I do seem to recall a bug report about misbehavior when
a SRF was present in just one arm of a CASE statement.  That would have
the same type of behavior as you describe here, and evidently there's at
least one person out there depending on it.

Would it be worth detecting SRFs below CASE/COALESCE/etc and throwing
an error?  It would be easier to sell throwing an error than silently
changing behavior, I think.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Double invocation of InitPostmasterChild in bgworker with -DEXEC_BACKEND
Next
From: Tom Lane
Date:
Subject: Re: [sqlsmith] Failed assertion in joinrels.c