Re: Alias of VALUES RTE in explain plan - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Alias of VALUES RTE in explain plan
Date
Msg-id CA+TgmobgTZCjB+_Ld3b44p42ROhwPB8aJFXGwYZhdjXcJrEWnA@mail.gmail.com
Whole thread Raw
In response to Re: Alias of VALUES RTE in explain plan  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Alias of VALUES RTE in explain plan
List pgsql-hackers
On Sat, Nov 2, 2024 at 1:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> regression=# create view vv as SELECT * FROM (VALUES (4),(2),(3),(1) ORDER BY t1.x LIMIT 2) AS t1(x);
> CREATE VIEW
> regression=# \d+ vv
>                              View "public.vv"
>  Column |  Type   | Collation | Nullable | Default | Storage | Description
> --------+---------+-----------+----------+---------+---------+-------------
>  x      | integer |           |          |         | plain   |
> View definition:
>  SELECT x
>    FROM ( VALUES (4), (2), (3), (1)
>           ORDER BY t1_1.x
>          LIMIT 2) t1(x);
>
> ruleutils has decided that it needs to make the two "t1" table
> aliases distinct.  But of course that will fail on reload:
>
> regression=# SELECT x
> regression-#    FROM ( VALUES (4), (2), (3), (1)
> regression(#           ORDER BY t1_1.x
> regression(#          LIMIT 2) t1(x);
> ERROR:  missing FROM-clause entry for table "t1_1"
> LINE 3:           ORDER BY t1_1.x
>                            ^
> It seems sufficient to avoid alias pushdown when there's an ORDER BY
> inside the VALUES subquery.  We disallow a locking clause, and
> while there can be LIMIT/OFFSET, those aren't allowed to reference the
> VALUES output anyway.  I added some test cases to show that this is
> enough to make view-dumping behave sanely.

I'm concerned about taking things in this direction. There's two scans
here, really: a Values Scan for the VALUES construct, and then a
Subquery Scan sitting on top of it that will normally be optimized
away. It seems to me that you're basically guessing whether the
subquery scan will be optimized away to a sufficient degree that its
alias will not leak out anywhere. But that seems a bit fragile and
error-prone. Whether to elide the subquery scan is a planner decision;
what aliases to assign to the planner output is a ruleutils.c
decision; but here you're talking about rejiggering things at parse
time. The correctness of that rejiggering depends crucially on what
will happen at plan time and then at EXPLAIN/ruleutils time, but the
rules for what will happen at those later times are pretty darn
complicated, so I feel like this is creating an awful lot of action at
a distance.

If were able (and I suspect we're not, but hypothetically) to in
effect pull up the subquery at parse time, so that to the planner and
executor it doesn't even exist, then I think that would be perfectly
fine, because then we would have strong reasons for believing that no
later decision can turn our parse-time decision into a problem. But to
leave that subquery there and guess that it's going to disappear
before we get to EXPLAIN doesn't seem nearly as safe. It seems pretty
easy to either miss things (like the ORDER BY case) or even to have
future planner changes break stuff.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: magical eref alias names
Next
From: Sami Imseih
Date:
Subject: Re: Vacuum statistics