Re: magical eref alias names - Mailing list pgsql-hackers

From Robert Haas
Subject Re: magical eref alias names
Date
Msg-id CA+TgmobkyBFyXzi_d6QtcM55zixYMwGLkgQRTpdphrE2a-5OUw@mail.gmail.com
Whole thread Raw
In response to Re: magical eref alias names  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: magical eref alias names
List pgsql-hackers
On Thu, Jan 2, 2025 at 3:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Global uniqueness across the database (not single queries) would be
> needed to prevent cases where different views use the same generated
> names.  The only way I can see to do that without nasty performance
> costs is to use something like an OID counter.  Which would mean
> that rather than nice names like "union_1", "union_2", etc, you'd
> soon be looking at "union_5846926".  I don't think anyone would
> find that to be an improvement on what we're doing now.

Oh, I agree, but I don't see why anyone would care whether rel names
are unique across different queries. When I mentioned global
uniqueness, I meant unique within a query, like what
set_rtable_names() does after the fact.

> > If we had global uniqueness, or even per-subquery-level uniqueness,
> > this would sort itself out somewhat nicely, I think. We would just end
> > up with select_1 and select_2 or union_1 and union_2 or something like
> > that, and I think that would be strictly better than the status quo
> > where we do sometimes generate *SELECT* %d, but also sometimes just
> > *SELECT* and other times unnamed_subquery, and also only ever *VALUES*
> > and not *VALUES* %d.
>
> I'll concede that it'd be nicer.  But I'm not convinced it'd be enough
> nicer to justify the costs of changing.  We've been doing it this way
> for a long time, and AFAIR you're the first to complain about it.

I'm not sure it's enough nicer to justify the cost of changing,
either. I do think that "you're the first to complain about it" is not
a convincing argument, here or in general. Every time somebody reports
a new problem, they are the first to complain about it, but that does
make the problem any less real.

The reason that I got interested in this problem was because of the
thread about allowing extensions to control planner behavior. I wrote
some words about it over there, but it's been a while so those
thoughts might not be entirely up to date. What I've found is that
it's a lot easier to insert a hook or three to allow an extension to
control the planner behavior than it is to get those hooks to do
anything useful, and that's precisely because it is hard to find a
useful way to identify a particular part of the query plan. If the
query planner were a person sitting next to you at your computer, you
could point at the screen with your finger and say "hey, do you see
this part of the EXPLAIN plan right here? could you try making this a
sequential scan rather than an index scan?" and the planner could say
"sure, let me re-plan it that way and I'll show you how it turns out".
Since the planner is incorporeal and cannot see your finger, you want
to identify "this part of the EXPLAIN plan right here" in some other
way, like with a name, but the names known at parsing time and
planning time are not unique and need not match what shows up in the
EXPLAIN output.

Concretely, if the user creates a partitioned table called *VALUES*
with several partitions and then creates another such table, also
called *VALUES*, in a different schema, and then joins the two of them
together; and then does UNION ALL with a subquery that actually uses
VALUES (), and then somewhere includes a subquery that also queries
one of the tables actually called *VALUES*, you cannot meaningfully
use the label *VALUES* to identify one particular scan; and you can't
use anything that appears in the EXPLAIN output for that purpose
either because the next planning cycle won't have those labels
available *during planning* when it needs to honor whatever plan-shape
requests were made.

Now, I'm firmly convinced that this is a real problem and worth
solving, but let me be clear that I don't think the solution is
anywhere on this thread, nor do I think that it is simple. My original
proposal of getting rid of system-generated fake names isn't
necessary, because you very helpfully pointed out that I can look at
whether RTE->alias->aliasname exists to figure that out. Also,
enforcing global uniqueness across the query at parse time wouldn't
actually help, because when we apply rewriter rules we can introduce
new relations into the query, and then when we expand inheritance
hierarchies we can introduce even more new relations into the query;
and what the user will care about if they want to modify "that portion
of the plan right there" is what ultimately ends up in the plan, not
what was in the query at parse time. So as far as I am concerned, this
thread has served the useful purpose of making me smarter and I can't
really see a way for it to do anything more than that; but if it's
given you a clever idea for something that we could change in the
code, I'm certainly happy to hear about that.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Larry Rosenman
Date:
Subject: Re: Fwd: Re: A new look at old NFS readdir() problems?
Next
From: Robert Haas
Date:
Subject: Re: apply_scanjoin_target_to_paths and partitionwise join