magical eref alias names - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | magical eref alias names |
Date | |
Msg-id | CA+TgmoYSYmDA2GvanzPMci084n+mVucv0bJ0HPbs6uhmMN6HMg@mail.gmail.com Whole thread Raw |
Responses |
Re: magical eref alias names
|
List | pgsql-hackers |
Hi, When a query references a normal relation, the RangeTblEntry we construct sets rte->eref->aliasname to either the user-specified alias name, if there is one, or otherwise to the table name. But if there's neither a user-specified alias name nor a table name, then we make up a name. I have two related questions about this. First, why do we do it at all? Second, why do we assign the names as we do? Let me start with the second question. After some surveying of the source code, here's a (possibly-incomplete) list of eref names that we fabricate: old, new, *SELECT*, ANY_subquery, *MERGE*, *RESULT*, *SELECT*, excluded, unnamed_subquery, unnamed_join, *GROUP*, *TLOCRN*, *TROCRN*, *SELECT* %d where %d is an integer, *VALUES*, xmltable, json_table. Of these, old, new, and excluded seem to make total sense -- in certain contexts, the user can actually use those names in SQL expressions to refer to stuff. But the rest, to my knowledge, can't be referenced within the query, so I suppose they're just for display purposes. But that seems like an odd choice, because these artificial names are (1) not unique, which means that if they are referenced anywhere such references are ambiguous; (2) not consistent in terms of capitalization and punctuation, which is questionable for something whose primary purpose is to inform the user; and (3) sometimes incomprehensible -- I can make nothing of *TLOCRN* or *TROCRN*, even after looking at the relevant source code, and I wouldn't know what the distinction is between *SELECT* and *SELECT* %d without looking at the source code. And that brings me to the first question, which is why we're even making up these names at all (with the exceptions of new, old, and excluded, which serve a clear purpose). If we needed them to identify things, that would make sense, but since they're neither unique nor comprehensible, they're not really any good for that. And it seems downright confusing at best, and a source of bugs at worst, to mix together user-supplied names and system-generated names in such a way that the one can't be easily distinguished from the other. We even have regression tests verifying that if the user explicitly enters unnamed_join or unnamed_subquery as an alias name, it doesn't break anything due to confusion with identical alias names that might be generated internally, which seems like good evidence for the proposition that I'm not the first person to worry about this being bug-prone. Granted, there are some cases where these names make their way into EXPLAIN output. For example, this query from the regression tests: select * from int4_tbl o where (f1, f1) in (select f1, generate_series(1,50) / 10 g from int4_tbl i group by f1); produces EXPLAIN output that includes this: -> Subquery Scan on "ANY_subquery" Output: "ANY_subquery".f1, "ANY_subquery".g Filter: ("ANY_subquery".f1 = "ANY_subquery".g) However, that seems to be a minority position. Many of the system-generated eref names do not appear in EXPLAIN output, at least not in our regression tests. Furthermore, EXPLAIN itself does post-processing of the relations that appear in the output to set the final names that are displaced (see set_rtable_names()), so if we didn't insert names at parse time, we'd still have an opportunity to make up something for display purposes. You could argue that the results would be worse, but the current results aren't especially amazing so I'm not sure I believe that. Perhaps with some elbow grease they would even be better. So overall I'm just confused here. Is this just a bunch of old cruft that has never been cleaned up or standardized, or does it serve some valuable purpose that is not obvious to me? -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: