Re: RFE: Column aliases in WHERE clauses - Mailing list pgsql-general
From | Tom Lane |
---|---|
Subject | Re: RFE: Column aliases in WHERE clauses |
Date | |
Msg-id | 25320.1347924765@sss.pgh.pa.us Whole thread Raw |
In response to | Re: RFE: Column aliases in WHERE clauses (Mike Christensen <mike@kitchenpc.com>) |
Responses |
Re: RFE: Column aliases in WHERE clauses
Re: RFE: Column aliases in WHERE clauses |
List | pgsql-general |
Mike Christensen <mike@kitchenpc.com> writes: > This definitely makes sense in the context of aggregation, but I'm > wondering if the same argument applies in the use case originally > posted: > SELECT left(value, 1) as first_letter > FROM some_table > WHERE first_letter > 'a'; > Obviously, you can write this as: > SELECT left(value, 1) as first_letter > FROM some_table > WHERE left(value, 1) > 'a'; > This would run fine, though you'd be doing a sequential scan on the > entire table, getting the left most character in each value, then > filtering those results. This of course assumes you haven't built an > index on left(value, 1). > Thus, in theory the compiler *could* resolve the actual definition of > first_letter and substitute in that expression on the fly. I'm > wondering if that concept is actually disallowed by the SQL spec. Yes, it is. If you read the spec you'll find that the scope of visibility of names defined in the SELECT list doesn't include WHERE. It's easier to understand why this is if you realize that SQL has a very clear model of a "pipeline" of query execution. Conceptually, what happens is: 1. Form the cartesian product of the tables listed in FROM (ie, all combinations of rows). 2. Apply the WHERE condition to each row from 1, and drop rows that don't pass it. 3. If there's a GROUP BY, merge the surviving rows into groups. 4. If there's aggregate functions, compute those over the rows in each group. 5. If there's a HAVING, filter the grouped rows according to that. 6. Evaluate the SELECT expressions for each remaining row. 7. If there's an ORDER BY, evaluate those expressions and sort the remaining rows accordingly. (Obviously, implementations try to improve on this - you don't want to actually form the cartesian product - but that's the conceptual model.) The traditional shortcut of doing "ORDER BY select-column-reference" is okay according to this world view, because the SELECT expressions are already available when ORDER BY needs them. However, it's not sensible to refer to SELECT outputs in WHERE, HAVING, or GROUP BY, because those steps precede the evaluation of the SELECT expressions. This isn't just academic nit-picking either, because the SELECT expressions might not be valid for rows that don't pass WHERE etc. Consider SELECT 1/x AS inverse FROM data WHERE x <> 0; The implementation *must* apply WHERE before computing the SELECT expressions, or it'll get zero-divide failures that should not happen. Now, having said all that, if you try it you'll find that Postgres does allow select column references in GROUP BY, using the model you propose above of copying whatever expression is in SELECT into GROUP BY. This is, to put it politely, a mistake that we are now stuck with for backwards-compatibility reasons. It's not spec compliant and it doesn't fit the language's conceptual model, but it's been that way for long enough that we're not likely to take it out. We are not, however, gonna introduce the same mistake elsewhere. > Obviously, it would add complexity (and compile overhead) but would be > somewhat handy to avoid repeating really complicated expressions. > Perhaps Common Table Expressions are a better way of doing this thing > anyhow. CTEs or sub-selects are a better answer for that. Each sub-select has its own instance of the conceptual pipeline. regards, tom lane
pgsql-general by date: