Re: RFE: Column aliases in WHERE clauses - Mailing list pgsql-general

From Mike Christensen
Subject Re: RFE: Column aliases in WHERE clauses
Date
Msg-id CABs1bs261SsQSJEy33x7+6zsT+1KQJHMkodqDLCCvG_XBsMgfQ@mail.gmail.com
Whole thread Raw
In response to Re: RFE: Column aliases in WHERE clauses  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: RFE: Column aliases in WHERE clauses
List pgsql-general
On Mon, Sep 17, 2012 at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Mike Christensen <mike@kitchenpc.com> writes:
>> This definitely makes sense in the context of aggregation, but I'm
>> wondering if the same argument applies in the use case originally
>> posted:
>
>> SELECT left(value, 1) as first_letter
>> FROM some_table
>> WHERE first_letter > 'a';
>
>> Obviously, you can write this as:
>
>> SELECT left(value, 1) as first_letter
>> FROM some_table
>> WHERE left(value, 1) > 'a';
>
>> This would run fine, though you'd be doing a sequential scan on the
>> entire table, getting the left most character in each value, then
>> filtering those results.  This of course assumes you haven't built an
>> index on left(value, 1).
>
>> Thus, in theory the compiler *could* resolve the actual definition of
>> first_letter and substitute in that expression on the fly.  I'm
>> wondering if that concept is actually disallowed by the SQL spec.
>
> Yes, it is.  If you read the spec you'll find that the scope of
> visibility of names defined in the SELECT list doesn't include WHERE.
>
> It's easier to understand why this is if you realize that SQL has a very
> clear model of a "pipeline" of query execution.  Conceptually, what
> happens is:
>
> 1. Form the cartesian product of the tables listed in FROM (ie, all
> combinations of rows).
>
> 2. Apply the WHERE condition to each row from 1, and drop rows that
> don't pass it.
>
> 3. If there's a GROUP BY, merge the surviving rows into groups.
>
> 4. If there's aggregate functions, compute those over the rows in
> each group.
>
> 5. If there's a HAVING, filter the grouped rows according to that.
>
> 6. Evaluate the SELECT expressions for each remaining row.
>
> 7. If there's an ORDER BY, evaluate those expressions and sort the
> remaining rows accordingly.
>
> (Obviously, implementations try to improve on this - you don't want
> to actually form the cartesian product - but that's the conceptual
> model.)
>
> The traditional shortcut of doing "ORDER BY select-column-reference"
> is okay according to this world view, because the SELECT expressions
> are already available when ORDER BY needs them.  However, it's not
> sensible to refer to SELECT outputs in WHERE, HAVING, or GROUP BY,
> because those steps precede the evaluation of the SELECT expressions.
>
> This isn't just academic nit-picking either, because the SELECT
> expressions might not be valid for rows that don't pass WHERE etc.
> Consider
>         SELECT 1/x AS inverse FROM data WHERE x <> 0;
> The implementation *must* apply WHERE before computing the SELECT
> expressions, or it'll get zero-divide failures that should not happen.
>
> Now, having said all that, if you try it you'll find that Postgres
> does allow select column references in GROUP BY, using the model
> you propose above of copying whatever expression is in SELECT into
> GROUP BY.  This is, to put it politely, a mistake that we are now
> stuck with for backwards-compatibility reasons.  It's not spec compliant
> and it doesn't fit the language's conceptual model, but it's been that
> way for long enough that we're not likely to take it out.  We are not,
> however, gonna introduce the same mistake elsewhere.
>
>> Obviously, it would add complexity (and compile overhead) but would be
>> somewhat handy to avoid repeating really complicated expressions.
>> Perhaps Common Table Expressions are a better way of doing this thing
>> anyhow.
>
> CTEs or sub-selects are a better answer for that.  Each sub-select has
> its own instance of the conceptual pipeline.

Excellent information, Tom!  I've been somewhat curious on this
behavior for some time now, and it's great to get a detailed answer..

Mike


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: RFE: Column aliases in WHERE clauses
Next
From: Jeff Janes
Date:
Subject: Re: Index creation takes more time?