Re: haversine formula with postgreSQL - Mailing list pgsql-general

From Tom Lane
Subject Re: haversine formula with postgreSQL
Date
Msg-id 5699.1253228413@sss.pgh.pa.us
Whole thread Raw
In response to Re: haversine formula with postgreSQL  (Mike Christensen <mike@kitchenpc.com>)
List pgsql-general
Mike Christensen <mike@kitchenpc.com> writes:
> This behavior kinda gets me sometimes too, especially in WHERE clauses..
> I'm a bit curious as to why this is so bad.  I could see why it would
> be expensive to do, since your clause wouldn't be indexed - but why is
> the syntax itself not allowed?

It's not logically sensible: per the implicit execution model defined by
the spec, the output list is not computed until after WHERE/HAVING/etc
are evaluated, so it makes no sense to refer to output expressions in
those clauses.  As an example of why the execution ordering is
important, you'd be pretty unhappy if this threw a division-by-zero
error:

    select 1/avg(x) from tab group by y having avg(x) <> 0;

While we could interpret such a reference as meaning to copy the output
expression into the other clause, it doesn't seem like a particularly
good idea to encourage confusion about what the syntax means.

Also, allowing references to output column names here actually creates
an ambiguity: for instance "distance" could also be a column name
available from some table in the FROM clause.  So it's not exactly
cost-free to allow this; it will likely result in queries being silently
interpreted in some way other than what the author expected.

BTW, there are two cases where Postgres *does* allow such references:

1. In ORDER BY.  This is mainly because the SQL spec used to require it.
It's actually logically consistent because ORDER BY is notionally
executed after forming the output expressions, but it's still confusing.
The spec authors thought better of this idea and removed it in SQL99,
but we're still stuck supporting it for backwards compatibility reasons.

2.  In GROUP BY.  This is, frankly, a mistake, and one I wish we could
have a do-over on.  Again we're stuck with it for compatibility reasons,
but we're not likely to extend the mistake to other clauses.

In both these cases, to reduce the scope for ambiguity problems we only
allow references to output columns as simple ORDER or GROUP list items
(for instance "ORDER BY distance" but not "ORDER BY abs(distance)").
This is all right because it's still frequently useful, but you'd seldom
write a WHERE or HAVING clause that consisted *only* of an output-column
name.  So even if we did extend the behavior it wouldn't help much,
unless we were to fling the doors wide open for ambiguity problems.

            regards, tom lane

pgsql-general by date:

Previous
From: Jonathan
Date:
Subject: Re: haversine formula with postgreSQL
Next
From: John R Pierce
Date:
Subject: Re: NAS