Thread: Is this a bug?

Is this a bug?

From
Prateek Sanyal
Date:
I discovered a difference between sqlite3 and psql and I am not sure if it
is intentional but it is definitely inconvenient.

If you use an alias for an aggregate function and then do a GROUP BY with
HAVING, you can't use that alias in the HAVING and you have to type in the
entire function again.

This is not the case with sqlite3 where you can just use the alias. Why
doesn't psql allow this? It seems more efficient.

Take a look at the following example:

-- Schema for sample table named "invoices":

-- id => int

-- customer_id => int

-- total_sale => decimal
Here is the query:
********************

SELECT SUM(total_sale) AS sum_sales FROM invoices GROUP BY customer_id
HAVING sum_sales > 20 ORDER BY sum_sales DESC;

********************

This code above works in sqlite3 but not psql. For psql I have to do the
following:

********************

SELECT SUM(total_sale) AS sum_sales FROM invoices GROUP BY customer_id
HAVING SUM(total_sale) > 20 ORDER BY sum_sales DESC;

********************

Regards,
Prateek.

Re: Is this a bug?

From
Tom Lane
Date:
Prateek Sanyal <sanyal.prateek@gmail.com> writes:
> I discovered a difference between sqlite3 and psql and I am not sure if it
> is intentional but it is definitely inconvenient.

> If you use an alias for an aggregate function and then do a GROUP BY with
> HAVING, you can't use that alias in the HAVING and you have to type in the
> entire function again.

> This is not the case with sqlite3 where you can just use the alias. Why
> doesn't psql allow this?

Because it is contrary to both the letter and the spirit of the SQL
standard.  The SELECT's result list is logically computed after every
operation except ORDER BY, so it's really not sensible for clauses
like GROUP BY or HAVING to refer to outputs of the SELECT list.
What's more, it's ambiguous, because the same name might mean
different things depending on whether you consider it to be an
input column name or a result column name.

PG deviates from the spec to the extent of allowing GROUP BY items
to be output column names *as long as they are just that, and not
expressions*.  (To my mind, that's probably a long-ago mistake,
but it's handy enough that there's never been any serious move to
remove it.)  Even if we applied that policy to HAVING, which
we don't, your example wouldn't work because the reference is
within an expression.

> SELECT SUM(total_sale) AS sum_sales FROM invoices GROUP BY customer_id
> HAVING SUM(total_sale) > 20 ORDER BY sum_sales DESC;

This code is correct per spec and should work in any SQL DBMS.  At least
as far as PG is concerned, there's no performance penalty, since the
common SUM() expression is evaluated only once anyway.

            regards, tom lane

Re: Is this a bug?

From
Prateek Sanyal
Date:
I was going to present my argument until I saw who signed off this email.
Thank you Mr. Lane. Now I will send a bug report to Microsoft and hope that
Bill Gates responds.

Regards,
Prateek.

On Fri, Jan 22, 2016 at 9:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Prateek Sanyal <sanyal.prateek@gmail.com> writes:
> > I discovered a difference between sqlite3 and psql and I am not sure if
> it
> > is intentional but it is definitely inconvenient.
>
> > If you use an alias for an aggregate function and then do a GROUP BY with
> > HAVING, you can't use that alias in the HAVING and you have to type in
> the
> > entire function again.
>
> > This is not the case with sqlite3 where you can just use the alias. Why
> > doesn't psql allow this?
>
> Because it is contrary to both the letter and the spirit of the SQL
> standard.  The SELECT's result list is logically computed after every
> operation except ORDER BY, so it's really not sensible for clauses
> like GROUP BY or HAVING to refer to outputs of the SELECT list.
> What's more, it's ambiguous, because the same name might mean
> different things depending on whether you consider it to be an
> input column name or a result column name.
>
> PG deviates from the spec to the extent of allowing GROUP BY items
> to be output column names *as long as they are just that, and not
> expressions*.  (To my mind, that's probably a long-ago mistake,
> but it's handy enough that there's never been any serious move to
> remove it.)  Even if we applied that policy to HAVING, which
> we don't, your example wouldn't work because the reference is
> within an expression.
>
> > SELECT SUM(total_sale) AS sum_sales FROM invoices GROUP BY customer_id
> > HAVING SUM(total_sale) > 20 ORDER BY sum_sales DESC;
>
> This code is correct per spec and should work in any SQL DBMS.  At least
> as far as PG is concerned, there's no performance penalty, since the
> common SUM() expression is evaluated only once anyway.
>
>                         regards, tom lane
>