Re: Feedback about Drupal SQL debugging - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Feedback about Drupal SQL debugging
Date
Msg-id 407d949e0908211603i53553592m5923c7fa18ade891@mail.gmail.com
Whole thread Raw
In response to Re: Feedback about Drupal SQL debugging  (Jean-Michel Pouré <jm@poure.com>)
Responses Re: Feedback about Drupal SQL debugging  (Stephen Frost <sfrost@snowman.net>)
Re: Feedback about Drupal SQL debugging  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
2009/8/21 Jean-Michel Pouré <jm@poure.com>:
> PostgreSQL requires all non-aggregated fields to be present in the GROUP
> BY clause (I fixed 10 such issues in Drupal code).
> http://drupal.org/node/555530
>
> Why can't PostgreSQL add the required field automatically? Could this be
> added to PostgreSQL to-do-list?

This is a more complex (and more interesting) topic than what your
blog discusses.

Firstly understand what MySQL is *actually* doing:

select a,b,c from tab group by a

Only sorts and groups by "a" as instructed. The b columns and c
columns are not included in the grouping. So if you have data like:

a,b,c
1,1,1
1,2,2
2,1,1
2,2,2

You'll get two groups because there are only two values of "a". One
group will have a=1 and one group will have a=2. Which value you get
for b and c will be completely arbitrary and unpredictable.

If Postgres added b,c to the GROUP BY it would produce four groups,
because there four different values of <a,b,c>. You *can* get
something similar to MySQL's behaviour using DISTINCT ON:

select distinct on (a) a,b,c from a ORDER BY a,b,c

But Postgres insists you have an ORDER BY which has to agree with the
DISTINCT ON columns and provide some extra column(s) to determine
which values of b,c are chosen.

If Postgres changed on this front it would be to support the SQL
Standard concept of "functional dependency". In cases where some
columns are guaranteed to be unique you can leave them out of the
GROUP BY but still use them in the select list. This isn't MySQL's
behaviour of just allowing you to leave them out and hope that it
doesn't matter which row's values are used. The database has to
actually determine that it really doesn't matter. Typically that would
be because you've grouped by a set of columns which form the key of a
unique constraint, in which case every other column from that table
would also necessarily be the same since they would all come from the
same row of that table.

--
greg
http://mit.edu/~gsstark/resume.pdf


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Feedback about Drupal SQL debugging
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Feedback about Drupal SQL debugging