2009/8/21 Jean-Michel Pouré <jm@poure.com>:
> PostgreSQL requires all non-aggregated fields to be present in the GROUP
> BY clause (I fixed 10 such issues in Drupal code).
> http://drupal.org/node/555530
>
> Why can't PostgreSQL add the required field automatically? Could this be
> added to PostgreSQL to-do-list?
This is a more complex (and more interesting) topic than what your
blog discusses.
Firstly understand what MySQL is *actually* doing:
select a,b,c from tab group by a
Only sorts and groups by "a" as instructed. The b columns and c
columns are not included in the grouping. So if you have data like:
a,b,c
1,1,1
1,2,2
2,1,1
2,2,2
You'll get two groups because there are only two values of "a". One
group will have a=1 and one group will have a=2. Which value you get
for b and c will be completely arbitrary and unpredictable.
If Postgres added b,c to the GROUP BY it would produce four groups,
because there four different values of <a,b,c>. You *can* get
something similar to MySQL's behaviour using DISTINCT ON:
select distinct on (a) a,b,c from a ORDER BY a,b,c
But Postgres insists you have an ORDER BY which has to agree with the
DISTINCT ON columns and provide some extra column(s) to determine
which values of b,c are chosen.
If Postgres changed on this front it would be to support the SQL
Standard concept of "functional dependency". In cases where some
columns are guaranteed to be unique you can leave them out of the
GROUP BY but still use them in the select list. This isn't MySQL's
behaviour of just allowing you to leave them out and hope that it
doesn't matter which row's values are used. The database has to
actually determine that it really doesn't matter. Typically that would
be because you've grouped by a set of columns which form the key of a
unique constraint, in which case every other column from that table
would also necessarily be the same since they would all come from the
same row of that table.
--
greg
http://mit.edu/~gsstark/resume.pdf