Thread: Attribute must be GROUPed.... ?
I always had been doubious about the "must" in the error message "Attribute must be GROUPed or used in an aggregate function". I certainly agree that there are many situations in which having an ungrouped attribute in the target list is nonsense (undefined), but I sometimes find situations in which it would be perfetcly meaningful. For example, suppose that you join two tables and the field you group by appears to be the primary key of one of the tables. In this case if I put any of the fields in that table in the target list, it will always be well defined and would make certain queries much simpler. In SQL: # create table cities (id int primary key,name text); # create table people (fullname text,city_id int references cities(id)); # select name,count(*) from people,cities where people.city_id=cities.id group by city_id; name | count ------------+------------- New York | 10 Los Angeles | 15 Seveso | 1 .... Here follows a patch that makes this behaviour configurable via a configuration variable. I apologize in advance if there's something stupid, it's my first patch on postgresql... The default is the current behaviour but I would propose to change the default to be more permissive. Bye! -- Daniele Orlandi Planet Srl Index: src/backend/parser/parse_agg.c =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/backend/parser/parse_agg.c,v retrieving revision 1.52 diff -u -r1.52 parse_agg.c --- src/backend/parser/parse_agg.c 2003/04/03 18:04:09 1.52 +++ src/backend/parser/parse_agg.c 2003/04/30 20:39:29 @@ -34,6 +34,8 @@ static bool check_ungrouped_columns_walker(Node *node, check_ungrouped_columns_context *context); +bool force_targets_grouped = true; + /* * check_ungrouped_columns - * Scan the given expression tree for ungrouped variables (variables @@ -246,13 +248,21 @@ (Node *) groupClauses); /* - * Check the targetlist and HAVING clause for ungrouped variables. + * Check the targetlist clause for ungrouped variables. */ - clause = (Node *) qry->targetList; - if (hasJoinRTEs) - clause = flatten_join_alias_vars(qry, clause); - check_ungrouped_columns(clause, pstate, - groupClauses, have_non_var_grouping); + + if (force_targets_grouped) + { + clause = (Node *) qry->targetList; + if (hasJoinRTEs) + clause = flatten_join_alias_vars(qry, clause); + check_ungrouped_columns(clause, pstate, + groupClauses, have_non_var_grouping); + } + + /* + * Check the HAVING clause for ungrouped variables. + */ clause = (Node *) qry->havingQual; if (hasJoinRTEs) Index: src/backend/utils/misc/guc.c =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/backend/utils/misc/guc.c,v retrieving revision 1.119 diff -u -r1.119 guc.c --- src/backend/utils/misc/guc.c 2003/04/25 19:45:09 1.119 +++ src/backend/utils/misc/guc.c 2003/04/30 20:39:31 @@ -43,6 +43,7 @@ #include "optimizer/paths.h" #include "optimizer/prep.h" #include "parser/parse_expr.h" +#include "parser/parse_agg.h" #include "storage/fd.h" #include "storage/freespace.h" #include "storage/lock.h" @@ -533,6 +534,10 @@ { {"transaction_read_only", PGC_USERSET, GUC_NO_RESET_ALL}, &XactReadOnly, false, NULL, NULL + }, + { + {"force_targets_grouped", PGC_USERSET, GUC_NO_RESET_ALL}, &force_targets_grouped, + true, NULL, NULL }, { Index: src/backend/utils/misc/postgresql.conf.sample =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/backend/utils/misc/postgresql.conf.sample,v retrieving revision 1.77 diff -u -r1.77 postgresql.conf.sample --- src/backend/utils/misc/postgresql.conf.sample 2003/04/19 00:37:28 1.77 +++ src/backend/utils/misc/postgresql.conf.sample 2003/04/30 20:39:31 @@ -219,3 +219,4 @@ #statement_timeout = 0 # 0 is disabled, in milliseconds #db_user_namespace = false #preload_libraries = '' +#force_targets_grouped = true Index: src/bin/psql/tab-complete.c =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/bin/psql/tab-complete.c,v retrieving revision 1.76 diff -u -r1.76 tab-complete.c --- src/bin/psql/tab-complete.c 2003/04/03 20:18:16 1.76 +++ src/bin/psql/tab-complete.c 2003/04/30 20:39:31 @@ -522,6 +522,7 @@ "enable_tidscan", "explain_pretty_print", "extra_float_digits", + "force_targets_grouped", "from_collapse_limit", "fsync", "geqo", Index: src/include/parser/parse_agg.h =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/include/parser/parse_agg.h,v retrieving revision 1.25 diff -u -r1.25 parse_agg.h --- src/include/parser/parse_agg.h 2003/01/17 03:25:04 1.25 +++ src/include/parser/parse_agg.h 2003/04/30 20:39:32 @@ -17,4 +17,6 @@ extern void parseCheckAggregates(ParseState *pstate, Query *qry); +extern bool force_targets_grouped; + #endif /* PARSE_AGG_H */
On Wed, 30 Apr 2003, Daniele Orlandi wrote: > I always had been doubious about the "must" in the error message > "Attribute must be GROUPed or used in an aggregate function". AFAIK it's a requirement of the SQL spec. (SQL92(draft) 7.9 SR 7, "each <column reference> in each <value expression> that references a column of T shall reference a grouping column or be specified within a <set function specification>." > For example, suppose that you join two tables and the field you group by > appears to be the primary key of one of the tables. In this case if I > put any of the fields in that table in the target list, it will always > be well defined and would make certain queries much simpler. Well, it'd mean you didn't have to put the extra columns in the group by list to make them grouping columns.
Stephan Szabo wrote: > > AFAIK it's a requirement of the SQL spec. (SQL92(draft) 7.9 SR 7, "each > <column reference> in each <value expression> that references a column > of T shall reference a grouping column or be specified within a <set > function specification>." I see... How should the "shall" term be considered ? I don't have much knowledge of the SQL specs language. How other DBMS behave in this case ? I know that mysql doesn't enforce this requirement but... mysql is not a perfect reference wrt standards compliance. > Well, it'd mean you didn't have to put the extra columns in the group by > list to make them grouping columns. This is what I currently do as a workaround, but it's not much clean expecially when you have many ungrouped fields in the target list. Bye! -- Daniele Orlandi Planet Srl
On Thu, 1 May 2003, Daniele Orlandi wrote: > Stephan Szabo wrote: > > > > AFAIK it's a requirement of the SQL spec. (SQL92(draft) 7.9 SR 7, "each > > <column reference> in each <value expression> that references a column > > of T shall reference a grouping column or be specified within a <set > > function specification>." > > I see... How should the "shall" term be considered ? I don't have much > knowledge of the SQL specs language. "In the Syntax Rules, the term shall defines conditions that are required to be true of syntactically conforming SQL language." I think most people would write "must", although I think "shall" might be more correct. > How other DBMS behave in this case ? I know that mysql doesn't enforce > this requirement but... mysql is not a perfect reference wrt standards > compliance. I'm not sure actually, someone else will need to speak to this (well, I can test Oracle later, but that's it). > > Well, it'd mean you didn't have to put the extra columns in the group by > > list to make them grouping columns. > > This is what I currently do as a workaround, but it's not much clean > expecially when you have many ungrouped fields in the target list. Yeah, that can get to be a problem... In any case, you'll probably get other comments. Oh yeah, and you'll probably be asked for documentation comments if it's even considered since you're adding a visible GUC entry. :)
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > Yeah, that can get to be a problem... In any case, you'll probably get > other comments. Oh yeah, and you'll probably be asked for documentation > comments if it's even considered since you're adding a visible GUC entry. > :) Well, it won't be --- diking out required error checks without providing a substitute isn't my idea of a useful patch ;) The SQL99 spec defines an improved version of this behavior which I think is what Daniele really would like to have. Basically it says that if column A can be proved functionally dependent on column B then you only need to GROUP BY column B, and then you can use column A without having to explicitly mention it in the GROUP BY list. "Functionally dependent" means there is no possibility of A values being different in rows with the same B value. The spec has a whole lot of verbiage about possible ways to deduce functional dependency, but one easy one is where column B is a primary key and column A is in its table. If someone wants to implement the SQL99 behavior (or even just a useful subset of it), that would be cool with me. It looks like a lot of work though. regards, tom lane
----- Original Message ----- From: "Stephan Szabo" <sszabo@megazone23.bigpanda.com> > > I see... How should the "shall" term be considered ? I don't have much > > knowledge of the SQL specs language. > > "In the Syntax Rules, the term shall defines conditions that are > required to be true of syntactically conforming SQL language." > > I think most people would write "must", although I think "shall" might be > more correct. > This is common usage in requirements docs - I suspect it has a government, perhaps a military origin. Even after vetting and helping to author a great many requirements docs it still makes me cringe, but then I don't split infinitives either ;-) andrew
On Wed, Apr 30, 2003 at 08:31:27PM -0400, Andrew Dunstan wrote: > This is common usage in requirements docs - I suspect it has a government, > perhaps a military origin. Even after vetting and helping to author a great > many requirements docs it still makes me cringe, but then I don't split > infinitives either ;-) <totally ot> It shouldn't make you cringe. According to Fowler (<http://www.bartleby.com/116/213.html>), this version of "shall" is an example of the "pure system" in the distinction between shall and will. So it's neither military nor government in origin, but mediaeval. Anyway, it's not much different from "Thou shalt not steal." </totally ot> A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
I use min(fieldname) as fieldname which is a little more than I want to type but doesn't disturb my groupings. On Wednesday 30 April 2003 05:02 pm, Daniele Orlandi wrote: > Stephan Szabo wrote: > > AFAIK it's a requirement of the SQL spec. (SQL92(draft) 7.9 SR 7, "each > > <column reference> in each <value expression> that references a column > > of T shall reference a grouping column or be specified within a <set > > function specification>." > > I see... How should the "shall" term be considered ? I don't have much > knowledge of the SQL specs language. > > How other DBMS behave in this case ? I know that mysql doesn't enforce > this requirement but... mysql is not a perfect reference wrt standards > compliance. > > > Well, it'd mean you didn't have to put the extra columns in the group by > > list to make them grouping columns. > > This is what I currently do as a workaround, but it's not much clean > expecially when you have many ungrouped fields in the target list. > > Bye!
David Walker wrote: >I use min(fieldname) as fieldname which is a little more than I want to type >but doesn't disturb my groupings. > Min( ) and max( ) somehow seem to also reduce to a limit 1 on duplicate values of which I'm not sure is of much use in the discussion. But if the groups are large then your doing O(n) passes over each column/field. I've seen some databases do the equivalent of a unique on the columns (*A) and where there were different values break the grouping which IMHO is completely wrong. Others have only displayed the first match and discarded the rest for the group on a non-aggregated column/field. (*B). TABLEAa | b |c ---+---+---1 | 1 | 31 | 2 | 33 | 3 | 32 | 2 | 3 select a, b from TABLEA group by c would produce Behavior Aa | b ---+---1 | 32 | 33 | 3 OR Behavior Ba | b ---|---1 | 3 If I had to pick a behavior I would have picked behavior B because one could infer that the nongrouped value was unimportant or if you needed an aggregate operation it would have been specified. Another option would be to be completely indiscriminate and go with the first tuple it found and skip to next token/match/group (*C). Even C* would be ok IMO, it might even be faster. Behavior C1 |a | b | ---|--- |1 | 3 | | Behavior C2 |a | b | ---|--- +----- All are possible outcomes...3 | 3 | | Behavior C3 |a | b | ---|--- |2 | 3 | > >On Wednesday 30 April 2003 05:02 pm, Daniele Orlandi wrote: > > >>Stephan Szabo wrote: >> >> >>>AFAIK it's a requirement of the SQL spec. (SQL92(draft) 7.9 SR 7, "each >>><column reference> in each <value expression> that references a column >>>of T shall reference a grouping column or be specified within a <set >>>function specification>." >>> >>> >>I see... How should the "shall" term be considered ? I don't have much >>knowledge of the SQL specs language. >> >>How other DBMS behave in this case ? I know that mysql doesn't enforce >>this requirement but... mysql is not a perfect reference wrt standards >>compliance. >> >> >> >>>Well, it'd mean you didn't have to put the extra columns in the group by >>>list to make them grouping columns. >>> >>> >>This is what I currently do as a workaround, but it's not much clean >>expecially when you have many ungrouped fields in the target list. >> >>Bye! >> >> > > >---------------------------(end of broadcast)--------------------------- >TIP 4: Don't 'kill -9' the postmaster > >