Thread: cross tables, SELECT expressions, and GROUP BY problem
Hi, I'm relatively stretched when it comes to SQL but I'm getting there. I have a question though: Thanks to various web sites I've succeeded in creating simple cross tables in PostgreSQL. I'm now trying more complicated examples (migrating queries from MS Access to PostgreSQL). I'm stuck on getting grouping to work where the selection element isn't a real table field, where it is generated by a CASE statement. Here's a full actual SELECT statement, that works but isn't quite what I need, followed by the first few rows of output: SELECT DISTINCTCASE WHEN lower(c."Order") = 'coleoptera' THEN 5 WHEN lower(c."Order") = 'trichoptera' THEN 8 WHENlower(c."Order") = 'ephemeroptera' THEN 6 WHEN lower(c."Order") = 'plecoptera' THEN 7 WHEN lower(c."Class") = 'oligochaeta'THEN 1 WHEN lower(c."Family") LIKE 'chiron%' THEN 2 WHEN lower(c."Order") = 'diptera' AND lower(c."Family")NOT LIKE 'chiron%' THEN 3 ELSE 4END AS "Ranking",CASE WHEN lower(c."Order") = 'coleoptera' THEN 'Coleoptera' WHEN lower(c."Order") = 'trichoptera' THEN 'Trichoptera' WHEN lower(c."Order") = 'ephemeroptera' THEN 'Ephemeroptera' WHEN lower(c."Order") = 'plecoptera' THEN 'Plecoptera' WHEN lower(c."Class") = 'oligochaeta' THEN 'Oligochaeta' WHEN lower(c."Family") LIKE 'chiron%' THEN 'Chironomidae' WHEN lower(c."Order") = 'diptera' AND lower(c."Family")NOT LIKE 'chiron%' THEN 'Diptera (Other)' ELSE 'Other'END AS "Taxa",SUM(CASE WHEN b."LocationCode" ='2222011' THEN c."Count" END) AS "2222011",SUM(CASE WHEN b."LocationCode" = '2222012' THEN c."Count" END) AS "2222012",SUM(CASEWHEN b."LocationCode" = '2222013' THEN c."Count" END) AS "2222013",SUM(CASE WHEN b."LocationCode" = '2222014'THEN c."Count" END) AS "2222014" FROM "tblBugIDSheetInfo" bINNER JOIN "tblBugCount" c USING ("BugSheetID") GROUP BY c."Order", c."Class", c."Family" ORDER BY "Ranking" Generates the following output: Ranking Taxa 2222011 2222012 2222013 2222014 --------------------------------------------------------------- 1 "Oligochaeta" 487 1711 1759 1078 1 "Oligochaeta" 7 1 "Oligochaeta" 2 "Chironomidae" 1385 2335 1500 1513 2 "Chironomidae" 3 "Diptera (Other)" 5 3 "Diptera (Other)" 1 1 3 3 "Diptera (Other)" 199 19 40 37 3 "Diptera (Other)" ... I want to GROUP on the "Ranking" field as well so that all rows with the same "Ranking" value are SUMmed. That is, I'm trying to achieve the following: Ranking Taxa 2222011 2222012 2222013 2222014 --------------------------------------------------------------- 1 "Oligochaeta" 494 1711 1759 1078 2 "Chironomidae" 1385 2335 1500 1513 3 "Diptera (Other)" 199 20 41 45 ... I tried including "Ranking" at the head of the GROUP BY list, but it has no effect. Can anyone show me what I'm doing wrong and/or show me what I need to change? I'd rather avoid wrapping the whole query in another SELECT that would duplicate this one, assuming that that would work. Thanks. Ross Johnson PS. I just want to acknowledge the very useful web sites that I found via google which helped out considerably getting me this far with cross- tables. The above query is based on this one: http://dev.mysql.com/tech-resources/articles/wizard/print_version.html
On Sun, 2006-04-02 at 16:43 +1000, Ross Johnson wrote: > Hi, > > I'm relatively stretched when it comes to SQL but I'm getting there. I > have a question though: > > Thanks to various web sites I've succeeded in creating simple cross > tables in PostgreSQL. I'm now trying more complicated examples > (migrating queries from MS Access to PostgreSQL). I'm stuck on getting > grouping to work where the selection element isn't a real table field, > where it is generated by a CASE statement. Here's a full actual SELECT > statement, that works but isn't quite what I need, followed by the first > few rows of output: > > SELECT DISTINCT > CASE > WHEN lower(c."Order") = 'coleoptera' THEN 5 > WHEN lower(c."Order") = 'trichoptera' THEN 8 > WHEN lower(c."Order") = 'ephemeroptera' THEN 6 > WHEN lower(c."Order") = 'plecoptera' THEN 7 > WHEN lower(c."Class") = 'oligochaeta' THEN 1 > WHEN lower(c."Family") LIKE 'chiron%' THEN 2 > WHEN lower(c."Order") = 'diptera' AND lower(c."Family") NOT LIKE 'chiron%' THEN 3 > ELSE 4 > END AS "Ranking", > CASE > WHEN lower(c."Order") = 'coleoptera' THEN 'Coleoptera' > WHEN lower(c."Order") = 'trichoptera' THEN 'Trichoptera' > WHEN lower(c."Order") = 'ephemeroptera' THEN 'Ephemeroptera' > WHEN lower(c."Order") = 'plecoptera' THEN 'Plecoptera' > WHEN lower(c."Class") = 'oligochaeta' THEN 'Oligochaeta' > WHEN lower(c."Family") LIKE 'chiron%' THEN 'Chironomidae' > WHEN lower(c."Order") = 'diptera' AND lower(c."Family") NOT LIKE 'chiron%' THEN 'Diptera (Other)' > ELSE 'Other' > END AS "Taxa", > SUM(CASE WHEN b."LocationCode" = '2222011' THEN c."Count" END) AS "2222011", > SUM(CASE WHEN b."LocationCode" = '2222012' THEN c."Count" END) AS "2222012", > SUM(CASE WHEN b."LocationCode" = '2222013' THEN c."Count" END) AS "2222013", > SUM(CASE WHEN b."LocationCode" = '2222014' THEN c."Count" END) AS "2222014" > FROM "tblBugIDSheetInfo" b > INNER JOIN "tblBugCount" c USING ("BugSheetID") > GROUP BY c."Order", c."Class", c."Family" > ORDER BY "Ranking" > > Generates the following output: > > Ranking Taxa 2222011 2222012 2222013 2222014 > --------------------------------------------------------------- > 1 "Oligochaeta" 487 1711 1759 1078 > 1 "Oligochaeta" 7 > 1 "Oligochaeta" > 2 "Chironomidae" 1385 2335 1500 1513 > 2 "Chironomidae" > 3 "Diptera (Other)" 5 > 3 "Diptera (Other)" 1 1 3 > 3 "Diptera (Other)" 199 19 40 37 > 3 "Diptera (Other)" > ... I should add that I've just tried the following query, which is just a rearrangement of the above query using a sub-SELECT, and the result is the same as above apart from a slightly different ordering of the rows. That is, GROUP BY "Ranking" still doesn't appear to do anything. I was under the impression that a sub-SELECT creates a temporary, or at least a pseudo-temporary table (a "virtual" table as it's called in the documentation), so I would have expected this query to work even if the above one doesn't. SELECT DISTINCT"Ranking", "Taxa",SUM(CASE WHEN b."LocationCode" = '2222011' THEN c."Count" END) AS "2222011",SUM(CASE WHENb."LocationCode" = '2222012' THEN c."Count" END) AS "2222012",SUM(CASE WHEN b."LocationCode" = '2222013' THEN c."Count"END) AS "2222013",SUM(CASE WHEN b."LocationCode" = '2222014' THEN c."Count" END) AS "2222014" FROM "tblBugIDSheetInfo" bINNER JOIN ( SELECT "BugSheetID", CASE WHEN lower("Order") = 'coleoptera' THEN 5 WHEN lower("Order") = 'trichoptera' THEN 8 WHEN lower("Order") = 'ephemeroptera' THEN 6 WHEN lower("Order")= 'plecoptera' THEN 7 WHEN lower("Class") = 'oligochaeta' THEN 1 WHEN lower("Family") LIKE 'chiron%'THEN 2 WHEN lower("Order") = 'diptera' AND lower("Family") NOT LIKE 'chiron%' THEN 3 ELSE 4 ENDAS "Ranking", CASE WHEN lower("Order") = 'coleoptera' THEN 'Coleoptera' WHEN lower("Order") = 'trichoptera'THEN 'Trichoptera' WHEN lower("Order") = 'ephemeroptera' THEN 'Ephemeroptera' WHEN lower("Order")= 'plecoptera' THEN 'Plecoptera' WHEN lower("Class") = 'oligochaeta' THEN 'Oligochaeta' WHENlower("Family") LIKE 'chiron%' THEN 'Chironomidae' WHEN lower("Order") = 'diptera' AND lower("Family") NOT LIKE'chiron%' THEN 'Diptera (Other)' ELSE 'Other' END AS "Taxa", "Order", "Class", "Family", "Count" FROM"tblBugCount") c USING ("BugSheetID") GROUP BY c."Ranking", c."Taxa", c."Order", c."Class", c."Family" ORDER BY "Ranking" In the doco, I found this statement: "In strict SQL, GROUP BY can only group by columns of the source table but PostgreSQL extends this to also allow GROUP BY to group by columns in the select list. Grouping by value expressions instead of simple column names is also allowed." This seems to relate to what I'm trying to do, and seems to say that it should be possible, unless I'm misreading it. Is my only option to create an actual temporary table? I was hoping to avoid that. If I must, is there a function to generate unique table names that I can use? Regards. Ross Johnson > I want to GROUP on the "Ranking" field as well so that all rows with the > same "Ranking" value are SUMmed. That is, I'm trying to achieve the > following: > > Ranking Taxa 2222011 2222012 2222013 2222014 > --------------------------------------------------------------- > 1 "Oligochaeta" 494 1711 1759 1078 > 2 "Chironomidae" 1385 2335 1500 1513 > 3 "Diptera (Other)" 199 20 41 45 > ... > > I tried including "Ranking" at the head of the GROUP BY list, but it has > no effect. > > Can anyone show me what I'm doing wrong and/or show me what I need to > change? I'd rather avoid wrapping the whole query in another SELECT that > would duplicate this one, assuming that that would work. > > Thanks. > Ross Johnson > > PS. I just want to acknowledge the very useful web sites that I found > via google which helped out considerably getting me this far with cross- > tables. The above query is based on this one: > > http://dev.mysql.com/tech-resources/articles/wizard/print_version.html > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster >
Following up my own question again, I've realised my error and solved my problem - in the interests of completing this thread, the working query (two versions) can be found below... On Sun, 2006-04-02 at 18:00 +1000, Ross Johnson wrote: > On Sun, 2006-04-02 at 16:43 +1000, Ross Johnson wrote: > > Hi, > > > > I'm relatively stretched when it comes to SQL but I'm getting there. I > > have a question though: > > > > Thanks to various web sites I've succeeded in creating simple cross > > tables in PostgreSQL. I'm now trying more complicated examples > > (migrating queries from MS Access to PostgreSQL). I'm stuck on getting > > grouping to work where the selection element isn't a real table field, > > where it is generated by a CASE statement. Here's a full actual SELECT > > statement, that works but isn't quite what I need, followed by the first > > few rows of output: > > > > SELECT DISTINCT > > CASE > > WHEN lower(c."Order") = 'coleoptera' THEN 5 > > WHEN lower(c."Order") = 'trichoptera' THEN 8 > > WHEN lower(c."Order") = 'ephemeroptera' THEN 6 > > WHEN lower(c."Order") = 'plecoptera' THEN 7 > > WHEN lower(c."Class") = 'oligochaeta' THEN 1 > > WHEN lower(c."Family") LIKE 'chiron%' THEN 2 > > WHEN lower(c."Order") = 'diptera' AND lower(c."Family") NOT LIKE 'chiron%' THEN 3 > > ELSE 4 > > END AS "Ranking", > > CASE > > WHEN lower(c."Order") = 'coleoptera' THEN 'Coleoptera' > > WHEN lower(c."Order") = 'trichoptera' THEN 'Trichoptera' > > WHEN lower(c."Order") = 'ephemeroptera' THEN 'Ephemeroptera' > > WHEN lower(c."Order") = 'plecoptera' THEN 'Plecoptera' > > WHEN lower(c."Class") = 'oligochaeta' THEN 'Oligochaeta' > > WHEN lower(c."Family") LIKE 'chiron%' THEN 'Chironomidae' > > WHEN lower(c."Order") = 'diptera' AND lower(c."Family") NOT LIKE 'chiron%' THEN 'Diptera (Other)' > > ELSE 'Other' > > END AS "Taxa", > > SUM(CASE WHEN b."LocationCode" = '2222011' THEN c."Count" END) AS "2222011", > > SUM(CASE WHEN b."LocationCode" = '2222012' THEN c."Count" END) AS "2222012", > > SUM(CASE WHEN b."LocationCode" = '2222013' THEN c."Count" END) AS "2222013", > > SUM(CASE WHEN b."LocationCode" = '2222014' THEN c."Count" END) AS "2222014" > > FROM "tblBugIDSheetInfo" b > > INNER JOIN "tblBugCount" c USING ("BugSheetID") > > GROUP BY c."Order", c."Class", c."Family" > > ORDER BY "Ranking" > > > > Generates the following output: > > > > Ranking Taxa 2222011 2222012 2222013 2222014 > > --------------------------------------------------------------- > > 1 "Oligochaeta" 487 1711 1759 1078 > > 1 "Oligochaeta" 7 > > 1 "Oligochaeta" > > 2 "Chironomidae" 1385 2335 1500 1513 > > 2 "Chironomidae" > > 3 "Diptera (Other)" 5 > > 3 "Diptera (Other)" 1 1 3 > > 3 "Diptera (Other)" 199 19 40 37 > > 3 "Diptera (Other)" > > ... > > I should add that I've just tried the following query, which is just a > rearrangement of the above query using a sub-SELECT, and the result is > the same as above apart from a slightly different ordering of the rows. > That is, GROUP BY "Ranking" still doesn't appear to do anything. I was > under the impression that a sub-SELECT creates a temporary, or at least > a pseudo-temporary table (a "virtual" table as it's called in the > documentation), so I would have expected this query to work even if the > above one doesn't. > > > SELECT DISTINCT > "Ranking", "Taxa", > SUM(CASE WHEN b."LocationCode" = '2222011' THEN c."Count" END) AS "2222011", > SUM(CASE WHEN b."LocationCode" = '2222012' THEN c."Count" END) AS "2222012", > SUM(CASE WHEN b."LocationCode" = '2222013' THEN c."Count" END) AS "2222013", > SUM(CASE WHEN b."LocationCode" = '2222014' THEN c."Count" END) AS "2222014" > FROM "tblBugIDSheetInfo" b > INNER JOIN ( > SELECT "BugSheetID", > CASE > WHEN lower("Order") = 'coleoptera' THEN 5 > WHEN lower("Order") = 'trichoptera' THEN 8 > WHEN lower("Order") = 'ephemeroptera' THEN 6 > WHEN lower("Order") = 'plecoptera' THEN 7 > WHEN lower("Class") = 'oligochaeta' THEN 1 > WHEN lower("Family") LIKE 'chiron%' THEN 2 > WHEN lower("Order") = 'diptera' AND lower("Family") NOT LIKE 'chiron%' THEN 3 > ELSE 4 > END AS "Ranking", > CASE > WHEN lower("Order") = 'coleoptera' THEN 'Coleoptera' > WHEN lower("Order") = 'trichoptera' THEN 'Trichoptera' > WHEN lower("Order") = 'ephemeroptera' THEN 'Ephemeroptera' > WHEN lower("Order") = 'plecoptera' THEN 'Plecoptera' > WHEN lower("Class") = 'oligochaeta' THEN 'Oligochaeta' > WHEN lower("Family") LIKE 'chiron%' THEN 'Chironomidae' > WHEN lower("Order") = 'diptera' AND lower("Family") NOT LIKE 'chiron%' THEN 'Diptera (Other)' > ELSE 'Other' > END AS "Taxa", > "Order", "Class", "Family", "Count" > FROM "tblBugCount") c USING ("BugSheetID") > GROUP BY c."Ranking", c."Taxa", c."Order", c."Class", c."Family" > ORDER BY "Ranking" > The problem was I was trying to do too much in one GROUP BY clause. After getting the right result using a temporary table I backtracked and found that the following version gives me the result I'm looking for. SELECT DISTINCT"Ranking", "Taxa",SUM(CASE WHEN b."LocationCode" = '2222011' THEN c."Count" END) AS "2222011",SUM(CASE WHENb."LocationCode" = '2222012' THEN c."Count" END) AS "2222012",SUM(CASE WHEN b."LocationCode" = '2222013' THEN c."Count"END) AS "2222013",SUM(CASE WHEN b."LocationCode" = '2222014' THEN c."Count" END) AS "2222014" FROM "tblBugIDSheetInfo" bINNER JOIN ( SELECT "BugSheetID", CASE WHEN lower("Order") = 'coleoptera' THEN 5 WHEN lower("Order") = 'trichoptera' THEN 8 WHEN lower("Order") = 'ephemeroptera' THEN 6 WHEN lower("Order")= 'plecoptera' THEN 7 WHEN lower("Class") = 'oligochaeta' THEN 1 WHEN lower("Family") LIKE 'chiron%'THEN 2 WHEN lower("Order") = 'diptera' AND lower("Family") NOT LIKE 'chiron%' THEN 3 ELSE 4 ENDAS "Ranking", CASE WHEN lower("Order") = 'coleoptera' THEN 'Coleoptera' WHEN lower("Order") = 'trichoptera'THEN 'Trichoptera' WHEN lower("Order") = 'ephemeroptera' THEN 'Ephemeroptera' WHEN lower("Order")= 'plecoptera' THEN 'Plecoptera' WHEN lower("Class") = 'oligochaeta' THEN 'Oligochaeta' WHENlower("Family") LIKE 'chiron%' THEN 'Chironomidae' WHEN lower("Order") = 'diptera' AND lower("Family") NOT LIKE'chiron%' THEN 'Diptera (Other)' ELSE 'Other' END AS "Taxa", "Order", "Class", "Family", SUM("Count") AS"Count" FROM "tblBugCount" GROUP BY "BugSheetID", "Order", "Class", "Family") c USING ("BugSheetID") GROUP BY "Ranking", "Taxa" ORDER BY "Ranking" And modifying the original query gives the faster performance: SELECT "Ranking","Taxa",SUM("2222011") AS "2222011",SUM("2222012") AS "2222012",SUM("2222013") AS "2222013",SUM("2222014")AS "2222014" FROM (SELECT DISTINCTCASE WHEN lower(c."Order") = 'coleoptera' THEN 5 WHEN lower(c."Order") = 'trichoptera' THEN 8 WHEN lower(c."Order") = 'ephemeroptera' THEN 6 WHEN lower(c."Order") = 'plecoptera' THEN 7 WHEN lower(c."Class")= 'oligochaeta' THEN 1 WHEN lower(c."Family") LIKE 'chiron%' THEN 2 WHEN lower(c."Order") = 'diptera'AND lower(c."Family") NOT LIKE 'chiron%' THEN 3 ELSE 4END AS "Ranking",CASE WHEN lower(c."Order") = 'coleoptera'THEN 'Coleoptera' WHEN lower(c."Order") = 'trichoptera' THEN 'Trichoptera' WHEN lower(c."Order") = 'ephemeroptera'THEN 'Ephemeroptera' WHEN lower(c."Order") = 'plecoptera' THEN 'Plecoptera' WHEN lower(c."Class") ='oligochaeta' THEN 'Oligochaeta' WHEN lower(c."Family") LIKE 'chiron%' THEN 'Chironomidae' WHEN lower(c."Order") ='diptera' AND lower(c."Family") NOT LIKE 'chiron%' THEN 'Diptera (Other)' ELSE 'Other'END AS "Taxa",SUM(CASE WHEN b."LocationCode"= '2222011' THEN c."Count" END) AS "2222011",SUM(CASE WHEN b."LocationCode" = '2222012' THEN c."Count" END)AS "2222012",SUM(CASE WHEN b."LocationCode" = '2222013' THEN c."Count" END) AS "2222013",SUM(CASE WHEN b."LocationCode"= '2222014' THEN c."Count" END) AS "2222014"FROM "tblBugIDSheetInfo" b INNER JOIN "tblBugCount" c USING("BugSheetID")WHERE b."LocationCode" = '2222011' OR b."LocationCode" = '2222012' OR b."LocationCode" = '2222013' OR b."LocationCode" = '2222014'GROUP BY c."Order", c."Class", c."Family") d GROUP BY "Ranking", "Taxa" ORDER BY "Ranking" Both these queries give me the result I was after: > > I want to GROUP on the "Ranking" field as well so that all rows with the > > same "Ranking" value are SUMmed. That is, I'm trying to achieve the > > following: > > > > Ranking Taxa 2222011 2222012 2222013 2222014 > > --------------------------------------------------------------- > > 1 "Oligochaeta" 494 1711 1759 1078 > > 2 "Chironomidae" 1385 2335 1500 1513 > > 3 "Diptera (Other)" 199 20 41 45 > > ... Applying the grouping at the appropriate points helps :) Regards. Ross Johnson