Thread: cross tables, SELECT expressions, and GROUP BY problem

cross tables, SELECT expressions, and GROUP BY problem

From
Ross Johnson
Date:
Hi,

I'm relatively stretched when it comes to SQL but I'm getting there. I
have a question though:

Thanks to various web sites I've succeeded in creating simple cross
tables in PostgreSQL. I'm now trying more complicated examples
(migrating queries from MS Access to PostgreSQL). I'm stuck on getting
grouping to work where the selection element isn't a real table field,
where it is generated by a CASE statement. Here's a full actual SELECT
statement, that works but isn't quite what I need, followed by the first
few rows of output:

SELECT DISTINCTCASE    WHEN lower(c."Order") = 'coleoptera' THEN 5    WHEN lower(c."Order") = 'trichoptera' THEN 8
WHENlower(c."Order") = 'ephemeroptera' THEN 6    WHEN lower(c."Order") = 'plecoptera' THEN 7    WHEN lower(c."Class") =
'oligochaeta'THEN 1    WHEN lower(c."Family") LIKE 'chiron%' THEN 2    WHEN lower(c."Order") = 'diptera' AND
lower(c."Family")NOT LIKE 'chiron%' THEN 3    ELSE 4END AS "Ranking",CASE    WHEN lower(c."Order") = 'coleoptera' THEN
'Coleoptera'   WHEN lower(c."Order") = 'trichoptera' THEN 'Trichoptera'    WHEN lower(c."Order") = 'ephemeroptera' THEN
'Ephemeroptera'   WHEN lower(c."Order") = 'plecoptera' THEN 'Plecoptera'    WHEN lower(c."Class") = 'oligochaeta' THEN
'Oligochaeta'   WHEN lower(c."Family") LIKE 'chiron%' THEN 'Chironomidae'    WHEN lower(c."Order") = 'diptera' AND
lower(c."Family")NOT LIKE 'chiron%' THEN 'Diptera (Other)'    ELSE 'Other'END AS "Taxa",SUM(CASE WHEN b."LocationCode"
='2222011' THEN c."Count" END) AS "2222011",SUM(CASE WHEN b."LocationCode" = '2222012' THEN c."Count" END) AS
"2222012",SUM(CASEWHEN b."LocationCode" = '2222013' THEN c."Count" END) AS "2222013",SUM(CASE WHEN b."LocationCode" =
'2222014'THEN c."Count" END) AS "2222014"
 
FROM "tblBugIDSheetInfo" bINNER JOIN "tblBugCount" c USING ("BugSheetID")
GROUP BY c."Order", c."Class", c."Family"
ORDER BY "Ranking"

Generates the following output:

Ranking    Taxa            2222011    2222012    2222013    2222014
---------------------------------------------------------------
1    "Oligochaeta"        487    1711    1759    1078
1    "Oligochaeta"        7            
1    "Oligochaeta"                    
2    "Chironomidae"        1385    2335    1500    1513
2    "Chironomidae"                    
3    "Diptera (Other)"                5
3    "Diptera (Other)"        1    1    3
3    "Diptera (Other)"    199    19    40    37
3    "Diptera (Other)"                
...

I want to GROUP on the "Ranking" field as well so that all rows with the
same "Ranking" value are SUMmed. That is, I'm trying to achieve the
following:

Ranking    Taxa            2222011    2222012    2222013    2222014
---------------------------------------------------------------
1    "Oligochaeta"        494    1711    1759    1078
2    "Chironomidae"        1385    2335    1500    1513
3    "Diptera (Other)"    199    20    41    45
...

I tried including "Ranking" at the head of the GROUP BY list, but it has
no effect.

Can anyone show me what I'm doing wrong and/or show me what I need to
change? I'd rather avoid wrapping the whole query in another SELECT that
would duplicate this one, assuming that that would work.

Thanks.
Ross Johnson

PS. I just want to acknowledge the very useful web sites that I found
via google which helped out considerably getting me this far with cross-
tables. The above query is based on this one:

http://dev.mysql.com/tech-resources/articles/wizard/print_version.html




Re: cross tables, SELECT expressions, and GROUP BY problem

From
Ross Johnson
Date:
On Sun, 2006-04-02 at 16:43 +1000, Ross Johnson wrote:
> Hi,
> 
> I'm relatively stretched when it comes to SQL but I'm getting there. I
> have a question though:
> 
> Thanks to various web sites I've succeeded in creating simple cross
> tables in PostgreSQL. I'm now trying more complicated examples
> (migrating queries from MS Access to PostgreSQL). I'm stuck on getting
> grouping to work where the selection element isn't a real table field,
> where it is generated by a CASE statement. Here's a full actual SELECT
> statement, that works but isn't quite what I need, followed by the first
> few rows of output:
> 
> SELECT DISTINCT
>     CASE
>         WHEN lower(c."Order") = 'coleoptera' THEN 5
>         WHEN lower(c."Order") = 'trichoptera' THEN 8
>         WHEN lower(c."Order") = 'ephemeroptera' THEN 6
>         WHEN lower(c."Order") = 'plecoptera' THEN 7
>         WHEN lower(c."Class") = 'oligochaeta' THEN 1
>         WHEN lower(c."Family") LIKE 'chiron%' THEN 2
>         WHEN lower(c."Order") = 'diptera' AND lower(c."Family") NOT LIKE 'chiron%' THEN 3
>         ELSE 4
>     END AS "Ranking",
>     CASE
>         WHEN lower(c."Order") = 'coleoptera' THEN 'Coleoptera'
>         WHEN lower(c."Order") = 'trichoptera' THEN 'Trichoptera'
>         WHEN lower(c."Order") = 'ephemeroptera' THEN 'Ephemeroptera'
>         WHEN lower(c."Order") = 'plecoptera' THEN 'Plecoptera'
>         WHEN lower(c."Class") = 'oligochaeta' THEN 'Oligochaeta'
>         WHEN lower(c."Family") LIKE 'chiron%' THEN 'Chironomidae'
>         WHEN lower(c."Order") = 'diptera' AND lower(c."Family") NOT LIKE 'chiron%' THEN 'Diptera (Other)'
>         ELSE 'Other'
>     END AS "Taxa",
>     SUM(CASE WHEN b."LocationCode" = '2222011' THEN c."Count" END) AS "2222011",
>     SUM(CASE WHEN b."LocationCode" = '2222012' THEN c."Count" END) AS "2222012",
>     SUM(CASE WHEN b."LocationCode" = '2222013' THEN c."Count" END) AS "2222013",
>     SUM(CASE WHEN b."LocationCode" = '2222014' THEN c."Count" END) AS "2222014"
> FROM "tblBugIDSheetInfo" b
>     INNER JOIN "tblBugCount" c USING ("BugSheetID")
> GROUP BY c."Order", c."Class", c."Family"
> ORDER BY "Ranking"
> 
> Generates the following output:
> 
> Ranking    Taxa            2222011    2222012    2222013    2222014
> ---------------------------------------------------------------
> 1    "Oligochaeta"        487    1711    1759    1078
> 1    "Oligochaeta"        7            
> 1    "Oligochaeta"                    
> 2    "Chironomidae"        1385    2335    1500    1513
> 2    "Chironomidae"                    
> 3    "Diptera (Other)"                5
> 3    "Diptera (Other)"        1    1    3
> 3    "Diptera (Other)"    199    19    40    37
> 3    "Diptera (Other)"                
> ...

I should add that I've just tried the following query, which is just a
rearrangement of the above query using a sub-SELECT, and the result is
the same as above apart from a slightly different ordering of the rows.
That is, GROUP BY "Ranking" still doesn't appear to do anything. I was
under the impression that a sub-SELECT creates a temporary, or at least
a pseudo-temporary table (a "virtual" table as it's called in the
documentation), so I would have expected this query to work even if the
above one doesn't.


SELECT DISTINCT"Ranking", "Taxa",SUM(CASE WHEN b."LocationCode" = '2222011' THEN c."Count" END) AS "2222011",SUM(CASE
WHENb."LocationCode" = '2222012' THEN c."Count" END) AS "2222012",SUM(CASE WHEN b."LocationCode" = '2222013' THEN
c."Count"END) AS "2222013",SUM(CASE WHEN b."LocationCode" = '2222014' THEN c."Count" END) AS "2222014"
 
FROM "tblBugIDSheetInfo" bINNER JOIN (    SELECT "BugSheetID",    CASE        WHEN lower("Order") = 'coleoptera' THEN 5
      WHEN lower("Order") = 'trichoptera' THEN 8        WHEN lower("Order") = 'ephemeroptera' THEN 6        WHEN
lower("Order")= 'plecoptera' THEN 7        WHEN lower("Class") = 'oligochaeta' THEN 1        WHEN lower("Family") LIKE
'chiron%'THEN 2        WHEN lower("Order") = 'diptera' AND lower("Family") NOT LIKE 'chiron%' THEN 3        ELSE 4
ENDAS "Ranking",    CASE        WHEN lower("Order") = 'coleoptera' THEN 'Coleoptera'        WHEN lower("Order") =
'trichoptera'THEN 'Trichoptera'        WHEN lower("Order") = 'ephemeroptera' THEN 'Ephemeroptera'        WHEN
lower("Order")= 'plecoptera' THEN 'Plecoptera'        WHEN lower("Class") = 'oligochaeta' THEN 'Oligochaeta'
WHENlower("Family") LIKE 'chiron%' THEN 'Chironomidae'        WHEN lower("Order") = 'diptera' AND lower("Family") NOT
LIKE'chiron%' THEN 'Diptera (Other)'        ELSE 'Other'    END AS "Taxa",    "Order", "Class", "Family", "Count"
FROM"tblBugCount") c USING ("BugSheetID")
 
GROUP BY c."Ranking", c."Taxa", c."Order", c."Class", c."Family"
ORDER BY "Ranking"


In the doco, I found this statement:
"In strict SQL, GROUP BY can only group by columns of the source table
but PostgreSQL extends this to also allow GROUP BY to group by columns
in the select list. Grouping by value expressions instead of simple
column names is also allowed."

This seems to relate to what I'm trying to do, and seems to say that it
should be possible, unless I'm misreading it.

Is my only option to create an actual temporary table? I was hoping to
avoid that. If I must, is there a function to generate unique table
names that I can use?

Regards.
Ross Johnson

> I want to GROUP on the "Ranking" field as well so that all rows with the
> same "Ranking" value are SUMmed. That is, I'm trying to achieve the
> following:
> 
> Ranking    Taxa            2222011    2222012    2222013    2222014
> ---------------------------------------------------------------
> 1    "Oligochaeta"        494    1711    1759    1078
> 2    "Chironomidae"        1385    2335    1500    1513
> 3    "Diptera (Other)"    199    20    41    45
> ...
> 
> I tried including "Ranking" at the head of the GROUP BY list, but it has
> no effect.
> 
> Can anyone show me what I'm doing wrong and/or show me what I need to
> change? I'd rather avoid wrapping the whole query in another SELECT that
> would duplicate this one, assuming that that would work.
> 
> Thanks.
> Ross Johnson
> 
> PS. I just want to acknowledge the very useful web sites that I found
> via google which helped out considerably getting me this far with cross-
> tables. The above query is based on this one:
> 
> http://dev.mysql.com/tech-resources/articles/wizard/print_version.html
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
> 



Re: cross tables, SELECT expressions, and GROUP BY problem

From
Ross Johnson
Date:
Following up my own question again, I've realised my error and solved my
problem - in the interests of completing this thread, the working query
(two versions) can be found below...

On Sun, 2006-04-02 at 18:00 +1000, Ross Johnson wrote:
> On Sun, 2006-04-02 at 16:43 +1000, Ross Johnson wrote:
> > Hi,
> > 
> > I'm relatively stretched when it comes to SQL but I'm getting there. I
> > have a question though:
> > 
> > Thanks to various web sites I've succeeded in creating simple cross
> > tables in PostgreSQL. I'm now trying more complicated examples
> > (migrating queries from MS Access to PostgreSQL). I'm stuck on getting
> > grouping to work where the selection element isn't a real table field,
> > where it is generated by a CASE statement. Here's a full actual SELECT
> > statement, that works but isn't quite what I need, followed by the first
> > few rows of output:
> > 
> > SELECT DISTINCT
> >     CASE
> >         WHEN lower(c."Order") = 'coleoptera' THEN 5
> >         WHEN lower(c."Order") = 'trichoptera' THEN 8
> >         WHEN lower(c."Order") = 'ephemeroptera' THEN 6
> >         WHEN lower(c."Order") = 'plecoptera' THEN 7
> >         WHEN lower(c."Class") = 'oligochaeta' THEN 1
> >         WHEN lower(c."Family") LIKE 'chiron%' THEN 2
> >         WHEN lower(c."Order") = 'diptera' AND lower(c."Family") NOT LIKE 'chiron%' THEN 3
> >         ELSE 4
> >     END AS "Ranking",
> >     CASE
> >         WHEN lower(c."Order") = 'coleoptera' THEN 'Coleoptera'
> >         WHEN lower(c."Order") = 'trichoptera' THEN 'Trichoptera'
> >         WHEN lower(c."Order") = 'ephemeroptera' THEN 'Ephemeroptera'
> >         WHEN lower(c."Order") = 'plecoptera' THEN 'Plecoptera'
> >         WHEN lower(c."Class") = 'oligochaeta' THEN 'Oligochaeta'
> >         WHEN lower(c."Family") LIKE 'chiron%' THEN 'Chironomidae'
> >         WHEN lower(c."Order") = 'diptera' AND lower(c."Family") NOT LIKE 'chiron%' THEN 'Diptera (Other)'
> >         ELSE 'Other'
> >     END AS "Taxa",
> >     SUM(CASE WHEN b."LocationCode" = '2222011' THEN c."Count" END) AS "2222011",
> >     SUM(CASE WHEN b."LocationCode" = '2222012' THEN c."Count" END) AS "2222012",
> >     SUM(CASE WHEN b."LocationCode" = '2222013' THEN c."Count" END) AS "2222013",
> >     SUM(CASE WHEN b."LocationCode" = '2222014' THEN c."Count" END) AS "2222014"
> > FROM "tblBugIDSheetInfo" b
> >     INNER JOIN "tblBugCount" c USING ("BugSheetID")
> > GROUP BY c."Order", c."Class", c."Family"
> > ORDER BY "Ranking"
> > 
> > Generates the following output:
> > 
> > Ranking    Taxa            2222011    2222012    2222013    2222014
> > ---------------------------------------------------------------
> > 1    "Oligochaeta"        487    1711    1759    1078
> > 1    "Oligochaeta"        7            
> > 1    "Oligochaeta"                    
> > 2    "Chironomidae"        1385    2335    1500    1513
> > 2    "Chironomidae"                    
> > 3    "Diptera (Other)"                5
> > 3    "Diptera (Other)"        1    1    3
> > 3    "Diptera (Other)"    199    19    40    37
> > 3    "Diptera (Other)"                
> > ...
> 
> I should add that I've just tried the following query, which is just a
> rearrangement of the above query using a sub-SELECT, and the result is
> the same as above apart from a slightly different ordering of the rows.
> That is, GROUP BY "Ranking" still doesn't appear to do anything. I was
> under the impression that a sub-SELECT creates a temporary, or at least
> a pseudo-temporary table (a "virtual" table as it's called in the
> documentation), so I would have expected this query to work even if the
> above one doesn't.
> 
> 
> SELECT DISTINCT
>     "Ranking", "Taxa",
>     SUM(CASE WHEN b."LocationCode" = '2222011' THEN c."Count" END) AS "2222011",
>     SUM(CASE WHEN b."LocationCode" = '2222012' THEN c."Count" END) AS "2222012",
>     SUM(CASE WHEN b."LocationCode" = '2222013' THEN c."Count" END) AS "2222013",
>     SUM(CASE WHEN b."LocationCode" = '2222014' THEN c."Count" END) AS "2222014"
> FROM "tblBugIDSheetInfo" b
>     INNER JOIN (
>         SELECT "BugSheetID",
>         CASE
>             WHEN lower("Order") = 'coleoptera' THEN 5
>             WHEN lower("Order") = 'trichoptera' THEN 8
>             WHEN lower("Order") = 'ephemeroptera' THEN 6
>             WHEN lower("Order") = 'plecoptera' THEN 7
>             WHEN lower("Class") = 'oligochaeta' THEN 1
>             WHEN lower("Family") LIKE 'chiron%' THEN 2
>             WHEN lower("Order") = 'diptera' AND lower("Family") NOT LIKE 'chiron%' THEN 3
>             ELSE 4
>         END AS "Ranking",
>         CASE
>             WHEN lower("Order") = 'coleoptera' THEN 'Coleoptera'
>             WHEN lower("Order") = 'trichoptera' THEN 'Trichoptera'
>             WHEN lower("Order") = 'ephemeroptera' THEN 'Ephemeroptera'
>             WHEN lower("Order") = 'plecoptera' THEN 'Plecoptera'
>             WHEN lower("Class") = 'oligochaeta' THEN 'Oligochaeta'
>             WHEN lower("Family") LIKE 'chiron%' THEN 'Chironomidae'
>             WHEN lower("Order") = 'diptera' AND lower("Family") NOT LIKE 'chiron%' THEN 'Diptera (Other)'
>             ELSE 'Other'
>         END AS "Taxa",
>         "Order", "Class", "Family", "Count"
>         FROM "tblBugCount") c USING ("BugSheetID")
> GROUP BY c."Ranking", c."Taxa", c."Order", c."Class", c."Family"
> ORDER BY "Ranking"
> 

The problem was I was trying to do too much in one GROUP BY clause.
After getting the right result using a temporary table I backtracked and
found that the following version gives me the result I'm looking for.

SELECT DISTINCT"Ranking", "Taxa",SUM(CASE WHEN b."LocationCode" = '2222011' THEN c."Count" END) AS "2222011",SUM(CASE
WHENb."LocationCode" = '2222012' THEN c."Count" END) AS "2222012",SUM(CASE WHEN b."LocationCode" = '2222013' THEN
c."Count"END) AS "2222013",SUM(CASE WHEN b."LocationCode" = '2222014' THEN c."Count" END) AS "2222014"
 
FROM "tblBugIDSheetInfo" bINNER JOIN (    SELECT "BugSheetID",    CASE        WHEN lower("Order") = 'coleoptera' THEN 5
      WHEN lower("Order") = 'trichoptera' THEN 8        WHEN lower("Order") = 'ephemeroptera' THEN 6        WHEN
lower("Order")= 'plecoptera' THEN 7        WHEN lower("Class") = 'oligochaeta' THEN 1        WHEN lower("Family") LIKE
'chiron%'THEN 2        WHEN lower("Order") = 'diptera' AND lower("Family") NOT LIKE 'chiron%' THEN 3        ELSE 4
ENDAS "Ranking",    CASE        WHEN lower("Order") = 'coleoptera' THEN 'Coleoptera'        WHEN lower("Order") =
'trichoptera'THEN 'Trichoptera'        WHEN lower("Order") = 'ephemeroptera' THEN 'Ephemeroptera'        WHEN
lower("Order")= 'plecoptera' THEN 'Plecoptera'        WHEN lower("Class") = 'oligochaeta' THEN 'Oligochaeta'
WHENlower("Family") LIKE 'chiron%' THEN 'Chironomidae'        WHEN lower("Order") = 'diptera' AND lower("Family") NOT
LIKE'chiron%' THEN 'Diptera (Other)'        ELSE 'Other'    END AS "Taxa",    "Order", "Class", "Family", SUM("Count")
AS"Count"    FROM "tblBugCount"    GROUP BY "BugSheetID", "Order", "Class", "Family") c USING ("BugSheetID")
 
GROUP BY "Ranking", "Taxa"
ORDER BY "Ranking"

And modifying the original query gives the faster performance:

SELECT    "Ranking","Taxa",SUM("2222011") AS "2222011",SUM("2222012") AS "2222012",SUM("2222013") AS
"2222013",SUM("2222014")AS "2222014"
 
FROM (SELECT DISTINCTCASE    WHEN lower(c."Order") = 'coleoptera' THEN 5    WHEN lower(c."Order") = 'trichoptera' THEN
8   WHEN lower(c."Order") = 'ephemeroptera' THEN 6    WHEN lower(c."Order") = 'plecoptera' THEN 7    WHEN
lower(c."Class")= 'oligochaeta' THEN 1    WHEN lower(c."Family") LIKE 'chiron%' THEN 2    WHEN lower(c."Order") =
'diptera'AND lower(c."Family") NOT LIKE 'chiron%' THEN 3    ELSE 4END AS "Ranking",CASE    WHEN lower(c."Order") =
'coleoptera'THEN 'Coleoptera'    WHEN lower(c."Order") = 'trichoptera' THEN 'Trichoptera'    WHEN lower(c."Order") =
'ephemeroptera'THEN 'Ephemeroptera'    WHEN lower(c."Order") = 'plecoptera' THEN 'Plecoptera'    WHEN lower(c."Class")
='oligochaeta' THEN 'Oligochaeta'    WHEN lower(c."Family") LIKE 'chiron%' THEN 'Chironomidae'    WHEN lower(c."Order")
='diptera' AND lower(c."Family") NOT LIKE 'chiron%' THEN 'Diptera (Other)'    ELSE 'Other'END AS "Taxa",SUM(CASE WHEN
b."LocationCode"= '2222011' THEN c."Count" END) AS "2222011",SUM(CASE WHEN b."LocationCode" = '2222012' THEN c."Count"
END)AS "2222012",SUM(CASE WHEN b."LocationCode" = '2222013' THEN c."Count" END) AS "2222013",SUM(CASE WHEN
b."LocationCode"= '2222014' THEN c."Count" END) AS "2222014"FROM "tblBugIDSheetInfo" b    INNER JOIN "tblBugCount" c
USING("BugSheetID")WHERE b."LocationCode" = '2222011'   OR b."LocationCode" = '2222012'   OR b."LocationCode" =
'2222013'  OR b."LocationCode" = '2222014'GROUP BY c."Order", c."Class", c."Family") d
 
GROUP BY "Ranking", "Taxa"
ORDER BY "Ranking"


Both these queries give me the result I was after:

> > I want to GROUP on the "Ranking" field as well so that all rows with the
> > same "Ranking" value are SUMmed. That is, I'm trying to achieve the
> > following:
> > 
> > Ranking    Taxa            2222011    2222012    2222013    2222014
> > ---------------------------------------------------------------
> > 1    "Oligochaeta"        494    1711    1759    1078
> > 2    "Chironomidae"        1385    2335    1500    1513
> > 3    "Diptera (Other)"    199    20    41    45
> > ...

Applying the grouping at the appropriate points helps :)

Regards.
Ross Johnson