Thread: Counting bool flags in a complex query
Hi. I think I've created a monster... Working on an email system I have the following: Table = usermail +----------------------------------+--------------------------+-------+ | Field | Type | Length| +----------------------------------+--------------------------+-------+ | contentlength | int4 | 4 | | folder | int4 | 4 | | flagnew | bool | 1 | etc... And: Table = folders +----------------------------------+--------------------------+-------+ | Field | Type | Length| +----------------------------------+--------------------------+-------+ | loginid | varchar() not null | 16 | | folderid | int4 not null default ( | 4 | | foldername | varchar() | 25 | etc... So each email message has an entry in usermail, and each mail folder has an entry in folders. I need to extract the following info: foldername, number of messages in that folder, number of messages in that folder with flagread set, total size of all the messages in each folder Since postgres does not appear to support outer joins, I've come up with a really icky query that almost does what I want: SELECT folderid,foldername,count(*),sum(contentlength) FROM usermail,folders WHERE usermail.loginid='michael' AND folders.loginid=usermail.loginid AND usermail.folder=folders.folderid GROUP BY folderid,foldername UNION SELECT folderid,foldername,null,null FROM folders WHERE loginid='michael' AND folderid NOT IN (SELECTfolder FROM usermail WHERE loginid='michael'); WHEW! folderid|foldername |count| sum --------+----------------+-----+------- -4|Deleted Messages| 110| 245627 -3|Saved Drafts | | -2|SentMail | 7| 10878 -1|New Mail Folder | 73|8831226 1|OOL | 7| 8470 etc... My final problem is to count all the messages with flagnew set to true. The only way I can think to do this is to convert the bool value to a 1 or 0 (which I think should be a standard conversion anyway) and run a sum() on them. Unless anyone can come up with a better way to do this, What is the best way to implement a conversion from bool to int? -Michael
> Hi. > > I think I've created a monster... > ... > > My final problem is to count all the messages with flagnew set to true. > The only way I can think to do this is to convert the bool value to a 1 or > 0 (which I think should be a standard conversion anyway) and run a sum() > on them. > > Unless anyone can come up with a better way to do this, What is the best > way to implement a conversion from bool to int? > > -Michael Of course, you could always use count() and a 'WHERE flagnew' clause... Duane
On Wed, 14 Jul 1999, Duane Currie wrote: > > My final problem is to count all the messages with flagnew set to true. > > The only way I can think to do this is to convert the bool value to a 1 or > > 0 (which I think should be a standard conversion anyway) and run a sum() > > on them. > > > > Unless anyone can come up with a better way to do this, What is the best > > way to implement a conversion from bool to int? > > Of course, you could always use count() and a 'WHERE flagnew' clause... Problem with that of course is that by limiting the query with a "where", I'd lose all the records in the original count, and therefore the total number of messages (a count that ignores the status of flagnew) would be wrong. What I was sort of hoping for was a way to implement a native conversion from bool to int, and have it included in the standard postgres system. I think the conversion if a reasonable logical one where true==1 and false==0. The problem is, I don't have a sweet clue how to do this. I think it should be a trivial matter to insert something into a system table... -Michael
> Unless anyone can come up with a better way to do this, What is the best > way to implement a conversion from bool to int? Try select sum(case when bfield = TRUE then 1 else 0 end) from table; It works for me... - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
On Wed, 14 Jul 1999, Thomas Lockhart wrote: > > Unless anyone can come up with a better way to do this, What is the best > > way to implement a conversion from bool to int? > > select sum(case when bfield = TRUE then 1 else 0 end) from table; I'm not sure this is correct, but I think I see a bug of some sort... SELECT folderid,foldername,count(*),sum(contentlength),sum(case when flagnew = TRUE then 1 else 0 end) FROM usermail,folders WHERE usermail.loginid='michael' and folders.loginid=usermail.loginid AND usermail.folder = folders.folderid GROUP BY folderid,foldername UNION SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND folder=folderid) ; ERROR: _finalize_primnode: can't handle node 723 It seems to be the union that is confuzing it... SELECT folderid,foldername,count(*),sum(contentlength),sum(case when flagnew = TRUE then 1 else 0 end) FROM usermail,folders WHERE usermail.loginid='michael' and folders.loginid=usermail.loginid AND usermail.folder = folders.folderid GROUP BY folderid,foldername; folderid|foldername |count| sum|sum --------+----------------+-----+-------+--- -4|Deleted Messages| 110| 245627| 50 -2|Sent Mail | 7| 10878| 2 -1|New Mail Folder | 73|8831226| 1 1|OOL | 7| 8470| 0 etc -Michael
Michael Richards <miker@scifair.acadiau.ca> writes: > ERROR: _finalize_primnode: can't handle node 723 Grumble. Still another routine that doesn't know as much as it should about traversing parsetrees. Looks like a job for <flourish of trumpets> expression_tree_walker. > It seems to be the union that is confuzing it... CASE expression inside a UNION/INTERSECT/EXCEPT, to be specific. Will fix this in time for 6.5.1. regards, tom lane
Michael Richards <miker@scifair.acadiau.ca> writes: > I'm not sure this is correct, but I think I see a bug of some sort... > SELECT folderid,foldername,count(*),sum(contentlength),sum(case when > flagnew = TRUE then 1 else 0 end) FROM usermail,folders WHERE > usermail.loginid='michael' and folders.loginid=usermail.loginid AND > usermail.folder = folders.folderid GROUP BY folderid,foldername UNION > SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND > NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND > folder=folderid) ; > ERROR: _finalize_primnode: can't handle node 723 I committed a fix last night; it will be in 6.5.1. regards, tom lane
On Thu, 15 Jul 1999, Tom Lane wrote: > Michael Richards <miker@scifair.acadiau.ca> writes: > > I'm not sure this is correct, but I think I see a bug of some sort... > > I committed a fix last night; it will be in 6.5.1. I've found what I believe is another set of bugs: This is my monster query again... My folder numbers are: negative numbers are system folders such as New mail, trash, drafts and sentmail. I wanted to order the tuples so that the folderids were sorted from -1 to -4, then 1 to x. This way the system folders would always appear first in the list. This may not be valid SQL, as none of my books mention it. Is it possible to order by an expression? Here are some examples which some some odd behaviour. My suspected bug findings are at the end: SELECT folderid,foldername,count(*) as "messgaes",sum(bool2int(flagnew)) as "newmessages",sum(contentlength) as "size" FROM usermail,folders WHERE usermail.loginid='michael' and folders.loginid=usermail.loginid AND usermail.folder = folders.folderid GROUP BY folderid,foldername UNION SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND folder=folderid) order by (folderid>0); folderid|foldername |messgaes|newmessages| size --------+----------------+--------+-----------+------- -4|Deleted Messages| 110| 50| 245627 -2|Sent Mail | 7| 2| 10878 -1|New Mail Folder | 73| 1|8831226 1|OOL | 7| 0| 8470 2|suggestions | 26| 0| 35433 3|Acadia | 5| 0| 17703 4|advertising | 4| 2| 5394 5|dealt with | 3| 0| 2883 36|dauphne | 9| 0| 66850 -3|Saved Drafts | 0| 0| 0 (10 rows) It looks like the order by is only being applied to the original select, not the unioned select. Some authority should check on it, but by thought it that a union does not necessarily maintain the order, so the entire select should be applied to the order. I'm not so good at interpreting the query plan, but here it is: Unique (cost=8.10 rows=0 width=0) -> Sort (cost=8.10 rows=0 width=0) -> Append (cost=8.10 rows=0 width=0) -> Aggregate (cost=6.05 rows=1 width=49) -> Group (cost=6.05 rows=1 width=49) -> Sort (cost=6.05 rows=1 width=49) -> Nested Loop (cost=6.05 rows=1 width=49) -> Index Scan using usermail_pkey on usermail (cost=2.05 rows=2 width=21) -> Index Scan using folders_pkey on folders (cost=2.00 rows=8448 width=28) -> Index Scan using folders_pkey on folders (cost=2.05 rows=2 width=16) SubPlan ->Index Scan using usermail_pkey on usermail (cost=2.05 rows=1 width=4) I would have expected the folderid -3 to appear as the 3rd one in this case. I'm probably going to change the numbering scheme of the system folders so they will sort correctly without a kluge such as: create function ordfolderid(int) returns int as 'select $1*-1 where $1<0 union select $1+1*10 where $1>=0' language 'sql'; Then running the order clause as: order by (folderid<0),ordfolderid(folderid) My thought behind this kludge is that the table should first be ordered by the t/f value of the fact folderid<0, then within each of the true and false sortings, subsort those by the value of folderid. Complicated enough for you? Well, in my playing I notice what appears to be more of a bug... SELECT folderid,foldername,count(*) as "messages",sum(bool2int(flagnew)) as "newmessages",sum(contentlength) as "size" FROM usermail,folders WHERE usermail.loginid='michael' and folders.loginid=usermail.loginid AND usermail.folder = folders.folderid GROUP BY folderid,foldername UNION SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND folder=folderid) order by (folderid<0); folderid|foldername |messgaes|newmessages| size --------+----------------+--------+-----------+------- 1|OOL | 7| 0| 8470 2|suggestions | 26| 0| 35433 3|Acadia | 5| 0| 17703 4|advertising | 4| 2| 5394 5|dealt with | 3| 0| 2883 36|dauphne | 9| 0| 66850 -4|Deleted Messages| 110| 50| 245627 -2|Sent Mail | 7| 2| 10878 -1|New Mail Folder | 73| 1|8831226 -3|Saved Drafts | 0| 0| 0 (10 rows) SELECT folderid,foldername,count(*) as "messages",sum(bool2int(flagnew)) as "newmessages",sum(contentlength) as "size" FROM usermail,folders WHERE usermail.loginid='michael' and folders.loginid=usermail.loginid AND usermail.folder = folders.folderid GROUP BY folderid,foldername UNION SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND folder=folderid) order by (messages<10); ERROR: attribute 'messages' not found Using a column name within an expression in the order by does not seem to work... Or a much simpler example to illustrate the bug: fastmail=> select 1 as "test" order by (test<9); ERROR: attribute 'test' not found fastmail=> select 1 as "test" order by test; test ---- 1 (1 row) I was almost able to make it work properly aside from the sorting issue with my kludged up routine... This is so nasty that I most definitely don't want to put it into production: SELECT folderid,foldername,count(*) as "messages",sum(bool2int(flagnew)) as "newmessages",sum(contentlength) as "size",(folderid>=0) FROM usermail,folders WHERE usermail.loginid='michael' and folders.loginid=usermail.loginid AND usermail.folder = folders.folderid GROUP BY folderid,foldername UNION SELECT folderid,foldername,0,0,0,(folderid>=0) FROM folders WHERE loginid='michael' AND NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND folder=folderid) order by 6,ordfolderid(folderid); folderid|foldername |messages|newmessages| size|?column? --------+----------------+--------+-----------+-------+-------- -1|New Mail Folder | 73| 1|8831226|f -2|Sent Mail | 7| 2| 10878|f -4|Deleted Messages| 110| 50| 245627|f -3|Saved Drafts | 0| 0| 0|f 1|OOL | 7| 0| 8470|t 2|suggestions | 26| 0| 35433|t 3|Acadia | 5| 0| 17703|t 4|advertising | 4| 2| 5394|t 5|dealt with | 3| 0| 2883|t 36|dauphne | 9| 0| 66850|t (10 rows) Do I need outer joins to make this work instead of the screwed up union method I'm trying here, or is it just a series of bugs? -Michael
Michael Richards <miker@scifair.acadiau.ca> writes: > I've found what I believe is another set of bugs: I can shed some light on these. > This may not be valid SQL, as none of my books mention it. Is it possible > to order by an expression? Postgres accepts expressions as ORDER BY clauses, although strict SQL92 only allows sorting by a column name or number. > It looks like the order by is only being applied to the original select, > not the unioned select. Some authority should check on it, but by thought > it that a union does not necessarily maintain the order, so the entire > select should be applied to the order. That looks like a bug to me too --- I think the ORDER BY is supposed to apply across the whole UNION result. Will look into it. > I'm probably going to change the numbering scheme of the system folders so > they will sort correctly without a kluge such as: Good plan. Although you could sort by a user-defined function result, it's likely to be horribly slow (because user-defined functions are slow:-(). > Using a column name within an expression in the order by does not seem to > work... > Or a much simpler example to illustrate the bug: > fastmail=> select 1 as "test" order by (test<9); > ERROR: attribute 'test' not found This is not so much a bug as a definitional issue. For SQL92 compatibility, we accept ORDER BY a column label so long as it's a bare column label, but column labels are NOT part of the namespace for full expression evaluation. You can't do this either: select 1 as "test" , test<9 ; ERROR: attribute 'test' not found There are all sorts of squirrely questions about this feature IMHO. For example, create table z1 (f1 int4, f2 int4); CREATE select f1 as f2, f2 from z1 order by f2; f2|f2 --+-- (0 rows) Which column do you think it's ordering by? Which column *should* it order by? I think this ought to draw an "ambiguous column label" error ... there is code in there that claims to be looking for such a thing, in fact, so I am not quite sure why it doesn't trigger on this example. regards, tom lane
On Fri, 16 Jul 1999, Tom Lane wrote: > Good plan. Although you could sort by a user-defined function result, > it's likely to be horribly slow (because user-defined functions are > slow:-(). Yes, but I did include my horrible design ideas so you could see why in "god's name" I was trying to do what I was trying to do when I found what looked to be a "bug" > This is not so much a bug as a definitional issue. For SQL92 > compatibility, we accept ORDER BY a column label so long as it's > a bare column label, but column labels are NOT part of the namespace > for full expression evaluation. You can't do this either: > > select 1 as "test" , test<9 ; > ERROR: attribute 'test' not found > > There are all sorts of squirrely questions about this feature IMHO. > For example, > > create table z1 (f1 int4, f2 int4); > CREATE > select f1 as f2, f2 from z1 order by f2; > f2|f2 > --+-- > (0 rows) > > Which column do you think it's ordering by? Which column *should* it > order by? I think this ought to draw an "ambiguous column label" error > ... there is code in there that claims to be looking for such a thing, > in fact, so I am not quite sure why it doesn't trigger on this example. Good point. Is there anything in the SQL standard that defined how this "is supposed" to work? I suppose with no expression support it isn't really necessary. How about requiring quotes when we're to look at it was "named" columns? If select f1 as f2, f2 from z1 order by "f2"; Of course I have no idea how this would conflicy with SQL-92. It's more of an idea... -Michael
Michael Richards <miker@scifair.acadiau.ca> writes: >> For example, >> >> create table z1 (f1 int4, f2 int4); >> CREATE >> select f1 as f2, f2 from z1 order by f2; >> f2|f2 >> --+-- >> (0 rows) >> >> Which column do you think it's ordering by? Which column *should* it >> order by? I think this ought to draw an "ambiguous column label" error > Good point. Is there anything in the SQL standard that defined how this > "is supposed" to work? After looking at the SQL spec I think the above definitely ought to draw an error. We have the following verbiage concerning the column names for the result of a SELECT: a) If the i-th <derived column> in the <select list> specifies an <as clause> that contains a <columnname> C, then the <column name> of the i-th column of the result is C. b) If the i-th <derived column> in the <select list> does not specify an <as clause> and the <valueexpression> of that <derived column> is a single <column reference>, then the <column name>of the i-th column of the result is C. c) Otherwise, the <column name> of the i-th column of the <query specification> is implementation-dependentand different from the <column name> of any column, other than itself, of a table referenced by any <table reference> contained in the SQL-statement. which Postgres does indeed follow, and we see from (a) and (b) that "f2" is the required column name for both columns of the SELECT result. Now ORDER BY says a) If a <sort specification> contains a <column name>, then T shall contain exactly one column withthat <column name> and ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the <sort specification> identifies thatcolumn. which sure looks to me like it mandates an error for the example statement. However, since SQL doesn't consider the possibility of expressions as ORDER BY entries, we are more or less on our own for those. An expression appearing in the target list of a SELECT is not allowed to refer to columns by their "AS" names (and this does seem to be mandated by SQL92). So I think it makes sense to carry over the same restriction to ORDER BY. regards, tom lane
Quite awhile ago, Michael Richards <miker@scifair.acadiau.ca> wrote: > It looks like the order by is only being applied to the original select, > not the unioned select. Some authority should check on it, but by thought > it that a union does not necessarily maintain the order, so the entire > select should be applied to the order. Just FYI, I have committed code for 7.1 that allows ORDER BY to work correctly for a UNION'd query. A limitation is that you can only do ordering on columns that are outputs of the UNION: regression=# select q1 from int8_tbl union select q2 from int8_tbl order by 1; q1 --------------------4567890123456789 123 456 4567890123456789 (4 rows) regression=# select q1 from int8_tbl union select q2 from int8_tbl order by int8_tbl.q1+1; ERROR: ORDER BY on a UNION/INTERSECT/EXCEPT result must be on one of the result columns In the general case of an arbitrary ORDER BY expression, it's not clear how to transpose it into each UNION source select anyway. It could be made to work for expressions using only the output columns, but since ORDER BY expressions are not standard SQL I'm not in a big hurry to make that happen... regards, tom lane