Thread: subquery/alias question
Hi all, I've read 7.2.1.3 (as short as it is) in the PgSQL docs, but don't see what I am doing wrong... Maybe you can help? I've got a query; SELECT d.dom_id, d.dom_name, (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) AS usr_count FROM domains d ORDER BY d.dom_name ASC; Where 'usr_count' returns the number of entries in 'users' that point to a given entry in 'domains'. Pretty straight forward so far. The trouble is: SELECT d.dom_id, d.dom_name, (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) AS usr_count FROM domains d WHERE usr_count > 0 ORDER BY d.dom_name ASC; Causes the error: ERROR: column "usr_count" does not exist It works if I use: SELECT d.dom_id, d.dom_name, (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) AS usr_count FROM domains d WHERE (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC; This seems terribly inefficient (and ugly), and I can't see why the results from 'usr_count' can't be counted... I can use 'usr_count' to sort the results... Thanks all! Madi
On Sep 25, 2007, at 16:59 , Madison Kelly wrote: > SELECT > d.dom_id, > d.dom_name, > (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) > AS > usr_count > FROM > domains d > WHERE > (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) > 0 > ORDER BY d.dom_name ASC; Why not just use a join? Something like this would work, I should think: select dom_id, dom_name, usr_count from domains natural join (select usr_dom_id as dom_id, count(usr_dom_id) as usr_count from users) u where usr_count > 0 order by dom_name; Michael Glaesemann grzm seespotcode net
Michael Glaesemann wrote: > > On Sep 25, 2007, at 16:59 , Madison Kelly wrote: > >> SELECT >> d.dom_id, >> d.dom_name, >> (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) >> AS >> usr_count >> FROM >> domains d >> WHERE >> (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) > 0 >> ORDER BY d.dom_name ASC; > > Why not just use a join? Something like this would work, I should think: > > select dom_id, > dom_name, > usr_count > from domains > natural join (select usr_dom_id as dom_id, > count(usr_dom_id) as usr_count > from users) u > where usr_count > 0 > order by dom_name; Maybe the usr_count should be tested in a HAVING clause instead of WHERE? And put the count(*) in the result list instead of a subselect. That feels more natural to me anyway. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Sep 25, 2007, at 17:30 , Alvaro Herrera wrote: > Michael Glaesemann wrote: >> >> select dom_id, >> dom_name, >> usr_count >> from domains >> natural join (select usr_dom_id as dom_id, >> count(usr_dom_id) as usr_count >> from users) u >> where usr_count > 0 >> order by dom_name; > > Maybe the usr_count should be tested in a HAVING clause instead of > WHERE? And put the count(*) in the result list instead of a > subselect. > That feels more natural to me anyway. I believe you'd have to write it like select dom_id, dom_name, count(usr_dom_id) as usr_count from domains join users on (usr_dom_id = dom_id) having count(usr_dom_id) > 0 order by dom_name; I don't know how the performance would compare. I think the backend is smart enough to know it doesn't need to perform two seq scans to calculate count(usr_dom_id), but I wasn't sure. Madison, how do the two queries compare with explain analyze? Michael Glaesemann grzm seespotcode net
Michael Glaesemann <grzm@seespotcode.net> writes: > I believe you'd have to write it like > select dom_id, dom_name, count(usr_dom_id) as usr_count > from domains > join users on (usr_dom_id = dom_id) > having count(usr_dom_id) > 0 > order by dom_name; > I don't know how the performance would compare. I think the backend > is smart enough to know it doesn't need to perform two seq scans to > calculate count(usr_dom_id), but I wasn't sure. It has been smart enough for a few years now --- don't recall when exactly, but nodeAgg.c quoth * Perform lookups of aggregate function info, and initialize the * unchanging fields of the per-agg data. We also detect duplicate * aggregates (for example, "SELECT sum(x) ... HAVING sum(x) > 0"). When * duplicates are detected, we only make an AggStatePerAgg struct for the * first one. The clones are simply pointed at the same result entry by * giving them duplicate aggno values. ... which in English means we just do the calculation once ... regards, tom lane
On Sep 25, 2007, at 21:44 , Tom Lane wrote: > ... which in English means we just do the calculation once ... As always, thanks, Tom, for the explanation (and Alvaro, who probably already knew this :)) Michael Glaesemann grzm seespotcode net
Michael Glaesemann wrote: > > On Sep 25, 2007, at 17:30 , Alvaro Herrera wrote: > >> Michael Glaesemann wrote: >>> >>> select dom_id, >>> dom_name, >>> usr_count >>> from domains >>> natural join (select usr_dom_id as dom_id, >>> count(usr_dom_id) as usr_count >>> from users) u >>> where usr_count > 0 >>> order by dom_name; >> >> Maybe the usr_count should be tested in a HAVING clause instead of >> WHERE? And put the count(*) in the result list instead of a subselect. >> That feels more natural to me anyway. > > I believe you'd have to write it like > > select dom_id, dom_name, count(usr_dom_id) as usr_count > from domains > join users on (usr_dom_id = dom_id) > having count(usr_dom_id) > 0 > order by dom_name; > > I don't know how the performance would compare. I think the backend is > smart enough to know it doesn't need to perform two seq scans to > calculate count(usr_dom_id), but I wasn't sure. > > Madison, how do the two queries compare with explain analyze? Thanks for your reply! Unfortunately, in both cases I get the error: nmc=> SELECT dom_id, dom_name, COUNT(usr_dom_id) AS usr_count FROM domains JOIN users ON (usr_dom_id=dom_id) HAVING COUNT (usr_dom_id) > 0 ORDER BY dom_name; ERROR: syntax error at or near "COUNT" at character 25 LINE 1: SELECT dom_id, dom_name COUNT(usr_dom_id) AS usr_count FROM ... I've been struggling with some deadlines, so for now I'm using just: SELECT d.dom_id, d.dom_name FROM domains d WHERE (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC; Which gives me just the domains with at least one user under them, but not the count. This is not ideal, and I will have to come back to it next week. In the meantime, any idea what the GROUP BY error is? If not, I'll read through the docs on 'GROUP'ing once I get this deadline out of the way. Thank you all for your help! I am sure I will have more question(s) next week as soon as I can get back to this. Madi
Madison Kelly wrote: > Thanks for your reply! > > Unfortunately, in both cases I get the error: > > nmc=> SELECT dom_id, dom_name, COUNT(usr_dom_id) AS usr_count FROM domains > JOIN users ON (usr_dom_id=dom_id) HAVING COUNT (usr_dom_id) > 0 ORDER BY > dom_name; > ERROR: syntax error at or near "COUNT" at character 25 > LINE 1: SELECT dom_id, dom_name COUNT(usr_dom_id) AS usr_count FROM ... Try to avoid missing the comma before the COUNT (and do not cheat when cut'n pasting ...) Also it seems you will need a GROUP BY clause: GROUP BY dom_id, dom_name (placed just before the HAVING clause). -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "Some men are heterosexual, and some are bisexual, and some men don't think about sex at all... they become lawyers" (Woody Allen)
"Madison Kelly" <linux@alteeve.com> writes: > SELECT d.dom_id, d.dom_name FROM domains d WHERE (SELECT COUNT(*) FROM users u > WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC; > > Which gives me just the domains with at least one user under them, but not > the count. This is not ideal, and I will have to come back to it next week. In > the meantime, any idea what the GROUP BY error is? If not, I'll read through > the docs on 'GROUP'ing once I get this deadline out of the way. I think you just want simply: SELECT dom_id, dom_name, count(*) FROM users JOIN domains ON (usr_dom_id=dom_id) GROUP BY dom_id, dom_nmae ORDER BY dom_name You don't actually need the HAVING (though it wouldn't do any harm either) since only domains which match a user will come out of the join anyways. You can also write it using a subquery instead of a join SELECT * FROM ( SELECT dom_id, dom_name, (SELECT count(*) FROM users WHERE user_dom_id = dom_id) as nusers FROM domains ) as subq WHERE nusers > 0 ORDER BY dom_name But that will perform worse in many cases. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Alvaro Herrera wrote: > Madison Kelly wrote: > >> Thanks for your reply! >> >> Unfortunately, in both cases I get the error: >> >> nmc=> SELECT dom_id, dom_name, COUNT(usr_dom_id) AS usr_count FROM domains >> JOIN users ON (usr_dom_id=dom_id) HAVING COUNT (usr_dom_id) > 0 ORDER BY >> dom_name; >> ERROR: syntax error at or near "COUNT" at character 25 >> LINE 1: SELECT dom_id, dom_name COUNT(usr_dom_id) AS usr_count FROM ... > > Try to avoid missing the comma before the COUNT (and do not cheat when > cut'n pasting ...) > > Also it seems you will need a GROUP BY clause: > GROUP BY dom_id, dom_name > (placed just before the HAVING clause). Bingo! Now to answer the performance questions (using my actual queries, unedited so they are a little longer): -=-=-=-=-=- nmc=> EXPLAIN ANALYZE SELECT dom_id, dom_name, dom_note, COUNT(usr_dom_id) AS usr_count FROM domains JOIN users ON (usr_dom_id=dom_id) GROUP BY dom_id, dom_name, dom_note HAVING COUNT (usr_dom_id) > 0 ORDER BY dom_name; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Sort (cost=10.70..10.78 rows=31 width=72) (actual time=2.107..2.133 rows=17 loops=1) Sort Key: domains.dom_name -> HashAggregate (cost=9.39..9.93 rows=31 width=72) (actual time=1.899..1.956 rows=17 loops=1) Filter: (count(usr_dom_id) > 0) -> Hash Join (cost=7.20..9.00 rows=31 width=72) (actual time=0.942..1.411 rows=96 loops=1) Hash Cond: ("outer".dom_id = "inner".usr_dom_id) -> Seq Scan on domains (cost=0.00..1.31 rows=31 width=68) (actual time=0.227..0.321 rows=31 loops=1) -> Hash (cost=6.96..6.96 rows=96 width=4) (actual time=0.673..0.673 rows=96 loops=1) -> Seq Scan on users (cost=0.00..6.96 rows=96 width=4) (actual time=0.010..0.371 rows=96 loops=1) Total runtime: 2.454 ms (10 rows) -=-=-=-=-=- Versus: -=-=-=-=-=- nmc=> EXPLAIN ANALYZE SELECT d.dom_id, d.dom_name, d.dom_note, (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) AS usr_count FROM domains d WHERE (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Sort (cost=297.37..297.39 rows=10 width=68) (actual time=10.171..10.196 rows=17 loops=1) Sort Key: dom_name -> Seq Scan on domains d (cost=0.00..297.20 rows=10 width=68) (actual time=0.508..10.013 rows=17 loops=1) Filter: ((subplan) > 0) SubPlan -> Aggregate (cost=7.21..7.21 rows=1 width=0) (actual time=0.203..0.204 rows=1 loops=31) -> Seq Scan on users u (cost=0.00..7.20 rows=1 width=0) (actual time=0.127..0.189 rows=3 loops=31) Filter: (usr_dom_id = $0) -> Aggregate (cost=7.21..7.21 rows=1 width=0) (actual time=0.184..0.186 rows=1 loops=17) -> Seq Scan on users u (cost=0.00..7.20 rows=1 width=0) (actual time=0.058..0.164 rows=6 loops=17) Filter: (usr_dom_id = $0) Total runtime: 10.593 ms (12 rows) -=-=-=-=-=- So using the JOIN you all helped me with, the query returns in 2.454 ms compared to my early query of 10.593 ms! I have not yet looked into any indexing either. I am waiting until the program is done and then will go back and review queries to look for bottlenecks. Thanks to all of you!! Madi
On Sep 26, 2007, at 7:41 , Madison Kelly wrote: > Unfortunately, in both cases I get the error: Um, the two cases could not be giving the same error as they don't both contain the syntax that the error is complaining about: the first case uses count in a subquery so it couldn't throw this exact error. > nmc=> SELECT dom_id, dom_name, COUNT(usr_dom_id) AS usr_count FROM > domains JOIN users ON (usr_dom_id=dom_id) HAVING COUNT (usr_dom_id) > > 0 ORDER BY dom_name; > ERROR: syntax error at or near "COUNT" at character 25 > LINE 1: SELECT dom_id, dom_name COUNT(usr_dom_id) AS usr_count > FROM ... The error message doesn't match the query you've provided. Note that in the line marked LINE 1, there's no comma after dom_name, which I assume is what the server is complaining about. However, the query you show *does* have this comma. Something isn't right. Is this an exact copy and paste from psql? > I've been struggling with some deadlines, so for now I'm using just: > > SELECT d.dom_id, d.dom_name FROM domains d WHERE (SELECT COUNT(*) > FROM users u WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC; > > Which gives me just the domains with at least one user under > them, but not the count. This is not ideal, and I will have to come > back to it next week. In the meantime, any idea what the GROUP BY > error is? Ah. You haven't actually shown us a GROUP BY error. A GROUP BY clause is needed when you've got columns that aren't included in the aggregate (COUNT in this case), e.g., select dom_id, dom_name, usr_count from domains natural join (select usr_dom_id as dom_id, count(usr_dom_id) as usr_count from users group by dom_id) u where usr_count > 0 order by dom_name; select dom_id, dom_name, count(usr_dom_id) as usr_count from domains join users on (usr_dom_id = dom_id) group by dom_id, dom_name having count(usr_dom_id) > 0 order by dom_name; Michael Glaesemann grzm seespotcode net
Gregory Stark wrote: > "Madison Kelly" <linux@alteeve.com> writes: > >> SELECT d.dom_id, d.dom_name FROM domains d WHERE (SELECT COUNT(*) FROM users u >> WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC; >> >> Which gives me just the domains with at least one user under them, but not >> the count. This is not ideal, and I will have to come back to it next week. In >> the meantime, any idea what the GROUP BY error is? If not, I'll read through >> the docs on 'GROUP'ing once I get this deadline out of the way. > > I think you just want simply: > > SELECT dom_id, dom_name, count(*) > FROM users > JOIN domains ON (usr_dom_id=dom_id) > GROUP BY dom_id, dom_nmae > ORDER BY dom_name > > You don't actually need the HAVING (though it wouldn't do any harm either) > since only domains which match a user will come out of the join anyways. > > You can also write it using a subquery instead of a join > > SELECT * > FROM ( > SELECT dom_id, dom_name, > (SELECT count(*) FROM users WHERE user_dom_id = dom_id) as nusers > FROM domains > ) as subq > WHERE nusers > 0 > ORDER BY dom_name > > But that will perform worse in many cases. > You are right, the 'HAVING' clause does seem to be redundant. I removed it and ran several 'EXPLAIN ANALYZE's on it with and without the 'HAVING' clause and found no perceivable difference. I removed the 'HAVING' clause anyway, since I like to keep queries as minimal as possible. Thank you! Madi