Thread: Matching several rows
Hi,<br /><br />This is possibly absolutely trivial but I am lost...<br /><br />A table URIGHTS which stores an ID and theRIGHTs this ID has. One ID may have many rights and accordingly records in table, sample:<br /><br />ID RIGHT <br/>---------------------<br />20 1<br />20 2<br />20 5<br />20 10<br />30 2<br />30 10<br /><br />Now I need to find out which IDs have, say rights 2 AND 5 AND 10.<br /><br />What would be the simplestquery? <br /><br />Thanks!
On Jan 18, 2006, at 13:43 , Ivan Steganov wrote: > A table URIGHTS which stores an ID and the RIGHTs this ID has. One > ID may have many rights and accordingly records in table, sample: > Now I need to find out which IDs have, say rights 2 AND 5 AND 10. select "ID" from "URIGHTS" as right_2 join "URIGHTS" as right_5 using ("ID") join "URIGHTS" as right_10 using ("ID") where right_2."RIGHT" = 2and right_5."RIGHT" = 5and right_10."RIGHT" = 10 or select "ID" from (select "ID"from "URIGHTS"where "RIGHT" = 2) as right_2 join (select "ID"from "URIGHTS"where "RIGHT" = 5) as right_5 using ("ID") join (select "ID"from "URIGHTS"where "RIGHT" = 10) as right_10 using ("ID") Simple is in the eye of the beholder. You might want to compare the EXPLAIN ANALYZE output to see if there are any significant differences between these queries. Michael Glaesemann grzm myrealbox com
Hi, On Jan 18 05:43, Ivan Steganov wrote: > ID RIGHT > --------------------- > 20 1 > 20 2 > 20 5 > 20 10 > 30 2 > 30 10 > > Now I need to find out which IDs have, say rights 2 AND 5 AND 10. SELECT t.id FROM (SELECT id, sum(1) AS s FROM id_n_rights WHERE rights = ANY(ARRAY[2,5,10]) GROUP BY id) AS t WHERE t.s = 3; -- 3 = length(ARRAY[2,5,10]) HTH. Regards.
On Jan 18, 2006, at 20:55 , Volkan YAZICI wrote: > On Jan 18 05:43, Ivan Steganov wrote: >> ID RIGHT >> --------------------- >> 20 1 >> 20 2 >> 20 5 >> 20 10 >> 30 2 >> 30 10 >> >> Now I need to find out which IDs have, say rights 2 AND 5 AND 10. > > SELECT t.id > FROM (SELECT id, sum(1) AS s > FROM id_n_rights > WHERE rights = ANY(ARRAY[2,5,10]) > GROUP BY id) AS t > WHERE t.s = 3; -- 3 = length(ARRAY[2,5,10]) Or how about just select id from urights where uright in (2,5,10) group by id having count(id) = 3; -- number of urights in IN clause Find below the query plans for the variations we've seen (as well as DDL statements): Michael Glaesemann grzm myrealbox com create table urights ( id integer not null , uright integer not null ); copy urights (id, uright) from stdin; 20 1 20 2 20 5 20 10 30 2 30 10 \. select id from urights as right_2 join urights as right_5 using (id) join urights as right_10 using (id) where right_2.uright = 2 and right_5.uright = 5 and right_10.uright = 10; QUERY PLAN ------------------------------------------------------------------------ --------------------------------------------- Nested Loop (cost=0.00..3.27 rows=1 width=4) (actual time=0.053..0.069 rows=1 loops=1) Join Filter: ("outer".id = "inner".id) -> Nested Loop (cost=0.00..2.17 rows=1 width=8)(actual time=0.041..0.052 rows=1 loops=1) Join Filter: ("inner".id = "outer".id) -> Seq Scan on urights right_5 (cost=0.00..1.07 rows=1 width=4) (actual time=0.023..0.027 rows=1 loops=1) Filter: (uright = 5) -> Seq Scan on urights right_2 (cost=0.00..1.07 rows=2 width=4) (actual time=0.005..0.011 rows=2 loops=1) Filter: (uright = 2) -> Seq Scan on urights right_10 (cost=0.00..1.07 rows=2 width=4) (actual time=0.007..0.011 rows=2 loops=1) Filter: (uright = 10) Total runtime: 0.337 ms (11 rows) select id from ( select id from urights where uright = 2 ) as right_2 join ( select id from urights where uright = 5 ) as right_5 using (id) join ( select id from urights where uright = 10 ) as right_10 using (id); QUERY PLAN ------------------------------------------------------------------------ ------------------------------------- Nested Loop (cost=0.00..3.27 rows=1 width=4) (actual time=0.062..0.079 rows=1 loops=1) Join Filter: ("outer".id = "inner".id) -> Nested Loop (cost=0.00..2.17 rows=1 width=8)(actual time=0.048..0.060 rows=1 loops=1) Join Filter: ("inner".id = "outer".id) -> Seq Scan on urights (cost=0.00..1.07rows=1 width=4) (actual time=0.031..0.036 rows=1 loops=1) Filter: (uright = 5) -> Seq Scan on urights (cost=0.00..1.07rows=2 width=4) (actual time=0.005..0.011 rows=2 loops=1) Filter: (uright = 2) -> Seq Scan on urights (cost=0.00..1.07rows=2 width=4) (actual time=0.006..0.011 rows=2 loops=1) Filter: (uright = 10) Total runtime: 0.299 ms (11 rows) SELECT t.id FROM (SELECT id, sum(1) AS s FROM urights WHERE uright = ANY(ARRAY[2,5,10]) GROUP BY id) AS t WHERE t.s = 3; -- 3 = length(ARRAY[2,5,10]) QUERY PLAN ------------------------------------------------------------------------ ------------------------------------- Subquery Scan t (cost=1.23..1.26 rows=1 width=4) (actual time=0.102..0.106 rows=1 loops=1) -> HashAggregate (cost=1.23..1.25 rows=1 width=4) (actual time=0.095..0.098 rows=1 loops=1) Filter: (sum(1) = 3) -> Seq Scan on urights (cost=0.00..1.21 rows=3 width=4) (actual time=0.038..0.052 rows=5 loops=1) Filter: (uright = ANY ('{2,5,10}'::integer[])) Total runtime: 2.521 ms (6 rows) select id from urights where uright in (2,5,10) group by id having count(id) = 3; QUERY PLAN ------------------------------------------------------------------------ ------------------------------- HashAggregate (cost=1.12..1.15 rows=2 width=4) (actual time=2.582..2.584 rows=1 loops=1) Filter: (count(id) = 3) -> Seq Scan on urights (cost=0.00..1.10 rows=4 width=4) (actual time=0.042..2.502 rows=5 loops=1) Filter: ((uright = 2) OR (uright = 5) OR (uright = 10)) Total runtime: 2.944 ms (5 rows)
On Jan 18 09:33, Michael Glaesemann wrote: > On Jan 18, 2006, at 20:55 , Volkan YAZICI wrote: > >SELECT t.id > >FROM (SELECT id, sum(1) AS s > > FROM id_n_rights > > WHERE rights = ANY(ARRAY[2,5,10]) > > GROUP BY id) AS t > >WHERE t.s = 3; AFAICS, the bottleneck in above query is ANY(ARRAY[]) clause usage. Instead of that, if you replace «rights = ANY(ARRAY[2,5,10])» with «rights IN (2,5,10)» it's overhead decreases to 0.200-0.300ms domain. Regards.
On Jan 18, 2006, at 21:48 , Volkan YAZICI wrote: > AFAICS, the bottleneck in above query is ANY(ARRAY[]) clause usage. > Instead of that, if you replace «rights = ANY(ARRAY[2,5,10])» with > «rights IN (2,5,10)» it's overhead decreases to 0.200-0.300ms domain. explain analyze SELECT id FROM ( SELECT id, sum(1) AS s FROM urights WHERE uright in (2,5,10) GROUP BY id) AS t WHERE s = 3; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------- Subquery Scan t (cost=1.14..1.19 rows=2 width=4) (actual time=0.106..0.108 rows=1 loops=1) -> HashAggregate (cost=1.14..1.17 rows=2 width=4) (actual time=0.103..0.105 rows=1 loops=1) Filter: (sum(1) = 3) -> Seq Scan on urights (cost=0.00..1.10 rows=4 width=4) (actual time=0.029..0.038 rows=5 loops=1) Filter: ((uright = 2) OR (uright = 5) OR (uright = 10)) Total runtime: 0.386 ms (6 rows) Michael Glaesemann grzm myrealbox com
Thank you to everyone for the great help!<br /><br />I will evaluate all methods in our query (It is actually well complexerthen this sample) and choose the best one.<br /><br />Is there any "scientific" name to this kind of "several rowsmatch for one result" data selection? <br /><br />Ivan
Michael Glaesemann wrote: > > On Jan 18, 2006, at 21:48 , Volkan YAZICI wrote: > >> AFAICS, the bottleneck in above query is ANY(ARRAY[]) clause usage. >> Instead of that, if you replace «rights = ANY(ARRAY[2,5,10])» with >> «rights IN (2,5,10)» it's overhead decreases to 0.200-0.300ms domain. > > > explain analyze > SELECT id > FROM ( > SELECT id, sum(1) AS s > FROM urights > WHERE uright in (2,5,10) > GROUP BY id) AS t > WHERE s = 3; > QUERY PLAN > ------------------------------------------------------------------------ > ------------------------------------- > Subquery Scan t (cost=1.14..1.19 rows=2 width=4) (actual > time=0.106..0.108 rows=1 loops=1) > -> HashAggregate (cost=1.14..1.17 rows=2 width=4) (actual > time=0.103..0.105 rows=1 loops=1) > Filter: (sum(1) = 3) > -> Seq Scan on urights (cost=0.00..1.10 rows=4 width=4) > (actual time=0.029..0.038 rows=5 loops=1) > Filter: ((uright = 2) OR (uright = 5) OR (uright = 10)) > Total runtime: 0.386 ms > (6 rows) > > > Michael Glaesemann > grzm myrealbox com > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > why not use an having clause in the GROUP BY? HTH
It is easier to think of this as SET INTERSECTION which leads to: SELECT id FROM urights WHERE right = 2 INTERSECT SELECT id FROM urights WHERE right = 5 INTERSECT SELECT id FROM urights WHERE right = 10 Ivan Steganov wrote: > Thank you to everyone for the great help! > > I will evaluate all methods in our query (It is actually well complexer then > this sample) and choose the best one. > > Is there any "scientific" name to this kind of "several rows match for one > result" data selection? > > Ivan >