Thread: Best way to "and" from a one-to-many joined table?
Dear Experts,<br /><br /> I'm looking for a good technique to do "and" searches on one-to-many joined tables. For example,to find people with both 'dark hair' and 'president':<br /><br /><tt># select * from test_people join test_attributesusing (people_id);<br /> +-----------+-------------+---------------+<br /> | people_id | person_name | attribute |<br /> +-----------+-------------+---------------+<br /> | 10 | Satan | The Devil |<br /> | 9 | Santa | Imaginary |<br /> | 8 | Obamba | Dark Hair |<br /> | 8 | Obamba | Dark Hair |<br /> | 8 | Obamba | USA President |<br /> | 10 | Satan | Dark Hair |<br /> +-----------+-------------+---------------+<br /><br /> # select person_name from test_people where people_idin<br /> (select people_id from test_attributes where attribute='USA President' <br /> INTERSECT<br /> selectpeople_id from test_attributes where attribute='Dark Hair');<br /><br /> # select person_name from test_people<br/> where people_id in<br /> (select people_id from test_attributes where attribute='USA President')<br /> andpeople_id in<br /> (select people_id from test_attributes where attribute='Dark Hair');<br /><br /> # select people_id,count(*)as count from test_people<br /> join test_attributes using (people_id)<br /> where attribute='Dark Hair'or attribute='USA President'<br /> group by people_id having count(*) >= 2;</tt><br /><br /><br /> A postgres specificsolution is OK, but SQL92 is better. I had the "in" solution recommended to me, but it's performing dramaticallypoorly on huge tables.<br /><br /> Thanks for any references to a solution! -Bryce<br />
Howdy, Bryce
Could you please try this out and tell me if it gave what you want.
Best,
Oliveiros
SELECT person_name
FROM test_people p
JOIN test_attributes a
ON ((a.people_id = p.people_id) AND (a."attribute" = @firstAttr))
JOIN test_attributes b
ON ((b."people_id" = p."people_id") AND (b."attribute" = @secondAttr));
FROM test_people p
JOIN test_attributes a
ON ((a.people_id = p.people_id) AND (a."attribute" = @firstAttr))
JOIN test_attributes b
ON ((b."people_id" = p."people_id") AND (b."attribute" = @secondAttr));
----- Original Message -----From: Bryce NesbittTo: sql pgsqlSent: Friday, December 05, 2008 6:55 PMSubject: [SQL] Best way to "and" from a one-to-many joined table?Dear Experts,
I'm looking for a good technique to do "and" searches on one-to-many joined tables. For example, to find people with both 'dark hair' and 'president':
# select * from test_people join test_attributes using (people_id);
+-----------+-------------+---------------+
| people_id | person_name | attribute |
+-----------+-------------+---------------+
| 10 | Satan | The Devil |
| 9 | Santa | Imaginary |
| 8 | Obamba | Dark Hair |
| 8 | Obamba | Dark Hair |
| 8 | Obamba | USA President |
| 10 | Satan | Dark Hair |
+-----------+-------------+---------------+
# select person_name from test_people where people_id in
(select people_id from test_attributes where attribute='USA President'
INTERSECT
select people_id from test_attributes where attribute='Dark Hair');
# select person_name from test_people
where people_id in
(select people_id from test_attributes where attribute='USA President')
and people_id in
(select people_id from test_attributes where attribute='Dark Hair');
# select people_id,count(*) as count from test_people
join test_attributes using (people_id)
where attribute='Dark Hair' or attribute='USA President'
group by people_id having count(*) >= 2;
A postgres specific solution is OK, but SQL92 is better. I had the "in" solution recommended to me, but it's performing dramatically poorly on huge tables.
Thanks for any references to a solution! -Bryce
Hi, This is how I do it, and it runs fast: select p.* from test_people p inner join test_attributes a on p.people_id = a.people_id where a."attribute" = @firstAttr or a."attribute" = @secondAttr If you have many attributes to search for you can replace the where part with where a."attribute" in (@firstAttr,@secondAttr,...) For best results, you can index the field "attribute" on test_attributes table. Be aware of case sensitivity of PG text search. Best regards, Milan Oparnica Oliveiros Cristina wrote: > Howdy, Bryce > > Could you please try this out and tell me if it gave what you want. > > Best, > Oliveiros > > SELECT person_name > FROM test_people p > JOIN test_attributes a > ON ((a.people_id = p.people_id) AND (a."attribute" = @firstAttr)) > JOIN test_attributes b > ON ((b."people_id" = p."people_id") AND (b."attribute" = @secondAttr)); > > ----- Original Message ----- > *From:* Bryce Nesbitt <mailto:bryce2@obviously.com> > *To:* sql pgsql <mailto:pgsql-sql@postgresql.org> > *Sent:* Friday, December 05, 2008 6:55 PM > *Subject:* [SQL] Best way to "and" from a one-to-many joined table? > > Dear Experts, > > I'm looking for a good technique to do "and" searches on one-to-many > joined tables. For example, to find people with both 'dark hair' > and 'president': > > # select * from test_people join test_attributes using (people_id); > +-----------+-------------+---------------+ > | people_id | person_name | attribute | > +-----------+-------------+---------------+ > | 10 | Satan | The Devil | > | 9 | Santa | Imaginary | > | 8 | Obamba | Dark Hair | > | 8 | Obamba | Dark Hair | > | 8 | Obamba | USA President | > | 10 | Satan | Dark Hair | > +-----------+-------------+---------------+ > > # select person_name from test_people where people_id in > (select people_id from test_attributes where attribute='USA President' > INTERSECT > select people_id from test_attributes where attribute='Dark Hair'); > > # select person_name from test_people > where people_id in > (select people_id from test_attributes where attribute='USA President') > and people_id in > (select people_id from test_attributes where attribute='Dark Hair'); > > # select people_id,count(*) as count from test_people > join test_attributes using (people_id) > where attribute='Dark Hair' or attribute='USA President' > group by people_id having count(*) >= 2; > > > A postgres specific solution is OK, but SQL92 is better. I had the > "in" solution recommended to me, but it's performing dramatically > poorly on huge tables. > > Thanks for any references to a solution! -Bryce
It works (with a DISTINCT clause added because of the duplicated row for Obama). It has a nice clean looking explain plan. It has the slowest execution time on this sample table (though that might not mean anything).<br /><br /> SELECT<br/> DISTINCT<br /> person_name<br /> FROM test_people p<br /> JOIN test_attributes a<br /> ON ((a.people_id = p.people_id)AND (a."attribute" = 'Dark Hair'))<br /> JOIN test_attributes b<br /> ON ((b."people_id" = p."people_id") AND(b."attribute" = 'USA President'));<br /><br /> Here's the full test table<br /><br /> $ pg_dump --table=test_people --table=test_attributes-p 5433 -i<br /> CREATE TABLE test_attributes (<br /> people_id integer,<br /> attribute text<br/> );<br /> COPY test_attributes (people_id, attribute) FROM stdin;<br /> 10 The Devil<br /> 9 Imaginary<br/> 8 Dark Hair<br /> 8 Dark Hair<br /> 8 USA President<br /> 10 Dark Hair<br /> \.<br /><br /> CREATETABLE test_people (<br /> people_id integer DEFAULT nextval('test_sequence'::regclass) NOT NULL,<br /> person_nametext<br /> );<br /> COPY test_people (people_id, person_name) FROM stdin;<br /> 8 Obamba<br /> 9 Santa<br/> 10 Satan<br /> \.<br /><br /><br /> Oliveiros Cristina wrote: <blockquote cite="mid:00d401c9570e$f1fbcee0$ec5a3d0a@marktestcr.marktest.pt"type="cite"><style></style><div><font face="Arial" size="2">Howdy,Bryce</font></div><div><font face="Arial" size="2">Could you please try this out and tell me if it gave whatyou want.</font></div><div><font face="Arial" size="2">Best,</font></div><div><font face="Arial" size="2">Oliveiros</font></div><div> </div><div><fontface="Arial" size="2">SELECT person_name<br /> FROM test_people p<br/> JOIN test_attributes a<br /> ON ((a.people_id = p.people_id) AND (a."attribute" = @firstAttr))<br /> JOIN test_attributesb<br /> ON ((b."people_id" = p."people_id") AND (b."attribute" = @secondAttr));</font></div></blockquote><br/><br />
Hello, Bryce.
It wasn't supposed to output duplicates.
I have assumed that on the test_attributes u didn't have duplicate records, i.e.,
you didn't have the same pair (people_id, attribute) more than once... But it seems you do...
And Hence the duplicate row for Obama .
Why is that?
One person can have exactly the same attribute twice?? :-)
On the execution speed, I do declare that query optimization is an area
I know very little about (just to avoid saying that i know nothing :p ) , maybe someone
with more knowledge than me can help you better, but from my
own experience, not just with postgres, but also with other sgbd ,
I can tell that subqueries of the kind WHERE x in (SELECT ... )
have the tendency to be slow, that's why I tried to provide you
a solution with the JOINs
Best,
Oliveiros
It wasn't supposed to output duplicates.
I have assumed that on the test_attributes u didn't have duplicate records, i.e.,
you didn't have the same pair (people_id, attribute) more than once... But it seems you do...
And Hence the duplicate row for Obama .
Why is that?
One person can have exactly the same attribute twice?? :-)
On the execution speed, I do declare that query optimization is an area
I know very little about (just to avoid saying that i know nothing :p ) , maybe someone
with more knowledge than me can help you better, but from my
own experience, not just with postgres, but also with other sgbd ,
I can tell that subqueries of the kind WHERE x in (SELECT ... )
have the tendency to be slow, that's why I tried to provide you
a solution with the JOINs
Best,
Oliveiros
2008/12/5 Bryce Nesbitt <bryce2@obviously.com>
It works (with a DISTINCT clause added because of the duplicated row for Obama). It has a nice clean looking explain plan. It has the slowest execution time on this sample table (though that might not mean anything).
SELECT
DISTINCTON ((a.people_id = p.people_id) AND (a."attribute" = 'Dark Hair'))
person_name
FROM test_people p
JOIN test_attributes a
JOIN test_attributes b
ON ((b."people_id" = p."people_id") AND (b."attribute" = 'USA President'));
Here's the full test table
$ pg_dump --table=test_people --table=test_attributes -p 5433 -i
CREATE TABLE test_attributes (
people_id integer,
attribute text
);
COPY test_attributes (people_id, attribute) FROM stdin;
10 The Devil
9 Imaginary
8 Dark Hair
8 Dark Hair
8 USA President
10 Dark Hair
\.
CREATE TABLE test_people (
people_id integer DEFAULT nextval('test_sequence'::regclass) NOT NULL,
person_name text
);
COPY test_people (people_id, person_name) FROM stdin;
8 Obamba
9 Santa
10 Satan
\.
Oliveiros Cristina wrote:Howdy, BryceCould you please try this out and tell me if it gave what you want.Best,OliveirosSELECT person_name
FROM test_people p
JOIN test_attributes a
ON ((a.people_id = p.people_id) AND (a."attribute" = @firstAttr))
JOIN test_attributes b
ON ((b."people_id" = p."people_id") AND (b."attribute" = @secondAttr));
At 11:20 AM 12/6/2008, pgsql-sql-owner@postgresql.org wrote: >Message-ID: <00d401c9570e$f1fbcee0$ec5a3d0a@marktestcr.marktest.pt> >From: "Oliveiros Cristina" <oliveiros.cristina@marktest.pt> >To: "Bryce Nesbitt" <bryce2@obviously.com>, > "sql pgsql" <pgsql-sql@postgresql.org> >References: <4939791B.5090604@obviously.com> >Subject: Re: Best way to "and" from a one-to-many joined table? >Date: Fri, 5 Dec 2008 19:23:25 -0000 > >Howdy, Bryce > >Could you please try this out and tell me if it gave what you want. > >Best, >Oliveiros > >SELECT person_name >FROM test_people p >JOIN test_attributes a >ON ((a.people_id = p.people_id) AND (a."attribute" = @firstAttr)) >JOIN test_attributes b >ON ((b."people_id" = p."people_id") AND (b."attribute" = >@secondAttr)); Hi, I saw a few people post answers to this question and it raised another related question for me. What are the differences between the above query and this one. Are they semantically/functionally identical but might differ in performance? Or would they be optimized down to an identical query? Or am I misreading them and they are actually different? SELECT person_name FROM test_people p JOIN test_attributes a ON ((a.people_id = p.people_id) JOIN test_attributes b ON ((b."people_id" = p."people_id") WHERE (a."attribute" = @firstAttr)) AND (b."attribute" = @secondAttr)); Also, any suggestions about how to figure out this on my own without bugging the list in the future would be great. Thanks for any insight! Steve p.s. I posting in the same thread, but if you think I should have started a new thread let me know for the future.
I guess it depends on the optimiser and how clever it is. With the former the db will probably generate 2 sets of ids for the 2 joined tables (a, b) which only contain the values you require, these lists are probably much smaller than the total number of rows in the table therefore any merges and sorts on them have to operate on less rows and will be quicker. With the latter query it has to fetch all the rows regardless of the attribute and then do the restriction at the end, which results in more rows, bigger merges and sorts and takes longer... Obviously postgres may be clever enough to realise what you want and rearrange the query internally to a more efficient form. Generally to find out what it is doing stick "EXPLAIN (ANALYZE)" in front. This will show you the steps the db is taking to perform the query and in what order. If you include ANAYLZE then the db actually does the query (throwing away the results) and gives you accurate values, etc otherwise it shows you estimated values based on the various stats collected for the table. >> SELECT person_name >> FROM test_people p >> JOIN test_attributes a >> ON ((a.people_id = p.people_id) AND (a."attribute" = @firstAttr)) >> JOIN test_attributes b >> ON ((b."people_id" = p."people_id") AND (b."attribute" = @secondAttr)); > > Hi, > > I saw a few people post answers to this question and it raised another > related question for me. > > What are the differences between the above query and this one. Are > they semantically/functionally identical but might differ in > performance? Or would they be optimized down to an identical query? Or > am I misreading them and they are actually different? > > SELECT person_name > FROM test_people p > JOIN test_attributes a > ON ((a.people_id = p.people_id) > JOIN test_attributes b > ON ((b."people_id" = p."people_id") > WHERE > (a."attribute" = @firstAttr)) > AND (b."attribute" = @secondAttr)); > > Also, any suggestions about how to figure out this on my own without > bugging the list in the future would be great. Thanks for any insight! > > Steve > > p.s. I posting in the same thread, but if you think I should have > started a new thread let me know for the future. >
Milan Oparnica wrote: > This is how I do it, and it runs fast: > select p.* > from test_people p inner join test_attributes a on p.people_id = > a.people_id > where a."attribute" = @firstAttr or a."attribute" = @secondAttr But that does an "or" search, not "and", returning Satan in addition to Obama: select * from test_people p inner join test_attributes a on p.people_id = a.people_id lyell5-> where a."attribute" = 'Dark Hair' or a."attribute" = 'USA President'; +-----------+-------------+-----------+---------------+ | people_id | person_name | people_id | attribute | +-----------+-------------+-----------+---------------+ | 8 | Obamba | 8 | USA President | | 8 | Obamba | 8 | Dark Hair | | 8 | Obamba | 8 | Dark Hair | | 10 | Satan | 10 | Dark Hair | +-----------+-------------+-----------+---------------+ How can I get an AND search (people with Dark Hair AND who are President)?
How can I get an AND search (people with Dark Hair AND who are President)?
The two joins didn't work?
Or were they too slow ?
Best,
Oliveiros
The two joins didn't work?
Or were they too slow ?
Best,
Oliveiros
2008/12/10 Bryce Nesbitt <bryce2@obviously.com>
But that does an "or" search, not "and", returning Satan in addition to
Milan Oparnica wrote:
> This is how I do it, and it runs fast:
> select p.*
> from test_people p inner join test_attributes a on p.people_id =
> a.people_id
> where a."attribute" = @firstAttr or a."attribute" = @secondAttr
Obama:
select * from test_people p inner join test_attributes a on p.people_id
= a.people_id
lyell5-> where a."attribute" = 'Dark Hair' or a."attribute" = 'USA
President';
+-----------+-------------+-----------+---------------+
| people_id | person_name | people_id | attribute |
+-----------+-------------+-----------+---------------+
| 8 | Obamba | 8 | USA President |
| 8 | Obamba | 8 | Dark Hair |
| 8 | Obamba | 8 | Dark Hair |
| 10 | Satan | 10 | Dark Hair |
+-----------+-------------+-----------+---------------+
How can I get an AND search (people with Dark Hair AND who are President)?
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql