Thread: Best way to "and" from a one-to-many joined table?

Best way to "and" from a one-to-many joined table?

From
Bryce Nesbitt
Date:
Dear Experts,<br /><br /> I'm looking for a good technique to do "and" searches on one-to-many joined tables.  For
example,to find people with both 'dark hair' and 'president':<br /><br /><tt># select * from test_people join
test_attributesusing (people_id);<br /> +-----------+-------------+---------------+<br /> | people_id | person_name |  
attribute  |<br /> +-----------+-------------+---------------+<br /> |        10 | Satan       | The Devil     |<br />
|        9 | Santa       | Imaginary     |<br /> |         8 | Obamba      | Dark Hair     |<br /> |         8 |
Obamba     | Dark Hair     |<br /> |         8 | Obamba      | USA President |<br /> |        10 | Satan       | Dark
Hair    |<br /> +-----------+-------------+---------------+<br /><br /> # select person_name from test_people where
people_idin<br /> (select people_id from test_attributes where attribute='USA President'  <br />  INTERSECT<br />
 selectpeople_id from test_attributes where attribute='Dark Hair');<br /><br /> # select person_name from
test_people<br/> where people_id in<br /> (select people_id from test_attributes where attribute='USA President')<br />
andpeople_id in<br /> (select people_id from test_attributes where attribute='Dark Hair');<br /><br /> # select
people_id,count(*)as count from test_people<br /> join test_attributes using (people_id)<br /> where attribute='Dark
Hair'or attribute='USA President'<br /> group by people_id having count(*) >= 2;</tt><br /><br /><br /> A postgres
specificsolution is OK, but SQL92 is better.  I had the "in" solution recommended to me, but it's performing
dramaticallypoorly on huge tables.<br /><br /> Thanks for any references to a solution!  -Bryce<br /> 

Re: Best way to "and" from a one-to-many joined table?

From
"Oliveiros Cristina"
Date:
Howdy, Bryce
 
Could you please try this out and tell me if it gave what you want.
 
Best,
Oliveiros
 
SELECT person_name
FROM test_people p
JOIN test_attributes a
ON ((a.people_id = p.people_id) AND (a."attribute" = @firstAttr))
JOIN test_attributes b
ON ((b."people_id" = p."people_id") AND (b."attribute" = @secondAttr));
----- Original Message -----
Sent: Friday, December 05, 2008 6:55 PM
Subject: [SQL] Best way to "and" from a one-to-many joined table?

Dear Experts,

I'm looking for a good technique to do "and" searches on one-to-many joined tables.  For example, to find people with both 'dark hair' and 'president':

# select * from test_people join test_attributes using (people_id);
+-----------+-------------+---------------+
| people_id | person_name |   attribute   |
+-----------+-------------+---------------+
|        10 | Satan       | The Devil     |
|         9 | Santa       | Imaginary     |
|         8 | Obamba      | Dark Hair     |
|         8 | Obamba      | Dark Hair     |
|         8 | Obamba      | USA President |
|        10 | Satan       | Dark Hair     |
+-----------+-------------+---------------+

# select person_name from test_people where people_id in
(select people_id from test_attributes where attribute='USA President' 
 INTERSECT
 select people_id from test_attributes where attribute='Dark Hair');

# select person_name from test_people
where people_id in
(select people_id from test_attributes where attribute='USA President')
and people_id in
(select people_id from test_attributes where attribute='Dark Hair');

# select people_id,count(*) as count from test_people
join test_attributes using (people_id)
where attribute='Dark Hair' or attribute='USA President'
group by people_id having count(*) >= 2;



A postgres specific solution is OK, but SQL92 is better.  I had the "in" solution recommended to me, but it's performing dramatically poorly on huge tables.

Thanks for any references to a solution!  -Bryce

Re: Best way to "and" from a one-to-many joined table?

From
Milan Oparnica
Date:
Hi,

This is how I do it, and it runs fast:

select p.*
from test_people p inner join test_attributes a on p.people_id = 
a.people_id
where a."attribute" = @firstAttr or a."attribute" = @secondAttr

If you have many attributes to search for you can replace the where part 
with

where a."attribute" in (@firstAttr,@secondAttr,...)

For best results, you can index the field "attribute" on test_attributes 
table. Be aware of case sensitivity of PG text search.

Best regards,

Milan Oparnica


Oliveiros Cristina wrote:
> Howdy, Bryce
>  
> Could you please try this out and tell me if it gave what you want.
>  
> Best,
> Oliveiros
>  
> SELECT person_name
> FROM test_people p
> JOIN test_attributes a
> ON ((a.people_id = p.people_id) AND (a."attribute" = @firstAttr))
> JOIN test_attributes b
> ON ((b."people_id" = p."people_id") AND (b."attribute" = @secondAttr));
> 
>     ----- Original Message -----
>     *From:* Bryce Nesbitt <mailto:bryce2@obviously.com>
>     *To:* sql pgsql <mailto:pgsql-sql@postgresql.org>
>     *Sent:* Friday, December 05, 2008 6:55 PM
>     *Subject:* [SQL] Best way to "and" from a one-to-many joined table?
> 
>     Dear Experts,
> 
>     I'm looking for a good technique to do "and" searches on one-to-many
>     joined tables.  For example, to find people with both 'dark hair'
>     and 'president':
> 
>     # select * from test_people join test_attributes using (people_id);
>     +-----------+-------------+---------------+
>     | people_id | person_name |   attribute   |
>     +-----------+-------------+---------------+
>     |        10 | Satan       | The Devil     |
>     |         9 | Santa       | Imaginary     |
>     |         8 | Obamba      | Dark Hair     |
>     |         8 | Obamba      | Dark Hair     |
>     |         8 | Obamba      | USA President |
>     |        10 | Satan       | Dark Hair     |
>     +-----------+-------------+---------------+
> 
>     # select person_name from test_people where people_id in
>     (select people_id from test_attributes where attribute='USA President' 
>      INTERSECT
>      select people_id from test_attributes where attribute='Dark Hair');
> 
>     # select person_name from test_people
>     where people_id in
>     (select people_id from test_attributes where attribute='USA President')
>     and people_id in
>     (select people_id from test_attributes where attribute='Dark Hair');
> 
>     # select people_id,count(*) as count from test_people
>     join test_attributes using (people_id)
>     where attribute='Dark Hair' or attribute='USA President'
>     group by people_id having count(*) >= 2;
> 
> 
>     A postgres specific solution is OK, but SQL92 is better.  I had the
>     "in" solution recommended to me, but it's performing dramatically
>     poorly on huge tables.
> 
>     Thanks for any references to a solution!  -Bryce


Re: Best way to "and" from a one-to-many joined table?

From
Bryce Nesbitt
Date:
It works (with a DISTINCT clause added because of the duplicated row for Obama).  It has a nice clean looking explain
plan. It has the slowest execution time on this sample table (though that might not mean anything).<br /><br />
SELECT<br/> DISTINCT<br /> person_name<br /> FROM test_people p<br /> JOIN test_attributes a<br /> ON ((a.people_id =
p.people_id)AND (a."attribute" = 'Dark Hair'))<br /> JOIN test_attributes b<br /> ON ((b."people_id" = p."people_id")
AND(b."attribute" = 'USA President'));<br /><br /> Here's the full test table<br /><br /> $ pg_dump --table=test_people
--table=test_attributes-p 5433 -i<br /> CREATE TABLE test_attributes (<br />     people_id integer,<br />     attribute
text<br/> );<br /> COPY test_attributes (people_id, attribute) FROM stdin;<br /> 10    The Devil<br /> 9   
Imaginary<br/> 8    Dark Hair<br /> 8    Dark Hair<br /> 8    USA President<br /> 10    Dark Hair<br /> \.<br /><br />
CREATETABLE test_people (<br />     people_id integer DEFAULT nextval('test_sequence'::regclass) NOT NULL,<br />    
person_nametext<br /> );<br /> COPY test_people (people_id, person_name) FROM stdin;<br /> 8    Obamba<br /> 9   
Santa<br/> 10    Satan<br /> \.<br /><br /><br /> Oliveiros Cristina wrote: <blockquote
cite="mid:00d401c9570e$f1fbcee0$ec5a3d0a@marktestcr.marktest.pt"type="cite"><style></style><div><font face="Arial"
size="2">Howdy,Bryce</font></div><div><font face="Arial" size="2">Could you please try this out and tell me if it gave
whatyou want.</font></div><div><font face="Arial" size="2">Best,</font></div><div><font face="Arial"
size="2">Oliveiros</font></div><div> </div><div><fontface="Arial" size="2">SELECT person_name<br /> FROM test_people
p<br/> JOIN test_attributes a<br /> ON ((a.people_id = p.people_id) AND (a."attribute" = @firstAttr))<br /> JOIN
test_attributesb<br /> ON ((b."people_id" = p."people_id") AND (b."attribute" =
@secondAttr));</font></div></blockquote><br/><br /> 

Re: Best way to "and" from a one-to-many joined table?

From
"Oliveiros Cristina"
Date:
Hello, Bryce.
It wasn't supposed to output duplicates.

I have assumed that on the test_attributes u didn't have duplicate records, i.e.,
you didn't have the same pair (people_id, attribute) more than once... But it seems you do...
And Hence the duplicate row for Obama .
Why is that?
One person can have exactly the same attribute twice?? :-)

On the execution speed, I do declare that query optimization is an area
I know very little about (just to avoid  saying that i know nothing :p ) , maybe someone
with more knowledge than me can help you better, but from my
own experience, not just with postgres, but also with other sgbd ,
I can tell that subqueries of the kind WHERE x in (SELECT ... )
have the tendency to be slow, that's why I tried to provide you
a solution with the JOINs


Best,
Oliveiros


2008/12/5 Bryce Nesbitt <bryce2@obviously.com>
It works (with a DISTINCT clause added because of the duplicated row for Obama).  It has a nice clean looking explain plan.  It has the slowest execution time on this sample table (though that might not mean anything).

SELECT
DISTINCT

person_name
FROM test_people p
JOIN test_attributes a
ON ((a.people_id = p.people_id) AND (a."attribute" = 'Dark Hair'))
JOIN test_attributes b
ON ((b."people_id" = p."people_id") AND (b."attribute" = 'USA President'));

Here's the full test table

$ pg_dump --table=test_people --table=test_attributes -p 5433 -i
CREATE TABLE test_attributes (
    people_id integer,
    attribute text
);
COPY test_attributes (people_id, attribute) FROM stdin;
10    The Devil
9    Imaginary
8    Dark Hair
8    Dark Hair
8    USA President
10    Dark Hair
\.

CREATE TABLE test_people (
    people_id integer DEFAULT nextval('test_sequence'::regclass) NOT NULL,
    person_name text
);
COPY test_people (people_id, person_name) FROM stdin;
8    Obamba
9    Santa
10    Satan
\.



Oliveiros Cristina wrote:
Howdy, Bryce
Could you please try this out and tell me if it gave what you want.
Best,
Oliveiros
 
SELECT person_name
FROM test_people p
JOIN test_attributes a
ON ((a.people_id = p.people_id) AND (a."attribute" = @firstAttr))
JOIN test_attributes b
ON ((b."people_id" = p."people_id") AND (b."attribute" = @secondAttr));



Re: Best way to "and" from a one-to-many joined table?

From
Steve Midgley
Date:
At 11:20 AM 12/6/2008, pgsql-sql-owner@postgresql.org wrote:
>Message-ID: <00d401c9570e$f1fbcee0$ec5a3d0a@marktestcr.marktest.pt>
>From: "Oliveiros Cristina" <oliveiros.cristina@marktest.pt>
>To: "Bryce Nesbitt" <bryce2@obviously.com>,
>         "sql pgsql" <pgsql-sql@postgresql.org>
>References: <4939791B.5090604@obviously.com>
>Subject: Re: Best way to "and" from a one-to-many joined table?
>Date: Fri, 5 Dec 2008 19:23:25 -0000
>
>Howdy, Bryce
>
>Could you please try this out and tell me if it gave what you want.
>
>Best,
>Oliveiros
>
>SELECT person_name
>FROM test_people p
>JOIN test_attributes a
>ON ((a.people_id = p.people_id) AND (a."attribute" = @firstAttr))
>JOIN test_attributes b
>ON ((b."people_id" = p."people_id") AND (b."attribute" = 
>@secondAttr));

Hi,

I saw a few people post answers to this question and it raised another 
related question for me.

What are the differences between the above query and this one. Are they 
semantically/functionally identical but might differ in performance? Or 
would they be optimized down to an identical query? Or am I misreading 
them and they are actually different?

SELECT person_name
FROM test_people p
JOIN test_attributes a
ON ((a.people_id = p.people_id)
JOIN test_attributes b
ON ((b."people_id" = p."people_id")
WHERE  (a."attribute" = @firstAttr))  AND (b."attribute" = @secondAttr));

Also, any suggestions about how to figure out this on my own without 
bugging the list in the future would be great. Thanks for any insight!

Steve

p.s. I posting in the same thread, but if you think I should have 
started a new thread let me know for the future. 



Re: Best way to "and" from a one-to-many joined table?

From
John Lister
Date:
I guess it depends on the optimiser and how clever it is. With the 
former the db will probably generate 2 sets of ids for the 2 joined 
tables (a, b) which only contain the values you require, these lists are 
probably much smaller than the total number of rows in the table 
therefore any merges and sorts on them have to operate on less rows and 
will be quicker. With the latter query it has to fetch all the rows 
regardless of the attribute and then do the restriction at the end, 
which results in more rows, bigger merges and sorts and takes longer...
Obviously postgres may be clever enough to realise what you want and 
rearrange the query internally to a more efficient form.

Generally to find out what it is doing stick "EXPLAIN (ANALYZE)" in 
front. This will show you the steps the db is taking to perform the 
query and in what order.
If you include ANAYLZE then the db actually does the query (throwing 
away the results) and gives you accurate values, etc otherwise it shows 
you estimated values based on the various stats collected for the table.

>> SELECT person_name
>> FROM test_people p
>> JOIN test_attributes a
>> ON ((a.people_id = p.people_id) AND (a."attribute" = @firstAttr))
>> JOIN test_attributes b
>> ON ((b."people_id" = p."people_id") AND (b."attribute" = @secondAttr));
>
> Hi,
>
> I saw a few people post answers to this question and it raised another 
> related question for me.
>
> What are the differences between the above query and this one. Are 
> they semantically/functionally identical but might differ in 
> performance? Or would they be optimized down to an identical query? Or 
> am I misreading them and they are actually different?
>
> SELECT person_name
> FROM test_people p
> JOIN test_attributes a
> ON ((a.people_id = p.people_id)
> JOIN test_attributes b
> ON ((b."people_id" = p."people_id")
> WHERE
>   (a."attribute" = @firstAttr))
>   AND (b."attribute" = @secondAttr));
>
> Also, any suggestions about how to figure out this on my own without 
> bugging the list in the future would be great. Thanks for any insight!
>
> Steve
>
> p.s. I posting in the same thread, but if you think I should have 
> started a new thread let me know for the future.
>


Re: Best way to "and" from a one-to-many joined table?

From
Bryce Nesbitt
Date:

Milan Oparnica wrote:
> This is how I do it, and it runs fast:
> select p.*
> from test_people p inner join test_attributes a on p.people_id =
> a.people_id
> where a."attribute" = @firstAttr or a."attribute" = @secondAttr
But that does an "or" search, not "and", returning Satan in addition to
Obama:

select * from test_people p inner join test_attributes a on p.people_id
= a.people_id
lyell5-> where a."attribute" = 'Dark Hair' or a."attribute" = 'USA
President';
+-----------+-------------+-----------+---------------+
| people_id | person_name | people_id |   attribute   |
+-----------+-------------+-----------+---------------+
|         8 | Obamba      |         8 | USA President |
|         8 | Obamba      |         8 | Dark Hair     |
|         8 | Obamba      |         8 | Dark Hair     |
|        10 | Satan       |        10 | Dark Hair     |
+-----------+-------------+-----------+---------------+

How can I get an AND search (people with Dark Hair AND who are President)?



Re: Best way to "and" from a one-to-many joined table?

From
"Oliveiros Cristina"
Date:
How can I get an AND search (people with Dark Hair AND who are President)?

The two joins didn't work?
Or were they too slow ?

Best,
Oliveiros

2008/12/10 Bryce Nesbitt <bryce2@obviously.com>


Milan Oparnica wrote:
> This is how I do it, and it runs fast:
> select p.*
> from test_people p inner join test_attributes a on p.people_id =
> a.people_id
> where a."attribute" = @firstAttr or a."attribute" = @secondAttr
But that does an "or" search, not "and", returning Satan in addition to
Obama:

select * from test_people p inner join test_attributes a on p.people_id
= a.people_id
lyell5-> where a."attribute" = 'Dark Hair' or a."attribute" = 'USA
President';
+-----------+-------------+-----------+---------------+
| people_id | person_name | people_id |   attribute   |
+-----------+-------------+-----------+---------------+
|         8 | Obamba      |         8 | USA President |
|         8 | Obamba      |         8 | Dark Hair     |
|         8 | Obamba      |         8 | Dark Hair     |
|        10 | Satan       |        10 | Dark Hair     |
+-----------+-------------+-----------+---------------+

How can I get an AND search (people with Dark Hair AND who are President)?


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql