Thread: Joins on many-to-many relations.

Joins on many-to-many relations.

From

Wiebe Cazemier

Date:

14 March 2007, 12:40:20

Hi,

Doing a join on one-to-many relations (like "orders" joining "custumors") is
easy, but what if there are many-to-many relations involved?

Consider this scenario of three (simplified) tables:

people
- id
- name

accounts
- id
- owner_id REFERENCES people

account_co_owners
- co_owner_id REFERENCES people
- account_id REFERENCES accounts

I need a query that allows the user to search for accounts by giving names of
either co-owners or owners. Currently, the query responsible is this:

SELECT DISTINCT ON (account.id) account.*
FROM accounts AS account
INNER JOIN people AS owner       ON owner.id = account.owner_id        OR owner.id IN (SELECT co_owner_id
        FROM account_co_owners                        WHERE account_id = account.id                        AND
co_owner_id= owner.id)
 
WHERE owner.name LIKE '%user supplied search string%';

But this query is too slow for my taste. It takes about 3 seconds, for only 800
accounts). Without the subselect in the JOIN statement (and therefor without
the ability to search based on the co-owner names), it is significantly
faster.

My question is, can joining many-to-many relations be done in a better way than
what I'm doing here? 

Thanks in advance.

Re: Joins on many-to-many relations.

From

Frank Bax

Date:

14 March 2007, 15:01:33

At 11:39 AM 3/14/07, Wiebe Cazemier wrote:
>Consider this scenario of three (simplified) tables:
>
>people
>- id
>- name
>
>accounts
>- id
>- owner_id REFERENCES people
>
>account_co_owners
>- co_owner_id REFERENCES people
>- account_id REFERENCES accounts
>
>I need a query that allows the user to search for accounts by giving names of
>either co-owners or owners. Currently, the query responsible is this:
>
>SELECT DISTINCT ON (account.id) account.*
>FROM accounts AS account
>INNER JOIN people AS owner
>         ON owner.id = account.owner_id
>         OR owner.id IN (SELECT co_owner_id
>                         FROM account_co_owners
>                         WHERE account_id = account.id
>                         AND co_owner_id = owner.id)
>WHERE owner.name LIKE '%user supplied search string%';
>
>But this query is too slow for my taste.


A performance question should always include the output of EXPLAIN ANALYZE.

I think the problem is database design.  If you added a boolean column into 
accounts table which would indicate owner/co-owner; then all data from 
account_co_owner could be merged into accounts and the query would be much 
simpler to code.

I don't expect this code to be any quicker; but I think it more clearly 
identifies the problem with your design:

SELECT accounts.* from accounts
inner join  ( SELECT account.* FROM    ( select id,owner_id from accounts      union      select account_id,co_owner_id
fromaccount_co_owners    ) as account    INNER JOIN    ( SELECT id FROM people WHERE name LIKE '%user%' ) AS owner
onaccount.owner_id = owner.id  ) as acct on acct.id=accounts.id;

Re: Joins on many-to-many relations.

From

Wiebe Cazemier

Date:

14 March 2007, 18:59:32

On Wednesday 14 March 2007 18:58, Frank Bax wrote:
> A performance question should always include the output of EXPLAIN ANALYZE.
> 
> I think the problem is database design.  If you added a boolean column into
> accounts table which would indicate owner/co-owner; then all data from
> account_co_owner could be merged into accounts and the query would be much
> simpler to code.
> 
> I don't expect this code to be any quicker; but I think it more clearly
> identifies the problem with your design:
> 
> SELECT accounts.* from accounts
> inner join
>    ( SELECT account.* FROM
>      ( select id,owner_id from accounts
>        union
>        select account_id,co_owner_id from account_co_owners
>      ) as account
>      INNER JOIN
>      ( SELECT id FROM people WHERE name LIKE '%user%' ) AS owner
>      on account.owner_id = owner.id
>    ) as acct on acct.id=accounts.id;

I can't say I really understand that query, but a union is not going to 
work, because account_co_owners is nothing more than a join-table, 
whereas accounts contains all the information belonging to an account. 
An account has one primary owner, indicated by the owner_id, and one or 
more co-owners, described by the account_co_owners table. Owners and 
co-owners are all of type people. I don't see anything wrong with this 
design.

In the real word, an account is actually a transaction_account. This 
is the real query ('%KOE%' is the user supplied search string):

SELECT DISTINCT ON (account.id) account.*
FROM trade.transaction_accounts AS account
INNER JOIN people.people AS owner       ON owner.id = account.owner_id        OR owner.id IN (SELECT co_owner_id
               FROM trade.transaction_account_co_owners                        WHERE account_id = account.id
           AND co_owner_id = owner.id)

WHERE upper(account.description) LIKE '%KOE%'
OR upper(owner.name) LIKE '%KOE%'
OR upper(owner.familiar_name) LIKE '%KOE%'
OR upper(owner.full_name) LIKE '%KOE%'

I discovered that removing the subselect (the entire second condition of 
the join actually) is not the only thing that speeds it up. If I remove 
the LIKE check on account.description, it's also a lot faster (152 ms 
as opposed to 2915 ms), although not as fast as without the subselect. 
I don't understand why that makes such a big difference. There is an 
index on upper() on the field.

This is the EXPLAIN ANALYZE output:

Unique  (cost=0.00..1061826.94 rows=800 width=551) (actual time=430.172..6492.619 rows=4 loops=1)  ->  Nested Loop
(cost=0.00..1061644.80rows=72856 width=551) (actual time=430.165..6492.585 rows=5 loops=1)        Join Filter:
(((upper(("outer".description)::text)~~ '%KOE%'::text) OR (upper(("inner".name)::text) ~~ '%KOE%'::text) OR
(upper(("inner".familiar_name)::text)~~ '%KOE%'::text) OR (upper(("inner".full_name)::text) ~~ '%KOE%'::text)) AND
(("inner".id= "outer".owner_id) OR (subplan)))        ->  Index Scan using transaction_accounts_pkey on
transaction_accountsaccount  (cost=0.00..36.80 rows=800 width=551) (actual time=0.014..3.717 rows=800 loops=1)
-> Seq Scan on people "owner"  (cost=0.00..54.08 rows=1208 width=1552) (actual time=0.002..2.541 rows=1208 loops=800)
    SubPlan          ->  Seq Scan on transaction_account_co_owners  (cost=0.00..2.04 rows=1 width=4) (actual
time=0.029..0.029rows=0 loops=4796)                Filter: ((account_id = $0) AND (co_owner_id = $1))Total runtime:
6492.709ms

But, I can't really be asking you to fully analyze my query, unless you see
something obvious that can be improved. My question was mainly general; 
if there is a better way than using subselects to join two tables which 
are only connected to eachother through a join-table (containing only 
references to the two tables in question). Subselects are usually very 
slow, aren't they?

Re: Joins on many-to-many relations.

From

"Rodrigo De León"

Date:

15 March 2007, 11:06:19

On 3/14/07, Wiebe Cazemier <halfgaar@gmx.net> wrote:
> I discovered that removing the subselect (the entire second condition of
> the join actually) is not the only thing that speeds it up. If I remove
> the LIKE check on account.description, it's also a lot faster (152 ms
> as opposed to 2915 ms), although not as fast as without the subselect.
> I don't understand why that makes such a big difference. There is an
> index on upper() on the field.

From http://www.postgresql.org/docs/8.2/static/indexes-types.html :

"The optimizer can also use a B-tree index for queries involving the
pattern matching operators LIKE and ~ if the pattern is a constant and
is anchored to the beginning of the string — for example, col LIKE
'foo%' or col ~ '^foo', but not col LIKE '%bar'."

Re: Joins on many-to-many relations.

From

Wiebe Cazemier

Date:

15 March 2007, 11:19:38

On Thursday 15 March 2007 15:05, Rodrigo De León wrote:

> From http://www.postgresql.org/docs/8.2/static/indexes-types.html :
> 
> "The optimizer can also use a B-tree index for queries involving the
> pattern matching operators LIKE and ~ if the pattern is a constant and
> is anchored to the beginning of the string ? for example, col LIKE
> 'foo%' or col ~ '^foo', but not col LIKE '%bar'."

It made the query faster (1500 instead of 2500 ms), but not fast enough. But,
this is good to know, although I often need the '%foobar%' search string...

Re: Joins on many-to-many relations.

From

Wiebe Cazemier

Date:

16 March 2007, 08:56:29

On Wednesday 14 March 2007 22:59, Wiebe Cazemier wrote:

> My question was mainly general;
> if there is a better way than using subselects to join two tables which
> are only connected to eachother through a join-table (containing only
> references to the two tables in question). Subselects are usually very
> slow, aren't they?

I fixed it. I now have a query with two outer joins, instead of the subselect
in the join condition:

SELECT DISTINCT ON (account.id) account.*
FROM trade.transaction_accounts AS account
INNER JOIN people.people AS owner       ON owner.id = account.owner_id
LEFT OUTER JOIN trade.transaction_account_co_owners acct_co_owner            ON account.id = acct_co_owner.account_id
LEFT OUTER JOIN people.people AS co_owner            ON acct_co_owner.co_owner_id = co_owner.id
WHERE upper(account.description) LIKE '%KOE%'
OR upper(owner.name) LIKE '%KOE%'
OR upper(owner.familiar_name) LIKE '%KOE%'
OR upper(owner.full_name) LIKE '%KOE%'
OR upper(co_owner.name) LIKE '%KOE%'
OR upper(co_owner.familiar_name) LIKE '%KOE%'
OR upper(co_owner.full_name) LIKE '%KOE%'

And now it executes in 1 ms.

This is what I was trying to do from the beginning, but because of some mental
block, was unable to think of the join condition...