Thread: Joins on many-to-many relations.
Hi, Doing a join on one-to-many relations (like "orders" joining "custumors") is easy, but what if there are many-to-many relations involved? Consider this scenario of three (simplified) tables: people - id - name accounts - id - owner_id REFERENCES people account_co_owners - co_owner_id REFERENCES people - account_id REFERENCES accounts I need a query that allows the user to search for accounts by giving names of either co-owners or owners. Currently, the query responsible is this: SELECT DISTINCT ON (account.id) account.* FROM accounts AS account INNER JOIN people AS owner ON owner.id = account.owner_id OR owner.id IN (SELECT co_owner_id FROM account_co_owners WHERE account_id = account.id AND co_owner_id= owner.id) WHERE owner.name LIKE '%user supplied search string%'; But this query is too slow for my taste. It takes about 3 seconds, for only 800 accounts). Without the subselect in the JOIN statement (and therefor without the ability to search based on the co-owner names), it is significantly faster. My question is, can joining many-to-many relations be done in a better way than what I'm doing here? Thanks in advance.
At 11:39 AM 3/14/07, Wiebe Cazemier wrote: >Consider this scenario of three (simplified) tables: > >people >- id >- name > >accounts >- id >- owner_id REFERENCES people > >account_co_owners >- co_owner_id REFERENCES people >- account_id REFERENCES accounts > >I need a query that allows the user to search for accounts by giving names of >either co-owners or owners. Currently, the query responsible is this: > >SELECT DISTINCT ON (account.id) account.* >FROM accounts AS account >INNER JOIN people AS owner > ON owner.id = account.owner_id > OR owner.id IN (SELECT co_owner_id > FROM account_co_owners > WHERE account_id = account.id > AND co_owner_id = owner.id) >WHERE owner.name LIKE '%user supplied search string%'; > >But this query is too slow for my taste. A performance question should always include the output of EXPLAIN ANALYZE. I think the problem is database design. If you added a boolean column into accounts table which would indicate owner/co-owner; then all data from account_co_owner could be merged into accounts and the query would be much simpler to code. I don't expect this code to be any quicker; but I think it more clearly identifies the problem with your design: SELECT accounts.* from accounts inner join ( SELECT account.* FROM ( select id,owner_id from accounts union select account_id,co_owner_id fromaccount_co_owners ) as account INNER JOIN ( SELECT id FROM people WHERE name LIKE '%user%' ) AS owner onaccount.owner_id = owner.id ) as acct on acct.id=accounts.id;
On Wednesday 14 March 2007 18:58, Frank Bax wrote: > A performance question should always include the output of EXPLAIN ANALYZE. > > I think the problem is database design. If you added a boolean column into > accounts table which would indicate owner/co-owner; then all data from > account_co_owner could be merged into accounts and the query would be much > simpler to code. > > I don't expect this code to be any quicker; but I think it more clearly > identifies the problem with your design: > > SELECT accounts.* from accounts > inner join > ( SELECT account.* FROM > ( select id,owner_id from accounts > union > select account_id,co_owner_id from account_co_owners > ) as account > INNER JOIN > ( SELECT id FROM people WHERE name LIKE '%user%' ) AS owner > on account.owner_id = owner.id > ) as acct on acct.id=accounts.id; I can't say I really understand that query, but a union is not going to work, because account_co_owners is nothing more than a join-table, whereas accounts contains all the information belonging to an account. An account has one primary owner, indicated by the owner_id, and one or more co-owners, described by the account_co_owners table. Owners and co-owners are all of type people. I don't see anything wrong with this design. In the real word, an account is actually a transaction_account. This is the real query ('%KOE%' is the user supplied search string): SELECT DISTINCT ON (account.id) account.* FROM trade.transaction_accounts AS account INNER JOIN people.people AS owner ON owner.id = account.owner_id OR owner.id IN (SELECT co_owner_id FROM trade.transaction_account_co_owners WHERE account_id = account.id AND co_owner_id = owner.id) WHERE upper(account.description) LIKE '%KOE%' OR upper(owner.name) LIKE '%KOE%' OR upper(owner.familiar_name) LIKE '%KOE%' OR upper(owner.full_name) LIKE '%KOE%' I discovered that removing the subselect (the entire second condition of the join actually) is not the only thing that speeds it up. If I remove the LIKE check on account.description, it's also a lot faster (152 ms as opposed to 2915 ms), although not as fast as without the subselect. I don't understand why that makes such a big difference. There is an index on upper() on the field. This is the EXPLAIN ANALYZE output: Unique (cost=0.00..1061826.94 rows=800 width=551) (actual time=430.172..6492.619 rows=4 loops=1) -> Nested Loop (cost=0.00..1061644.80rows=72856 width=551) (actual time=430.165..6492.585 rows=5 loops=1) Join Filter: (((upper(("outer".description)::text)~~ '%KOE%'::text) OR (upper(("inner".name)::text) ~~ '%KOE%'::text) OR (upper(("inner".familiar_name)::text)~~ '%KOE%'::text) OR (upper(("inner".full_name)::text) ~~ '%KOE%'::text)) AND (("inner".id= "outer".owner_id) OR (subplan))) -> Index Scan using transaction_accounts_pkey on transaction_accountsaccount (cost=0.00..36.80 rows=800 width=551) (actual time=0.014..3.717 rows=800 loops=1) -> Seq Scan on people "owner" (cost=0.00..54.08 rows=1208 width=1552) (actual time=0.002..2.541 rows=1208 loops=800) SubPlan -> Seq Scan on transaction_account_co_owners (cost=0.00..2.04 rows=1 width=4) (actual time=0.029..0.029rows=0 loops=4796) Filter: ((account_id = $0) AND (co_owner_id = $1))Total runtime: 6492.709ms But, I can't really be asking you to fully analyze my query, unless you see something obvious that can be improved. My question was mainly general; if there is a better way than using subselects to join two tables which are only connected to eachother through a join-table (containing only references to the two tables in question). Subselects are usually very slow, aren't they?
On 3/14/07, Wiebe Cazemier <halfgaar@gmx.net> wrote: > I discovered that removing the subselect (the entire second condition of > the join actually) is not the only thing that speeds it up. If I remove > the LIKE check on account.description, it's also a lot faster (152 ms > as opposed to 2915 ms), although not as fast as without the subselect. > I don't understand why that makes such a big difference. There is an > index on upper() on the field. From http://www.postgresql.org/docs/8.2/static/indexes-types.html : "The optimizer can also use a B-tree index for queries involving the pattern matching operators LIKE and ~ if the pattern is a constant and is anchored to the beginning of the string — for example, col LIKE 'foo%' or col ~ '^foo', but not col LIKE '%bar'."
On Thursday 15 March 2007 15:05, Rodrigo De León wrote: > From http://www.postgresql.org/docs/8.2/static/indexes-types.html : > > "The optimizer can also use a B-tree index for queries involving the > pattern matching operators LIKE and ~ if the pattern is a constant and > is anchored to the beginning of the string ? for example, col LIKE > 'foo%' or col ~ '^foo', but not col LIKE '%bar'." It made the query faster (1500 instead of 2500 ms), but not fast enough. But, this is good to know, although I often need the '%foobar%' search string...
On Wednesday 14 March 2007 22:59, Wiebe Cazemier wrote: > My question was mainly general; > if there is a better way than using subselects to join two tables which > are only connected to eachother through a join-table (containing only > references to the two tables in question). Subselects are usually very > slow, aren't they? I fixed it. I now have a query with two outer joins, instead of the subselect in the join condition: SELECT DISTINCT ON (account.id) account.* FROM trade.transaction_accounts AS account INNER JOIN people.people AS owner ON owner.id = account.owner_id LEFT OUTER JOIN trade.transaction_account_co_owners acct_co_owner ON account.id = acct_co_owner.account_id LEFT OUTER JOIN people.people AS co_owner ON acct_co_owner.co_owner_id = co_owner.id WHERE upper(account.description) LIKE '%KOE%' OR upper(owner.name) LIKE '%KOE%' OR upper(owner.familiar_name) LIKE '%KOE%' OR upper(owner.full_name) LIKE '%KOE%' OR upper(co_owner.name) LIKE '%KOE%' OR upper(co_owner.familiar_name) LIKE '%KOE%' OR upper(co_owner.full_name) LIKE '%KOE%' And now it executes in 1 ms. This is what I was trying to do from the beginning, but because of some mental block, was unable to think of the join condition...