Thread: No index usage with "left join"
We have a "companies" and a "contacts" table with about 3000 records each. We run the following SQL-Command which runs about 2 MINUTES !: SELECT count(*) FROM contacts LEFT JOIN companies ON contacts.sid = companies.intfield01 contacts.sid (type text, b-tree index on it) companies.intfield01 (type bigint, b-tree index on it) comfire=> explain analyze SELECT count(*) FROM prg_contacts LEFT JOIN prg_addresses ON prg_contacts.sid=prg_addresses.intfield01; NOTICE: QUERY PLAN: Aggregate (cost=495261.02..495261.02 rows=1 width=15) (actual time=40939.38..40939.38 rows=1 loops=1) -> Nested Loop (cost=0.00..495253.81 rows=2885 width=15) (actual time=0.05..40930.14 rows=2866 loops=1) -> Seq Scan on prg_contacts (cost=0.00..80.66 rows=2866 width=7) (actual time=0.01..18.10 rows=2866 loops=1) -> Seq Scan on prg_addresses (cost=0.00..131.51 rows=2751 width=8) (actual time=0.03..6.25 rows=2751 loops=2866) Total runtime: 40939.52 msec EXPLAIN Note: - We need the left join because we need all contacts even if they are not assigned to a company - We are not able to change the datatypes of the joined fields because we use a standard software (btw who cares: SuSE Open Exchange Server) - When we use a normal join (without LEFT or a where clause) the SQL runs immediately using the indexes How can I force the usage of the indexes when using "left join". Or any other SQL construct that does the same !? Can anybody please give us a hint !? Thanks in forward. Greetings Achim
Cannot you do a cast in your query? Does that help with using the indexes? -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of mailing@impactmedia.de Sent: maandag 2 augustus 2004 14:09 To: pgsql-performance@postgresql.org Subject: [PERFORM] No index usage with "left join" We have a "companies" and a "contacts" table with about 3000 records each. We run the following SQL-Command which runs about 2 MINUTES !: SELECT count(*) FROM contacts LEFT JOIN companies ON contacts.sid = companies.intfield01 contacts.sid (type text, b-tree index on it) companies.intfield01 (type bigint, b-tree index on it) comfire=> explain analyze SELECT count(*) FROM prg_contacts LEFT JOIN prg_addresses ON prg_contacts.sid=prg_addresses.intfield01; NOTICE: QUERY PLAN: Aggregate (cost=495261.02..495261.02 rows=1 width=15) (actual time=40939.38..40939.38 rows=1 loops=1) -> Nested Loop (cost=0.00..495253.81 rows=2885 width=15) (actual time=0.05..40930.14 rows=2866 loops=1) -> Seq Scan on prg_contacts (cost=0.00..80.66 rows=2866 width=7) (actual time=0.01..18.10 rows=2866 loops=1) -> Seq Scan on prg_addresses (cost=0.00..131.51 rows=2751 width=8) (actual time=0.03..6.25 rows=2751 loops=2866) Total runtime: 40939.52 msec EXPLAIN Note: - We need the left join because we need all contacts even if they are not assigned to a company - We are not able to change the datatypes of the joined fields because we use a standard software (btw who cares: SuSE Open Exchange Server) - When we use a normal join (without LEFT or a where clause) the SQL runs immediately using the indexes How can I force the usage of the indexes when using "left join". Or any other SQL construct that does the same !? Can anybody please give us a hint !? Thanks in forward. Greetings Achim ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
> SELECT count(*) FROM contacts LEFT JOIN companies ON contacts.sid = > companies.intfield01 > > contacts.sid (type text, b-tree index on it) > companies.intfield01 (type bigint, b-tree index on it) <snip> > How can I force the usage of the indexes when using "left join". Or > any other SQL construct that does the same !? Can anybody please give > us a hint !? You really don't need to use indexes since you're fetching all information from both tables. Anyway, we can be fairly sure this isn't PostgreSQL 7.4 (which would likely choose a far better plan -- hash join rather than nested loop) as it won't join a bigint to a text field without a cast. Try this: set enable_nestloop = false; SELECT count(*) FROM contacts LEFT JOIN companies ON cast(contacts.sid as bigint) = companies.intfield01; set enable_nestloop = true;
Rod Taylor <pg@rbt.ca> writes: >> How can I force the usage of the indexes when using "left join". > Anyway, we can be fairly sure this isn't PostgreSQL 7.4 (which would > likely choose a far better plan -- hash join rather than nested loop) Indeed, the lack of any join-condition line in the EXPLAIN output implies it's 7.2 or older. IIRC 7.4 is the first release that is capable of using merge or hash join with a condition more complicated than plain "Var = Var". In this case, since the two fields are of different datatypes, the planner sees something like "Var = Var::text" (ie, there's an inserted cast function). 7.2 will just say "duh, too complicated for me" and generate a nestloop. With the columns being of different datatypes, you don't even have a chance for an inner indexscan in the nestloop. In short: change the column datatypes to be the same, or update to 7.4.something. There are no other solutions. (Well, if you were really desperate you could create a set of mergejoinable "text op bigint" comparison operators, and then 7.2 would be able to cope; but I should think that updating to 7.4 would be much less work.) regards, tom lane
On Mon, 2004-08-02 at 06:08, mailing@impactmedia.de wrote: > We have a "companies" and a "contacts" table with about 3000 records > each. > > We run the following SQL-Command which runs about 2 MINUTES !: > > SELECT count(*) FROM contacts LEFT JOIN companies ON contacts.sid = > companies.intfield01 > > contacts.sid (type text, b-tree index on it) > companies.intfield01 (type bigint, b-tree index on it) > > comfire=> explain analyze SELECT count(*) FROM prg_contacts LEFT JOIN > prg_addresses ON prg_contacts.sid=prg_addresses.intfield01; > NOTICE: QUERY PLAN: > > Aggregate (cost=495261.02..495261.02 rows=1 width=15) (actual > time=40939.38..40939.38 rows=1 loops=1) > -> Nested Loop (cost=0.00..495253.81 rows=2885 width=15) (actual > time=0.05..40930.14 rows=2866 loops=1) > -> Seq Scan on prg_contacts (cost=0.00..80.66 rows=2866 > width=7) (actual time=0.01..18.10 rows=2866 loops=1) > -> Seq Scan on prg_addresses (cost=0.00..131.51 rows=2751 > width=8) (actual time=0.03..6.25 rows=2751 loops=2866) > Total runtime: 40939.52 msec > > EXPLAIN > > Note: > - We need the left join because we need all contacts even if they are > not assigned to a company > - We are not able to change the datatypes of the joined fields > because we use a standard software (btw who cares: SuSE Open Exchange > Server) > - When we use a normal join (without LEFT or a where clause) the SQL > runs immediately using the indexes > > How can I force the usage of the indexes when using "left join". Or > any other SQL construct that does the same !? Can anybody please give > us a hint !? Why in the world would the database use the index in this case? You're retrieving every single row, so it may as well hit the data store directly. By the way, unlike many other databases that can just hit the index, PostgreSQL always has to go back to the data store anyway to get the real value, so if it's gonna hit more than some small percentage of rows, it's usually a win to just seq scan it. Try restricting your query with a where clause to one or two rows and see what you get.