Thread: No index usage with "left join"

No index usage with "left join"

From
mailing@impactmedia.de
Date:
We have a "companies" and a "contacts" table with about 3000 records
each.

We run the following SQL-Command which runs about 2 MINUTES !:

SELECT count(*) FROM contacts LEFT JOIN companies ON contacts.sid =
companies.intfield01

contacts.sid (type text, b-tree index on it)
companies.intfield01 (type bigint, b-tree index on it)

comfire=> explain analyze SELECT count(*) FROM prg_contacts LEFT JOIN
prg_addresses ON prg_contacts.sid=prg_addresses.intfield01;
NOTICE:  QUERY PLAN:

Aggregate  (cost=495261.02..495261.02 rows=1 width=15) (actual
time=40939.38..40939.38 rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..495253.81 rows=2885 width=15) (actual
time=0.05..40930.14 rows=2866 loops=1)
    ->  Seq Scan on prg_contacts  (cost=0.00..80.66 rows=2866
width=7) (actual time=0.01..18.10 rows=2866 loops=1)
    ->  Seq Scan on prg_addresses  (cost=0.00..131.51 rows=2751
width=8) (actual time=0.03..6.25 rows=2751 loops=2866)
Total runtime: 40939.52 msec

EXPLAIN

Note:
- We need the left join because we need all contacts even if they are
not assigned to a company
- We are not able to change the datatypes of the joined fields
because we use a standard software (btw who cares: SuSE Open Exchange
Server)
- When we use a normal join (without LEFT or a where clause) the SQL
runs immediately using the indexes

How can I force the usage of the indexes when using "left join". Or
any other SQL construct that does the same !? Can anybody please give
us a hint !?

Thanks in forward.

Greetings
Achim

Re: No index usage with "left join"

From
"Leeuw van der, Tim"
Date:
Cannot you do a cast in your query? Does that help with using the indexes?

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org]On Behalf Of
mailing@impactmedia.de
Sent: maandag 2 augustus 2004 14:09
To: pgsql-performance@postgresql.org
Subject: [PERFORM] No index usage with "left join"


We have a "companies" and a "contacts" table with about 3000 records
each.

We run the following SQL-Command which runs about 2 MINUTES !:

SELECT count(*) FROM contacts LEFT JOIN companies ON contacts.sid =
companies.intfield01

contacts.sid (type text, b-tree index on it)
companies.intfield01 (type bigint, b-tree index on it)

comfire=> explain analyze SELECT count(*) FROM prg_contacts LEFT JOIN
prg_addresses ON prg_contacts.sid=prg_addresses.intfield01;
NOTICE:  QUERY PLAN:

Aggregate  (cost=495261.02..495261.02 rows=1 width=15) (actual
time=40939.38..40939.38 rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..495253.81 rows=2885 width=15) (actual
time=0.05..40930.14 rows=2866 loops=1)
    ->  Seq Scan on prg_contacts  (cost=0.00..80.66 rows=2866
width=7) (actual time=0.01..18.10 rows=2866 loops=1)
    ->  Seq Scan on prg_addresses  (cost=0.00..131.51 rows=2751
width=8) (actual time=0.03..6.25 rows=2751 loops=2866)
Total runtime: 40939.52 msec

EXPLAIN

Note:
- We need the left join because we need all contacts even if they are
not assigned to a company
- We are not able to change the datatypes of the joined fields
because we use a standard software (btw who cares: SuSE Open Exchange
Server)
- When we use a normal join (without LEFT or a where clause) the SQL
runs immediately using the indexes

How can I force the usage of the indexes when using "left join". Or
any other SQL construct that does the same !? Can anybody please give
us a hint !?

Thanks in forward.

Greetings
Achim

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Re: No index usage with "left join"

From
Rod Taylor
Date:
> SELECT count(*) FROM contacts LEFT JOIN companies ON contacts.sid =
> companies.intfield01
>
> contacts.sid (type text, b-tree index on it)
> companies.intfield01 (type bigint, b-tree index on it)
<snip>
> How can I force the usage of the indexes when using "left join". Or
> any other SQL construct that does the same !? Can anybody please give
> us a hint !?

You really don't need to use indexes since you're fetching all
information from both tables.

Anyway, we can be fairly sure this isn't PostgreSQL 7.4 (which would
likely choose a far better plan -- hash join rather than nested loop) as
it won't join a bigint to a text field without a cast.

Try this:
    set enable_nestloop = false;
        SELECT count(*) FROM contacts LEFT JOIN companies ON
        cast(contacts.sid as bigint) = companies.intfield01;
    set enable_nestloop = true;



Re: No index usage with "left join"

From
Tom Lane
Date:
Rod Taylor <pg@rbt.ca> writes:
>> How can I force the usage of the indexes when using "left join".

> Anyway, we can be fairly sure this isn't PostgreSQL 7.4 (which would
> likely choose a far better plan -- hash join rather than nested loop)

Indeed, the lack of any join-condition line in the EXPLAIN output
implies it's 7.2 or older.  IIRC 7.4 is the first release that is
capable of using merge or hash join with a condition more complicated
than plain "Var = Var".  In this case, since the two fields are of
different datatypes, the planner sees something like "Var = Var::text"
(ie, there's an inserted cast function).  7.2 will just say "duh, too
complicated for me" and generate a nestloop.  With the columns being
of different datatypes, you don't even have a chance for an inner
indexscan in the nestloop.

In short: change the column datatypes to be the same, or update to
7.4.something.  There are no other solutions.

(Well, if you were really desperate you could create a set of
mergejoinable "text op bigint" comparison operators, and then 7.2
would be able to cope; but I should think that updating to 7.4 would
be much less work.)

            regards, tom lane

Re: No index usage with "left join"

From
"Scott Marlowe"
Date:
On Mon, 2004-08-02 at 06:08, mailing@impactmedia.de wrote:
> We have a "companies" and a "contacts" table with about 3000 records
> each.
>
> We run the following SQL-Command which runs about 2 MINUTES !:
>
> SELECT count(*) FROM contacts LEFT JOIN companies ON contacts.sid =
> companies.intfield01
>
> contacts.sid (type text, b-tree index on it)
> companies.intfield01 (type bigint, b-tree index on it)
>
> comfire=> explain analyze SELECT count(*) FROM prg_contacts LEFT JOIN
> prg_addresses ON prg_contacts.sid=prg_addresses.intfield01;
> NOTICE:  QUERY PLAN:
>
> Aggregate  (cost=495261.02..495261.02 rows=1 width=15) (actual
> time=40939.38..40939.38 rows=1 loops=1)
>   ->  Nested Loop  (cost=0.00..495253.81 rows=2885 width=15) (actual
> time=0.05..40930.14 rows=2866 loops=1)
>     ->  Seq Scan on prg_contacts  (cost=0.00..80.66 rows=2866
> width=7) (actual time=0.01..18.10 rows=2866 loops=1)
>     ->  Seq Scan on prg_addresses  (cost=0.00..131.51 rows=2751
> width=8) (actual time=0.03..6.25 rows=2751 loops=2866)
> Total runtime: 40939.52 msec
>
> EXPLAIN
>
> Note:
> - We need the left join because we need all contacts even if they are
> not assigned to a company
> - We are not able to change the datatypes of the joined fields
> because we use a standard software (btw who cares: SuSE Open Exchange
> Server)
> - When we use a normal join (without LEFT or a where clause) the SQL
> runs immediately using the indexes
>
> How can I force the usage of the indexes when using "left join". Or
> any other SQL construct that does the same !? Can anybody please give
> us a hint !?

Why in the world would the database use the index in this case?  You're
retrieving every single row, so it may as well hit the data store
directly.  By the way, unlike many other databases that can just hit the
index, PostgreSQL always has to go back to the data store anyway to get
the real value, so if it's gonna hit more than some small percentage of
rows, it's usually a win to just seq scan it.  Try restricting your
query with a where clause to one or two rows and see what you get.