Arnaud Lesauvage wrote:
> I have two table with a 2-column index on both of them.
> In the first table, the first colum of the index is the primary key, the
> second one is an integer field.
> In the second table, the two columns are the primary key.
> When I join these two tables, the 2-column index of the first table is
> not used.
> Why does the query planner think that this plan is better ?
>
> ALTER TABLE geo.subcities_names
> ADD CONSTRAINT subcities_names_pkey PRIMARY KEY(subcity_gid,
> language_id);
>
> CREATE INDEX subcities_gid_language_id
> ON geo.subcities
> USING btree
> (gid, official_language_id);
>
> EXPLAIN ANALYZE
> SELECT * FROM geo.subcities sc, geo.subcities_names scn
> WHERE sc.gid = scn.subcity_gid AND sc.official_language_id =
> scn.language_id;
My theory:
There's no additional restrictions besides the join condition, so the
system has to scan both tables completely. It chooses to use a full
index scan instead of a seq scan to be able to do a merge join. Because
it's going to have to scan the indexes completely anyway, it chooses the
smallest index which is subcities_pkey.
You'd think that the system could do the merge using just the indexes,
and only fetch the heap tuples for matches. If that were the case, using
the 2-column index would indeed be a good idea. However, PostgreSQL
can't use the values stored in the index to check the join condition, so
all the heap tuples are fetched anyway. There was just recently
discussion about this on this list:
http://archives.postgresql.org/pgsql-performance/2006-09/msg00080.php.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com