Index "misbehavior" in PostgreSQL 8.2.2? - Mailing list pgsql-interfaces
From | Tonnerre LOMBARD |
---|---|
Subject | Index "misbehavior" in PostgreSQL 8.2.2? |
Date | |
Msg-id | 20070406172139.GA4527@thebsh.sygroup-int.ch Whole thread Raw |
List | pgsql-interfaces |
Salut, I have four tables, two of which are relevant for this: searchengine=# \d websites Table "public.websites" Column | Type | Modifiers ------------+--------------------------+-------------------------------------------------------id | bigint | not null default nextval('websites_id_seq'::regclass)url | text | not nulltitle | text | abstract | text | lastindex | timestamp with time zone| authority | bigint | not null default 0failed | boolean | not null defaultfalsespamminess | smallint | not null default 0 Indexes: "websites_pkey" PRIMARY KEY, btree (id) "websites_url_key" UNIQUE, btree (url) "websites_authority_key" btree(authority) "websites_failed_key" btree (failed) "websites_lastindex_key" btree (lastindex) "websites_spamminess_key"btree (spamminess) searchengine=# SELECT pg_size_pretty (pg_relation_size ('websites'));pg_size_pretty ----------------293 MB (1 row) searchengine=# SELECT COUNT(*) FROM websites;count --------828778 (1 row) searchengine=# \d sitewords Table "public.sitewords" Column | Type | Modifiers ------------+--------+-----------id_site | bigint | not nullid_keyword | bigint | not nullcount | bigint | not nullratio | bigint | Indexes: "sitewords_key" UNIQUE, btree (id_site, id_keyword) "sitewords_id_keyword_key" btree (id_keyword) "sitewords_id_site_key"btree (id_site) searchengine=# SELECT COUNT(*) FROM sitewords; count ----------46696260 (1 row) searchengine=# SELECT pg_size_pretty (pg_relation_size ('sitewords'));pg_size_pretty ----------------3041 MB (1 row) searchengine=# My effective_cache_size is set to 768MB. I am trying to run a query: searchengine=# EXPLAIN ANALYZE SELECT COUNT(me.id_keyword) * SUM(ratio) AS relevance, id_site FROM sitewords me JOIN websitesws ON me.id_site = ws.id WHERE me.id_keyword IN (4241,28303) GROUP BY id_site ORDER BY relevance DESC LIMIT 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------Limit (cost=150319.90..150319.91 rows=3 width=24) (actual time=9211.903..9212.013 rows=10 loops=1) -> Sort (cost=150319.90..150319.91rows=3 width=24) (actual time=9211.895..9211.931 rows=10 loops=1) Sort Key: ((count(me.id_keyword))::numeric* sum(me.ratio)) -> HashAggregate (cost=150319.81..150319.87 rows=3 width=24) (actualtime=9180.744..9197.080 rows=3320 loops=1) -> Hash Join (cost=60470.11..150116.47 rows=27112 width=24)(actual time=7816.860..9151.601 rows=5747 loops=1) Hash Cond: (me.id_site = ws.id) -> Bitmap Heap Scan on sitewords me (cost=609.31..85356.43 rows=27112 width=24) (actual time=3.245..137.026 rows=5748loops=1) Recheck Cond: (id_keyword = ANY ('{4241,28303}'::bigint[])) -> Bitmap Index Scan on sitewords_id_keyword_key (cost=0.00..602.53 rows=27112 width=0) (actual time=2.169..2.169rows=5748 loops=1) Index Cond: (id_keyword = ANY ('{4241,28303}'::bigint[])) -> Hash (cost=45674.80..45674.80 rows=816080 width=8) (actual time=7786.940..7786.940rows=829078 loops=1) -> Seq Scan on websites ws (cost=0.00..45674.80 rows=816080width=8) (actual time=0.026..4126.069 rows=829078 loops=1)Total runtime: 9212.679 ms (13 rows) searchengine=# The only way to get PostgreSQL to use the index appears to be to set the random_page_cost to 1.0 (1.1 also doesn't work). In that case, I get: searchengine=# EXPLAIN ANALYZE SELECT COUNT(me.id_keyword) * SUM(ratio) AS relevance, id_site FROM sitewords me JOIN websitesws ON me.id_site = ws.id WHERE me.id_keyword IN (4241,28303) GROUP BY id_site ORDER BY relevance DESC LIMIT 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------Limit (cost=78647.65..78647.66 rows=3 width=24) (actual time=418.368..418.478 rows=10 loops=1) -> Sort (cost=78647.65..78647.66rows=3 width=24) (actual time=418.359..418.395 rows=10 loops=1) Sort Key: ((count(me.id_keyword))::numeric* sum(me.ratio)) -> HashAggregate (cost=78647.57..78647.63 rows=3 width=24) (actualtime=386.734..403.241 rows=3320 loops=1) -> Nested Loop (cost=310.29..78444.23 rows=27112 width=24)(actual time=2.878..354.418 rows=5747 loops=1) -> Bitmap Heap Scan on sitewords me (cost=310.29..26849.19rows=27112 width=24) (actual time=2.789..135.915 rows=5748 loops=1) RecheckCond: (id_keyword = ANY ('{4241,28303}'::bigint[])) -> Bitmap Index Scan on sitewords_id_keyword_key (cost=0.00..303.51 rows=27112 width=0) (actual time=1.762..1.762 rows=5748 loops=1) Index Cond: (id_keyword = ANY ('{4241,28303}'::bigint[])) -> Index Scan using websites_pkeyon websites ws (cost=0.00..1.89 rows=1 width=8) (actual time=0.017..0.022 rows=1 loops=5748) Index Cond: (me.id_site = ws.id)Total runtime: 418.791 ms (12 rows) searchengine=# which is fine. However, if I leave random_page_cost at 4 and do the following query: searchengine=# EXPLAIN ANALYZE select x.ratio * x.count, x.id_site from (SELECT me.id_site, count(me.id_site), sum(me.ratio)as ratio FROM sitewords me WHERE me.id_keyword IN (4241, 28303) group by id_site) x join websites ws on (ws.id= x.id_site); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------Nested Loop (cost=85559.77..85585.13 rows=3 width=48) (actual time=169.707..337.717 rows=3320 loops=1) -> HashAggregate (cost=85559.77..85559.82rows=3 width=16) (actual time=169.567..184.975 rows=3321 loops=1) -> Bitmap Heap Scan onsitewords me (cost=609.31..85356.43 rows=27112 width=16) (actual time=2.897..137.590 rows=5748 loops=1) RecheckCond: (id_keyword = ANY ('{4241,28303}'::bigint[])) -> Bitmap Index Scan on sitewords_id_keyword_key (cost=0.00..602.53 rows=27112 width=0) (actual time=1.820..1.820 rows=5748 loops=1) Index Cond: (id_keyword = ANY ('{4241,28303}'::bigint[])) -> Index Scan using websites_pkey on websites ws (cost=0.00..8.41rows=1 width=8) (actual time=0.025..0.029 rows=1 loops=3321) Index Cond: (ws.id = x.id_site)Totalruntime: 351.053 ms (9 rows) searchengine=# the result is even more acceptable than with the above JOIN statement. The tables are analyzed, and I have vacuumed regularly. Any idea why PostgreSQL attempts a full sequential scan here? Tonnerre
pgsql-interfaces by date: