How to get the optimizer to use an index with multiple fields - Mailing list pgsql-performance
From | Jim C. Nasby |
---|---|
Subject | How to get the optimizer to use an index with multiple fields |
Date | |
Msg-id | 20030428132143.J66185@flake.decibel.org Whole thread Raw |
Responses |
Re: How to get the optimizer to use an index with multiple fields
Re: How to get the optimizer to use an index with multiple |
List | pgsql-performance |
select id into temp NewRetires from stats_participant where retire_to>=1 AND retire_date = (SELECT last_date FROM Project_statsrun WHERE project_id = :ProjectID); results in a table with 5 values... explain analyze delete from email_rank where project_id=25 and id in (select id from NewRetires); Index Scan using email_rank__day_rank on email_rank (cost=0.00..9003741627715.16 rows=45019 width=6) (actual time=408.12..9688.37 rows=3 loops=1) Index Cond: (project_id = 25) Filter: (subplan) SubPlan -> Seq Scan on newretires (cost=100000000.00..100000020.00 rows=1000 width=4) (actual time=0.01..0.05 rows=5 loops=91834) Total runtime: 9689.86 msec But, there's already an index that would fit the bill here perfectly: Table "public.email_rank" Column | Type | Modifiers -----------------------+---------+-------------------- project_id | integer | not null id | integer | not null first_date | date | not null last_date | date | not null day_rank | integer | not null default 0 day_rank_previous | integer | not null default 0 overall_rank | integer | not null default 0 overall_rank_previous | integer | not null default 0 work_today | bigint | not null default 0 work_total | bigint | not null default 0 Indexes: email_rank_pkey primary key btree (project_id, id), email_rank__day_rank btree (project_id, day_rank), email_rank__overall_rank btree (project_id, overall_rank) Why isn't it using email_rank_pkey instead of using day_rank then a filter? The original query on sybase (see below) is essentially instant, because it's using the index of (project_id, id), so it doesn't have to read the whole table. stats=> select project_id,count(*) from email_rank group by project_id; project_id | count ------------+-------- 5 | 327856 8 | 28304 24 | 34622 25 | 91834 205 | 331464 Also, changing the WHERE IN to a WHERE EXISTS in the delete is substantially faster in this case (3.5 seconds as opposed to 9); it would be nice if the optimizer could rewrite the query on-the-fly. I started looking into this in the first place because the original query was taking 6-10 seconds, which seemed too long... Original query: DELETE FROM Email_Rank WHERE project_id = :ProjectID AND id IN (SELECT id FROM STATS_Participant sp WHERE retire_to >= 1 AND retire_date = (SELECT last_date FROM Project_statsrun WHERE project_id = :ProjectID) ) ; I tried changing this to an EXISTS and it takes over a minute. So in this case, the range of runtimes is ~4 seconds (building the temp table takes ~0.25 seconds) to over a minute. -- Jim C. Nasby (aka Decibel!) jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
pgsql-performance by date: