How to get the optimizer to use an index with multiple fields - Mailing list pgsql-performance

From Jim C. Nasby
Subject How to get the optimizer to use an index with multiple fields
Date
Msg-id 20030428132143.J66185@flake.decibel.org
Whole thread Raw
Responses Re: How to get the optimizer to use an index with multiple fields
Re: How to get the optimizer to use an index with multiple
List pgsql-performance
select id into temp NewRetires from stats_participant where retire_to>=1
AND retire_date = (SELECT last_date FROM Project_statsrun WHERE
project_id = :ProjectID);

results in a table with 5 values...

explain analyze delete from email_rank where project_id=25 and id in
(select id from NewRetires);

 Index Scan using email_rank__day_rank on email_rank
(cost=0.00..9003741627715.16 rows=45019 width=6) (actual time=408.12..9688.37 rows=3 loops=1)
   Index Cond: (project_id = 25)
   Filter: (subplan)
   SubPlan
     ->  Seq Scan on newretires  (cost=100000000.00..100000020.00 rows=1000 width=4) (actual time=0.01..0.05 rows=5
loops=91834)
 Total runtime: 9689.86 msec

But, there's already an index that would fit the bill here perfectly:

              Table "public.email_rank"
        Column         |  Type   |     Modifiers
-----------------------+---------+--------------------
 project_id            | integer | not null
 id                    | integer | not null
 first_date            | date    | not null
 last_date             | date    | not null
 day_rank              | integer | not null default 0
 day_rank_previous     | integer | not null default 0
 overall_rank          | integer | not null default 0
 overall_rank_previous | integer | not null default 0
 work_today            | bigint  | not null default 0
 work_total            | bigint  | not null default 0
Indexes: email_rank_pkey primary key btree (project_id, id),
         email_rank__day_rank btree (project_id, day_rank),
         email_rank__overall_rank btree (project_id, overall_rank)

Why isn't it using email_rank_pkey instead of using day_rank then a
filter? The original query on sybase (see below) is essentially instant,
because it's using the index of (project_id, id), so it doesn't have to
read the whole table.

stats=> select project_id,count(*) from email_rank group by project_id;
 project_id | count
------------+--------
          5 | 327856
          8 |  28304
         24 |  34622
         25 |  91834
        205 | 331464

Also, changing the WHERE IN to a WHERE EXISTS in the delete is
substantially faster in this case (3.5 seconds as opposed to 9); it
would be nice if the optimizer could rewrite the query on-the-fly. I
started looking into this in the first place because the original query
was taking 6-10 seconds, which seemed too long...

Original query:
DELETE FROM Email_Rank
    WHERE project_id = :ProjectID
        AND id IN  (SELECT id
                        FROM STATS_Participant sp
                        WHERE retire_to >= 1
                            AND retire_date = (SELECT last_date FROM Project_statsrun WHERE project_id = :ProjectID)
                    )
;

I tried changing this to an EXISTS and it takes over a minute. So in
this case, the range of runtimes is ~4 seconds (building the temp table
takes ~0.25 seconds) to over a minute.
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


pgsql-performance by date:

Previous
From: Andrew Sullivan
Date:
Subject: Re: Diferent execution plan for similar query
Next
From: "Jim C. Nasby"
Date:
Subject: Re: How to get the optimizer to use an index with multiple fields