Thread: Index question

Index question

From

"David Witham"

Date:

13 February 2004, 01:30:33

Hi all,

I have a table with around 3M records in it and a few indexes on it. One of them is on the day column. I get 10-20K new
recordsa day. After running ANALYSE in psql I tried the following queries: 

buns=# explain select count(*) from cdr where day >= '20040127';                           QUERY PLAN
         
------------------------------------------------------------------Aggregate  (cost=85596.50..85596.50 rows=1 width=0)
-> Seq Scan on cdr  (cost=0.00..85053.86 rows=217055 width=0)        Filter: ("day" >= '2004-01-27'::date) 
(3 rows)

buns=# explain select count(*) from cdr where day = '20040127';                                   QUERY PLAN
                       
--------------------------------------------------------------------------------Aggregate  (cost=12950.10..12950.10
rows=1width=0)  ->  Index Scan using cdr_ix1 on cdr  (cost=0.00..12928.00 rows=8839 width=0)        Index Cond: ("day"
='2004-01-27'::date) 
(3 rows)

buns=# explain select count(*) from cdr where day between '20040127' and current_date;
   QUERY PLAN                                        
----------------------------------------------------------------------------------------Aggregate
(cost=20129.91..20129.91rows=1 width=0)  ->  Index Scan using cdr_ix1 on cdr  (cost=0.00..20095.66 rows=13699 width=0)
     Index Cond: (("day" >= '2004-01-27'::date) AND ("day" <= ('now'::text)::date)) 
(3 rows)

I understand that selecting count(*) will involve a scan at some stage, but I was surprised that the index wasn't used
inthe >= case, but was used in the between case. 

Why is this so? Do I need to ANALYSE some more or is this just the way the query planner works?

Thanks,
David

David Witham
Telephony Platforms Architect
Unidial
Ph: 03 8628 3383
Fax: 03 8628 3399

Re: Index question

From

Tom Lane

Date:

13 February 2004, 04:38:27

"David Witham" <davidw@unidial.com.au> writes:
> I understand that selecting count(*) will involve a scan at some
> stage, but I was surprised that the index wasn't used in the >= case,
> but was used in the between case.

Given the estimated row counts in your examples, the planner's choices
are not surprising.  You have not given us any information on whether
those estimates are accurate.
        regards, tom lane

Re: Index question

From

"David Witham"

Date:

13 February 2004, 04:58:40

There are 18321 records for 20040127 and so the estimate of 8839 for the = case is low but it still does the right
thing.

There are 227197 records between '20040127' and current_date so the estimate in the >= case is accurate but the
estimatefor the between case is an order of magnitude too low. However, it used the index I wanted and the >= case
didn't.

Regards,
David

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, 13 February 2004 16:38
To: David Witham
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] Index question

"David Witham" <davidw@unidial.com.au> writes:
> I understand that selecting count(*) will involve a scan at some
> stage, but I was surprised that the index wasn't used in the >= case,
> but was used in the between case.

Given the estimated row counts in your examples, the planner's choices
are not surprising.  You have not given us any information on whether
those estimates are accurate.
        regards, tom lane