Thread: Index question
Hi all, I have a table with around 3M records in it and a few indexes on it. One of them is on the day column. I get 10-20K new recordsa day. After running ANALYSE in psql I tried the following queries: buns=# explain select count(*) from cdr where day >= '20040127'; QUERY PLAN ------------------------------------------------------------------Aggregate (cost=85596.50..85596.50 rows=1 width=0) -> Seq Scan on cdr (cost=0.00..85053.86 rows=217055 width=0) Filter: ("day" >= '2004-01-27'::date) (3 rows) buns=# explain select count(*) from cdr where day = '20040127'; QUERY PLAN --------------------------------------------------------------------------------Aggregate (cost=12950.10..12950.10 rows=1width=0) -> Index Scan using cdr_ix1 on cdr (cost=0.00..12928.00 rows=8839 width=0) Index Cond: ("day" ='2004-01-27'::date) (3 rows) buns=# explain select count(*) from cdr where day between '20040127' and current_date; QUERY PLAN ----------------------------------------------------------------------------------------Aggregate (cost=20129.91..20129.91rows=1 width=0) -> Index Scan using cdr_ix1 on cdr (cost=0.00..20095.66 rows=13699 width=0) Index Cond: (("day" >= '2004-01-27'::date) AND ("day" <= ('now'::text)::date)) (3 rows) I understand that selecting count(*) will involve a scan at some stage, but I was surprised that the index wasn't used inthe >= case, but was used in the between case. Why is this so? Do I need to ANALYSE some more or is this just the way the query planner works? Thanks, David David Witham Telephony Platforms Architect Unidial Ph: 03 8628 3383 Fax: 03 8628 3399
"David Witham" <davidw@unidial.com.au> writes: > I understand that selecting count(*) will involve a scan at some > stage, but I was surprised that the index wasn't used in the >= case, > but was used in the between case. Given the estimated row counts in your examples, the planner's choices are not surprising. You have not given us any information on whether those estimates are accurate. regards, tom lane
There are 18321 records for 20040127 and so the estimate of 8839 for the = case is low but it still does the right thing. There are 227197 records between '20040127' and current_date so the estimate in the >= case is accurate but the estimatefor the between case is an order of magnitude too low. However, it used the index I wanted and the >= case didn't. Regards, David -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Friday, 13 February 2004 16:38 To: David Witham Cc: pgsql-sql@postgresql.org Subject: Re: [SQL] Index question "David Witham" <davidw@unidial.com.au> writes: > I understand that selecting count(*) will involve a scan at some > stage, but I was surprised that the index wasn't used in the >= case, > but was used in the between case. Given the estimated row counts in your examples, the planner's choices are not surprising. You have not given us any information on whether those estimates are accurate. regards, tom lane