Thread: Statistics issue

Statistics issue

From
Vlad Arkhipov
Date:
I have a big table that is used in many queries. Most used index is
created on date field. Number of records in this table when date field
is saturday is about 5 times smaller than other days, on sunday this
number is always 0. Statistics target is 1000. Many queries have
problems when condition on this table looks like "d between '2007-05-12'
and '2007-05-12'" (saturday).

EXPLAIN ANALYZE
SELECT *
FROM i
WHERE d BETWEEN '2007-05-12' AND '2007-05-12'

Index Scan using i_d on i  (cost=0.00..2.39 rows=1 width=402) (actual
time=0.053..4.284 rows=1721 loops=1)
  Index Cond: ((d >= '2007-05-12'::date) AND (d <= '2007-05-12'::date))
Total runtime: 6.645 ms

EXPLAIN ANALYZE
SELECT *
FROM i
WHERE d = '2007-05-12'

Index Scan using i_d on i  (cost=0.00..38.97 rows=1572 width=402)
(actual time=0.044..4.250 rows=1721 loops=1)
  Index Cond: (d = '2007-05-12'::date)
Total runtime: 6.619 ms

Is there a way to solve this problem?


Re: Statistics issue

From
Tom Lane
Date:
Vlad Arkhipov <arhipov@dc.baikal.ru> writes:
> EXPLAIN ANALYZE
> SELECT *
> FROM i
> WHERE d BETWEEN '2007-05-12' AND '2007-05-12'

> Index Scan using i_d on i  (cost=0.00..2.39 rows=1 width=402) (actual
> time=0.053..4.284 rows=1721 loops=1)
>   Index Cond: ((d >= '2007-05-12'::date) AND (d <= '2007-05-12'::date))
> Total runtime: 6.645 ms

> EXPLAIN ANALYZE
> SELECT *
> FROM i
> WHERE d = '2007-05-12'

> Index Scan using i_d on i  (cost=0.00..38.97 rows=1572 width=402)
> (actual time=0.044..4.250 rows=1721 loops=1)
>   Index Cond: (d = '2007-05-12'::date)
> Total runtime: 6.619 ms

Hmm, I wonder whether we shouldn't do something like this
http://archives.postgresql.org/pgsql-committers/2008-03/msg00128.php
for all range conditions, not just those made up by
prefix_selectivity().

            regards, tom lane