On Thu, Dec 02, 2004 at 22:37:38 -0800,
Vincent Hikida <vhikida@inreach.com> wrote:
> 2004-12-01 2003-02-22
> 2005-03-04 2003-02-22 (a)
> 2005-03-05 2004-12-15 (b)
> 2005-03-05 2004-06-18 (c)
> 2007-04-12 2005-06-18 (d)
>
> Let's say that there are a million entries where the toDate is less than
> today 2004-12-02. That is less than (a) in the index. From the index then
> only a, b, c, and d should be scanned further. a and c would be picked
That is correct, but that part relies only on the part of the index dependent
on toDate.
> based on the index values because 2004-12-02 is between the from and end
> date. However, b and d would be excluded immediately because the the from
> date is greater than 2004-12-02 and would save the optimizer from even
> reading the table for these index entries because the fromDate is in the
> index.
That is not correct. Postgres currently doesn't have a way to skip ahead
on an index scan. So what will happen is that a, b, c, d and will all be
considered and b and d removed by a filter rule.
> This may be a somewhat extreme example but my experience is in most systems
> old historical data makes up the bulk of the data and newer data is a much
> smaller amount. In addition most people are interested in data from the
> most recent month.
The idea of having a toDate index is good, it is just that having a multicolumn
index doesn't help for this problem. In fact by making the index wider, it
will slow things down.
> Of course I may be mistaken about the data distribution.
The distribution of values is what makes toDate or FromDate a better index
(if any) to use. You may very well be correct that for most people toDate
will more likely be the better index to use.