Thread: Indexes?

Indexes?

From
Bjørn T Johansen
Date:
I have a table where I need to use "..where curdate between fromDate and toDate".
Is it best to have two indexes, one for FromDate and one for toDate or just one index for
both the fields?


Regards,

BTJ

--
-----------------------------------------------------------------------------------------------
Bjørn T Johansen

btj@havleik.no
-----------------------------------------------------------------------------------------------
Someone wrote:
"I understand that if you play a Windows CD backwards you hear strange Satanic messages"
To which someone replied:
"It's even worse than that; play it forwards and it installs Windows"
-----------------------------------------------------------------------------------------------

Re: Indexes?

From
"Vincent Hikida"
Date:
I believe that it is better to have a concatenated key of (toDate,FromDate).
The reason the toDate should come first is that for more "recent" records,
finding curDates less than toDate is much more selective than finding
curDates greater than fromDate. Actually I'm not sure if fromDate is that
helpful either as part of the concatenated key (it probably depends) but
definitely not by itself.

If your usual query is someEarlyHistoricalDate between toDate and fromDate,
then the concatenated key should be (fromDate,toDate) instead.

If toDate is sometimes not known, I would use some fixed date far in the
future rather than a null.


Vincent
----- Original Message -----
From: "Bjørn T Johansen" <btj@havleik.no>
To: <pgsql-general@postgresql.org>
Sent: Wednesday, December 01, 2004 10:11 PM
Subject: [GENERAL] Indexes?


>I have a table where I need to use "..where curdate between fromDate and
>toDate".
> Is it best to have two indexes, one for FromDate and one for toDate or
> just one index for both the fields?
>
>
> Regards,
>
> BTJ
>
> --
> -----------------------------------------------------------------------------------------------
> Bjørn T Johansen
>
> btj@havleik.no
> -----------------------------------------------------------------------------------------------
> Someone wrote:
> "I understand that if you play a Windows CD backwards you hear strange
> Satanic messages"
> To which someone replied:
> "It's even worse than that; play it forwards and it installs Windows"
> -----------------------------------------------------------------------------------------------
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>               http://archives.postgresql.org
>


Re: Indexes?

From
Bruno Wolff III
Date:
On Thu, Dec 02, 2004 at 07:11:29 +0100,
  Bjørn T Johansen <btj@havleik.no> wrote:
> I have a table where I need to use "..where curdate between fromDate and
> toDate".
> Is it best to have two indexes, one for FromDate and one for toDate or just
> one index for both the fields?

Assuming that curdate is something like the date when the query is being
run and that FromDate and toDate are columns in the table you are
searching, then you probably want indexes on each column. A combined
index scan wouldn't be useful. An index scan on either FromDate or toDate
might be useful depending on the distribution of values in those columns
and the value of curdate.

Re: Indexes?

From
Bruno Wolff III
Date:
On Wed, Dec 01, 2004 at 23:16:48 -0800,
  Vincent Hikida <vhikida@inreach.com> wrote:
> I believe that it is better to have a concatenated key of
> (toDate,FromDate). The reason the toDate should come first is that for more
> "recent" records, finding curDates less than toDate is much more selective
> than finding curDates greater than fromDate. Actually I'm not sure if
> fromDate is that helpful either as part of the concatenated key (it
> probably depends) but definitely not by itself.

I combined index won't be very useful for the kind of search he is doing.
And not having an index on FromDate could hurt in some cases depending
on the distribution of values.

Re: Indexes?

From
Michael Fuhr
Date:
On Thu, Dec 02, 2004 at 07:11:29AM +0100, Bjørn T Johansen wrote:

> I have a table where I need to use "..where curdate between fromDate and
> toDate".
> Is it best to have two indexes, one for FromDate and one for toDate or just
> one index for both the fields?

You could try it both ways and use EXPLAIN ANALYZE to see which
results in a faster plan, if that's what you mean by "best."

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: Indexes?

From
Bjørn T Johansen
Date:
Well, then it's decided to try with two indexes...

Thx... :)


BTJ

Bruno Wolff III wrote:
> On Wed, Dec 01, 2004 at 23:16:48 -0800,
>   Vincent Hikida <vhikida@inreach.com> wrote:
>
>>I believe that it is better to have a concatenated key of
>>(toDate,FromDate). The reason the toDate should come first is that for more
>>"recent" records, finding curDates less than toDate is much more selective
>>than finding curDates greater than fromDate. Actually I'm not sure if
>>fromDate is that helpful either as part of the concatenated key (it
>>probably depends) but definitely not by itself.
>
>
> I combined index won't be very useful for the kind of search he is doing.
> And not having an index on FromDate could hurt in some cases depending
> on the distribution of values.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: Indexes?

From
"Vincent Hikida"
Date:
Perhaps I'm missing something but let's say that the index has the
following:

toDate           fromDate
1992-03-02  1991-01-23
1992-04-03  1990-06-13
1993-05-03  1991-01-22
...
...
...
2004-12-01  2003-02-22
2005-03-04  2003-02-22          (a)
2005-03-05  2004-12-15          (b)
2005-03-05  2004-06-18          (c)
2007-04-12  2005-06-18          (d)

Let's say that there are a million entries where the toDate is less than
today 2004-12-02. That is less than (a) in the index. From the index then
only a, b, c, and d should be scanned further. a and c would be picked based
on the index values because 2004-12-02 is between the from and end date.
However, b and d would be excluded immediately because the the from date is
greater than 2004-12-02 and would save the optimizer from even reading the
table for these index entries because the fromDate is in the index.

This may be a somewhat extreme example but my experience is in most systems
old historical data makes up the bulk of the data and newer data is a much
smaller amount. In addition most people are interested in data from the most
recent month.

Of course I may be mistaken about the data distribution.

Vincent
----- Original Message -----
From: "Bruno Wolff III" <bruno@wolff.to>
To: "Vincent Hikida" <vhikida@inreach.com>
Cc: <pgsql-general@postgresql.org>
Sent: Thursday, December 02, 2004 12:10 PM
Subject: Re: [GENERAL] Indexes?


> On Wed, Dec 01, 2004 at 23:16:48 -0800,
>  Vincent Hikida <vhikida@inreach.com> wrote:
>> I believe that it is better to have a concatenated key of
>> (toDate,FromDate). The reason the toDate should come first is that for
>> more
>> "recent" records, finding curDates less than toDate is much more
>> selective
>> than finding curDates greater than fromDate. Actually I'm not sure if
>> fromDate is that helpful either as part of the concatenated key (it
>> probably depends) but definitely not by itself.
>
> I combined index won't be very useful for the kind of search he is doing.
> And not having an index on FromDate could hurt in some cases depending
> on the distribution of values.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>


Re: Indexes?

From
Bruno Wolff III
Date:
On Thu, Dec 02, 2004 at 22:37:38 -0800,
  Vincent Hikida <vhikida@inreach.com> wrote:
> 2004-12-01  2003-02-22
> 2005-03-04  2003-02-22          (a)
> 2005-03-05  2004-12-15          (b)
> 2005-03-05  2004-06-18          (c)
> 2007-04-12  2005-06-18          (d)
>
> Let's say that there are a million entries where the toDate is less than
> today 2004-12-02. That is less than (a) in the index. From the index then
> only a, b, c, and d should be scanned further. a and c would be picked

That is correct, but that part relies only on the part of the index dependent
on toDate.

> based on the index values because 2004-12-02 is between the from and end
> date. However, b and d would be excluded immediately because the the from
> date is greater than 2004-12-02 and would save the optimizer from even
> reading the table for these index entries because the fromDate is in the
> index.

That is not correct. Postgres currently doesn't have a way to skip ahead
on an index scan. So what will happen is that a, b, c, d and will all be
considered and b and d removed by a filter rule.

> This may be a somewhat extreme example but my experience is in most systems
> old historical data makes up the bulk of the data and newer data is a much
> smaller amount. In addition most people are interested in data from the
> most recent month.

The idea of having a toDate index is good, it is just that having a multicolumn
index doesn't help for this problem. In fact by making the index wider, it
will slow things down.

> Of course I may be mistaken about the data distribution.

The distribution of values is what makes toDate or FromDate a better index
(if any) to use. You may very well be correct that for most people toDate
will more likely be the better index to use.

Re: Indexes?

From
vhikida@inreach.com
Date:
>
>> based on the index values because 2004-12-02 is between the from and end
>> date. However, b and d would be excluded immediately because the the
>> from
>> date is greater than 2004-12-02 and would save the optimizer from even
>> reading the table for these index entries because the fromDate is in the
>> index.
>
> That is not correct. Postgres currently doesn't have a way to skip ahead
> on an index scan. So what will happen is that a, b, c, d and will all be
> considered and b and d removed by a filter rule.
>
OK. I got it now. Thanks :)