Thread: Indexes?
I have a table where I need to use "..where curdate between fromDate and toDate". Is it best to have two indexes, one for FromDate and one for toDate or just one index for both the fields? Regards, BTJ -- ----------------------------------------------------------------------------------------------- Bjørn T Johansen btj@havleik.no ----------------------------------------------------------------------------------------------- Someone wrote: "I understand that if you play a Windows CD backwards you hear strange Satanic messages" To which someone replied: "It's even worse than that; play it forwards and it installs Windows" -----------------------------------------------------------------------------------------------
I believe that it is better to have a concatenated key of (toDate,FromDate). The reason the toDate should come first is that for more "recent" records, finding curDates less than toDate is much more selective than finding curDates greater than fromDate. Actually I'm not sure if fromDate is that helpful either as part of the concatenated key (it probably depends) but definitely not by itself. If your usual query is someEarlyHistoricalDate between toDate and fromDate, then the concatenated key should be (fromDate,toDate) instead. If toDate is sometimes not known, I would use some fixed date far in the future rather than a null. Vincent ----- Original Message ----- From: "Bjørn T Johansen" <btj@havleik.no> To: <pgsql-general@postgresql.org> Sent: Wednesday, December 01, 2004 10:11 PM Subject: [GENERAL] Indexes? >I have a table where I need to use "..where curdate between fromDate and >toDate". > Is it best to have two indexes, one for FromDate and one for toDate or > just one index for both the fields? > > > Regards, > > BTJ > > -- > ----------------------------------------------------------------------------------------------- > Bjørn T Johansen > > btj@havleik.no > ----------------------------------------------------------------------------------------------- > Someone wrote: > "I understand that if you play a Windows CD backwards you hear strange > Satanic messages" > To which someone replied: > "It's even worse than that; play it forwards and it installs Windows" > ----------------------------------------------------------------------------------------------- > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
On Thu, Dec 02, 2004 at 07:11:29 +0100, Bjørn T Johansen <btj@havleik.no> wrote: > I have a table where I need to use "..where curdate between fromDate and > toDate". > Is it best to have two indexes, one for FromDate and one for toDate or just > one index for both the fields? Assuming that curdate is something like the date when the query is being run and that FromDate and toDate are columns in the table you are searching, then you probably want indexes on each column. A combined index scan wouldn't be useful. An index scan on either FromDate or toDate might be useful depending on the distribution of values in those columns and the value of curdate.
On Wed, Dec 01, 2004 at 23:16:48 -0800, Vincent Hikida <vhikida@inreach.com> wrote: > I believe that it is better to have a concatenated key of > (toDate,FromDate). The reason the toDate should come first is that for more > "recent" records, finding curDates less than toDate is much more selective > than finding curDates greater than fromDate. Actually I'm not sure if > fromDate is that helpful either as part of the concatenated key (it > probably depends) but definitely not by itself. I combined index won't be very useful for the kind of search he is doing. And not having an index on FromDate could hurt in some cases depending on the distribution of values.
On Thu, Dec 02, 2004 at 07:11:29AM +0100, Bjørn T Johansen wrote: > I have a table where I need to use "..where curdate between fromDate and > toDate". > Is it best to have two indexes, one for FromDate and one for toDate or just > one index for both the fields? You could try it both ways and use EXPLAIN ANALYZE to see which results in a faster plan, if that's what you mean by "best." -- Michael Fuhr http://www.fuhr.org/~mfuhr/
Well, then it's decided to try with two indexes... Thx... :) BTJ Bruno Wolff III wrote: > On Wed, Dec 01, 2004 at 23:16:48 -0800, > Vincent Hikida <vhikida@inreach.com> wrote: > >>I believe that it is better to have a concatenated key of >>(toDate,FromDate). The reason the toDate should come first is that for more >>"recent" records, finding curDates less than toDate is much more selective >>than finding curDates greater than fromDate. Actually I'm not sure if >>fromDate is that helpful either as part of the concatenated key (it >>probably depends) but definitely not by itself. > > > I combined index won't be very useful for the kind of search he is doing. > And not having an index on FromDate could hurt in some cases depending > on the distribution of values. > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Perhaps I'm missing something but let's say that the index has the following: toDate fromDate 1992-03-02 1991-01-23 1992-04-03 1990-06-13 1993-05-03 1991-01-22 ... ... ... 2004-12-01 2003-02-22 2005-03-04 2003-02-22 (a) 2005-03-05 2004-12-15 (b) 2005-03-05 2004-06-18 (c) 2007-04-12 2005-06-18 (d) Let's say that there are a million entries where the toDate is less than today 2004-12-02. That is less than (a) in the index. From the index then only a, b, c, and d should be scanned further. a and c would be picked based on the index values because 2004-12-02 is between the from and end date. However, b and d would be excluded immediately because the the from date is greater than 2004-12-02 and would save the optimizer from even reading the table for these index entries because the fromDate is in the index. This may be a somewhat extreme example but my experience is in most systems old historical data makes up the bulk of the data and newer data is a much smaller amount. In addition most people are interested in data from the most recent month. Of course I may be mistaken about the data distribution. Vincent ----- Original Message ----- From: "Bruno Wolff III" <bruno@wolff.to> To: "Vincent Hikida" <vhikida@inreach.com> Cc: <pgsql-general@postgresql.org> Sent: Thursday, December 02, 2004 12:10 PM Subject: Re: [GENERAL] Indexes? > On Wed, Dec 01, 2004 at 23:16:48 -0800, > Vincent Hikida <vhikida@inreach.com> wrote: >> I believe that it is better to have a concatenated key of >> (toDate,FromDate). The reason the toDate should come first is that for >> more >> "recent" records, finding curDates less than toDate is much more >> selective >> than finding curDates greater than fromDate. Actually I'm not sure if >> fromDate is that helpful either as part of the concatenated key (it >> probably depends) but definitely not by itself. > > I combined index won't be very useful for the kind of search he is doing. > And not having an index on FromDate could hurt in some cases depending > on the distribution of values. > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
On Thu, Dec 02, 2004 at 22:37:38 -0800, Vincent Hikida <vhikida@inreach.com> wrote: > 2004-12-01 2003-02-22 > 2005-03-04 2003-02-22 (a) > 2005-03-05 2004-12-15 (b) > 2005-03-05 2004-06-18 (c) > 2007-04-12 2005-06-18 (d) > > Let's say that there are a million entries where the toDate is less than > today 2004-12-02. That is less than (a) in the index. From the index then > only a, b, c, and d should be scanned further. a and c would be picked That is correct, but that part relies only on the part of the index dependent on toDate. > based on the index values because 2004-12-02 is between the from and end > date. However, b and d would be excluded immediately because the the from > date is greater than 2004-12-02 and would save the optimizer from even > reading the table for these index entries because the fromDate is in the > index. That is not correct. Postgres currently doesn't have a way to skip ahead on an index scan. So what will happen is that a, b, c, d and will all be considered and b and d removed by a filter rule. > This may be a somewhat extreme example but my experience is in most systems > old historical data makes up the bulk of the data and newer data is a much > smaller amount. In addition most people are interested in data from the > most recent month. The idea of having a toDate index is good, it is just that having a multicolumn index doesn't help for this problem. In fact by making the index wider, it will slow things down. > Of course I may be mistaken about the data distribution. The distribution of values is what makes toDate or FromDate a better index (if any) to use. You may very well be correct that for most people toDate will more likely be the better index to use.
> >> based on the index values because 2004-12-02 is between the from and end >> date. However, b and d would be excluded immediately because the the >> from >> date is greater than 2004-12-02 and would save the optimizer from even >> reading the table for these index entries because the fromDate is in the >> index. > > That is not correct. Postgres currently doesn't have a way to skip ahead > on an index scan. So what will happen is that a, b, c, d and will all be > considered and b and d removed by a filter rule. > OK. I got it now. Thanks :)