Thread: Speeding up index scans by truncating timestamp?

Speeding up index scans by truncating timestamp?

From
Derrick Rice
Date:
Hey folks,

I've got a table of historical events that 10 million rows over 18+ months.  Currently there is an index for the event timestamp.  I'm wondering if someone can respond to my curiosity regarding the performance of indexes (BTree).

Would creating an index on the timestamp truncated to the *day* make the index more efficient for queries which are interested in events falling in a range of 7+ days?

How much would it hurt when the planner needs to tease out rows which fall within the day range but not the real time range?

Would having both indexes allow the planner to index by the day for entirely-included days and by the actual time for the boundary days?

The performance of the index isn't terrible.  It can take up to cost 15,000 (which I assume is 15 seconds) for a query looking at 5 weeks of data.  I'd like to improve it if it's simple to do, and changing the index to be on the truncated time seems like it could possibly be useful.

Derrick

Re: Speeding up index scans by truncating timestamp?

From
Derrick Rice
Date:
Would creating an index on the timestamp truncated to the *day* make the index more efficient for queries which are interested in events falling in a range of 7+ days?

I gave this a shot, changing the index to be on date_trunc('day', timestamp).  PostgreSQL (8.2) then decided not to use the index at all. Not sure if this is a limitation of PostgreSQL or if it decided it was going to be more expensive than the alternative.

Is the query optimizer capable of using the relationship between an index on date_trunc(foo) and a query with "where foo < bar and foo > baz" ?  At this point the question is to satisfy my own curiosity.

Derrick

Re: Speeding up index scans by truncating timestamp?

From
Michael Glaesemann
Date:
On Feb 15, 2011, at 10:00, Derrick Rice wrote:

>>
>> Would creating an index on the timestamp truncated to the *day* make the
>> index more efficient for queries which are interested in events falling in a
>> range of 7+ days?
>>
>
> I gave this a shot, changing the index to be on date_trunc('day',
> timestamp).  PostgreSQL (8.2) then decided not to use the index at all. Not
> sure if this is a limitation of PostgreSQL or if it decided it was going to
> be more expensive than the alternative.
>
> Is the query optimizer capable of using the relationship between an index on
> date_trunc(foo) and a query with "where foo < bar and foo > baz" ?  At this
> point the question is to satisfy my own curiosity.

Providing the table definition, queries, and EXPLAIN and EXPLAIN ANALYZE output would help people provide more specific
feedback.

Michael Glaesemann
grzm seespotcode net




Re: Speeding up index scans by truncating timestamp?

From
Vick Khera
Date:
On Tue, Feb 15, 2011 at 10:00 AM, Derrick Rice <derrick.rice@gmail.com> wrote:
> Is the query optimizer capable of using the relationship between an index on
> date_trunc(foo) and a query with "where foo < bar and foo > baz" ?  At this
> point the question is to satisfy my own curiosity.

No. The query has to use the same function as the index does.

I also don't think that the storage space will be any less.  A
timestamp is always stored in the same amount of space. All you're
doing is zeroing out the higher resolution bits of time.

Re: Speeding up index scans by truncating timestamp?

From
Derrick Rice
Date:
On Tue, Feb 15, 2011 at 10:16 AM, Michael Glaesemann <grzm@seespotcode.net> wrote:
Providing the table definition, queries, and EXPLAIN and EXPLAIN ANALYZE output would help people provide more specific feedback.

Seemed a general enough question that it wasn't necessary.  That, and I wanted to figure out as much of it on my own as I could, rather than just get the end-result of an expert's answer without all the knowledge of the leg work.

Thank you, though.

On Tue, Feb 15, 2011 at 10:20 AM, Vick Khera <vivek@khera.org> wrote:
On Tue, Feb 15, 2011 at 10:00 AM, Derrick Rice <derrick.rice@gmail.com> wrote:
> Is the query optimizer capable of using the relationship between an index on
> date_trunc(foo) and a query with "where foo < bar and foo > baz" ?  At this
> point the question is to satisfy my own curiosity.

No. The query has to use the same function as the index does.

Well that settles it, then.  Thanks, Vick.
 

I also don't think that the storage space will be any less.  A
timestamp is always stored in the same amount of space. All you're
doing is zeroing out the higher resolution bits of time.


It's been a while since I was intimate with the implementation of a btree.  I was guessing that it might make the tree more "dense" if there were more values that were equal.  A "dense" tree would be easier to scan when grabbing all of the children of a particular node (which is the case when doing a wide range comparison).

The little bit of review that I was able to do reminded me that equal or unequal values don't make a tree more or less "dense".  It could arguably make inserts easier (because there's more acceptable places to put an item) but each node will have n to m items regardless of their relationship to each other.

On that topic... are the details of PostgreSQL's b-tree implementation found anywhere outside of the code?  i.e. what n,m-tree values it uses?  Searched docs and wiki with no luck.

Having fun relearning this stuff

Derrick

Re: Speeding up index scans by truncating timestamp?

From
Alban Hertroys
Date:
On 15 Feb 2011, at 16:20, Vick Khera wrote:

> On Tue, Feb 15, 2011 at 10:00 AM, Derrick Rice <derrick.rice@gmail.com> wrote:
>> Is the query optimizer capable of using the relationship between an index on
>> date_trunc(foo) and a query with "where foo < bar and foo > baz" ?  At this
>> point the question is to satisfy my own curiosity.

> I also don't think that the storage space will be any less.  A
> timestamp is always stored in the same amount of space. All you're
> doing is zeroing out the higher resolution bits of time.


I suppose it would help to cast the column to date, both in the index creation and in the queries. I don't have time
rightnow to verify that the storage space of a date is actually less than a timestamp, but I expect it would be
(although,dates in PG have an awful lot of range!). 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4d5adbbf11734598963003!