Thread: help with dual indexing

help with dual indexing

From
Orion Henry
Date:
I've got a table with about 10 million events in it.

Each has a user_id (about 1000 users) and a event_time timestamp
covering a 4 year period with about 50% of the events being in the last
year.  Some users have only dozens of events.  A few have hundreds of
thousands.

The queries usually are in the form of, where "user_id = something and
event_time between something and something".

Half of my queries index off of the user_id and half index off the
event_time.  I was thinking this would be a perfect opportunity to use a
dual index of (user_id,event_time) but I'm confused as to weather this
will help considering the size of this index given that there very few
tuples that have the exact same timestamp as another and I'm not sure
which order to put the user_id/event_time as I don't know what is meant
when people on this list ask which is more selective.

Also, would it make sense for me to raise my ANALYZE value and how would
I go about doing this?

Thanks for the help.


--
Orion Henry <orion@trustcommerce.com>

Attachment

Re: help with dual indexing

From
Tom Lane
Date:
Orion Henry <orion@trustcommerce.com> writes:
> The queries usually are in the form of, where "user_id = something and
> event_time between something and something".

> Half of my queries index off of the user_id and half index off the
> event_time.  I was thinking this would be a perfect opportunity to use a
> dual index of (user_id,event_time) but I'm confused as to weather this
> will help

Probably.  Put the user_id as the first column of the index --- if you
think about the sort ordering of a multicolumn index, you will see why.
With user_id first, a constraint as above describes a contiguous
subrange of the index; with event_time first it does not.

            regards, tom lane

Re: help with dual indexing

From
Orion Henry
Date:
Thanks Tom!  You're a life-saver.

On Fri, 2004-01-23 at 17:08, Tom Lane wrote:
> Orion Henry <orion@trustcommerce.com> writes:
> > The queries usually are in the form of, where "user_id = something and
> > event_time between something and something".
>
> > Half of my queries index off of the user_id and half index off the
> > event_time.  I was thinking this would be a perfect opportunity to use a
> > dual index of (user_id,event_time) but I'm confused as to weather this
> > will help
>
> Probably.  Put the user_id as the first column of the index --- if you
> think about the sort ordering of a multicolumn index, you will see why.
> With user_id first, a constraint as above describes a contiguous
> subrange of the index; with event_time first it does not.
>
>             regards, tom lane
--
Orion Henry <orion@trustcommerce.com>

Attachment