Home > mailing lists

Re: Indexing Strategy - Mailing list pgsql-novice

From	Matthew Engelbert
Subject	Re: Indexing Strategy
Date	December 11, 2004 10:44:01
Msg-id	20041211005702.75029.qmail@web60302.mail.yahoo.com Whole thread Raw
In response to	Indexing Strategy (Matthew Engelbert <mje1975@yahoo.com>)
Responses	Re: Indexing Strategy
List	pgsql-novice

Tree view

The select query will be like this:

select * from quote where sybmol = 'MSFT' and extract(doy from time) = 340
order by time

I need to select all the records for a particular symbol, for a particular day,
ordered by time.  I might need to select the records in the same way, but for a
shorter time interval, like some hour during the day.  I'm using Postgresql
version 7.4.

Other info I forgot to mention.... this data is historical, and so I won't be
doing lots of inserts, updates or deletes - I'll just load a big chunk of data
once in a while, and rebuild the index.  I realize that I can also add a lookup
table for symbol, as there are only about 150 unique values.  Time values are
about 99% unique.  Yes, I'm sure I need to tune the db better.  I'm new to
Postgresql.  Thanks for the help.

--- Josh Berkus <josh@agliodbs.com> wrote:

> Matthew,
>
> > I'm having trouble correctly indexing a large table.  The table will be
> > receiving about 30 million rows/day, and has the following definition:
> >
> > CREATE TABLE quote (
> >         symbol  varchar (5),
> >         price   numeric (7,4),
> >         time timestamp
> >         );
> >
> > I need to configure the indices so that I can minimize the time to request
> > all the records for a particular symbol, for a particular day, ordered by
> > time.
>
> Please post the query you're using.   How are you selecting the day?   What
> version of PostgreSQL are you using?
>
> > I've tried clustering a single index based on symbol, then time, but this
> > operation takes much too long.  I think this is because this operation
> > would have O(N^2) complexity(?).
>
> Nope,  just because it's a very large table and you're probably swapping.
> Try
> increasing sort_mem dramatically, like half your RAM.
>
> > I think my next best option would be to
> > cluster on an index based on symbol, then have another index on time.
>
> Not that either.   You need a multicolumn index.  When you answer the above
> quesitons, I'll explain it.
>
> > Does any one know of some resources on the web that discusses indexing
> > strategies ?  Any help would be appreciated?  Thanks.
>
> Not that covers your question.   I'm writing a book which covers this right
> now, but it won't be published until January 2006.
>
> --
> --Josh
>
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>

pgsql-novice by date:

From: Michael Fuhr
Date: 11 December 2004, 10:29:16
Subject: Re: Get comment

From: Michael Fuhr
Date: 11 December 2004, 10:44:08
Subject: Re: Version 8 & column order

Re: Indexing Strategy - Mailing list pgsql-novice

Previous

Next