Thread: Problem Designing Index

Problem Designing Index

From

Alan J Batsford

Date:

13 August 2007, 08:57:03


Hello,

I'm doing some select statements on my table that look like:
SELECT * FROM table WHERE prod_num = '1234567' AND transaction_timestamp >
'2007-07-18 21:29:57' OR prod_num > '1234567' ORDER BY prod_num ASC,
transaction_timestamp ASC LIMIT 1;

I've added two indices one for prod_num and another transaction_timestamp.
This table has 151,000 rows and the above statement returns in less than a
millisecond. If I change the above statement from '>' to '<' it takes 8
seconds to complete. Prod_num '1234567' is towards the end of the 151k
rows. If i use a prod_num like '0000123' towards the front the problem is
reversed with '>' and '<'.

I tried adding a third index that uses both prod_num and
transaction_timestamp. The average performance at each end of the data for
both '>' and '<' improved but the problem wasn't resolved. Selects at the
end of the data with '>' conditions (Like the original statement) then
become broken and take 500 ms to finish, which is unacceptable for the
application.

I did analyze on the table with no effect.

Is it possible to design an index that can account for all the scenerios?
Thanks for any help you can provide.

-Alan

Re: Problem Designing Index

From

Bill Moran

Date:

13 August 2007, 09:36:37

In response to Alan J Batsford <AJBatsford@uss.com>:
>
> Hello,
>
> I'm doing some select statements on my table that look like:
> SELECT * FROM table WHERE prod_num = '1234567' AND transaction_timestamp >
> '2007-07-18 21:29:57' OR prod_num > '1234567' ORDER BY prod_num ASC,
> transaction_timestamp ASC LIMIT 1;
>
> I've added two indices one for prod_num and another transaction_timestamp.
> This table has 151,000 rows and the above statement returns in less than a
> millisecond. If I change the above statement from '>' to '<' it takes 8
> seconds to complete. Prod_num '1234567' is towards the end of the 151k
> rows. If i use a prod_num like '0000123' towards the front the problem is
> reversed with '>' and '<'.
>
> I tried adding a third index that uses both prod_num and
> transaction_timestamp. The average performance at each end of the data for
> both '>' and '<' improved but the problem wasn't resolved. Selects at the
> end of the data with '>' conditions (Like the original statement) then
> become broken and take 500 ms to finish, which is unacceptable for the
> application.
>
> I did analyze on the table with no effect.
>
> Is it possible to design an index that can account for all the scenerios?
> Thanks for any help you can provide.

While it's difficult to be sure, I'm guessing you have either a hardware
problem, or a tuning problem -- but I don't think your indexes are a problem.

Keep in mind that once PostgreSQL has determined which rows to return, it
has to actually read all those rows off disk and send them to the client
application.  In my opinion, 8 seconds to read in over 100,000 rows isn't
unreasonable (especially if those rows are wide).

If 8 seconds is an unacceptable time, then you're liable to need hardware to
fix it: more RAM to cache those rows, or faster disks or both.

However, this is just speculation.  You didn't provide analyze output, table
schema, hardware details, or configuration information ... so it's entirely
possible that there is something else wrong.  I'm just making an educated
guess.

--
Bill Moran
http://www.potentialtech.com

Re: Problem Designing Index

From

Alan J Batsford

Date:

13 August 2007, 10:38:45

pgsql-general-owner@postgresql.org wrote on 08/13/2007 08:36:23 AM:

> While it's difficult to be sure, I'm guessing you have either a hardware
> problem, or a tuning problem -- but I don't think your indexes are a
problem.
>
> Keep in mind that once PostgreSQL has determined which rows to return, it
> has to actually read all those rows off disk and send them to the client
> application.  In my opinion, 8 seconds to read in over 100,000 rows isn't
> unreasonable (especially if those rows are wide).
>
> If 8 seconds is an unacceptable time, then you're liable to need hardware
to
> fix it: more RAM to cache those rows, or faster disks or both.
>
> However, this is just speculation.  You didn't provide analyze output,
table
> schema, hardware details, or configuration information ... so it's
entirely
> possible that there is something else wrong.  I'm just making an educated
> guess.

Thanks for the help, after your email I went to capture some analyze output
for you and when I did I figured to bump up the statistics on the two
columns of interest from 100 to 1000. Now all statements return close to
instantly.

I originally thought it was the index because I could make an index that
yielded great performance for each type of select I was doing, but never
for all of them at once. To answer your question about hardware the CPU is
a xeon with 3GB of ram. I am unsure of the exact speed of the HDD but I'm
certain its high performance.

Is this analyze tool something I need to run periodically to keep
performance up? If so how often should I run it.

Re: Problem Designing Index

From

Gregory Stark

Date:

13 August 2007, 11:10:05

"Alan J Batsford" <AJBatsford@uss.com> writes:

> Thanks for the help, after your email I went to capture some analyze output
> for you and when I did I figured to bump up the statistics on the two
> columns of interest from 100 to 1000. Now all statements return close to
> instantly.

Note that 1000 can take quite a lot of space in the statistics table. Make
sure it's vacuumed regularly and check that this isn't slowing down planning
of simple queries excessively.

Look at the explain analyze and check that the estimates are reasonably
accurate. They may have just flipped from being wildly inaccurate on the wrong
side of the decision point to wildly inaccurate but on the right side of the
decision point.

> Is this analyze tool something I need to run periodically to keep
> performance up? If so how often should I run it.

Yes. autovacuum likes to do it whenever 10% of the table has been updated, but
your mileage will vary considerably depending on how much your updates or
other DML affects the distribution which the queries are depending on.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com