Thread: Problem Designing Index
Hello, I'm doing some select statements on my table that look like: SELECT * FROM table WHERE prod_num = '1234567' AND transaction_timestamp > '2007-07-18 21:29:57' OR prod_num > '1234567' ORDER BY prod_num ASC, transaction_timestamp ASC LIMIT 1; I've added two indices one for prod_num and another transaction_timestamp. This table has 151,000 rows and the above statement returns in less than a millisecond. If I change the above statement from '>' to '<' it takes 8 seconds to complete. Prod_num '1234567' is towards the end of the 151k rows. If i use a prod_num like '0000123' towards the front the problem is reversed with '>' and '<'. I tried adding a third index that uses both prod_num and transaction_timestamp. The average performance at each end of the data for both '>' and '<' improved but the problem wasn't resolved. Selects at the end of the data with '>' conditions (Like the original statement) then become broken and take 500 ms to finish, which is unacceptable for the application. I did analyze on the table with no effect. Is it possible to design an index that can account for all the scenerios? Thanks for any help you can provide. -Alan
In response to Alan J Batsford <AJBatsford@uss.com>: > > Hello, > > I'm doing some select statements on my table that look like: > SELECT * FROM table WHERE prod_num = '1234567' AND transaction_timestamp > > '2007-07-18 21:29:57' OR prod_num > '1234567' ORDER BY prod_num ASC, > transaction_timestamp ASC LIMIT 1; > > I've added two indices one for prod_num and another transaction_timestamp. > This table has 151,000 rows and the above statement returns in less than a > millisecond. If I change the above statement from '>' to '<' it takes 8 > seconds to complete. Prod_num '1234567' is towards the end of the 151k > rows. If i use a prod_num like '0000123' towards the front the problem is > reversed with '>' and '<'. > > I tried adding a third index that uses both prod_num and > transaction_timestamp. The average performance at each end of the data for > both '>' and '<' improved but the problem wasn't resolved. Selects at the > end of the data with '>' conditions (Like the original statement) then > become broken and take 500 ms to finish, which is unacceptable for the > application. > > I did analyze on the table with no effect. > > Is it possible to design an index that can account for all the scenerios? > Thanks for any help you can provide. While it's difficult to be sure, I'm guessing you have either a hardware problem, or a tuning problem -- but I don't think your indexes are a problem. Keep in mind that once PostgreSQL has determined which rows to return, it has to actually read all those rows off disk and send them to the client application. In my opinion, 8 seconds to read in over 100,000 rows isn't unreasonable (especially if those rows are wide). If 8 seconds is an unacceptable time, then you're liable to need hardware to fix it: more RAM to cache those rows, or faster disks or both. However, this is just speculation. You didn't provide analyze output, table schema, hardware details, or configuration information ... so it's entirely possible that there is something else wrong. I'm just making an educated guess. -- Bill Moran http://www.potentialtech.com
pgsql-general-owner@postgresql.org wrote on 08/13/2007 08:36:23 AM: > While it's difficult to be sure, I'm guessing you have either a hardware > problem, or a tuning problem -- but I don't think your indexes are a problem. > > Keep in mind that once PostgreSQL has determined which rows to return, it > has to actually read all those rows off disk and send them to the client > application. In my opinion, 8 seconds to read in over 100,000 rows isn't > unreasonable (especially if those rows are wide). > > If 8 seconds is an unacceptable time, then you're liable to need hardware to > fix it: more RAM to cache those rows, or faster disks or both. > > However, this is just speculation. You didn't provide analyze output, table > schema, hardware details, or configuration information ... so it's entirely > possible that there is something else wrong. I'm just making an educated > guess. Thanks for the help, after your email I went to capture some analyze output for you and when I did I figured to bump up the statistics on the two columns of interest from 100 to 1000. Now all statements return close to instantly. I originally thought it was the index because I could make an index that yielded great performance for each type of select I was doing, but never for all of them at once. To answer your question about hardware the CPU is a xeon with 3GB of ram. I am unsure of the exact speed of the HDD but I'm certain its high performance. Is this analyze tool something I need to run periodically to keep performance up? If so how often should I run it.
"Alan J Batsford" <AJBatsford@uss.com> writes: > Thanks for the help, after your email I went to capture some analyze output > for you and when I did I figured to bump up the statistics on the two > columns of interest from 100 to 1000. Now all statements return close to > instantly. Note that 1000 can take quite a lot of space in the statistics table. Make sure it's vacuumed regularly and check that this isn't slowing down planning of simple queries excessively. Look at the explain analyze and check that the estimates are reasonably accurate. They may have just flipped from being wildly inaccurate on the wrong side of the decision point to wildly inaccurate but on the right side of the decision point. > Is this analyze tool something I need to run periodically to keep > performance up? If so how often should I run it. Yes. autovacuum likes to do it whenever 10% of the table has been updated, but your mileage will vary considerably depending on how much your updates or other DML affects the distribution which the queries are depending on. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com