Thread: Terrible performance after deleting/recreating indexes
Hi, Using PostgreSQL 7.4.2 on Solaris. I'm trying to improve performance on some queries to my databases so I wanted to try out various index structures. Since I'm going to be running my performance tests repeatedly, I created some SQL scripts to delete and recreate various index configurations. One of the scripts contains the commands for recreating the 'original' index configuration (i.e. the one I've already got some numbers for). Only thing is now when I delete and recreate the original indexes then run the query, I'm finding the performance has gone completely down the tubes compared to what I originally had. A query that used to take 5 minutes to complete now takes hours to complete. For what it's worth my query looks something like: select * from tbl_1, tbl_2 where tbl_1.id = tbl_2.id and tbl_2.name like 'x%y%' and tbl_1.x > 1234567890123 order by tbl_1.x; tbl_1 is very big (> 2 million rows) tbl_2 is relatively small (7000 or so rows) tbl_1.x is a numeric(13) tbl_1.id & tbl_2.id are integers tbl_2.name is a varchar(64) I've run 'VACUUM ANALYZE' on both tables involved in the query. I also used 'EXPLAIN' and observed that the query plan is completely changed from what it was originally. Any idea why this would be? I would have thougth that a freshly created index would have better performance not worse. I have not done any inserts or updates since recreating the indexes. thanks in advance, Bill C __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Bill Chandler wrote: > Hi, > > Using PostgreSQL 7.4.2 on Solaris. I'm trying to > improve performance on some queries to my databases so > I wanted to try out various index structures. > > Since I'm going to be running my performance tests > repeatedly, I created some SQL scripts to delete and > recreate various index configurations. One of the > scripts contains the commands for recreating the > 'original' index configuration (i.e. the one I've > already got some numbers for). Only thing is now > when I delete and recreate the original indexes then > run the query, I'm finding the performance has gone > completely down the tubes compared to what I > originally had. A query that used to take 5 minutes > to complete now takes hours to complete. > > For what it's worth my query looks something like: > > select * from tbl_1, tbl_2 where tbl_1.id = tbl_2.id > and tbl_2.name like 'x%y%' and tbl_1.x > 1234567890123 > order by tbl_1.x; > > tbl_1 is very big (> 2 million rows) > tbl_2 is relatively small (7000 or so rows) > tbl_1.x is a numeric(13) > tbl_1.id & tbl_2.id are integers > tbl_2.name is a varchar(64) > > I've run 'VACUUM ANALYZE' on both tables involved in > the query. I also used 'EXPLAIN' and observed that > the query plan is completely changed from what it > was originally. Get an explain analyze. That gives actual v/s planned time spent. See what is causing the difference. A discrepency between planned and actual row is usually a indication of out-of-date stats. Which are the indexes on these tables? You should list fields with indexes first in where clause. Also list most selective field first so that it eliminates as many rows as possible in first scan. I hope you have read the tuning articles on varlena.com and applied some basic tuning. And post the table schema, hardware config, postgresql config(important ones of course) and explain analyze for queries. That would be something to start with. Shridhar
Thanks for the advice. On further review it appears I am only getting this performance degradation when I run the command via a JDBC app. If I do the exact same query from psql, the performance is fine. I've tried both the JDBC2 and JDBC3 jars. Same results. It definitely seems to correspond to deleting and recreating the indexes, though. The same query thru JDBC worked fine before recreating the indexes. Does that make any sense at all? thanks Bill --- Shridhar Daithankar <shridhar@frodo.hserus.net> wrote: > Bill Chandler wrote: > > > Hi, > > > > Using PostgreSQL 7.4.2 on Solaris. I'm trying to > > improve performance on some queries to my > databases so > > I wanted to try out various index structures. > > > > Since I'm going to be running my performance tests > > repeatedly, I created some SQL scripts to delete > and > > recreate various index configurations. One of the > > scripts contains the commands for recreating the > > 'original' index configuration (i.e. the one I've > > already got some numbers for). Only thing is now > > when I delete and recreate the original indexes > then > > run the query, I'm finding the performance has > gone > > completely down the tubes compared to what I > > originally had. A query that used to take 5 > minutes > > to complete now takes hours to complete. > > > > For what it's worth my query looks something like: > > > > select * from tbl_1, tbl_2 where tbl_1.id = > tbl_2.id > > and tbl_2.name like 'x%y%' and tbl_1.x > > 1234567890123 > > order by tbl_1.x; > > > > tbl_1 is very big (> 2 million rows) > > tbl_2 is relatively small (7000 or so rows) > > tbl_1.x is a numeric(13) > > tbl_1.id & tbl_2.id are integers > > tbl_2.name is a varchar(64) > > > > I've run 'VACUUM ANALYZE' on both tables involved > in > > the query. I also used 'EXPLAIN' and observed > that > > the query plan is completely changed from what it > > was originally. > > Get an explain analyze. That gives actual v/s > planned time spent. See what is > causing the difference. A discrepency between > planned and actual row is usually > a indication of out-of-date stats. > > > Which are the indexes on these tables? You should > list fields with indexes first > in where clause. Also list most selective field > first so that it eliminates as > many rows as possible in first scan. > > > I hope you have read the tuning articles on > varlena.com and applied some basic > tuning. > > And post the table schema, hardware config, > postgresql config(important ones of > course) and explain analyze for queries. That would > be something to start with. > > Shridhar > > ---------------------------(end of > broadcast)--------------------------- > TIP 2: you can get off all lists at once with the > unregister command > (send "unregister YourEmailAddressHere" to > majordomo@postgresql.org) > __________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages! http://promotions.yahoo.com/new_mail
That is interesting - both psql and JDBC merely submit statements for the backend to process, so generally you would expect no difference in execution plan or performance. It might be worth setting "log_statement=true" in postgresql.conf and checking that you are executing *exactly* the same statement in both JDBC and psql. regards Mark P.s : lets see the output from EXPLAIN ANALYZE :-) Bill Chandler wrote: >Thanks for the advice. > >On further review it appears I am only getting this >performance degradation when I run the command via >a JDBC app. If I do the exact same query from >psql, the performance is fine. I've tried both the >JDBC2 and JDBC3 jars. Same results. > > > > > > >
Thanks for this tip. Turns out there is a difference. I am using cursors (i.e. calling setFetchSize(5000) on my Statement) in JDBC. So the SQL statement is preceded by: DECLARE JDBC_CURS_1 CURSOR FOR ... which is then followed by the SQL statemnt. This is followed by the separate statement: FETCH FORWARD 5000 FROM JDBC_CURS_1; Also, don't know if this is significant but there are a few lines before both of these: set datestyle to 'ISO'; select version(), case when pg_encoding_to_char(1) = 'SQL_ASCII' then 'UNKNOWN' else getdatabaseencoding() end; set client_encoding = 'UNICODE begin; Only thing is, though, none of this is new. I was using cursors before as well. Here is the output from "EXPLAIN ANALYZE". Hope it comes out readable: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=50466.04..50470.45 rows=1765 width=114) (actual time=87237.003..88235.011 rows=108311 loops=1) Sort Key: iso_nep_data_update_events.lds -> Merge Join (cost=49240.03..50370.85 rows=1765 width=114) (actual time=56658.356..65221.995 rows=108311 loops=1) Merge Cond: ("outer".obj_id = "inner".obj_id) -> Sort (cost=198.01..198.16 rows=61 width=65) (actual time=175.947..181.172 rows=3768 loops=1) Sort Key: iso_nep_control.obj_id -> Seq Scan on iso_nep_control (cost=0.00..196.20 rows=61 width=65) (actual time=0.056..108.151 rows=3768 loops=1) Filter: ((real_name)::text ~~ 'NEPOOL%REAL%'::text) -> Sort (cost=49042.02..49598.46 rows=222573 width=69) (actual time=56482.073..58642.901 rows=216528 loops=1) Sort Key: iso_nep_data_update_events.obj_id -> Index Scan using iso_nep_due_idx1 on iso_nep_data_update_events (cost=0.00..7183.18 rows=222573 width=69) (actual time=0.179..11739.104 rows=216671 loops=1) Index Cond: (lds > 1088554754000::numeric) Total runtime: 88643.330 ms (13 rows) Here is the actual query: select iso_nep_DATA_UPDATE_EVENTS.lds, iso_nep_DATA_UPDATE_EVENTS.tsds, iso_nep_DATA_UPDATE_EVENTS.value, iso_nep_DATA_UPDATE_EVENTS.correction, iso_nep_DATA_UPDATE_EVENTS.delta_lds_tsds, iso_nep_CONTROL.real_name, iso_nep_CONTROL.freq, iso_nep_CONTROL.type from iso_nep_DATA_UPDATE_EVENTS, iso_nep_CONTROL where iso_nep_CONTROL.real_name like 'NEPOOL%REAL%' escape '/' and iso_nep_DATA_UPDATE_EVENTS.obj_id = iso_nep_CONTROL.obj_id and iso_nep_DATA_UPDATE_EVENTS.lds > 1088554754000 order by lds; Two tables: iso_nep_data_update_events and iso_nep_control. Basically getting all columns from both tables. Joining the tables on obj_id = obj_id. Have unique indexes on iso_nep_control.obj_id (clustered) and iso_nep_control.real_name. Have non-unique indexes on iso_nep_data_update_events.lds and iso_nep_data_update_events.obj_id. thanks, Bill --- Mark Kirkwood <markir@coretech.co.nz> wrote: > That is interesting - both psql and JDBC merely > submit statements for > the backend to process, so generally you would > expect no difference in > execution plan or performance. > > It might be worth setting "log_statement=true" in > postgresql.conf and > checking that you are executing *exactly* the same > statement in both > JDBC and psql. > > regards > > Mark > > P.s : lets see the output from EXPLAIN ANALYZE :-) > > Bill Chandler wrote: > > >Thanks for the advice. > > > >On further review it appears I am only getting this > > >performance degradation when I run the command via > >a JDBC app. If I do the exact same query from > >psql, the performance is fine. I've tried both the > >JDBC2 and JDBC3 jars. Same results. > > > > > > > > > > > > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > __________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - 100MB free storage! http://promotions.yahoo.com/new_mail
Cursors performance (was: Re: Terrible performance after deleting/recreating indexes)
From
Bill Chandler
Date:
Thanks to all who have responded. I now think my problem is not related to deleting/recreating indexes. Somehow it is related to JDBC cursors. It appears that what is happening is that since I'm using a fetch size of 5000, the command: FETCH FORWARD 5000 FROM JDBC_CURS_1 is being repeatedly sent to the server as I process the result set from my query. Each time this command is sent it it takes about 5 minutes to return which is about the amount of time the whole query took to complete before the performance degredation. So in other words it looks as if the full select is being rerun on each fetch. Now the mystery is why is this happening all of the sudden? I have been running w/ fetch size set to 5000 for the last couple of weeks and it did not appear to be doing this (i.e. re-running the entire select statement again). Is this what I should expect when using cursors? I would have thought that the server should "remember" where it left off in the query since the last fetch and continue from there. Could I have inadvertently changed a parameter somewhere that would cause this behavior? thanks, Bill __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Bill, What happens if you do this in psql, also you can turn on duration logging in the backend and log the queries. dave On Fri, 2004-07-09 at 16:24, Bill Chandler wrote: > Thanks to all who have responded. I now think my > problem is not related to deleting/recreating indexes. > Somehow it is related to JDBC cursors. It appears > that what is happening is that since I'm using > a fetch size of 5000, the command: > > FETCH FORWARD 5000 FROM JDBC_CURS_1 > > is being repeatedly sent to the server as I process > the result set from my query. Each time this command > is sent it it takes about 5 minutes to return which is > about the amount of time the whole query took to > complete before the performance degredation. So in > other words it looks as if the full select is being > rerun on each fetch. > > Now the mystery is why is this happening all of the > sudden? I have been running w/ fetch size set to 5000 > for the last couple of weeks and it did not appear to > be doing this (i.e. re-running the entire select > statement again). Is this what I should expect when > using cursors? I would have thought that the server > should "remember" where it left off in the query since > the last fetch and continue from there. > > Could I have inadvertently changed a parameter > somewhere that would cause this behavior? > > thanks, > > Bill > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > > > > !DSPAM:40eefff6170301475214189! > > -- Dave Cramer 519 939 0336 ICQ # 14675561
On Fri, 9 Jul 2004, Bill Chandler wrote: > Thanks to all who have responded. I now think my > problem is not related to deleting/recreating indexes. > Somehow it is related to JDBC cursors. It appears > that what is happening is that since I'm using > a fetch size of 5000, the command: > > FETCH FORWARD 5000 FROM JDBC_CURS_1 > If the top level node of your execution plan is a sort step, it should take essentially no time to retrieve additional rows after the first fetch. The sort step is materializes the results so that future fetches simply need to spit this data back to the client. I would agree with Dave's suggestion to use log_duration and compare the values for the first and subsequent fetches. Kris Jurka
Bill Chandler wrote: > Now the mystery is why is this happening all of the > sudden? I have been running w/ fetch size set to 5000 > for the last couple of weeks and it did not appear to > be doing this (i.e. re-running the entire select > statement again). Is this what I should expect when > using cursors? I would have thought that the server > should "remember" where it left off in the query since > the last fetch and continue from there. I'd check heap size, GC activity (-verbose:gc), CPU use, swapping activity on the *client* side. It may be that your dataset size or physical memory or something similar has changed sufficiently that GC resulting from the data in each 5k row batch is killing you. Can you try a trivial app that runs the same query (with same fetchsize, autocommit, etc) via JDBC and does nothing but steps forward through the resultset, and see how fast it runs? Perhaps the problem is in your processing logic. -O
Might be worth doing a little test: i) modify your code to fetch 1 row at a time ii) set log_duration=true in your postgresql.conf (as the other posters have suggested) Then compare with running the query in psql. regards Mark Bill Chandler wrote: >Thanks to all who have responded. I now think my >problem is not related to deleting/recreating indexes. >Somehow it is related to JDBC cursors. It appears >that what is happening is that since I'm using >a fetch size of 5000, the command: > >FETCH FORWARD 5000 FROM JDBC_CURS_1 > >is being repeatedly sent to the server as I process >the result set from my query. Each time this command >is sent it it takes about 5 minutes to return which is >about the amount of time the whole query took to >complete before the performance degredation. So in >other words it looks as if the full select is being >rerun on each fetch. > >Now the mystery is why is this happening all of the >sudden? I have been running w/ fetch size set to 5000 >for the last couple of weeks and it did not appear to >be doing this (i.e. re-running the entire select >statement again). Is this what I should expect when >using cursors? I would have thought that the server >should "remember" where it left off in the query since >the last fetch and continue from there. > >Could I have inadvertently changed a parameter >somewhere that would cause this behavior? > >thanks, > >Bill > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com > >---------------------------(end of broadcast)--------------------------- >TIP 8: explain analyze is your friend > >