performace problem after VACUUM ANALYZE - Mailing list pgsql-performance
From | Scott Cain |
---|---|
Subject | performace problem after VACUUM ANALYZE |
Date | |
Msg-id | 1045241040.1486.600.camel@localhost.localdomain Whole thread Raw |
Responses |
Re: performace problem after VACUUM ANALYZE
Re: [Gmod-schema] Re: performace problem after VACUUM ANALYZE Re: performace problem after VACUUM ANALYZE |
List | pgsql-performance |
Hello, I am going to do my best to describe this problem, but the description may be quite long. Also, this is my first post to this list, so I miss important details please let me know. All of the following analysis was done on my P4 laptop with 0.5 Gig ram and postgresql 7.3 installed from RPM for RedHat linux 7.3 I have a database with two large tables that I need to do a join on. Here is the query that is causing the problems: select distinct f.name,fl.nbeg,fl.nend,fl.strand,f.type_id,f.feature_id from feature f, featureloc fl where fl.srcfeature_id = 1 and ((fl.strand=1 and fl.nbeg <= 393164 and fl.nend >= 390956) OR (fl.strand=-1 and fl.nend <= 393164 and fl.nbeg >= 390956)) and f.feature_id = fl.feature_id The feature table has 3,685,287 rows and featureloc has 3,803,762 rows. Here are all of the relevant indexes on these tables: Index "feature_pkey" Column | Type ------------+--------- feature_id | integer unique btree (primary key) Index "featureloc_idx1" Column | Type ------------+--------- feature_id | integer btree Index "featureloc_idx2" Column | Type ---------------+--------- srcfeature_id | integer btree Index "featureloc_src_strand_beg_end" Column | Type ---------------+---------- srcfeature_id | integer strand | smallint nbeg | integer nend | integer btree In a naive database (ie, no ANALYZE has ever been run), the above query runs in about 3 seconds after the first time it has been run (due to caching?). Here is the output of EXPLAIN on that query: Unique (cost=75513.46..75513.48 rows=1 width=167) -> Sort (cost=75513.46..75513.46 rows=1 width=167) -> Nested Loop (cost=0.00..75513.45 rows=1 width=167) -> Index Scan using featureloc_idx2 on featureloc fl (cost=0.00..75508.43 rows=1 width=14) -> Index Scan using feature_pkey on feature f (cost=0.00..5.01 rows=1 width=153) Notice that for featureloc it is using featureloc_idx2, which is the index on srcfeature_id. Ideally, it would be using featureloc_src_strand_beg_end, but this is not bad. In fact, if I drop featureloc_idx2 (not something I can do in a production database), it does use that index and cuts the query runtime in half. Now comes the really bizarre part: if I run VACUUM ANALYZE, the performance becomes truly awful. Instead of using an index for featureloc, it now does a seq scan, causing the runtime to go up to about 30 seconds. Here is the output of explain now: Unique (cost=344377.70..344759.85 rows=2548 width=47) -> Sort (cost=344377.70..344377.70 rows=25477 width=47) -> Nested Loop (cost=0.00..342053.97 rows=25477 width=47) -> Seq Scan on featureloc fl (cost=0.00..261709.31 rows=25477 width=14) -> Index Scan using feature_pkey on feature f (cost=0.00..3.14 rows=1 width=33) If I try to force it to use an index by setting enable_seqscan=0, it then uses featureloc_idx1, the index on feature_id. This has slightly worse performance than the seq scan, at just over 30 seconds. Here is the output of explain for this case: Unique (cost=356513.46..356895.61 rows=2548 width=47) -> Sort (cost=356513.46..356513.46 rows=25477 width=47) -> Nested Loop (cost=0.00..354189.73 rows=25477 width=47) -> Index Scan using featureloc_idx1 on featureloc fl (cost=0.00..273845.08 rows=25477 width=14) -> Index Scan using feature_pkey on feature f (cost=0.00..3.14 rows=1 width=33) Now, if I drop featureloc_idx1 (again, not something I can do in production) and still disallow seq scans, it uses featureloc_idx2, as it did with the naive database above, but the performance is much worse, running about 24 seconds for the query. Here is the output of explain: Unique (cost=1310195.21..1310577.36 rows=2548 width=47) -> Sort (cost=1310195.21..1310195.21 rows=25477 width=47) -> Nested Loop (cost=0.00..1307871.48 rows=25477 width=47) -> Index Scan using featureloc_idx2 on featureloc fl (cost=0.00..1227526.82 rows=25477 width=14) -> Index Scan using feature_pkey on feature f (cost=0.00..3.14 rows=1 width=33) Finally, if I drop featureloc_idx2, it uses the right index, and the runtime gets back to about 2 seconds, but the database is unusable for other queries because I dropped the other indexes and disallowed seq scans. Here is the output for explain in this case: Unique (cost=1414516.98..1414899.13 rows=2548 width=47) -> Sort (cost=1414516.98..1414516.98 rows=25477 width=47) -> Nested Loop (cost=0.00..1412193.25 rows=25477 width=47) -> Index Scan using featureloc_src_strand_beg_end on featureloc fl (cost=0.00..1331848.60 rows=25477 width=14) -> Index Scan using feature_pkey on feature f (cost=0.00..3.14 rows=1 width=33) Now the question: can anybody tell me why the query planner does such a bad job figuring out how to run this query after I run VACUUM ANALYZE, and can you give me any advice on making this arrangement work without forcing postgres' hand by limiting its options until it gets it right? Thank you very much for reading down to here, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory
pgsql-performance by date: