performace problem after VACUUM ANALYZE - Mailing list pgsql-performance

From Scott Cain
Subject performace problem after VACUUM ANALYZE
Date
Msg-id 1045241040.1486.600.camel@localhost.localdomain
Whole thread Raw
Responses Re: performace problem after VACUUM ANALYZE
Re: [Gmod-schema] Re: performace problem after VACUUM ANALYZE
Re: performace problem after VACUUM ANALYZE
List pgsql-performance
Hello,

I am going to do my best to describe this problem, but the description
may be quite long.  Also, this is my first post to this list, so I miss
important details please let me know.  All of the following analysis was
done on my P4 laptop with 0.5 Gig ram and postgresql 7.3 installed from
RPM for RedHat linux 7.3

I have a database with two large tables that I need to do a join on.
Here is the query that is causing the problems:

select distinct f.name,fl.nbeg,fl.nend,fl.strand,f.type_id,f.feature_id
    from feature f, featureloc fl
    where
      fl.srcfeature_id = 1 and
      ((fl.strand=1  and fl.nbeg <= 393164 and fl.nend >= 390956) OR
       (fl.strand=-1 and fl.nend <= 393164 and fl.nbeg >= 390956)) and
      f.feature_id  = fl.feature_id

The feature table has 3,685,287 rows and featureloc has 3,803,762 rows.
Here are all of the relevant indexes on these tables:

 Index "feature_pkey"
   Column   |  Type
------------+---------
 feature_id | integer
unique btree (primary key)

Index "featureloc_idx1"
   Column   |  Type
------------+---------
 feature_id | integer
btree

 Index "featureloc_idx2"
    Column     |  Type
---------------+---------
 srcfeature_id | integer
btree

Index "featureloc_src_strand_beg_end"
    Column     |   Type
---------------+----------
 srcfeature_id | integer
 strand        | smallint
 nbeg          | integer
 nend          | integer
btree

In a naive database (ie, no ANALYZE has ever been run), the above query
runs in about 3 seconds after the first time it has been run (due to
caching?).  Here is the output of EXPLAIN on that query:

Unique  (cost=75513.46..75513.48 rows=1 width=167)
  ->  Sort  (cost=75513.46..75513.46 rows=1 width=167)
        ->  Nested Loop  (cost=0.00..75513.45 rows=1 width=167)
              ->  Index Scan using featureloc_idx2 on featureloc fl
(cost=0.00..75508.43 rows=1 width=14)
              ->  Index Scan using feature_pkey on feature f
(cost=0.00..5.01 rows=1 width=153)

Notice that for featureloc it is using featureloc_idx2, which is the
index on srcfeature_id.  Ideally, it would be using
featureloc_src_strand_beg_end, but this is not bad.  In fact, if I drop
featureloc_idx2 (not something I can do in a production database), it
does use that index and cuts the query runtime in half.

Now comes the really bizarre part: if I run VACUUM ANALYZE, the
performance becomes truly awful. Instead of using an index for
featureloc, it now does a seq scan, causing the runtime to go up to
about 30 seconds.  Here is the output of explain now:

Unique  (cost=344377.70..344759.85 rows=2548 width=47)
  ->  Sort  (cost=344377.70..344377.70 rows=25477 width=47)
        ->  Nested Loop  (cost=0.00..342053.97 rows=25477 width=47)
              ->  Seq Scan on featureloc fl  (cost=0.00..261709.31
rows=25477 width=14)
              ->  Index Scan using feature_pkey on feature f
(cost=0.00..3.14 rows=1 width=33)

If I try to force it to use an index by setting enable_seqscan=0, it
then uses featureloc_idx1, the index on feature_id.  This has slightly
worse performance than the seq scan, at just over 30 seconds.  Here is
the output of explain for this case:

Unique  (cost=356513.46..356895.61 rows=2548 width=47)
  ->  Sort  (cost=356513.46..356513.46 rows=25477 width=47)
        ->  Nested Loop  (cost=0.00..354189.73 rows=25477 width=47)
              ->  Index Scan using featureloc_idx1 on featureloc fl
(cost=0.00..273845.08 rows=25477 width=14)
              ->  Index Scan using feature_pkey on feature f
(cost=0.00..3.14 rows=1 width=33)

Now, if I drop featureloc_idx1 (again, not something I can do in
production) and still disallow seq scans, it uses featureloc_idx2, as it
did with the naive database above, but the performance is much worse,
running about 24 seconds for the query.  Here is the output of explain:

Unique  (cost=1310195.21..1310577.36 rows=2548 width=47)
  ->  Sort  (cost=1310195.21..1310195.21 rows=25477 width=47)
        ->  Nested Loop  (cost=0.00..1307871.48 rows=25477 width=47)
              ->  Index Scan using featureloc_idx2 on featureloc fl
(cost=0.00..1227526.82 rows=25477 width=14)
              ->  Index Scan using feature_pkey on feature f
(cost=0.00..3.14 rows=1 width=33)

Finally, if I drop featureloc_idx2, it uses the right index, and the
runtime gets back to about 2 seconds, but the database is unusable for
other queries because I dropped the other indexes and disallowed seq
scans.  Here is the output for explain in this case:

Unique  (cost=1414516.98..1414899.13 rows=2548 width=47)
  ->  Sort  (cost=1414516.98..1414516.98 rows=25477 width=47)
        ->  Nested Loop  (cost=0.00..1412193.25 rows=25477 width=47)
              ->  Index Scan using featureloc_src_strand_beg_end on
featureloc fl  (cost=0.00..1331848.60 rows=25477 width=14)
              ->  Index Scan using feature_pkey on feature f
(cost=0.00..3.14 rows=1 width=33)

Now the question: can anybody tell me why the query planner does such a
bad job figuring out how to run this query after I run VACUUM ANALYZE,
and can you give me any advice on making this arrangement work without
forcing postgres' hand by limiting its options until it gets it right?

Thank you very much for reading down to here,
Scott

--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


pgsql-performance by date:

Previous
From: johnnnnnn
Date:
Subject: Re: Tuning scenarios (was Changing the default configuration)
Next
From: Tom Lane
Date:
Subject: Re: performace problem after VACUUM ANALYZE