Re: Improve performance of query - Mailing list pgsql-performance

From John A Meinel
Subject Re: Improve performance of query
Date
Msg-id 41C1BED9.2050206@arbash-meinel.com
Whole thread Raw
In response to Improve performance of query  (Richard Rowell <richard@bowmansystems.com>)
List pgsql-performance
The first thing to check... Did you do a recent VACUUM ANALYZE? This
updates all the statistics. There are a number of places where it says
"rows=1000" which is usually the "I have no idea, let me guess 1000".
Also, there are a number of places where the estimates are pretty far
off. For instance:

Richard Rowell wrote:

>->  Subquery Scan "*SELECT* 1"  (cost=0.00..64034.15 rows=10540 width=24) (actual time=279.089..4419.371 rows=161
loops=1)
>
>
estimating 10,000 when only 161 is a little bit different.

>                                 ->  Seq Scan on da_answer a  (cost=0.00..63928.75 rows=10540 width=24) (actual
time=279.080..4418.808rows=161 loops=1) 
>                                       Filter: ((date_effective <= 9999999999::double precision) AND (inactive <> 1)
AND(subplan)) 
>
>
Though this could be a lack of cross-column statistics. If 2 columns are
correlated, the planner isn't as accurate as it could be. Also,
date_effective <= 9999999999 doesn't seem very restrictive, could you
use a between statement? (date between 0 and 9999999). I know for
timestamps usually giving a between is better than a single sided query.

This one was underestimated.

>->  Subquery Scan "*SELECT* 2"  (cost=988627.58..989175.52 rows=2799 width=24) (actual time=290.730..417.720 rows=7556
loops=1)
>                                 ->  Hash Join  (cost=988627.58..989147.53 rows=2799 width=24) (actual
time=290.722..395.739rows=7556 loops=1) 
>                                       Hash Cond: ("outer".main_id = "inner".uid)
>
>
This is one of the ones that looks like it didn't have any ideas. It
could be because of the function. You might consider adding a function
index, though I think there are some caveats there.

>->  Function Scan on svp_getparentproviderids  (cost=0.00..12.50 rows=1000 width=4) (actual time=0.473..0.474 rows=1
loops=1)
>
>
Another very poor estimation. It might be a need to increase the
statistics for this column (ALTER TABLE, ALTER COLUMN, SET STATISTICS).
IIRC, compared with other db's postgres defaults to a much lower
statistics value. Try changing it from 10 (?) to 100 or so. There was a
discussion that every column with an index should use higher statistics.

>->  Index Scan using in_da_dr_type_provider on da_data_restrict  (cost=0.00..145.50 rows=46 width=8) (actual
time=0.041..26.627rows=7280 loops=1) 
>
>
I'm not a great optimizer, these are just some first things to look at.
Your sort mem seems pretty low to me (considering you have 1GB of RAM).
Perhaps you could bump that up to 40MB instead of 4MB. Also, if you run
this query twice in a row, is it still slow? (Sometimes it takes a bit
of work to get the right indexes loaded into ram, but then it is faster.)

Just some guesses,
John
=:->


Attachment

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Seqscan rather than Index
Next
From: Stan Y
Date:
Subject: Monitor CPU time per transaction?