query slows down with more accurate stats - Mailing list pgsql-performance
From | Robert Treat |
---|---|
Subject | query slows down with more accurate stats |
Date | |
Msg-id | 1081879359.25537.526.camel@camel Whole thread Raw |
Responses |
Re: query slows down with more accurate stats
|
List | pgsql-performance |
In the process of optimizing some queries, I have found the following query seems to degrade in performance the more accurate I make the statistics on the table... whether by using increased alter table ... set statistics or by using vacuum.. SELECT count( cl.caller_id ), npanxx.city, npanxx.state FROM cl LEFT OUTER JOIN npanxx on substr( cl.caller_id, 1, 3 ) = npanxx.npa and substr( cl.caller_id, 4, 3 ) = npanxx.nxx LEFT OUTER JOIN cp ON cl.caller_id = cp.caller_id WHERE cl.ivr_system_id = 130 AND cl.call_time > '2004-03-01 06:00:00.0 CST' AND cl.call_time < '2004-04-01 06:00:00.0 CST' AND cp.age >= 18 AND cp.age <= 24 AND cp.gender = 'm' GROUP BY npanxx.city, npanxx.state live=# analyze cl; ANALYZE live=# select reltuples from pg_class where relname = 'cl'; reltuples ----------- 53580 (1 row) live=# select count(*) from cl; count --------- 1140166 (1 row) The plan i get under these conditions is actually pretty good... HashAggregate (cost=28367.22..28367.66 rows=174 width=32) (actual time=1722.060..1722.176 rows=29 loops=1) -> Nested Loop (cost=0.00..28365.92 rows=174 width=32) (actual time=518.592..1716.254 rows=558 loops=1) -> Nested Loop Left Join (cost=0.00..20837.33 rows=1248 width=32) (actual time=509.991..1286.755 rows=13739 loops=1) -> Index Scan using cl_ivr_system_id on cl (cost=0.00..13301.15 rows=1248 width=14) (actual time=509.644..767.421rows=13739 loops=1) Index Cond: (ivr_system_id = 130) Filter: ((call_time > '2004-03-01 07:00:00-05'::timestamp with time zone) AND (call_time < '2004-04-0107:00:00-05'::timestamp with time zone)) -> Index Scan using npanxx_pkey on npanxx (cost=0.00..6.02 rows=1 width=32) (actual time=0.025..0.027 rows=1loops=13739) Index Cond: ((substr(("outer".caller_id)::text, 1, 3) = (npanxx.npa)::text) AND (substr(("outer".caller_id)::text,4, 3) = (npanxx.nxx)::text)) -> Index Scan using cp_pkey on cp (cost=0.00..6.02 rows=1 width=14) (actual time=0.027..0.027 rows=0 loops=13739) Index Cond: (("outer".caller_id)::text = (cp.caller_id)::text) Filter: ((age >= 18) AND (age <= 24) AND (gender = 'm'::bpchar)) Total runtime: 1722.489 ms (12 rows) but when i do live=# vacuum cl; VACUUM live=# select reltuples from pg_class where relname = 'cl'; reltuples ------------- 1.14017e+06 (1 row) (or alternatively increase the stats target on the table) I now get the following plan: HashAggregate (cost=80478.74..80482.41 rows=1471 width=32) (actual time=8132.261..8132.422 rows=29 loops=1) -> Merge Join (cost=79951.95..80467.70 rows=1471 width=32) (actual time=7794.338..8130.041 rows=558 loops=1) Merge Cond: ("outer"."?column4?" = "inner"."?column2?") -> Sort (cost=55719.06..55745.42 rows=10546 width=32) (actual time=4031.827..4052.526 rows=13739 loops=1) Sort Key: (cl.caller_id)::text -> Merge Right Join (cost=45458.30..55014.35 rows=10546 width=32) (actual time=2944.441..3796.787 rows=13739loops=1) Merge Cond: ((("outer".npa)::text = "inner"."?column2?") AND (("outer".nxx)::text = "inner"."?column3?")) -> Index Scan using npanxx_pkey on npanxx (cost=0.00..8032.99 rows=132866 width=32) (actual time=0.200..461.767rows=130262 loops=1) -> Sort (cost=45458.30..45484.67 rows=10546 width=14) (actual time=2942.994..2967.935 rows=13739 loops=1) Sort Key: substr((cl.caller_id)::text, 1, 3), substr((cl.caller_id)::text, 4, 3) -> Seq Scan on cl (cost=0.00..44753.60 rows=10546 width=14) (actual time=1162.423..2619.662rows=13739 loops=1) Filter: ((ivr_system_id = 130) AND (call_time > '2004-03-01 07:00:00-05'::timestamp withtime zone) AND (call_time < '2004-04-01 07:00:00-05'::timestamp with time zone)) -> Sort (cost=24232.89..24457.06 rows=89667 width=14) (actual time=3761.703..3900.340 rows=98010 loops=1) Sort Key: (cp.caller_id)::text -> Seq Scan on cp (cost=0.00..15979.91 rows=89667 width=14) (actual time=0.128..1772.215 rows=100302 loops=1) Filter: ((age >= 18) AND (age <= 24) AND (gender = 'm'::bpchar)) Total runtime: 8138.607 ms (17 rows) so i guess i am wondering if there is something I should be doing to help get the better plan at the more accurate stats levels and/or why it doesn't stick with the original plan (I noticed disabling merge joins does seem to push it back to the original plan). alternatively if anyone has any general suggestions on speeding up the query I'd be open to that too :-) Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
pgsql-performance by date: