Re: Simple postgresql.conf wizard - Mailing list pgsql-hackers
From | Mark Wong |
---|---|
Subject | Re: Simple postgresql.conf wizard |
Date | |
Msg-id | 70c01d1d0812012053o709cfd23nfb494705fd300eaa@mail.gmail.com Whole thread Raw |
In response to | Re: Simple postgresql.conf wizard (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Simple postgresql.conf wizard
Re: Simple postgresql.conf wizard Re: default_stats_target WAS: Simple postgresql.conf wizard |
List | pgsql-hackers |
On Thu, Nov 13, 2008 at 11:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> A lot of people have suggested raising our default_statistics target, >> and it has been rejected because there's some O(n^2) behavior in the >> planner, and it makes ANALYZE slower, but it's not that crazy. > > I think everyone agrees it ought to be raised. Where the rubber meets > the road is deciding just *what* to raise it to. We've got no > convincing evidence in favor of any particular value. > > If someone actually wanted to put some effort into this, I'd suggest > taking some reasonably complex benchmark (maybe TPCH or one of the DBT > series) and plotting planner runtime for each query as a function of > statistics_target, taking care to mark the breakpoints where it shifted > to a better (or worse?) plan due to having better stats. Hi all, I have some data from a mostly un-tuned system. I have a 10GB scale factor using DBT-3 on a single 25 disk hardware RAID-0 lun. Generally speaking, it seems like on un-tuned systems increasing the default_statistics_target for this workload doesn't have a clear benefit. Here is a histogram of the Power Test and Throughput Test with default GUC values. The Power and Throughput test results are plotted against the default_statistics_target value. For reminders, the Power Test executes each query serially, while the Throughput Test executes 8 streams of the 22 queries simultaneously. Default values: http://207.173.203.223/~markwkm/pgsql/default_statistics_target/default.png Raw data in http://207.173.203.223/~markwkm/community6/dbt3/, directories 1 - 11. I started increasing the shared_buffers and effective_cache_size to see if there might be anything more interesting in the results, but still don't think so. shared_buffers=2048MB, effective_cache_size=1024MB http://207.173.203.223/~markwkm/pgsql/default_statistics_target/2048x1024.png Raw data in http://207.173.203.223/~markwkm/community6/dbt3/, directories 12 - 22. ----- shared_buffers=4096MB, effective_cache_size=2048MB http://207.173.203.223/~markwkm/pgsql/default_statistics_target/4096x2048.png Raw data in http://207.173.203.223/~markwkm/community6/dbt3/, directories 23 - 33. ----- shared_buffers=20480MB effective_cache_size=10240MB checkpoint_segments=3000 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/20480x10240.png Raw data in http://207.173.203.223/~markwkm/community6/dbt3/, directories 34 - 44. So then I attempted to see if there might have been difference between the executing time of each individual query with the above parameters.The queries that don't seem to be effected are Q1,Q4, Q12, Q13, and Q15. Q17 suggests that anything higher than default_statistics_target=10 is an improvement. The rest of the queries appears not to follow any particular trend with respect to default_statistics_target. Here's query 17 when default_statistics_target=10 EXPLAIN ANALYZE select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#52' and p_container = 'SM PACK' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey ); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=15725486.01..15725486.02 rows=1 width=4) (actual time=31669.249..31669.249 rows=1 loops=1) -> Hash Join (cost=70662.08..15725435.96 rows=20019 width=4) (actual time=690.551..31666.031 rows=5413 loops=1) Hash Cond: (public.lineitem.l_partkey = part.p_partkey) Join Filter: (public.lineitem.l_quantity < (subplan)) -> Seq Scan on lineitem (cost=0.00..1668441.22 rows=60052722 width=12) (actual time=0.007..11016.720 rows=60046144 loops=1) -> Hash (cost=70637.08..70637.08 rows=2000 width=4) (actual time=548.192..548.192 rows=1986 loops=1) -> Seq Scan on part (cost=0.00..70637.08 rows=2000 width=4) (actual time=0.066..547.106 rows=1986 loops=1) Filter: ((p_brand = 'Brand#52'::bpchar) AND (p_container = 'SM PACK'::bpchar)) SubPlan -> Aggregate (cost=227.86..227.87 rows=1 width=4) (actual time=0.141..0.141 rows=1 loops=59857) -> Index Scan using i_l_partkey on lineitem (cost=0.00..227.72 rows=54 width=4) (actual time=0.071..0.132 rows=31 loops=59857) Index Cond: (l_partkey = $0)Total runtime: 31669.340 ms (13 rows) Here's query 17 when default_statistics_target=1000 EXPLAIN ANALYZE select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#21' and p_container = 'LG BAG' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey ); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=8660521.85..8660521.87 rows=1 width=4) (actual time=7035.069..7035.069 rows=1 loops=1) -> Nested Loop (cost=0.00..8660472.24 rows=19845 width=4) (actual time=0.183..7033.231 rows=5148 loops=1) Join Filter: (public.lineitem.l_quantity < (subplan)) -> Seq Scanon part (cost=0.00..70639.00 rows=1983 width=4) (actual time=0.054..544.415 rows=1934 loops=1) Filter: ((p_brand = 'Brand#21'::bpchar) AND (p_container = 'LG BAG'::bpchar)) -> Index Scan using i_l_partkey on lineitem (cost=0.00..131.88 rows=31 width=12) (actual time=1.378..1.400 rows=30 loops=1934) Index Cond: (public.lineitem.l_partkey = part.p_partkey) SubPlan -> Aggregate (cost=135.45..135.46 rows=1 width=4) (actual time=0.064..0.064 rows=1 loops=58151) -> Index Scan using i_l_partkey on lineitem (cost=0.00..135.37 rows=31 width=4) (actual time=0.005..0.056 rows=31 loops=58151) Index Cond: (l_partkey = $0)Total runtime: 7035.149 ms (12 rows) Histograms of each individual query: Q1 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q1.png Q2 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q2.png Q3 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q3.png Q4 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q4.png Q5 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q5.png Q6 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q6.png Q7 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q7.png Q8 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q8.png Q9 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q9.png Q10 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q10.png Q11 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q11.png Q12 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q12.png Q13 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q13.png Q14 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q14.png Q15 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q15.png Q16 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q16.png Q17 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q17.png Q18 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q18.png Q19 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q19.png Q20 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q20.png Q21 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q21.png Q22 http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q22.png I dumped a lot of raw data and gave little analysis, I hope someone can make more sense of it... :) Regards, Mark
pgsql-hackers by date: