Re: Simple postgresql.conf wizard - Mailing list pgsql-hackers

From Mark Wong
Subject Re: Simple postgresql.conf wizard
Date
Msg-id 70c01d1d0812012053o709cfd23nfb494705fd300eaa@mail.gmail.com
Whole thread Raw
In response to Re: Simple postgresql.conf wizard  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Simple postgresql.conf wizard
Re: Simple postgresql.conf wizard
Re: default_stats_target WAS: Simple postgresql.conf wizard
List pgsql-hackers
On Thu, Nov 13, 2008 at 11:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> A lot of people have suggested raising our default_statistics target,
>> and it has been rejected because there's some O(n^2) behavior in the
>> planner, and it makes ANALYZE slower, but it's not that crazy.
>
> I think everyone agrees it ought to be raised.  Where the rubber meets
> the road is deciding just *what* to raise it to.  We've got no
> convincing evidence in favor of any particular value.
>
> If someone actually wanted to put some effort into this, I'd suggest
> taking some reasonably complex benchmark (maybe TPCH or one of the DBT
> series) and plotting planner runtime for each query as a function of
> statistics_target, taking care to mark the breakpoints where it shifted
> to a better (or worse?) plan due to having better stats.

Hi all,

I have some data from a mostly un-tuned system.  I have a 10GB scale
factor using DBT-3 on a single 25 disk hardware RAID-0 lun.  Generally
speaking, it seems like on un-tuned systems increasing the
default_statistics_target for this workload doesn't have a clear
benefit.  Here is a histogram of the Power Test and Throughput Test
with default GUC values.  The Power and Throughput test results are
plotted against the default_statistics_target value.  For reminders,
the Power Test executes each query serially, while the Throughput Test
executes 8 streams of the 22 queries simultaneously.

Default values:
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/default.png

Raw data in http://207.173.203.223/~markwkm/community6/dbt3/,
directories 1 - 11.

I started increasing the shared_buffers and effective_cache_size to
see if there might be anything more interesting in the results, but
still don't think so.

shared_buffers=2048MB, effective_cache_size=1024MB
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/2048x1024.png

Raw data in http://207.173.203.223/~markwkm/community6/dbt3/,
directories 12 - 22.

-----

shared_buffers=4096MB, effective_cache_size=2048MB
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/4096x2048.png

Raw data in http://207.173.203.223/~markwkm/community6/dbt3/,
directories 23 - 33.

-----

shared_buffers=20480MB
effective_cache_size=10240MB
checkpoint_segments=3000

http://207.173.203.223/~markwkm/pgsql/default_statistics_target/20480x10240.png

Raw data in http://207.173.203.223/~markwkm/community6/dbt3/,
directories 34 - 44.

So then I attempted to see if there might have been difference between
the executing time of each individual query with the above parameters.The queries that don't seem to be effected are
Q1,Q4, Q12, Q13, and
 
Q15.  Q17 suggests that anything higher than
default_statistics_target=10 is an improvement.  The rest of the
queries appears not to follow any particular trend with respect to
default_statistics_target.

Here's query 17 when default_statistics_target=10
EXPLAIN ANALYZE select sum(l_extendedprice) / 7.0 as avg_yearly from
lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#52'
and p_container = 'SM PACK' and l_quantity < ( select 0.2 *
avg(l_quantity) from lineitem where l_partkey = p_partkey );

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=15725486.01..15725486.02 rows=1 width=4) (actual
 
time=31669.249..31669.249 rows=1 loops=1)  ->  Hash Join  (cost=70662.08..15725435.96 rows=20019 width=4)
(actual time=690.551..31666.031 rows=5413 loops=1)        Hash Cond: (public.lineitem.l_partkey = part.p_partkey)
Join Filter: (public.lineitem.l_quantity < (subplan))        ->  Seq Scan on lineitem  (cost=0.00..1668441.22
 
rows=60052722 width=12) (actual time=0.007..11016.720 rows=60046144
loops=1)        ->  Hash  (cost=70637.08..70637.08 rows=2000 width=4) (actual
time=548.192..548.192 rows=1986 loops=1)              ->  Seq Scan on part  (cost=0.00..70637.08 rows=2000
width=4) (actual time=0.066..547.106 rows=1986 loops=1)                    Filter: ((p_brand = 'Brand#52'::bpchar) AND
(p_container = 'SM PACK'::bpchar))        SubPlan          ->  Aggregate  (cost=227.86..227.87 rows=1 width=4) (actual
time=0.141..0.141 rows=1 loops=59857)                ->  Index Scan using i_l_partkey on lineitem
(cost=0.00..227.72 rows=54 width=4) (actual time=0.071..0.132 rows=31
loops=59857)                      Index Cond: (l_partkey = $0)Total runtime: 31669.340 ms
(13 rows)


Here's query 17 when default_statistics_target=1000
EXPLAIN ANALYZE select sum(l_extendedprice) / 7.0 as avg_yearly from
lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#21'
and p_container = 'LG BAG' and l_quantity < ( select 0.2 *
avg(l_quantity) from lineitem where l_partkey = p_partkey );

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=8660521.85..8660521.87 rows=1 width=4) (actual
 
time=7035.069..7035.069 rows=1 loops=1)  ->  Nested Loop  (cost=0.00..8660472.24 rows=19845 width=4) (actual
time=0.183..7033.231 rows=5148 loops=1)        Join Filter: (public.lineitem.l_quantity < (subplan))        ->  Seq
Scanon part  (cost=0.00..70639.00 rows=1983 width=4)
 
(actual time=0.054..544.415 rows=1934 loops=1)              Filter: ((p_brand = 'Brand#21'::bpchar) AND
(p_container = 'LG BAG'::bpchar))        ->  Index Scan using i_l_partkey on lineitem
(cost=0.00..131.88 rows=31 width=12) (actual time=1.378..1.400 rows=30
loops=1934)              Index Cond: (public.lineitem.l_partkey = part.p_partkey)        SubPlan          ->  Aggregate
(cost=135.45..135.46 rows=1 width=4) (actual
 
time=0.064..0.064 rows=1 loops=58151)                ->  Index Scan using i_l_partkey on lineitem
(cost=0.00..135.37 rows=31 width=4) (actual time=0.005..0.056 rows=31
loops=58151)                      Index Cond: (l_partkey = $0)Total runtime: 7035.149 ms
(12 rows)


Histograms of each individual query:

Q1
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q1.png

Q2
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q2.png

Q3
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q3.png

Q4
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q4.png

Q5
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q5.png

Q6
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q6.png

Q7
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q7.png

Q8
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q8.png

Q9
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q9.png

Q10
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q10.png

Q11
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q11.png

Q12
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q12.png

Q13
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q13.png

Q14
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q14.png

Q15
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q15.png

Q16
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q16.png

Q17
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q17.png

Q18
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q18.png

Q19
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q19.png

Q20
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q20.png

Q21
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q21.png

Q22
http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q22.png


I dumped a lot of raw data and gave little analysis, I hope someone
can make more sense of it... :)

Regards,
Mark


pgsql-hackers by date:

Previous
From: "Brendan Jurd"
Date:
Subject: Re: New to_timestamp implementation is pretty strict
Next
From: Greg Smith
Date:
Subject: Re: Simple postgresql.conf wizard