Re: Worse perfomance on 8.2.0 than on 7.4.14 - Mailing list pgsql-performance

From Rolf Østvik
Subject Re: Worse perfomance on 8.2.0 than on 7.4.14
Date
Msg-id 20070105182833.94277.qmail@web26312.mail.ukl.yahoo.com
Whole thread Raw
In response to Re: Worse perfomance on 8.2.0 than on 7.4.14  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Worse perfomance on 8.2.0 than on 7.4.14  ("Simon Riggs" <simon@2ndquadrant.com>)
List pgsql-performance
--- Tom Lane <tgl@sss.pgh.pa.us> skrev:

> The number-of-matching-rows estimate has gone up by a factor of 10,
> which undoubtedly has a lot to do with the much higher cost estimate.
> Do you have any idea why that is ... is the table really the same size
> in both servers?  If so, could we see the pg_stats row for
> step_result_subset.uut_result on both servers?

Table step_result_subset and uut_result_subset in both databases is created from same schema
definition file and filled with data from the same data source file.

==== Server 7.4.14: ====

logistics_74# select count(*) from step_result_subset;
  count
----------
 17179506
(1 row)

logistics_74# select count(distinct uut_result) from step_result_subset;
 count
--------
 176450
(1 row)

logistics_74# analyse verbose step_result_subset;
INFO:  analyzing "public.step_result_subset"
INFO:  "step_result_subset": 92863 pages, 3000 rows sampled, 17179655 estimated total rows
ANALYZE

logistics_74# select * from pg_stats where tablename = step_result_subset and
attname='uut_result';
 schemaname |     tablename      |  attname   | null_frac | avg_width | n_distinct |
         most_common_vals                        |
most_common_freqs                                               |
histogram_bounds                             | correlation

------------+--------------------+------------+-----------+-----------+------------+----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+-------------
 public     | step_result_subset | uut_result |         0 |         4 |      57503 |
{70335,145211,17229,20091,21827,33338,34370,42426,47274,54146} |
{0.001,0.001,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667}
| {213,30974,51300,68529,85053,100838,114971,128126,144230,161657,176691} |    0.951364
(1 row)


==== Server 8.2.0: ====

logistics_82# select count(*) from step_result_subset;
  count
----------
 17179506
(1 row)

logistics_82# select count(distinct uut_result) from step_result_subset;
 count
--------
 176450
(1 row)

logistics_82# analyse verbose step_result_subset;
INFO:  analyzing "public.step_result_subset"
INFO:  "step_result_subset": scanned 3000 of 92863 pages, containing 555000 live rows and 0 dead
rows; 3000 rows in sample, 17179655 estimated total rows
ANALYZE

logistics_# select * from pg_stats where tablename = step_result_subset and attname='uut_result';
 schemaname |     tablename      |  attname   | null_frac | avg_width | n_distinct |
         most_common_vals                         |
most_common_freqs                                                |
histogram_bounds                             | correlation

------------+--------------------+------------+-----------+-----------+------------+-----------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+-------------
 public     | step_result_subset | uut_result |         0 |         4 |       6516 |
{35010,111592,35790,41162,56844,57444,60709,73017,76295,106470} |
{0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333}
| {147,31791,54286,70928,85996,102668,117885,130947,144766,162098,176685} |    0.954647
(1 row)

Then on server 8.2.0 i need to set statistics to ~120 on step_result_subset.uut_result to get
n_distinct to be in same range as n_distinct on 7.4.14.

Even with a statistics value of 1000, the n_distinct value does only reach ~138 000. Is it correct
that _ideally_ the n_distinct value should be the same as "select count(distinct uut_result) from
step_result_subset"?

====
Even with better statistics on step_result_subset.uut_result neither of 7.4.14 or 8.2.0 manages to
pick the best plan when i want to select bigger datasets (in my examples that would be to set an
earlier date in the where clause for "ur.start_date_time > '2006-12-11'"). I will continue to
adjust other parameters and see what i can manage myself.

Best regards
Rolf Østvik


__________________________________________________
Bruker du Yahoo!?
Lei av spam?  Yahoo! Mail har den beste spambeskyttelsen
http://no.mail.yahoo.com

pgsql-performance by date:

Previous
From: Erik Jones
Date:
Subject: Re: More 8.2 client issues (Was: [Slow dump?)
Next
From: Reid Thompson
Date:
Subject: Re: PostgreSQL to host e-mail?