Re: [GENERAL] how to get accurate values in pg_statistic - Mailing list pgsql-performance
From | Bruce Momjian |
---|---|
Subject | Re: [GENERAL] how to get accurate values in pg_statistic |
Date | |
Msg-id | 200309071622.h87GMt909846@candle.pha.pa.us Whole thread Raw |
In response to | Re: [GENERAL] how to get accurate values in pg_statistic (Mary Edie Meredith <maryedie@osdl.org>) |
List | pgsql-performance |
I have learned you can use: SET random = 0; to force identical statistics every time you run ANALYZE. --------------------------------------------------------------------------- Mary Edie Meredith wrote: > I certainly don't claim that it is appropriate to force customers into a > full analysis, particularly if random sampling versus a full scan of the > data reveals little to no performance differences in the plans. Being > able to sample accurately is _very nice for large tables. > > For our testing purposes, however, consistent results are extremely > important. We have observed that small difference in one plan for one of > 22 queries can cause a difference in the DBT-3 results. If this > happens, a small change in performance runs between two Linux kernels > may appear to be due to the kernels, when in fact it is due to the plan > change. > > We know that the plans are _exactly the same if the data in the > pg_statistics table is the same from run to run (all other things being > equal). So what we need to have is identical optimizer costs > (pg_statistics) for the same table data for each. > > I feel certain that the pg_statistics table will be identical from run > to run if analyze looks at every row. Thus our hope to find a way to > get that. > > We did runs over night. We can confirm that VACUUM FULL ANALYZE does > not produce the same pg_statistics run to run. With the default (10) > default_statistics_target the plans are also different. > > We ran additional tests with default_statistics_target set to 1000 (the > max I believe). The plans are the same over the different runs, but the > pg_statistics table has different cost values. The performance results > of the runs are consistent (we would expect this with the same plans). > The resulting performance metrics are similar to the best plans we see > using the default histogram size (good news). > > However, we worry that one day the cost will change enough for whatever > reason to cause a plan change, especially for a larger database scale > factor (database size/row size). > > I know we appear to be an isolated case, but customers also do testing > and may have the same consistency issues we have. I can also imagine > cases where customers want to guarantee that plans stay the same > (between replicated sites, for example). If two developers are > analyzing changes to the optimizer, don't you want the costs used for > testing on their two systems to be identical for comparison purposes? > > Anyway, IMHO I believe that an option for an ANALYZE FULL ("sampling" > all rows) would be valuable. Any other ideas for how to force this > without code change are very welcome. > > Thanks for your info! > > > > On Thu, 2003-09-04 at 16:16, Tom Lane wrote: > > Mary Edie Meredith <maryedie@osdl.org> writes: > > > Stephan Szabo kindly responded to our earlier queries suggesting we look > > > at default_statistics_target and ALTER TABLE ALTER COLUMN SET > > > STATISTICS. > > > > > These determine the number of bins in the histogram for a given column. > > > But for a large number of rows (for example 6 million) the maximum value > > > (1000) does not guarantee that ANALYZE will do a full scan of the table. > > > We do not see a way to guarantee the same statistics run to run without > > > forcing ANALYZE to examine every row of every table. > > > > Do you actually still have a problem with the plans changing when the > > stats target is above 100 or so? I think the notion of "force ANALYZE > > to do a full scan" is inherently wrongheaded ... it certainly would not > > produce numbers that have anything to do with ordinary practice. > > > > If you have data statistics that are so bizarre that the planner still > > gets things wrong with a target of 1000, then I'd like to know more > > about why. > > > > regards, tom lane > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 2: you can get off all lists at once with the unregister command > > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- > Mary Edie Meredith <maryedie@osdl.org> > Open Source Development Lab > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
pgsql-performance by date: