Re: [GENERAL] how to get accurate values in pg_statistic - Mailing list pgsql-performance

From Bruce Momjian
Subject Re: [GENERAL] how to get accurate values in pg_statistic
Date
Msg-id 200309052049.h85KnCv16810@candle.pha.pa.us
Whole thread Raw
In response to Re: [GENERAL] how to get accurate values in pg_statistic  (Mary Edie Meredith <maryedie@osdl.org>)
Responses Re: [GENERAL] how to get accurate values in pg_statistic
List pgsql-performance
Mary Edie Meredith wrote:
> I certainly don't claim that it is appropriate to force customers into a
> full analysis, particularly if random sampling versus a full scan of the
> data reveals little to no performance differences in the plans.  Being
> able to sample accurately is _very nice for large tables.
>
> For our testing purposes, however, consistent results are extremely
> important. We have observed that small difference in one plan for one of
> 22 queries can cause a difference in the DBT-3 results.  If this
> happens, a small change in performance runs between two Linux kernels
> may appear to be due to the kernels, when in fact it is due to the plan
> change.
>
> We know that the plans are _exactly the same if the data in the
> pg_statistics table is the same from run to run (all other things being
> equal).  So what we need to have is identical optimizer costs
> (pg_statistics) for the same table data for each.
>
> I feel certain that the pg_statistics table will be identical from run
> to run if analyze looks at every row.   Thus our hope to find a way to
> get that.


Actually, if you are usig GEQO (many tables in a join) the optimizer
itself will randomly try plans --- even worse than random statistics.

We do have:

    #geqo_random_seed = -1          # -1 = use variable seed

that lets you force a specific random seed for testing purposes.  I
wonder if that could be extended to control VACUUM radomization too.
Right now, it just controls GEQO and in fact gets reset on every
optimizer run.

I wonder if you could just poke a srandom(10) in
src/backend/command/analyze.c.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-performance by date:

Previous
From: "Relaxin"
Date:
Subject: Re: SELECT's take a long time compared to other DBMS
Next
From: Bruce Momjian
Date:
Subject: Re: [GENERAL] how to get accurate values in pg_statistic