[GENERAL] how to get accurate values in pg_statistic (continued) - Mailing list pgsql-performance

From Mary Edie Meredith
Subject [GENERAL] how to get accurate values in pg_statistic (continued)
Date
Msg-id 1062697270.6854.3475.camel@ibm-e.pdx.osdl.net
Whole thread Raw
Responses Re: [GENERAL] how to get accurate values in pg_statistic
Re: [GENERAL] how to get accurate values in pg_statistic (continued)
List pgsql-performance
Our port of OSDL DBT3 test suite to PostgreSQL (see Background
information below) is nearing completion.  We would also like to confirm
our understanding of an outstanding consistency issue.

We have not been able to do meaningful kernel testing since the runs
(all parameters/kernels being equal) arewildly varying - sometimes
20-25% differences in the metrics run to run.

We found plans were changing from test run to test run. In one case a
plan ran 20 minutes in the throughput test of one run, and 2 seconds in
another run!  By forcing the contents of pg_statistics to be the same
before the queries run, we have consistent results now. So we know for
sure the problem is due to the random nature of the stats sampling: the
optimizer always saw different stats data resulting in different plans.

Stephan Szabo kindly responded to our earlier queries suggesting we look
at default_statistics_target and ALTER TABLE ALTER COLUMN SET
STATISTICS.

These determine the number of bins in the histogram for a given column.
But for a large number of rows (for example 6 million) the maximum value
(1000) does not guarantee that ANALYZE will do a full scan of the table.
We do not see a way to guarantee the same statistics run to run without
forcing ANALYZE to examine every row of every table.

Are we wrong in our analysis?

Are there main-stream alternatives we have missed?

How do you do testing on large tables and make the execution plans
consistent?

Is there a change to ANALYZE in 7.4 that solves our problem?

TIA.


********************************************************************
Background information:

Database Test 3 (DBT-3) is a decision support workload.

The test kit itself has been executing on PostgreSQL for some time, is
available on sourceforge, and is implemented on our Scalable Test
Platform (STP).


A bit of background:  The test
(1) builds a database from load files, gathers statistics,
(2) runs a single stream of 22 queries plus a set of inserts and deletes
(the power test), then
(3) runs a multiple stream of the queries with one added stream of
inserts/deletes (the throughput test).


--
Mary Edie Meredith <maryedie@osdl.org>
Open Source Development Lab


pgsql-performance by date:

Previous
From: "Relaxin"
Date:
Subject: Re: SELECT's take a long time compared to other DBMS
Next
From: William Yu
Date:
Subject: Re: SELECT's take a long time compared to other DBMS