AW: Call for alpha testing: planner statistics revision s - Mailing list pgsql-hackers

From Zeugswetter Andreas SB
Subject AW: Call for alpha testing: planner statistics revision s
Date
Msg-id 11C1E6749A55D411A9670001FA68796336832B@sdexcsrv1.f000.d0188.sd.spardat.at
Whole thread Raw
Responses Re: AW: Call for alpha testing: planner statistics revision s  (Alex Pilosov <alex@pilosoft.com>)
Re: AW: Call for alpha testing: planner statistics revision s  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
First of all thanks for the great effort, it will surely be appreciated :-)

> * On large tables, ANALYZE uses a random sample of rows rather than
> examining every row, so that it should take a reasonably short time
> even on very large tables.  Possible downside: inaccurate stats.
> We need to find out if the sample size is large enough.

Imho that is not optimal :-) ** ducks head, to evade flying hammer **
1. the random sample approach should be explicitly requested with some 
syntax extension
2. the sample size should also be tuneable with some analyze syntax 
extension (the dba chooses the tradeoff between accuracy and runtime)
3. if at all, an automatic analyze should do the samples on small tables,
and accurate stats on large tables

The reasoning behind this is, that when the optimizer does a "mistake"
on small tables the runtime penalty is small, and probably even beats
the cost of accurate statistics lookup. (3 page table --> no stats 
except table size needed)

When on the other hand the optimizer does a "mistake" on a huge table
the difference is easily a matter of hours, thus you want accurate stats.

Because we do not want the dba to decide which statistics are optimal,
there should probably be an analyze helper application that is invoked
with "vacuum analyze database optimal" or some such, that also decides 
whether a table was sufficiently altered to justify new stats gathering
or vacuum. The decision, what to do may also be based on a runtime limit, 
that the dba specifies ("do the most important stats/vacuums you can do 
within ~3 hours"). 

These points are also based on experience with huge SAP/R3 installations
and the way statistics are gathered there.

Andreas


pgsql-hackers by date:

Previous
From: Zeugswetter Andreas SB
Date:
Subject: AW: RE: Row Versioning, for jdbc updateable result sets
Next
From: Zeugswetter Andreas SB
Date:
Subject: AW: timestamp with/without time zone