Greg Stark <gsstark@mit.edu> writes:
> I'm not about to run analyze in the middle of the data generation
> (which wouldn't work anyways since it's in a transaction).
Since 7.3 or 7.4, you *can* run ANALYZE in the middle of a transaction.
The cached-plan business is a problem, I agree, but I think it's
orthogonal to this particular discussion (and you can always use EXECUTE
if you have to).
>> In the absence of any ANALYZE data the tuples-per-page estimate might be
>> pretty bogus, but it couldn't be off by more than an order of magnitude or
>> so either way.
> I don't see why it couldn't. If you have a table badly in need of vacuuming
> (or had one at the time of the last analyze) it could be off by way more than
> an order of magnitude.
Well, I was actually thinking of the physical tuples-per-page stat
(perhaps better expressed as an average tuple size), but you are right
that the fraction of dead tuples is also something to think about.
We don't model that explicitly ATM but maybe we should. The original
VACUUM-based stats code couldn't really do much with it, since VACUUM
would leave no dead tuples behind in the first place; but separate
ANALYZE could definitely make an estimate of the fraction of dead tuples.
> Ideally I would want a guarantee that every query would *always*
> result in the same plan. Once I've tested them and approved the plans
> I want to know that only those approved plans will ever run, and I
> want to be present and be able to verify new plans before they go into
> production.
> I doubt I'm going to convince anyone today,
Nope, you aren't. The above seems to me to be a recipe for degradation
of performance over time, precisely because the plans wouldn't change in
the face of changes in the situation. I've resisted adding "planner
hints" to the language for this reason, and I'm certainly not eager to
offer any hard guarantees about plans not changing.
regards, tom lane