On Sat, 2006-03-25 at 16:24 +0100, Martijn van Oosterhout wrote:
> I agree. However, if it's the overhead of calling gettimeofday() that
> slows everything down, perhaps we should tackle that end. For example,
> have a sampling mode that only times say 5% of the executed nodes.
>
> EXPLAIN ANALYZE SAMPLE blah;
I like this idea. Why not do this all the time? I'd say we don't need
the SAMPLE clause at all, just do this for all EXPLAIN ANALYZEs.
> And then in InstrStart have a quick test that skips the gettimeofday
> for this interation sometimes. You'd probably need some heuristics
> because you always want to catch the first iteration but after the
> 10,000th tuple in an indexscan, you're probably not going to learn
> anything new.
> How does this sound?
Something even simpler? First 40 plus 5% random sample after that? I'd
prefer a random sample so we have the highest level of trust in the
numbers produced. Otherwise we might accidentally introduce bias from
systematic effects such as nested loops queries speeding up towards the
end of their run. (I know we would do that at the start, but we are
stuck because we don't know the population size ahead of time and we
know we need a reasonable number of data points).
Best Regards, Simon Riggs