On Mon, 05 Apr 2004 15:37:07 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>I wouldn't bother with a GUC variable for the production patch.
Among other things the GUC variable will be thrown out for the final
version.
>> Once a block is selected for inspection, all tuples of this
>> block are accessed to get a good estimation of the live : dead row
>> ratio.
>
>Why are you bothering to compute the live:dead ratio?
That was basically bad wording. It should have been "... to get a good
estimation of live rows per page." Counting dead rows turned out to be
trivial, so I just did it and included the number in the debug messages.
Then it happened to be useful for method 2.
>> Because I was afraid that method 1 might be too expensive in terms of
>> CPU cycles, I implemented a small variation that skips tuples without
>> checking them for visibility; this is sampling_method 2.
>
>And? Does it matter?
There's a clearly visible difference for mid-size relations. I'm not
sure whether this can be attributed to visibility bit updating or other
noise-contributing factors.
Method 2 gives a row count estimation error between 10 and 17% where
method 1 is less than 1% off. (My updates generated dead tuples at very
evenly distributed locations by using conditions like WHERE mod(twenty,
7) = 0).
>If that's as bad as it gets I think we are OK. You should redo the test
>with larger sample size though (try stats target = 100) to see if the
>answer changes.
Will do.
>I find -u diffs close to unreadable for reviewing purposes. Please
>submit diffs in -c format in future.
De gustibus non est disputandum :-)
Fortunately this patch wouldn't look much different. There is just a
bunch of "+" lines.
>AFAICS the rows will *always* be sorted already, and so this qsort is an
>utter waste of cycles. No?
I don't think so. We get the *blocks* in the correct order. But tuples
are still sampled by the Vitter method which starts to replace random
tuples after the pool is filled.
BTW, thanks for the paper!
Servus
Manfred