Jenny Zhang <jenny@osdl.org> writes:
> ... It seems to me that small
> effective_cache_size favors the choice of nested loop joins (NLJ)
> while the big effective_cache_size is in favor of merge joins (MJ).
No, I wouldn't think that, because a nestloop plan will involve repeated
fetches of the same tuples whereas a merge join doesn't (at least not
when it sorts its inner input, as this plan does). Larger cache
improves the odds of a repeated fetch not having to do I/O. In practice
a larger cache area would also have some effects on access costs for the
sort's temp file, but I don't think the planner's cost model for sorting
takes that into account.
As Matt Clark points out nearby, the real question is whether these
planner estimates have anything to do with reality. EXPLAIN ANALYZE
results would be far more interesting than plain EXPLAIN.
> However, within the same run set consist of 6 runs, we see 2-3%
> standard deviation for the run metrics associated with the multiple
> stream part of the test (as opposed to the single stream part).
<python> Och, laddie, we useta *dream* of 2-3% variation </python>
> We would like to reduce the variation to be less than 1% so that a
> 2% change between two different kernels would be significant.
I think this is a pipe dream. Variation in where the data gets laid
down on your disk drive would alone create more than that kind of delta.
I'm frankly amazed you could get repeatability within 2-3%.
regards, tom lane