On Thu, Jul 15, 2004 at 12:52:38AM -0400, Tom Lane wrote:
> No, it's not missing anything. The number being reported here is the
> number of rows pulled from the plan node --- but this plan node is on
> the inside of a merge join, and one of the properties of merge join is
> that it will do partial rescans of its inner input in the presence of
> equal keys in the outer input. If you have, say, 10 occurrences of
> "42" in the outer input, then any "42" rows in the inner input have to
> be rescanned 10 times. EXPLAIN ANALYZE will count each of them as 10
> rows returned by the input node.
OK, that makes sense, although it seems to me as is loops= should have been
something larger than 1 if the data was scanned multiple times.
> The large multiple here (80-to-one overscan) says that you've got a lot
> of duplicate values in the outer input. This is generally a good
> situation to *not* use a mergejoin in ;-). We do have some logic in the
> planner that attempts to estimate the extra cost involved in such
> rescanning, but I'm not sure how accurate the cost model is.
Hum, I'm not sure if I'm in the termiology here -- "outer input" in "A left
join B" is A, right? But yes, I do have a lot of duplicates, that seems to
match my data well.
> Raising shared_buffers seems unlikely to help. I do agree with raising
> sort_mem --- not so much to make the merge faster as to encourage the
> thing to try a hash join instead.
sort_mem is already 16384, which I thought would be plenty -- I tried
increasing it to 65536 which made exactly zero difference. :-)
/* Steinar */
--
Homepage: http://www.sesse.net/