Re: Nested loop vs merge join: inconsistencies between estimated and actual time - Mailing list pgsql-performance

From Tom Lane
Subject Re: Nested loop vs merge join: inconsistencies between estimated and actual time
Date
Msg-id 7386.1204871746@sss.pgh.pa.us
Whole thread Raw
In response to Nested loop vs merge join: inconsistencies between estimated and actual time  (Vlad Arkhipov <arhipov@dc.baikal.ru>)
Responses Re: Nested loop vs merge join: inconsistencies between estimated and actual time  (Vlad Arkhipov <arhipov@dc.baikal.ru>)
List pgsql-performance
Vlad Arkhipov <arhipov@dc.baikal.ru> writes:
> I've came across this issue while writing report-like query for 2 not
> very large tables. I've tried several methods to resolve this one (see
> below). But now I'm really stuck...

It looks like you are wishing to optimize for all-in-memory situations,
in which case the traditional advice is to reduce random_page_cost to
something close to 1.  AFAICS all the rowcount estimates you're seeing
are spot on, or as close to spot on as you could realistically hope for,
and so the problem lies with the cost parameters.  Fooling with the
statistics is not going to help if the rowcount estimates are already
good.

(Note: the apparent undercounts you're seeing on indexscans on the outer
side of a mergejoin seem to be because the mergejoin terminates early
due to limited range of the other input join key.  The planner is
expecting this, as we can see because the predicted cost of the join is
actually much less than the predicted cost of running the input
indexscan to completion.  The cost ratio is about consistent with the
rowcount ratio, which makes me think it got these right too.)

            regards, tom lane

pgsql-performance by date:

Previous
From: Vlad Arkhipov
Date:
Subject: Nested loop vs merge join: inconsistencies between estimated and actual time
Next
From: "Pavel Rotek"
Date:
Subject: Toast space grows