Re: tweaking costs to favor nestloop - Mailing list pgsql-performance

From Tom Lane
Subject Re: tweaking costs to favor nestloop
Date
Msg-id 23557.1055362673@sss.pgh.pa.us
Whole thread Raw
In response to tweaking costs to favor nestloop  (Vincent van Leeuwen <pgsql.spam@vinz.nl>)
Responses Re: tweaking costs to favor nestloop
Re: tweaking costs to favor nestloop
List pgsql-performance
Vincent van Leeuwen <pgsql.spam@vinz.nl> writes:
> I'm unable to tweak the various _cost settings in such a way that attached
> query will use the right plan.

You aren't going to be able to.  You've already overshot a reasonable
random_page_cost setting --- to judge by the relative actual costs of
the merge and hash join, a value somewhere around 3 is appropriate for
your setup.  (Assuming I did the math right --- if you set it to 3,
do you get a ratio of merge and hash estimated costs that agrees with
the ratio of actual runtimes?)

The problem here is that the costing of the repeated inner index scans
isn't realistic: 35417 probes into "auth" are clearly taking much less
than 35417 times what a single probe could be expected to take.  We
talked about how repeated scans would win from caching of the upper
btree levels, but I think there's more to it than that.  It occurs to me
that the probes you are making are probably not random and uncorrelated.
They are driven by the values of reportuser.idreporter ... is it fair
to guess that most of the reportuser rows link to just a small fraction
of the total auth population?  If so, the caching could be eliminating
most of the reads, not just the upper btree levels, because we're
mostly hitting only small parts of the index and auth tables.

I'm beginning to think that the only reasonable way to model this is to
cost the entire nestloop join as a unit, so that we have access to
statistics about the outer table as well as the indexed table.  That
would give us a shot at estimating how much of the index is likely to
get touched.

As of 7.3 I think all you can do is force nestloop by disabling the
other two join types.

            regards, tom lane

pgsql-performance by date:

Previous
From: Vincent van Leeuwen
Date:
Subject: tweaking costs to favor nestloop
Next
From: Tom Lane
Date:
Subject: Re: left join performance problem