Vincent van Leeuwen <pgsql.spam@vinz.nl> writes:
> I'm unable to tweak the various _cost settings in such a way that attached
> query will use the right plan.
You aren't going to be able to. You've already overshot a reasonable
random_page_cost setting --- to judge by the relative actual costs of
the merge and hash join, a value somewhere around 3 is appropriate for
your setup. (Assuming I did the math right --- if you set it to 3,
do you get a ratio of merge and hash estimated costs that agrees with
the ratio of actual runtimes?)
The problem here is that the costing of the repeated inner index scans
isn't realistic: 35417 probes into "auth" are clearly taking much less
than 35417 times what a single probe could be expected to take. We
talked about how repeated scans would win from caching of the upper
btree levels, but I think there's more to it than that. It occurs to me
that the probes you are making are probably not random and uncorrelated.
They are driven by the values of reportuser.idreporter ... is it fair
to guess that most of the reportuser rows link to just a small fraction
of the total auth population? If so, the caching could be eliminating
most of the reads, not just the upper btree levels, because we're
mostly hitting only small parts of the index and auth tables.
I'm beginning to think that the only reasonable way to model this is to
cost the entire nestloop join as a unit, so that we have access to
statistics about the outer table as well as the indexed table. That
would give us a shot at estimating how much of the index is likely to
get touched.
As of 7.3 I think all you can do is force nestloop by disabling the
other two join types.
regards, tom lane