Ahh, thanks Tom.
I hadn't seen your email before I posted my own followup.
I guess the clustering approach managed to work around the need to mess with the statistics target. I did previously
increasethe target to 1000 (from 100) for that field and had no impact, but this is an aspect of tuning I'm not so
familiarwith - I didn't consider pushing it all the way to 11.
On 11 Aug 2013, at 00:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Timothy Kane <tim.kane@gmail.com> writes:
>> I seem to be having some grief with the 9.1.9 query planner favouring an index scan + merge join, over a sequential
scan+ hash join.
>
> I believe the reason it's preferring the merge join plan is that it thinks
> the executor will be able to terminate the merge join early as a
> consequence of the range of join keys in "addresses" being only a fraction
> of the range of join keys in "users". Notice that the total estimated
> cost for the merge join is just a fraction of the full estimated cost of
> the indexscan on "users"; the only way that's possible is if the indexscan
> on "users" doesn't have to run through all of the table. Probably, the
> range of join keys is wider than the planner thinks and so the merge join
> can't terminate early. The fix therefore is to crank the stats target for
> "addresses" up high enough that you get a reasonable value in pg_statistic
> for the largest address_id value (look at the last histogram entry).
>
>> Interestingly, on another instance of this same database running on postgres 8.3.8, the query planner correctly
choosesthe sequential scan method - having more sane cost estimates for the index scan method.
>
> I think the 8.3 planner didn't take this effect into account. Or maybe it
> did, but by chance the upper histogram value is closer to reality on the
> older database.
>
> regards, tom lane