Re: Plan uses wrong index, preferring to scan pkey index instead - Mailing list pgsql-performance

From Yuri Kunde Schlesner
Subject Re: Plan uses wrong index, preferring to scan pkey index instead
Date
Msg-id 1416355354.465985.192652933.31058871@webmail.messagingengine.com
Whole thread Raw
In response to Re: Plan uses wrong index, preferring to scan pkey index instead  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Sun, Nov 16, 2014, at 03:18 PM, Tom Lane wrote:
> I suspect that the reason the planner likes the backlog_pkey is that it's
> almost perfectly correlated with table order, which greatly reduces the
> number of table fetches that need to happen over the course of a
> indexscan
> compared to using the less-well-correlated bufferid+messageid index.
> So that way is estimated to be cheaper than using the less-correlated
> index ... and that may even be true except for outlier bufferid values
> with no recent messages.
Indeed, and I can imagine that this is more advantageous in the general
case, as I described in my last message. The problem is that the
variance is too high, with a 500x slowdown between the best and the
worst cases for that plan.

> What I think might be a workable solution, assuming you can stand a
> little
> downtime to do it, is to CLUSTER the table on the bufferid+messageid
> index.  This would reverse the correlation advantage and thereby solve
> your problem.  Now, ordinarily CLUSTER is only a temporary solution
> because the cluster-induced ordering degrades over time.  But I think it
> would likely be a very long time until you accumulate so many new
> messages
> that the table as a whole looks well-correlated on messageid alone.
I tried this and it seems to have solved my problem! The better plan is
consistently chosen now, and it's as fast as former plan on the fast
cases, and much faster on the slow case. I will continue monitoring the
DB to see if it eventually switches back to the former scheme, and if it
does I can just include a re-cluster on my maintenance schedule. Thanks
so much for the suggestion.


pgsql-performance by date:

Previous
From: Dave Johansen
Date:
Subject: Re: Partitions and work_mem?
Next
From: Stuart Bishop
Date:
Subject: A pessimistic planner