"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
> where chat_post_timeuser_idx is defined on the columns (time,poster_id)
> and chat_post_usertime_idx is defined on the columns (poster_id,time)
> Why is the planner not choosing the user_time index [for]
> avid_chat_archive=> explain analyze select * from chat_post where poster_id = '1600' order by time desc limit 2;
> NOTICE: QUERY PLAN:
> Limit (cost=0.00..32.40 rows=2 width=46) (actual time=96204.53..96204.71 rows=2 loops=1)
> -> Index Scan Backward using chat_post_time_idx on chat_post (cost=0.00..42370.93 rows=2616 width=46) (actual
time=96204.49..96204.64rows=3 loops=1)
> Total runtime: 96205.18 msec
If you'd said "order by poster_id desc, time desc" then that index would be
considered to match the ORDER BY clause, and so would be usable in this
same type of plan. As-is, the index is only useful for matching
poster_id and not for obtaining the required order, so the only plan
type considered for it involves an explicit sort step, which isn't
considered a win for the estimated number of rows matching the poster_id.
> My plan now is to maintain my own set of poster_id stats and use one
> of several query variants depending on what they say but this requires
> at least some understanding of the choices made by the planner.
Rather than maintaining your own stats, consider boosting the statistics
target for the poster_id column. You probably want the pg_stats info to
cover all the poster_ids that account for more than 1% of the entries.
The n_distinct value should improve too, producing a better estimate for
the infrequent poster_ids even though they're not explicitly stored.
regards, tom lane