Re: postgres chooses objectively wrong index - Mailing list pgsql-performance
| From | Merlin Moncure |
|---|---|
| Subject | Re: postgres chooses objectively wrong index |
| Date | |
| Msg-id | CAHyXU0zhdKYDvt_iON5kVSa4AXa=p_vLP1JMs3u0FFA29rP-aQ@mail.gmail.com Whole thread Raw |
| In response to | Re: postgres chooses objectively wrong index (Alexey Ermakov <alexius.work@gmail.com>) |
| Responses |
Re: postgres chooses objectively wrong index
Re: postgres chooses objectively wrong index |
| List | pgsql-performance |
On Tue, Mar 17, 2026 at 11:27 PM Alexey Ermakov <alexius.work@gmail.com> wrote:
On 2026-03-18 04:52, Merlin Moncure wrote:Hello. I think planner doesn't have information about distribution of async.task_execution_state(task) unless it's part of any full index. I would try to give that with extended statistics (postgresql 14+):create statistics (mcv) task_task_execution_state_stat on ((async.task_execution_state(task))) from async.task; analyze async.task;If that won't help - please show distribution from pg_stats_ext view for extended statistic above.
This unfortunately fails, probably because the table type includes system columns (despite not using them).orchestrator_service_user@orchestrator=> create statistics task_stats (mcv) on (async.task_execution_state(task)) from async.task;
ERROR: statistics creation on system columns is not supportedThis would require some refactoring to fix.Interesting... In that case functional index should help (as it also makes statistic for the planner):
create index concurrently on task_task_execution_state_idx async.task using btree ((async.task_execution_state(task)));analyze async.task;
This can't help performance, as the index...
CREATE INDEX ON async.task(concurrency_pool, priority, entered)
WHERE async.task_execution_state(task) = 'READY';
WHERE async.task_execution_state(task) = 'READY';
...is very precisely configured to provide exactly what's needed; I need tasks for that exact pool in that exact order if and only if ready. The partial predicate is designed to keep the index nice and small since only a small portion of tasks would be eligible at any specific time.
@Tom Lane I'm pretty sure you were following me, but my abstraction earlier was a bit off;
Simplified, the query is:
SELECT * FROM foo WHERE a=? AND b=K ORDER BY c, d LIMIT N;
CREATE INDEX ON foo(a,b,c) WHERE b=K;
Should have been:
SELECT * FROM foo WHERE a=? AND d=K ORDER BY b, c LIMIT N;
CREATE INDEX ON foo(a,b,c) WHERE d=K;
Point being, the index match in on (=, order, order). If a contains any less than 100% of the total records, and N is small relative to table size, this ought to be the ideal index for just about any case, the exact match on partial qual is just gravy.
I think the planner is not giving enough bonus for an exact match versus an inexact match on partial index mathcing, (A=A should be better than A IN(A,B,C)), and it's unclear why the planner things bitmap heap + sort is outperforming a raw read off the index base on marginal estimated row counts. Lowering random_page_cost definitely biases the plan I like, but it skews both estimates.
If you're interested in more context, see:
merlin
pgsql-performance by date: