Re: postgres chooses objectively wrong index - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: postgres chooses objectively wrong index
Date
Msg-id CAHyXU0zhdKYDvt_iON5kVSa4AXa=p_vLP1JMs3u0FFA29rP-aQ@mail.gmail.com
Whole thread Raw
In response to Re: postgres chooses objectively wrong index  (Alexey Ermakov <alexius.work@gmail.com>)
Responses Re: postgres chooses objectively wrong index
Re: postgres chooses objectively wrong index
List pgsql-performance
On Tue, Mar 17, 2026 at 11:27 PM Alexey Ermakov <alexius.work@gmail.com> wrote:
On 2026-03-18 04:52, Merlin Moncure wrote:
Hello. I think planner doesn't have information about distribution of async.task_execution_state(task) unless it's part of any full index. I would try to give that with extended statistics (postgresql 14+):
create statistics (mcv) task_task_execution_state_stat on ((async.task_execution_state(task))) from async.task;
analyze async.task;

If that won't help - please show distribution from pg_stats_ext view for extended statistic above.


This unfortunately fails, probably because the table type includes system columns (despite not using them).
 
orchestrator_service_user@orchestrator=> create statistics  task_stats (mcv) on (async.task_execution_state(task)) from async.task;
ERROR:  statistics creation on system columns is not supported

This would require some refactoring to fix.

Interesting... In that case functional index should help (as it also makes statistic for the planner):

create index concurrently on task_task_execution_state_idx async.task using btree ((async.task_execution_state(task)));
analyze async.task;

This can't help performance, as the index...
CREATE INDEX ON async.task(concurrency_pool, priority, entered)
WHERE async.task_execution_state(task) = 'READY';

...is very precisely configured to provide exactly what's needed; I need tasks for that exact pool in that exact order if and only if ready.  The partial predicate is designed to keep the index nice and small since only a small portion of tasks would be eligible at any specific time.

@Tom Lane I'm pretty sure you were following me, but my abstraction earlier was a bit off;

Simplified, the query is:
SELECT * FROM foo WHERE a=? AND b=K ORDER BY c, d LIMIT N;
CREATE INDEX ON foo(a,b,c) WHERE b=K;

Should have been:
SELECT * FROM foo WHERE a=? AND d=K ORDER BY b, c LIMIT N;
CREATE INDEX ON foo(a,b,c) WHERE d=K;

Point being, the index match in on (=, order, order). If a contains any less than 100% of the total records, and N is small relative to table size, this ought to be the ideal index for just about any case, the exact match on partial qual is just gravy.

I think the planner is not giving enough bonus for an exact match versus an inexact match on partial index mathcing,  (A=A should be better than A IN(A,B,C)), and it's unclear why the planner things bitmap heap + sort is outperforming a raw read off the index base on marginal estimated row counts.  Lowering random_page_cost definitely biases the plan I like, but it skews both estimates.


If you're interested in more context, see:


merlin

 

pgsql-performance by date:

Previous
From: Alexey Ermakov
Date:
Subject: Re: postgres chooses objectively wrong index
Next
From: Alexey Ermakov
Date:
Subject: Re: postgres chooses objectively wrong index