Home > mailing lists

postgres chooses objectively wrong index - Mailing list pgsql-performance

From	Merlin Moncure
Subject	postgres chooses objectively wrong index
Date	March 18 00:01:09
Msg-id	CAHyXU0xkTBW4p2N9QNLwye3O5SgzSGSMiPjFb7P8MyXCrEhhwg@mail.gmail.com Whole thread
Responses	Re: postgres chooses objectively wrong index Re: postgres chooses objectively wrong index
List	pgsql-performance

Tree view

I've been maintaining an airflow style orchestrator in pl/pgsql, and it's revealed a performance issue I just can't solve. There is a table, task, which may normally contain billions of rows, but only a tiny portion is interesting for specific reasons—a common pattern in task-type systems.

CREATE TABLE async.task
(
task_id BIGSERIAL PRIMARY KEY,
target TEXT REFERENCES async.target ON UPDATE CASCADE ON DELETE CASCADE,
priority INT DEFAULT 0,
entered TIMESTAMPTZ DEFAULT clock_timestamp(),
consumed TIMESTAMPTZ,
processed TIMESTAMPTZ,
yielded TIMESTAMPTZ,

times_up TIMESTAMPTZ,

concurrency_pool TEXT
);

CREATE OR REPLACE FUNCTION async.task_execution_state(t async.task)
RETURNS async.task_execution_state_t AS
$$
SELECT
CASE
WHEN t.processed IS NOT NULL THEN 'FINISHED'
WHEN t.consumed IS NULL AND t.yielded IS NULL THEN 'READY'
WHEN t.yielded IS NOT NULL THEN 'YIELDED'
WHEN t.consumed IS NOT NULL AND t.yielded IS NULL THEN 'RUNNING'
END::async.task_execution_state_t;
$$ LANGUAGE SQL IMMUTABLE;

"processed NOT NULL" defines the 'needle', let's say typically <0.01%. Of those cases, a few patterns need defense from a performance standpoint. Naturally, partial indexes are used because we don't want to index the entire table.

/* supports fetching eligible tasks */

CREATE INDEX ON async.task(concurrency_pool, priority, entered)
WHERE async.task_execution_state(task) = 'READY';

/* look up expired tasks. Times up qual is to prevent index being used for
* any other purpose.
*/
CREATE INDEX ON async.task(times_up)
WHERE
async.task_execution_state(task) IN('READY', 'RUNNING', 'YIELDED')
AND times_up IS NOT NULL;

/* supports cleaning up dead tasks on startup and other needs for
* processing unfinished tasks.
*/
CREATE INDEX ON async.task(task_id)
WHERE async.task_execution_state(task) IN('READY', 'RUNNING', 'YIELDED');

These indexes support queries called in a tight loop, for example:

SELECT *

FRROM async.task

WHERE

async.task_execution_state(task.*) = 'READY'::async.task_execution_state_t

AND concurrency_pool = 'xyz'

ORDER BY priority, entered

LIMIT 10;

Usually, we get a plan that looks like this:

Limit (cost=0.38..39.74 rows=10 width=563) (actual time=0.054..0.054 rows=0 loops=1)
-> Index Scan using task_concurrency_pool_priority_entered_idx on task (cost=0.38..705.08 rows=179 width=563) (actual time=0.053..0.053 rows=0 loops=1)
Index Cond: (concurrency_pool = 'xyz'::text)
Planning Time: 0.234 ms
Execution Time: 0.072 ms

Let's note that the partial index predicate exactly matches the where clause, and that the index from left to right matches in terms of equality and ordering. No sorting is required, and the results are excellent. The final costing here is IMNSHO very high: 39.74, and I believe that is the fundamental issue.

Sometimes, based on a certain data distribution, we get results like this:

Limit (cost=25.75..25.78 rows=10 width=563) (actual time=8.909..8.911 rows=0 loops=1)
-> Sort (cost=25.75..26.20 rows=179 width=563) (actual time=8.908..8.909 rows=0 loops=1)
Sort Key: priority, entered
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on task (cost=9.10..21.89 rows=179 width=563) (actual time=8.902..8.903 rows=0 loops=1)
Recheck Cond: ((async.task_execution_state(task.*) = ANY ('{READY,RUNNING,YIELDED}'::async.task_execution_state_t[])) AND (concurrency_pool = 'xyz'::text) AND (async.task_execution_state(task.*) = 'READY'::async.task_execution_state_t))
-> BitmapAnd (cost=9.10..9.10 rows=3 width=0) (actual time=8.883..8.883 rows=0 loops=1)
-> Bitmap Index Scan on task_task_id_idx (cost=0.00..4.38 rows=575191 width=0) (actual time=8.828..8.828 rows=16 loops=1)
-> Bitmap Index Scan on task_concurrency_pool_priority_entered_idx (cost=0.00..4.38 rows=179 width=0) (actual time=0.053..0.053 rows=0 loops=1)
Index Cond: (concurrency_pool = 'xyz'::text)
Planning Time: 0.262 ms
Execution Time: 8.946 ms

In this case, we get an explicit sort and other unnecessary work for a 100x degradation in runtime. My basic issue is that I do not believe any data distribution that allows plan #2 to beat plan #1, given the more specific predicate and index order matching result order. I suspect this is a very long standing issue concerning insufficient weight given to partial indexes, predicate matching, and possibly index-supported sorting. I've dealt with some variant of this problem for many years.

Sometimes, there can be even worse plans, running into 10-20 seconds, for a ~ 10 order of magnitude miss. I can manage this at the query level by:

* turning off various planner directives, heap scan, etc

* adding faux columns to the table to support forcing index selection in particular cases (CREATE INDEX ON foo WHERE this_case IS NULL....SELECT * FROM foo WHERE this_case IS NULL...)

I'm wondering if there are other tricks that might apply here, for example, multi column index statistics...curious if anyone has thoughts on that.

Any suggestions?

merlin

pgsql-performance by date:

From: Merlin Moncure
Date: 11 March, 02:07:30
Subject: Re: Postgres IO sweet spot

From: Alexey Ermakov
Date: 18 March, 01:16:30
Subject: Re: postgres chooses objectively wrong index

postgres chooses objectively wrong index - Mailing list pgsql-performance

Previous

Next