postgres chooses objectively wrong index - Mailing list pgsql-performance

From Merlin Moncure
Subject postgres chooses objectively wrong index
Date
Msg-id CAHyXU0xkTBW4p2N9QNLwye3O5SgzSGSMiPjFb7P8MyXCrEhhwg@mail.gmail.com
Whole thread
Responses Re: postgres chooses objectively wrong index
Re: postgres chooses objectively wrong index
List pgsql-performance
I've been maintaining an airflow style orchestrator in pl/pgsql, and it's revealed a performance issue I just can't solve.  There is a table, task, which may normally contain billions of rows, but only a tiny portion is interesting for specific reasons—a common pattern in task-type systems.

CREATE TABLE async.task
(
  task_id BIGSERIAL PRIMARY KEY,
  target TEXT REFERENCES async.target ON UPDATE CASCADE ON DELETE CASCADE,
  priority INT DEFAULT 0,
  entered TIMESTAMPTZ DEFAULT clock_timestamp(),
  consumed TIMESTAMPTZ,
  processed TIMESTAMPTZ,
  yielded TIMESTAMPTZ,
  times_up TIMESTAMPTZ,
  concurrency_pool TEXT
);

CREATE OR REPLACE FUNCTION async.task_execution_state(t async.task)
  RETURNS async.task_execution_state_t AS
$$
  SELECT
    CASE
      WHEN t.processed IS NOT NULL THEN 'FINISHED'
      WHEN t.consumed IS NULL AND t.yielded IS NULL THEN 'READY'
      WHEN t.yielded IS NOT NULL THEN 'YIELDED'
      WHEN t.consumed IS NOT NULL AND t.yielded IS NULL THEN 'RUNNING'
    END::async.task_execution_state_t;
$$ LANGUAGE SQL IMMUTABLE;

"processed NOT NULL" defines the 'needle', let's say typically <0.01%.   Of those cases, a few patterns need defense from a performance standpoint. Naturally, partial indexes are used because we don't want to index the entire table.

/* supports fetching eligible tasks */
CREATE INDEX ON async.task(concurrency_pool, priority, entered)
WHERE async.task_execution_state(task) = 'READY';

/* look up expired tasks.  Times up qual is to prevent index being used for
 * any other purpose.
 */
CREATE INDEX ON async.task(times_up)
  WHERE
    async.task_execution_state(task) IN('READY', 'RUNNING', 'YIELDED')
    AND times_up IS NOT NULL;

/* supports cleaning up dead tasks on startup and other needs for
 * processing unfinished tasks.
 */
CREATE INDEX ON async.task(task_id)
  WHERE async.task_execution_state(task) IN('READY', 'RUNNING', 'YIELDED');

These indexes support queries called in a tight loop, for example:

SELECT *
FRROM async.task 
WHERE 
  async.task_execution_state(task.*) = 'READY'::async.task_execution_state_t  
  AND concurrency_pool = 'xyz' 
ORDER BY priority, entered 
LIMIT 10;

Usually, we get a plan that looks like this: 

 Limit  (cost=0.38..39.74 rows=10 width=563) (actual time=0.054..0.054 rows=0 loops=1)
   ->  Index Scan using task_concurrency_pool_priority_entered_idx on task  (cost=0.38..705.08 rows=179 width=563) (actual time=0.053..0.053 rows=0 loops=1)
         Index Cond: (concurrency_pool = 'xyz'::text)
 Planning Time: 0.234 ms
 Execution Time: 0.072 ms

Let's note that the partial index predicate exactly matches the where clause, and that the index from left to right matches in terms of equality and ordering.  No sorting is required, and the results are excellent.  The final costing here is IMNSHO very high: 39.74, and I believe that is the fundamental issue.

Sometimes, based on a certain data distribution, we get results like this:

Limit  (cost=25.75..25.78 rows=10 width=563) (actual time=8.909..8.911 rows=0 loops=1)
  ->  Sort  (cost=25.75..26.20 rows=179 width=563) (actual time=8.908..8.909 rows=0 loops=1)
        Sort Key: priority, entered
        Sort Method: quicksort  Memory: 25kB
        ->  Bitmap Heap Scan on task  (cost=9.10..21.89 rows=179 width=563) (actual time=8.902..8.903 rows=0 loops=1)
              Recheck Cond: ((async.task_execution_state(task.*) = ANY ('{READY,RUNNING,YIELDED}'::async.task_execution_state_t[])) AND (concurrency_pool = 'xyz'::text) AND (async.task_execution_state(task.*) = 'READY'::async.task_execution_state_t))
              ->  BitmapAnd  (cost=9.10..9.10 rows=3 width=0) (actual time=8.883..8.883 rows=0 loops=1)
                    ->  Bitmap Index Scan on task_task_id_idx  (cost=0.00..4.38 rows=575191 width=0) (actual time=8.828..8.828 rows=16 loops=1)
                    ->  Bitmap Index Scan on task_concurrency_pool_priority_entered_idx  (cost=0.00..4.38 rows=179 width=0) (actual time=0.053..0.053 rows=0 loops=1)
                          Index Cond: (concurrency_pool = 'xyz'::text)
Planning Time: 0.262 ms
Execution Time: 8.946 ms

In this case, we get an explicit sort and other unnecessary work for a 100x degradation in runtime.  My basic issue is that I do not believe any data distribution that allows plan #2 to beat plan #1, given the more specific predicate and index order matching result order.  I suspect this is a very long standing issue concerning insufficient weight given to partial indexes, predicate matching, and possibly index-supported sorting.  I've dealt with some variant of this problem for many years.

Sometimes, there can be even worse plans, running into 10-20 seconds, for a ~ 10 order of magnitude miss.  I can manage this at the query level by:
* turning off various planner directives, heap scan, etc
* adding faux columns to the table to support forcing index selection in particular cases (CREATE INDEX ON foo WHERE this_case IS NULL....SELECT * FROM foo WHERE this_case IS NULL...)

I'm wondering if there are other tricks that might apply here, for example, multi column index statistics...curious if anyone has thoughts on that.

Any suggestions?

merlin





pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Postgres IO sweet spot
Next
From: Alexey Ermakov
Date:
Subject: Re: postgres chooses objectively wrong index