Re: postgres chooses objectively wrong index - Mailing list pgsql-performance

From Tom Lane
Subject Re: postgres chooses objectively wrong index
Date
Msg-id 574349.1773786268@sss.pgh.pa.us
Whole thread Raw
In response to postgres chooses objectively wrong index  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: postgres chooses objectively wrong index
List pgsql-performance
Merlin Moncure <mmoncure@gmail.com> writes:
> I've been maintaining an airflow style orchestrator in pl/pgsql, and it's
> revealed a performance issue I just can't solve.  There is a table, task,
> which may normally contain billions of rows, but only a tiny portion is
> interesting for specific reasons—a common pattern in task-type systems.
> ...

> Usually, we get a plan that looks like this:

>  Limit  (cost=0.38..39.74 rows=10 width=563) (actual time=0.054..0.054 rows=0 loops=1)
>    ->  Index Scan using task_concurrency_pool_priority_entered_idx on task  (cost=0.38..705.08 rows=179 width=563)
(actualtime=0.053..0.053 rows=0 loops=1) 

> Sometimes, based on a certain data distribution, we get results like this:

> Limit  (cost=25.75..25.78 rows=10 width=563) (actual time=8.909..8.911 rows=0 loops=1)
>   ->  Sort  (cost=25.75..26.20 rows=179 width=563) (actual time=8.908..8.909 rows=0 loops=1)

I think the fundamental problem here is that the planner is estimating
179 matching rows when the true count is 0.  Getting that estimate
down by, say, an order of magnitude would probably fix your issue.
However, if the selectivity is already epsilon (are there really
billions of rows?) it may be hard to get it down to a smaller epsilon.
What statistics target are you using?

How often do tasks change state?  Could it be reasonable to partition
the task table on state, rather than rely on an index?

            regards, tom lane



pgsql-performance by date:

Previous
From: Alexey Ermakov
Date:
Subject: Re: postgres chooses objectively wrong index
Next
From: Merlin Moncure
Date:
Subject: Re: postgres chooses objectively wrong index