Home > mailing lists

Re: BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit - Mailing list pgsql-bugs

From	Jeff Janes
Subject	Re: BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit
Date	April 7 02:25:08
Msg-id	CAMkU=1zMSpTNXzD2ERwyO+GyPRRA1-Moe1KW2gqN1GsdyFnGhA@mail.gmail.com Whole thread Raw
In response to	BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit (PG Bug reporting form <noreply@postgresql.org>)
Responses	Re: BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit
List	pgsql-bugs

Tree view

On Sat, Apr 6, 2024 at 5:44 PM PG Bug reporting form <noreply@postgresql.org> wrote:

For example, our "user" table has an id primary key, an "org_id" column and
a "disabled" column. The table has millions of rows and for each org_id
there is only usually a few hundred rows.

It would be helpful to know precisely how many millions of rows. We know it actually removed 596003 rows from the ordered index scan, but we don't know how many it thought it would need to remove. I reckon it thought it would remove # in table / 837, but I don't know what that division comes out to, not knowing the numerator.

-> Index Scan using user_org_disabled_idx on user
(cost=0.43..3141.43 rows=837 width=236) (actual time=0.049..1.407 rows=166
loops=1)

So this estimate is quite wrong, 837/166 = 5. Do you know why? This bad estimate makes this plan look 5 times too expensive, and the competing one look 5 times too cheap, for a ratio of 25. That is more than the current ratio between the two plan cost estimates, so fixing this could drive the difference. (The ratio of actual times is more than 25, so there is more to the problem than just this, but fixing this alone should be enough to drive the correct choice). So why is this estimate that bad? Is the selectivity estimate of `org_id = 123456` alone that bad, or is it only when combined with `disabled=false`?

A more robust solution is to add an index on (org_id, disabled, id). That way it can combine the two strategies, jumping to just the part of the index it needs and then reading it already in order. Not only will this be much faster than either of the two plans you show, it will also be more resilient to estimation errors.

Anyway, these just look like well-known estimation difficulties, nothing which seems like an actual bug. Estimation is hard and sometimes there is no way to know the correct value to use until after the query is already underway.

Cheers,

Jeff

pgsql-bugs by date:

From: Noah Misch
Date: 07 April, 01:30:37
Subject: Re: FSM Corruption (was: Could not read block at end of the relation)

From: Noah Misch
Date: 07 April, 04:01:46
Subject: Re: BUG #18404: Select from pg_stat_activity in a plpgsql block can lead to a global locking issue

Re: BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit - Mailing list pgsql-bugs

Previous

Next