Home > mailing lists

Re: Odd (slow) plan choice with min/max - Mailing list pgsql-performance

From	Rick Otten
Subject	Re: Odd (slow) plan choice with min/max
Date	March 23, 2021 13:07:11
Msg-id	CAMAYy4JNU6hSJZc-gQZwKqF1Z2z-jH00ySg+u959gbQZ3GJvKw@mail.gmail.com Whole thread
In response to	Re: Odd (slow) plan choice with min/max (Paul McGarry <paul@paulmcgarry.com>)
Responses	Re: Odd (slow) plan choice with min/max
List	pgsql-performance

Tree view

On Tue, Mar 23, 2021 at 2:52 AM Paul McGarry <paul@paulmcgarry.com> wrote:

On Tue, 23 Mar 2021 at 16:13, Justin Pryzby <pryzby@telsasoft.com> wrote:
On Tue, Mar 23, 2021 at 03:00:38PM +1100, Paul McGarry wrote:
> I have a query where Postgresql (11.9 at the moment) is making an odd plan
> choice, choosing to use index scans which require filtering out millions of
> rows, rather than "just" doing an aggregate over the rows the where clause
> targets which is much faster.
> AFAICT it isn't a statistics problem, at least increasing the stats target
> and analyzing the table doesn't seem to fix the problem.

> explain analyze select min(risk_id),max(risk_id) from risk where
> time>='2020-01-20 15:00:07+00' and time < '2020-01-21 15:00:08+00';

I'm guessing the time and ID columns are highly correlated...

So the planner thinks it can get the smallest ID by scanning the ID index, but
then ends up rejecting the first 161e6 rows for which the time is too low, and
fails the >= condition.

And thinks it can get the greatest ID by backward scanning the ID idx, but ends
up rejecting/filtering the first 41e6 rows, for which the time is too high,
failing the < condition.

Yes, the columns are highly correlated, but that alone doesn't seem like it should be sufficient criteria to choose this plan.
Ie the selection criteria (1 day of data about a year ago) has a year+ worth of data after it and probably a decade of data before it, so anything walking a correlated index from top or bottom is going to have to walk past a lot of data before it gets to data that fits the criteria.

I assume you have a statistic on the correlated columns, ie `create statistic` ?

If you can't use partitions on your date column, can you use partial indexes instead? Or a functional index with min() over day and max() over day?

pgsql-performance by date:

From: Paul McGarry
Date: 23 March 2021, 06:51:49
Subject: Re: Odd (slow) plan choice with min/max

From: Chris Stephens
Date: 23 March 2021, 15:21:55
Subject: Re: SQL performance issue (postgresql chooses a bad plan when a better one is available)

Re: Odd (slow) plan choice with min/max - Mailing list pgsql-performance

Previous

Next