Home > mailing lists

Re: Bad selectivity estimate when using a sub query to determineWHERE condition - Mailing list pgsql-performance

From	Chris Borckholder
Subject	Re: Bad selectivity estimate when using a sub query to determineWHERE condition
Date	February 12, 2020 08:09:25
Msg-id	CADPUTkTmoCoWnV7rZciY002g=xsDGoxSNpC8kBqWWMJ8L9nD=g@mail.gmail.com Whole thread
In response to	Re: Bad selectivity estimate when using a sub query to determineWHERE condition (Justin Pryzby <pryzby@telsasoft.com>)
List	pgsql-performance

Tree view

Using a column to mark rolled up rows might have been a better choice, but there are unfortunately some regulatory requirements

that require that table to be immutable. I'm not sure about the implications w.r.t. auto vacuum, which is already a consideration for us due to the sheer size of the table.

I'm planning to partition the table as soon as we finish upgrading to v11.

Thanks for your insight!

Best Regards

Chris

On Mon, Feb 10, 2020 at 8:13 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Mon, Feb 10, 2020 at 11:34:01AM +0100, Chris Borckholder wrote:
> I have a large table of immutable events that need to be aggregated
> regularly to derive statistics. To improve the performance, that table is
> rolled up every 15minutes, so that online checks can aggregate rolled up
> data and combine it with latest events created after the last roll up.
>
> To implement this a query is executed that selects only events after the
> time of the last rollup.
> That time is determined dynamically based on a log table.

Perhaps that could be done as an indexed column in the large table, rather
than querying a 2nd log table.
Possibly with a partial index on that column: WHERE unprocessed='t'.

> When using a sub select or CTE to get the latest roll up time, the query
> planner fails to recognize that a most of the large table would be filtered
> out by the condition and tries a sequential scan instead of an index scan.
> When using the literal value for the WHERE condition, the plan correctly
> uses an index scan, which is much faster.
>
> I analyzed the involved tables and increased the collected histogram, but
> the query plan did not improve. Is there a way to help the query planner
> recognize this in the dynamic case?

Also, if you used partitioning with pgostgres since v11, then I think most
partitions would be excluded:

https://www.postgresql.org/docs/12/release-12.html
|Allow partition elimination during query execution (David Rowley, Beena Emerson)
|Previously, partition elimination only happened at planning time, meaning many joins and prepared queries could not use partition elimination.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=499be013de65242235ebdde06adb08db887f0ea5

https://www.postgresql.org/about/featurematrix/detail/332/

Justin

pgsql-performance by date:

From: Amol Tarte
Date: 12 February 2020, 05:28:15
Subject: Re: Writing 1100 rows per second

From: Chris Borckholder
Date: 12 February 2020, 08:12:34
Subject: Re: Bad selectivity estimate when using a sub query to determineWHERE condition

Re: Bad selectivity estimate when using a sub query to determineWHERE condition - Mailing list pgsql-performance

Previous

Next