Re: Query Slow After 2018 - Mailing list pgsql-performance

From Justin Pryzby
Subject Re: Query Slow After 2018
Date
Msg-id 20180128175110.GA18115@telsasoft.com
Whole thread Raw
In response to Query Slow After 2018  (Nur Agus <nuragus.linux@gmail.com>)
List pgsql-performance
On Mon, Jan 29, 2018 at 12:32:59AM +0700, Nur Agus wrote:
> The following query run in just 9 ms:

>  "distrib_reports"."month" = 1 AND
>  "distrib_reports"."year" = 2017 AND
>  "distrib_reports"."state" = 'SUBMITTED' AND
>  "distrib_report_groups"."distrib_report_group_type_id" =
> '559a5fdc-418d-4494-aebf-80ecf8743d35'

> The explain analyze of the 2 queries are resulting on really different
> query plan, here are the links to depesz:
> 2017 --> explain result on postgres-9: https://explain.depesz.com/s/qJF1
> 2018 --> explain result on postgres-9: https://explain.depesz.com/s/pT0y

> The question is, why the query planner choose such very different path just
> by changing one parameter?

Looks like this badly underestimates its rowcount:

Index Scan using index_distrib_reports_on_year on distrib_reports (cost=0.42..40.62 rows=8 width=32) (actual
time=0.034..50.452rows=17,055 loops=1)
 
    Index Cond: (year = 2018)
    Filter: ((month = 1) AND ((state)::text = 'SUBMITTED'::text))
    Rows Removed by Filter: 1049

Maybe because "if year==2018" then, month=1 does essentialy nothing ..
..but postgres thinks it'll filters out some 90% of the rows.

And possibly the same for SUBMITTED (?)
You should probably use timestamp column rather than integer year+month.

On PG10, you could probably work around it using "CREATE STATISTICS".

> This behaviour is *not-reproducable* on postgres-10. On postgres-10, the
> query plan are consistent, and both have very acceptable time:
> 2017 --> explain result on postgres-10: https://explain.depesz.com/s/N9r5
> 2018 --> --> explain result on postgres-10:
> https://explain.depesz.com/s/Tf5K
..I think default max_parallel_workers_per_gather=3 by chance causes the plan
to be the same.

I think there's still a underestimate rowcount with PG10 (without CREATE
STATISTICS), but it's masked by "rows=1 actual rows=0" roundoff error with
high loop count.

Justin


pgsql-performance by date:

Previous
From: Nur Agus
Date:
Subject: Query Slow After 2018
Next
From: "David G. Johnston"
Date:
Subject: Re: Query Slow After 2018