Thread: Query Slow After 2018

Query Slow After 2018

From
Nur Agus
Date:
Hello,

I would like to report a strange behaviour on postgresql 9.4.4.

The following query run in just 9 ms:

SELECT SUM("distrib_report_items"."qty") AS sum_id
FROM
 "distrib_report_items" INNER JOIN
 "retailers" ON "retailers"."id" = "distrib_report_items"."retailer_id" INNER JOIN
 "distrib_reports" ON "distrib_reports"."id" = "distrib_report_items"."distrib_report_id" INNER JOIN
 "distrib_report_groups" ON "distrib_report_groups"."id" = "distrib_reports"."distrib_report_group_id"
WHERE
 "retailers"."sub_district_id" = 'f4bff929-f911-4ab8-b1b2-aaa50e0ccb39' AND
 "distrib_report_items"."product_id" = '05167ad0-d2fa-4a4a-bd13-be8f89ce34a2' AND
 "distrib_reports"."month" = 1 AND
 "distrib_reports"."year" = 2017 AND
 "distrib_reports"."state" = 'SUBMITTED' AND
 "distrib_report_groups"."distrib_report_group_type_id" = '559a5fdc-418d-4494-aebf-80ecf8743d35'

But changing just one parameter (the year) from 2017 to 2018, the "exactly same query", become incredebly slow, at 8 seconds. This is the full query after changing the year:

SELECT SUM("distrib_report_items"."qty") AS sum_id
FROM
 "distrib_report_items" INNER JOIN
 "retailers" ON "retailers"."id" = "distrib_report_items"."retailer_id" INNER JOIN
 "distrib_reports" ON "distrib_reports"."id" = "distrib_report_items"."distrib_report_id" INNER JOIN
 "distrib_report_groups" ON "distrib_report_groups"."id" = "distrib_reports"."distrib_report_group_id"
WHERE
 "retailers"."sub_district_id" = 'f4bff929-f911-4ab8-b1b2-aaa50e0ccb39' AND
 "distrib_report_items"."product_id" = '05167ad0-d2fa-4a4a-bd13-be8f89ce34a2' AND
 "distrib_reports"."month" = 1 AND
 "distrib_reports"."year" = 2018 AND
 "distrib_reports"."state" = 'SUBMITTED' AND
 "distrib_report_groups"."distrib_report_group_type_id" = '559a5fdc-418d-4494-aebf-80ecf8743d35'

The explain analyze of the 2 queries are resulting on really different query plan, here are the links to depesz:
2017 --> explain result on postgres-9: https://explain.depesz.com/s/qJF1
2018 --> explain result on postgres-9: https://explain.depesz.com/s/pT0y

The table growth itself are normal. distrib_report_items table are growing from 1.9++ millions row on december 2017 to 2.3++ million rows on january 2018. Not a really significant growth.
The distrib_reports table (on which the year is filtered) has even less rows on 2018 (10k rows) compared to 400.000++ rows on 2017, which is very obvious.

The question is, why the query planner choose such very different path just by changing one parameter?

The table structures are below:
https://pastebin.com/T6AmtQ3z

This behaviour is *not-reproducable* on postgres-10. On postgres-10, the query plan are consistent, and both have very acceptable time:
2017 --> explain result on postgres-10: https://explain.depesz.com/s/N9r5
2018 --> --> explain result on postgres-10: https://explain.depesz.com/s/Tf5K

Is this a bug on postgres-9.4.4 ? 

We are considering upgrade to postgres-10 but since this is a very critical system, it takes a lot of test and approval :)

Thank you very much.

Re: Query Slow After 2018

From
Justin Pryzby
Date:
On Mon, Jan 29, 2018 at 12:32:59AM +0700, Nur Agus wrote:
> The following query run in just 9 ms:

>  "distrib_reports"."month" = 1 AND
>  "distrib_reports"."year" = 2017 AND
>  "distrib_reports"."state" = 'SUBMITTED' AND
>  "distrib_report_groups"."distrib_report_group_type_id" =
> '559a5fdc-418d-4494-aebf-80ecf8743d35'

> The explain analyze of the 2 queries are resulting on really different
> query plan, here are the links to depesz:
> 2017 --> explain result on postgres-9: https://explain.depesz.com/s/qJF1
> 2018 --> explain result on postgres-9: https://explain.depesz.com/s/pT0y

> The question is, why the query planner choose such very different path just
> by changing one parameter?

Looks like this badly underestimates its rowcount:

Index Scan using index_distrib_reports_on_year on distrib_reports (cost=0.42..40.62 rows=8 width=32) (actual
time=0.034..50.452rows=17,055 loops=1)
 
    Index Cond: (year = 2018)
    Filter: ((month = 1) AND ((state)::text = 'SUBMITTED'::text))
    Rows Removed by Filter: 1049

Maybe because "if year==2018" then, month=1 does essentialy nothing ..
..but postgres thinks it'll filters out some 90% of the rows.

And possibly the same for SUBMITTED (?)
You should probably use timestamp column rather than integer year+month.

On PG10, you could probably work around it using "CREATE STATISTICS".

> This behaviour is *not-reproducable* on postgres-10. On postgres-10, the
> query plan are consistent, and both have very acceptable time:
> 2017 --> explain result on postgres-10: https://explain.depesz.com/s/N9r5
> 2018 --> --> explain result on postgres-10:
> https://explain.depesz.com/s/Tf5K
..I think default max_parallel_workers_per_gather=3 by chance causes the plan
to be the same.

I think there's still a underestimate rowcount with PG10 (without CREATE
STATISTICS), but it's masked by "rows=1 actual rows=0" roundoff error with
high loop count.

Justin


Re: Query Slow After 2018

From
"David G. Johnston"
Date:
On Sunday, January 28, 2018, Nur Agus <nuragus.linux@gmail.com> wrote:

Is this a bug on postgres-9.4.4 ? 

We are considering upgrade to postgres-10 but since this is a very critical system, it takes a lot of test and approval :)
 

Upgrade to 9.4.15.  Asking if 9.4.4 might have a bug is a question most people here aren't going to be inclined to answer given the 11 bug fixes that version has received since.

David J.