Query Slow After 2018 - Mailing list pgsql-performance

From Nur Agus
Subject Query Slow After 2018
Date
Msg-id CAM7u+C6gueE1Ry=n+1ErHe-medq6zJiMRzOwFe8S8eywDWpZUA@mail.gmail.com
Whole thread Raw
Responses Re: Query Slow After 2018  (Justin Pryzby <pryzby@telsasoft.com>)
Re: Query Slow After 2018  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-performance
Hello,

I would like to report a strange behaviour on postgresql 9.4.4.

The following query run in just 9 ms:

SELECT SUM("distrib_report_items"."qty") AS sum_id
FROM
 "distrib_report_items" INNER JOIN
 "retailers" ON "retailers"."id" = "distrib_report_items"."retailer_id" INNER JOIN
 "distrib_reports" ON "distrib_reports"."id" = "distrib_report_items"."distrib_report_id" INNER JOIN
 "distrib_report_groups" ON "distrib_report_groups"."id" = "distrib_reports"."distrib_report_group_id"
WHERE
 "retailers"."sub_district_id" = 'f4bff929-f911-4ab8-b1b2-aaa50e0ccb39' AND
 "distrib_report_items"."product_id" = '05167ad0-d2fa-4a4a-bd13-be8f89ce34a2' AND
 "distrib_reports"."month" = 1 AND
 "distrib_reports"."year" = 2017 AND
 "distrib_reports"."state" = 'SUBMITTED' AND
 "distrib_report_groups"."distrib_report_group_type_id" = '559a5fdc-418d-4494-aebf-80ecf8743d35'

But changing just one parameter (the year) from 2017 to 2018, the "exactly same query", become incredebly slow, at 8 seconds. This is the full query after changing the year:

SELECT SUM("distrib_report_items"."qty") AS sum_id
FROM
 "distrib_report_items" INNER JOIN
 "retailers" ON "retailers"."id" = "distrib_report_items"."retailer_id" INNER JOIN
 "distrib_reports" ON "distrib_reports"."id" = "distrib_report_items"."distrib_report_id" INNER JOIN
 "distrib_report_groups" ON "distrib_report_groups"."id" = "distrib_reports"."distrib_report_group_id"
WHERE
 "retailers"."sub_district_id" = 'f4bff929-f911-4ab8-b1b2-aaa50e0ccb39' AND
 "distrib_report_items"."product_id" = '05167ad0-d2fa-4a4a-bd13-be8f89ce34a2' AND
 "distrib_reports"."month" = 1 AND
 "distrib_reports"."year" = 2018 AND
 "distrib_reports"."state" = 'SUBMITTED' AND
 "distrib_report_groups"."distrib_report_group_type_id" = '559a5fdc-418d-4494-aebf-80ecf8743d35'

The explain analyze of the 2 queries are resulting on really different query plan, here are the links to depesz:
2017 --> explain result on postgres-9: https://explain.depesz.com/s/qJF1
2018 --> explain result on postgres-9: https://explain.depesz.com/s/pT0y

The table growth itself are normal. distrib_report_items table are growing from 1.9++ millions row on december 2017 to 2.3++ million rows on january 2018. Not a really significant growth.
The distrib_reports table (on which the year is filtered) has even less rows on 2018 (10k rows) compared to 400.000++ rows on 2017, which is very obvious.

The question is, why the query planner choose such very different path just by changing one parameter?

The table structures are below:
https://pastebin.com/T6AmtQ3z

This behaviour is *not-reproducable* on postgres-10. On postgres-10, the query plan are consistent, and both have very acceptable time:
2017 --> explain result on postgres-10: https://explain.depesz.com/s/N9r5
2018 --> --> explain result on postgres-10: https://explain.depesz.com/s/Tf5K

Is this a bug on postgres-9.4.4 ? 

We are considering upgrade to postgres-10 but since this is a very critical system, it takes a lot of test and approval :)

Thank you very much.

pgsql-performance by date:

Previous
From: Lars Aksel Opsahl
Date:
Subject: SV: copy csv into partitioned table with unique index
Next
From: Justin Pryzby
Date:
Subject: Re: Query Slow After 2018