Thread: PG Query Planner

PG Query Planner

From
Kenny Bachman
Date:
Hello,

I wonder how the query planner works in postgresql. So, I have a query that takes 6 seconds with an index scan. However, the same query takes 0.1ms when I set disable index scan parameter.

How do I get the planner to make the right decision? Also, I ran ANALYZE command many times.

And I have an index on job_next_process_time  column but did not use it by the planner.

Limit  (cost=0.56..29.04 rows=1 width=695) (actual time=6386.751..6386.753 rows=0 loops=1)
   ->  Nested Loop  (cost=0.56..692987.46 rows=24331 width=695) (actual time=6386.749..6386.751 rows=0 loops=1)
         Join Filter: (subs.offer_id = offer1.id)
         ->  Index Scan using subs_pkey on subscription subs  (cost=0.56..572151.65 rows=8053633 width=695) (actual time=0.008..5554.872 rows=8021769 loops=1)
               Filter: ((NOT job_in_progress) AND (job_next_process_time < '2022-04-19 09:25:25.535'::timestamp without time zone))
               Rows Removed by Filter: 72039
         ->  Materialize  (cost=0.00..31.31 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=8021769)
               ->  Seq Scan on offer offer1  (cost=0.00..31.31 rows=1 width=8) (actual time=0.087..0.088 rows=0 loops=1)
                     Filter: (NOT is_external_lifecycle_management)
                     Rows Removed by Filter: 334
 Planning Time: 1.335 ms
 Execution Time: 6386.792 ms

SET enable_indexscan = OFF;

  Limit  (cost=84760.55..84760.55 rows=1 width=695) (actual time=0.092..0.093 rows=0 loops=1)
   ->  Sort  (cost=84760.55..84822.63 rows=24832 width=695) (actual time=0.092..0.092 rows=0 loops=1)
         Sort Key: subs.id
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=955.54..84636.39 rows=24832 width=695) (actual time=0.090..0.090 rows=0 loops=1)
               ->  Seq Scan on offer offer1  (cost=0.00..31.31 rows=1 width=8) (actual time=0.089..0.089 rows=0 loops=1)
                     Filter: (NOT is_external_lifecycle_management)
                     Rows Removed by Filter: 334
               ->  Bitmap Heap Scan on subscription subs  (cost=955.54..83681.53 rows=92355 width=695) (never executed)
                     Recheck Cond: (offer_id = offer1.id)
                     Filter: ((NOT job_in_progress) AND (job_next_process_time < '2022-04-19 09:25:25.535'::timestamp without time zone))
                     ->  Bitmap Index Scan on i_fk_subscription_offer  (cost=0.00..932.45 rows=93029 width=0) (never executed)
                           Index Cond: (offer_id = offer1.id)
 Planning Time: 0.266 ms
 Execution Time: 0.126 ms

Re: PG Query Planner

From
Gaurav Anand
Date:
Looks like your Index has gone wrong, instead of 92355 records it is scanning     8021769 rows which is why it is taking 6s.

Share the sql too

On Tue, 19 Apr 2022 at 11:07 PM, Kenny Bachman <kenny.bachman17@gmail.com> wrote:
Hello,

I wonder how the query planner works in postgresql. So, I have a query that takes 6 seconds with an index scan. However, the same query takes 0.1ms when I set disable index scan parameter.

How do I get the planner to make the right decision? Also, I ran ANALYZE command many times.

And I have an index on job_next_process_time  column but did not use it by the planner.

Limit  (cost=0.56..29.04 rows=1 width=695) (actual time=6386.751..6386.753 rows=0 loops=1)
   ->  Nested Loop  (cost=0.56..692987.46 rows=24331 width=695) (actual time=6386.749..6386.751 rows=0 loops=1)
         Join Filter: (subs.offer_id = offer1.id)
         ->  Index Scan using subs_pkey on subscription subs  (cost=0.56..572151.65 rows=8053633 width=695) (actual time=0.008..5554.872 rows=8021769 loops=1)
               Filter: ((NOT job_in_progress) AND (job_next_process_time < '2022-04-19 09:25:25.535'::timestamp without time zone))
               Rows Removed by Filter: 72039
         ->  Materialize  (cost=0.00..31.31 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=8021769)
               ->  Seq Scan on offer offer1  (cost=0.00..31.31 rows=1 width=8) (actual time=0.087..0.088 rows=0 loops=1)
                     Filter: (NOT is_external_lifecycle_management)
                     Rows Removed by Filter: 334
 Planning Time: 1.335 ms
 Execution Time: 6386.792 ms

SET enable_indexscan = OFF;

  Limit  (cost=84760.55..84760.55 rows=1 width=695) (actual time=0.092..0.093 rows=0 loops=1)
   ->  Sort  (cost=84760.55..84822.63 rows=24832 width=695) (actual time=0.092..0.092 rows=0 loops=1)
         Sort Key: subs.id
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=955.54..84636.39 rows=24832 width=695) (actual time=0.090..0.090 rows=0 loops=1)
               ->  Seq Scan on offer offer1  (cost=0.00..31.31 rows=1 width=8) (actual time=0.089..0.089 rows=0 loops=1)
                     Filter: (NOT is_external_lifecycle_management)
                     Rows Removed by Filter: 334
               ->  Bitmap Heap Scan on subscription subs  (cost=955.54..83681.53 rows=92355 width=695) (never executed)
                     Recheck Cond: (offer_id = offer1.id)
                     Filter: ((NOT job_in_progress) AND (job_next_process_time < '2022-04-19 09:25:25.535'::timestamp without time zone))
                     ->  Bitmap Index Scan on i_fk_subscription_offer  (cost=0.00..932.45 rows=93029 width=0) (never executed)
                           Index Cond: (offer_id = offer1.id)
 Planning Time: 0.266 ms
 Execution Time: 0.126 ms
--

Thanks.

Regards,
Gaurav Anand

logo

This communication is confidential and subject to and governed by Saama’s Electronic Communications Disclaimer.

 

Re: PG Query Planner

From
Kenny Bachman
Date:
Hello,

My query is :

SELECT subs.id AS id1_109,
scriber_id AS subs_109
FROM subscription subs
LEFT OUTER JOIN offer offer1
ON subs.offer_id = offer1.id
WHERE offer1.is_external_lifecycle_management = FALSE
AND subs.job_next_process_time < '2022-04-19 09:25:25.535'
AND subs.job_in_progress = FALSE
ORDER BY subs.id ASC LIMIT 1 ;


Gaurav Anand <gaurav.anand@saama.com>, 19 Nis 2022 Sal, 20:45 tarihinde şunu yazdı:
Looks like your Index has gone wrong, instead of 92355 records it is scanning     8021769 rows which is why it is taking 6s.

Share the sql too

On Tue, 19 Apr 2022 at 11:07 PM, Kenny Bachman <kenny.bachman17@gmail.com> wrote:
Hello,

I wonder how the query planner works in postgresql. So, I have a query that takes 6 seconds with an index scan. However, the same query takes 0.1ms when I set disable index scan parameter.

How do I get the planner to make the right decision? Also, I ran ANALYZE command many times.

And I have an index on job_next_process_time  column but did not use it by the planner.

Limit  (cost=0.56..29.04 rows=1 width=695) (actual time=6386.751..6386.753 rows=0 loops=1)
   ->  Nested Loop  (cost=0.56..692987.46 rows=24331 width=695) (actual time=6386.749..6386.751 rows=0 loops=1)
         Join Filter: (subs.offer_id = offer1.id)
         ->  Index Scan using subs_pkey on subscription subs  (cost=0.56..572151.65 rows=8053633 width=695) (actual time=0.008..5554.872 rows=8021769 loops=1)
               Filter: ((NOT job_in_progress) AND (job_next_process_time < '2022-04-19 09:25:25.535'::timestamp without time zone))
               Rows Removed by Filter: 72039
         ->  Materialize  (cost=0.00..31.31 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=8021769)
               ->  Seq Scan on offer offer1  (cost=0.00..31.31 rows=1 width=8) (actual time=0.087..0.088 rows=0 loops=1)
                     Filter: (NOT is_external_lifecycle_management)
                     Rows Removed by Filter: 334
 Planning Time: 1.335 ms
 Execution Time: 6386.792 ms

SET enable_indexscan = OFF;

  Limit  (cost=84760.55..84760.55 rows=1 width=695) (actual time=0.092..0.093 rows=0 loops=1)
   ->  Sort  (cost=84760.55..84822.63 rows=24832 width=695) (actual time=0.092..0.092 rows=0 loops=1)
         Sort Key: subs.id
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=955.54..84636.39 rows=24832 width=695) (actual time=0.090..0.090 rows=0 loops=1)
               ->  Seq Scan on offer offer1  (cost=0.00..31.31 rows=1 width=8) (actual time=0.089..0.089 rows=0 loops=1)
                     Filter: (NOT is_external_lifecycle_management)
                     Rows Removed by Filter: 334
               ->  Bitmap Heap Scan on subscription subs  (cost=955.54..83681.53 rows=92355 width=695) (never executed)
                     Recheck Cond: (offer_id = offer1.id)
                     Filter: ((NOT job_in_progress) AND (job_next_process_time < '2022-04-19 09:25:25.535'::timestamp without time zone))
                     ->  Bitmap Index Scan on i_fk_subscription_offer  (cost=0.00..932.45 rows=93029 width=0) (never executed)
                           Index Cond: (offer_id = offer1.id)
 Planning Time: 0.266 ms
 Execution Time: 0.126 ms
--

Thanks.

Regards,
Gaurav Anand

logo

This communication is confidential and subject to and governed by Saama’s Electronic Communications Disclaimer.

 

Re: PG Query Planner

From
"David G. Johnston"
Date:
On Tue, Apr 19, 2022 at 10:57 AM Kenny Bachman <kenny.bachman17@gmail.com> wrote:
SELECT subs.id AS id1_109,
scriber_id AS subs_109
FROM subscription subs
LEFT OUTER JOIN offer offer1
ON subs.offer_id = offer1.id
WHERE offer1.is_external_lifecycle_management = FALSE

You've made the left outer join specification pointless here.

What version?

The fundamental issue is the first query is optimistic - it expects it will find at least one result, potentially fairly quickly, if it just walks the possible subscriptions in order.  But it doesn't ever find a result after scanning the whole thing.

The second query figures it can prove that there are no possible results because the offer1 restriction makes the final result an empty set.

I'm not sure how to rework the query here - though I'd at least clean up the lie that you are doing an outer join even if that doesn't impact the query plan (though it may very well do that).

The statistics seem OK, and if you don't actually make changes to the data, repeatedly running analyze is no better than running it once.

You may find some success using a materialized CTE explicitly to scan offer and then semi-join that result to subscriptions; the empty materialized CTE needed for the semi-join would then enable the short-circuit that is seen in the second query.

The convention on these lists is to inline or bottom post, trimming context.

David J.


Re: PG Query Planner

From
Jeff Janes
Date:
On Tue, Apr 19, 2022 at 1:57 PM Kenny Bachman <kenny.bachman17@gmail.com> wrote:
Hello,

My query is :

SELECT subs.id AS id1_109,
scriber_id AS subs_109
FROM subscription subs
LEFT OUTER JOIN offer offer1
ON subs.offer_id = offer1.id
WHERE offer1.is_external_lifecycle_management = FALSE
AND subs.job_next_process_time < '2022-04-19 09:25:25.535'
AND subs.job_in_progress = FALSE
ORDER BY subs.id ASC LIMIT 1 ;

An easy (if you can control the queries) way to force the faster plan is to prohibit it from using using the index to fulfill the ORDER BY, by changing it to "ORDER BY subs.id+0 ASC LIMIT 1"

As for getting the planner to get a better plan on its own, I don't think there is much hope.  The difference between 0 rows with is_external_lifecycle_management=FALSE, and 1 row meeting that, is the smallest possible difference. Yet still the ratio between them is infinite.  Changing a histogram bin count or adding a multivariate statistic is not going to change that.

Maybe the executor should be smart enough to cut off the nested loop once it sees the Materialize will never return a row.  But that is not a change you can make in user-land.

Cheers,

Jeff