Thread: Interpreting postgres execution plan along with AND/OR precedence

Interpreting postgres execution plan along with AND/OR precedence

From
"Dirschel, Steve"
Date:

Table definition:

 

                                          Table "load.lm_queue"

         Column         |              Type              | Collation | Nullable |         Default

------------------------+--------------------------------+-----------+----------+-------------------------

guid                   | character(33)                  |           | not null |

host_name              | character varying(40)          |           |          |

priority               | numeric(11,0)                  |           | not null |

request_time           | timestamp(6) without time zone |           | not null |

collection_name        | character varying(40)          |           | not null |

stage_id               | numeric(11,0)                  |           |          |

source_file            | character varying(250)         |           |          |

lm_id                  | numeric(11,0)                  |           |          |

start_time             | timestamp(6) without time zone |           |          |

status_text            | character varying(225)         |           | not null | NULL::character varying

load_data_id           | character varying(500)         |           |          |

docs_in_load           | numeric(11,0)                  |           |          |

client_name            | character varying(50)          |           |          |

status_code            | numeric(11,0)                  |           |          |

email_address          | character varying(2000)        |           |          |

hold_flag              | character(1)                   |           |          |

process_type           | character varying(40)          |           |          |

cancel_flag            | character(1)                   |           |          |

file_type              | character varying(6)           |           |          |

lm_data                | character varying(4000)        |           |          |

ds_request_time        | timestamp(6) without time zone |           |          |

ds_id                  | numeric(11,0)                  |           |          |

ds_start_time          | timestamp(6) without time zone |           |          |

auto_promote_flag      | character(1)                   |           |          |

extract_out_file       | character varying(250)         |           |          |

last_upd_time          | timestamp(6) without time zone |           |          |

ds_fail_count          | numeric(11,0)                  |           |          |

cc_collection          | character varying(40)          |           |          |

cc_environment         | character varying(40)          |           |          |

cc_fail_on_db_mismatch | character(1)                   |           |          |

cc_tracking_guid       | character varying(33)          |           |          |

cc_numrows             | character varying(50)          |           |          |

cc_owner               | character varying(30)          |           |          |

cc_password            | character varying(30)          |           |          |

parent_guid            | character varying(33)          |           |          |

Indexes:

    "xpklm_queue" PRIMARY KEY, btree (guid)

    "idx_hn_cn_dsid_sc_dst_dfc" btree (host_name, collection_name, ds_id, status_code, ds_start_time, ds_fail_count)

    "ix_lm_cc" btree (collection_name, client_name)

    "ix_lm_chl" btree (client_name, host_name, lm_id)

 

 

Query and plan:

 

explain

SELECT GUID, COLLECTION_NAME, PRIORITY, START_TIME, REQUEST_TIME, CLIENT_NAME, PROCESS_TYPE, PARENT_GUID

FROM LOAD.LM_QUEUE lmq1

WHERE CLIENT_NAME='WLCASES'

AND HOLD_FLAG='Y'

AND HOST_NAME='WLLOADB'

AND STATUS_CODE in (1)

AND NOT EXISTS

(SELECT COLLECTION_NAME

   FROM LOAD.LM_QUEUE lmq2

  WHERE lmq1.COLLECTION_NAME = lmq2.COLLECTION_NAME

    AND LM_ID <> 0

    AND PROCESS_TYPE NOT IN('EXTRACT'))

OR (PROCESS_TYPE in ('UNLOCK','UNLOCK RERUN')

   AND LM_ID = 0

   AND CLIENT_NAME='WLCASES'

   AND HOST_NAME= 'WLLOADB' )

ORDER BY PRIORITY DESC, REQUEST_TIME ASC;

 

 

Sort  (cost=1578.99..1579.00 rows=1 width=120)

   Sort Key: lmq1.priority DESC, lmq1.request_time

   ->  Index Scan using ix_lm_chl on lm_queue lmq1  (cost=0.40..1578.98 rows=1 width=120)

         Index Cond: (((client_name)::text = 'WLCASES'::text) AND ((host_name)::text = 'WLLOADB'::text))

         Filter: (((hold_flag = 'Y'::bpchar) AND (status_code = '1'::numeric) AND (NOT (alternatives: SubPlan 1 or hashed SubPlan 2))) OR (((process_type)::text = ANY ('{UNLOCK,"UNLOCK RERUN"}'::text[])) AND (lm_id = '0'::numeric)))

         SubPlan 1

           ->  Index Scan using ix_lm_cc on lm_queue lmq2  (cost=0.40..177.93 rows=1 width=0)

                 Index Cond: ((collection_name)::text = (lmq1.collection_name)::text)

                 Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 'EXTRACT'::text))

         SubPlan 2

           ->  Seq Scan on lm_queue lmq2_1  (cost=0.00..124999.06 rows=12 width=32)

                 Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 'EXTRACT'::text))

 

Question on the plan above:

I understand SubPlan 1 above-  it is joining into the NOT EXISTS via the lmq1.COLLECTION_NAME = lmq2.COLLECTION_NAME and then applying the other filtering inside the NOT EXISTS.  But I don’t understand SubPlan 2.  Given the filter conditions under SubPlan 2 it is also coming from the NOT EXISTS because that is where LM_ID <> 0 and PROCESS_TYPE <> EXTRACT exist but I don’t understand the scenario where this query would need to use SubPlan 2.  Would anyone be able to explain under what condition(s) SubPlan 2 would get executed?

 

I’m trying to understand the precedence of AND/OR operations when everything is not tied together with ()’s.  I see the following filtering in the query:

 

  1. CLIENT_NAME='WLCASES'
  2. AND HOLD_FLAG='Y'
  3. AND HOST_NAME='WLLOADB'
  4. AND STATUS_CODE in (1)
  5. AND NOT EXISTS

(SELECT COLLECTION_NAME

    FROM LOAD.LM_QUEUE lmq2

 WHERE lmq1.COLLECTION_NAME = lmq2.COLLECTION_NAME

      AND LM_ID <> 0

      AND PROCESS_TYPE NOT IN('EXTRACT'))

  1. OR (PROCESS_TYPE in ('UNLOCK','UNLOCK RERUN')

   AND LM_ID = 0

   AND CLIENT_NAME='WLCASES'

   AND HOST_NAME= 'WLLOADB' )

 

All of the conditions in #6 above go together in that OR clause because of the ().

 

According to this link:

 

https://www.postgresql.org/docs/14/sql-syntax-lexical.html#SQL-PRECEDENCE

 

AND has higher precedence over OR.  So in my mind that would imply the order of the filters in the WHERE clause shouldn’t matter-  if there are multiple AND filters and OR filters regardless of the order in the FROM clause the AND’s are applied first.

 

But if I change the order of the filters-  moved AND HOLD_FLAG='Y' AND HOST_NAME='WLLOADB' AND STATUS_CODE in (1) to the bottom of the query:

 

explain

SELECT GUID, COLLECTION_NAME, PRIORITY, START_TIME, REQUEST_TIME, CLIENT_NAME, PROCESS_TYPE, PARENT_GUID

FROM LOAD.LM_QUEUE lmq1

WHERE CLIENT_NAME='WLCASES'

AND NOT EXISTS

(SELECT COLLECTION_NAME

   FROM LOAD.LM_QUEUE lmq2

  WHERE lmq1.COLLECTION_NAME = lmq2.COLLECTION_NAME

    AND LM_ID <> 0

    AND PROCESS_TYPE NOT IN('EXTRACT'))

OR (PROCESS_TYPE in ('UNLOCK','UNLOCK RERUN')

   AND LM_ID = 0

   AND CLIENT_NAME='WLCASES'

   AND HOST_NAME= 'WLLOADB' )

AND HOLD_FLAG='Y'

AND HOST_NAME='WLLOADB'

AND STATUS_CODE in (1)

ORDER BY PRIORITY DESC, REQUEST_TIME ASC;

 

The plan changes:

 

Sort  (cost=9382.94..9382.97 rows=12 width=169)

   Sort Key: lmq1.priority DESC, lmq1.request_time

   ->  Bitmap Heap Scan on lm_queue lmq1  (cost=4572.59..9382.73 rows=12 width=169)

         Recheck Cond: ((client_name)::text = 'WLCASES'::text)

         Filter: ((NOT (alternatives: SubPlan 1 or hashed SubPlan 2)) OR (((process_type)::text = ANY ('{UNLOCK,"UNLOCK RERUN"}'::text[])) AND (lm_id = '0'::numeric) AND ((host_name)::text = 'WLLOADB'::text) AND (hold_flag = 'Y'::bpchar) AND ((host_name)::text = 'WLLOADB'::text) AND (status_code = '1'::numeric)))

         ->  Bitmap Index Scan on ix_lm_chl  (cost=0.00..4572.58 rows=25 width=0)

               Index Cond: ((client_name)::text = 'WLCASES'::text)

         SubPlan 1

           ->  Bitmap Heap Scan on lm_queue lmq2  (cost=164.44..188.42 rows=1 width=0)

                 Recheck Cond: ((lmq1.collection_name)::text = (collection_name)::text)

                 Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 'EXTRACT'::text))

                 ->  Bitmap Index Scan on ix_lm_cc  (cost=0.00..164.44 rows=6 width=0)

                       Index Cond: ((collection_name)::text = (lmq1.collection_name)::text)

         SubPlan 2

           ->  Seq Scan on lm_queue lmq2_1  (cost=0.00..124999.49 rows=25 width=32)

                 Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 'EXTRACT'::text))

 

In the original plan above I believe the query drives off index ix_lm_chl applying both client_name = WLCASES and host_name = WLLOADB to the index cond.  In the plan directly above I believe it also drives off index ix_lm_chl but it is only applying client_name = WLCASES to the index cond.

 

If AND’s are applied first then why wouldn’t the modified query apply both client_name = WLCASES and host_name = WLLOADB to the index cond?  Is it because those are moved below the OR condition?

 

Thanks in advance.

Steve

This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html

Re: Interpreting postgres execution plan along with AND/OR precedence

From
Tom Lane
Date:
"Dirschel, Steve" <steve.dirschel@thomsonreuters.com> writes:
>          Filter: ... (alternatives: SubPlan 1 or hashed SubPlan 2) ...
>          SubPlan 1
>            ->  Index Scan using ix_lm_cc on lm_queue lmq2  (cost=0.40..177.93 rows=1 width=0)
>                  Index Cond: ((collection_name)::text = (lmq1.collection_name)::text)
>                  Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 'EXTRACT'::text))
>          SubPlan 2
>            ->  Seq Scan on lm_queue lmq2_1  (cost=0.00..124999.06 rows=12 width=32)
>                  Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 'EXTRACT'::text))

> I understand SubPlan 1 above- it is joining into the NOT EXISTS via the
> lmq1.COLLECTION_NAME = lmq2.COLLECTION_NAME and then applying the other
> filtering inside the NOT EXISTS.  But I don't understand SubPlan 2.
> Given the filter conditions under SubPlan 2 it is also coming from the
> NOT EXISTS because that is where LM_ID <> 0 and PROCESS_TYPE <> EXTRACT
> exist but I don't understand the scenario where this query would need to
> use SubPlan 2.  Would anyone be able to explain under what condition(s)
> SubPlan 2 would get executed?

The key is the "alternatives:" notation.  SubPlan 1 can be used in a
"retail" fashion by invoking it once per outer row, passing a new
value of lmq1.collection_name each time.  SubPlan 2 is meant to be
invoked just once, and its output (ie, all the relevant values of
lmq2.collection_name) will be loaded into an in-memory hash table
which is then probed for each outer row.  At the point where these
subplans are made, we don't have enough information about how many
outer rows there will be to decide which way is better, so we create
both subplans and postpone the decision till execution.  That's all
just related to the EXISTS clause, though.

(Since v14 we don't do it like that anymore, so that this confusing
EXPLAIN notation is gone.)

> I'm trying to understand the precedence of AND/OR operations when
> everything is not tied together with ()'s.

The OR is lower priority than all the ANDs, so yeah moving some
clauses to be after the OR would change the semantics.  I think
you probably need some more parentheses here; it's not clear
exactly what semantics you are after.

            regards, tom lane



Re: Interpreting postgres execution plan along with AND/OR precedence

From
Erik Wienhold
Date:
> On 18/01/2023 18:56 CET Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> "Dirschel, Steve" <steve.dirschel@thomsonreuters.com> writes:
> >          Filter: ... (alternatives: SubPlan 1 or hashed SubPlan 2) ...
> >          SubPlan 1
> >            ->  Index Scan using ix_lm_cc on lm_queue lmq2  (cost=0.40..177.93 rows=1 width=0)
> >                  Index Cond: ((collection_name)::text = (lmq1.collection_name)::text)
> >                  Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 'EXTRACT'::text))
> >          SubPlan 2
> >            ->  Seq Scan on lm_queue lmq2_1  (cost=0.00..124999.06 rows=12 width=32)
> >                  Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 'EXTRACT'::text))
> 
> > I understand SubPlan 1 above- it is joining into the NOT EXISTS via the
> > lmq1.COLLECTION_NAME = lmq2.COLLECTION_NAME and then applying the other
> > filtering inside the NOT EXISTS.  But I don't understand SubPlan 2.
> > Given the filter conditions under SubPlan 2 it is also coming from the
> > NOT EXISTS because that is where LM_ID <> 0 and PROCESS_TYPE <> EXTRACT
> > exist but I don't understand the scenario where this query would need to
> > use SubPlan 2.  Would anyone be able to explain under what condition(s)
> > SubPlan 2 would get executed?
> 
> The key is the "alternatives:" notation.  SubPlan 1 can be used in a
> "retail" fashion by invoking it once per outer row, passing a new
> value of lmq1.collection_name each time.  SubPlan 2 is meant to be
> invoked just once, and its output (ie, all the relevant values of
> lmq2.collection_name) will be loaded into an in-memory hash table
> which is then probed for each outer row.  At the point where these
> subplans are made, we don't have enough information about how many
> outer rows there will be to decide which way is better, so we create
> both subplans and postpone the decision till execution.  That's all
> just related to the EXISTS clause, though.
>
> (Since v14 we don't do it like that anymore, so that this confusing
> EXPLAIN notation is gone.)

EXPLAIN ANALYZE shows which subplan gets executed.  Look for "never executed".

--
Erik



Re: Interpreting postgres execution plan along with AND/OR precedence

From
Erik Wienhold
Date:
> On 18/01/2023 18:03 CET Dirschel, Steve <steve.dirschel@thomsonreuters.com> wrote:
>
> The plan changes:
>
> Sort (cost=9382.94..9382.97 rows=12 width=169)
> Sort Key: lmq1.priority DESC, lmq1.request_time
> -> Bitmap Heap Scan on lm_queue lmq1 (cost=4572.59..9382.73 rows=12 width=169)
> Recheck Cond: ((client_name)::text = 'WLCASES'::text)
> Filter: ((NOT (alternatives: SubPlan 1 or hashed SubPlan 2)) OR (((process_type)::text = ANY ('{UNLOCK,"UNLOCK
RERUN"}'::text[]))AND (lm_id = '0'::numeric) AND ((host_name)::text = 'WLLOADB'::text) AND (hold_flag = 'Y'::bpchar)
AND((host_name)::text = 'WLLOADB'::text) AND (status_code = '1'::numeric))) 
> -> Bitmap Index Scan on ix_lm_chl (cost=0.00..4572.58 rows=25 width=0)
> Index Cond: ((client_name)::text = 'WLCASES'::text)
> SubPlan 1
> -> Bitmap Heap Scan on lm_queue lmq2 (cost=164.44..188.42 rows=1 width=0)
> Recheck Cond: ((lmq1.collection_name)::text = (collection_name)::text)
> Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 'EXTRACT'::text))
> -> Bitmap Index Scan on ix_lm_cc (cost=0.00..164.44 rows=6 width=0)
> Index Cond: ((collection_name)::text = (lmq1.collection_name)::text)
> SubPlan 2
> -> Seq Scan on lm_queue lmq2_1 (cost=0.00..124999.49 rows=25 width=32)
> Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 'EXTRACT'::text))
>
> In the original plan above I believe the query drives off index ix_lm_chl
> applying both client_name = WLCASES and host_name = WLLOADB to the index cond.
> In the plan directly above I believe it also drives off index ix_lm_chl but it
> is only applying client_name = WLCASES to the index cond.
>
> If AND’s are applied first then why wouldn’t the modified query apply both
> client_name = WLCASES and host_name = WLLOADB to the index cond? Is it because
> those are moved below the OR condition?

Yes because those two conditions are combined with OR and that's handled by the
bitmap scans in your second execution plan.  See also:
https://www.postgresql.org/docs/14/indexes-bitmap-scans.html

--
Erik