Thread: Query performance going from Oracle to Postgres

Query performance going from Oracle to Postgres

From

"Dirschel, Steve"

Date:

06 September 2023, 20:06:58

We are in the process of converting from Oracle to Postgres and I have a query that is using the same index in Postgres as is used in Oracle but in Postgres the query does 16x more buffer/logical reads. I’d like to understand why. The query is hitting a partitioned table but to simply things I changed it to hit a single partition. This partition has the same number of rows in Oracle as in Postgres.

Here is the Postgres query, partition definition, execution plan. I will also include the Oracle execution plan below in case anyone is interested.

explain (analyze, buffers)

select count(historyeve0_.HISTORY_EVENT_SID) as col_0_0_ from hist28.history_event_display_timestamp_20230301 historyeve0_ where historyeve0_.IS_DELETED=0

and historyeve0_.PRISM_GUID='i0accd6a20000018405f095ee669dc5b4'

and historyeve0_.PRODUCT_SID='CARSWELL.WESTLAW'

and (historyeve0_.EVENT_TYPE not in ('versionsSearchWithinQueryEvent','notesOfDecisionsSearchWithinQueryEvent','citingReferencesSearchWithinQueryEvent','tocSearchWithinQueryEvent','searchWithinDocumentEvent','coCitesSearchWithinQueryEvent'))

and (historyeve0_.PRODUCT_VIEW in ('DefaultProductView','TNPPlus','PLCUS','Indigo','IndigoPremium','INDIGOCA')

or historyeve0_.PRODUCT_VIEW is null)

and historyeve0_.CLIENT_ID='WLNCE_VNJXL7'

and (historyeve0_.DISPLAY_TIMESTAMP between '2022-03-01 00:00:00' and '2023-03-01 23:59:59.999');

Aggregate (cost=56.64..56.65 rows=1 width=8) (actual time=0.930..0.931 rows=1 loops=1)

Buffers: shared hit=341

-> Index Scan using history_event_display_timesta_prism_guid_display_timestamp_idx1 on history_event_display_timestamp_20230301 historyeve0_ (cost=0.42..56.64 rows=1 width=33) (actual time=0.034..0.897 rows=332 loops=1)

Index Cond: (((prism_guid)::text = 'i0accd6a20000018405f095ee669dc5b4'::text) AND (display_timestamp >= '2022-03-01 00:00:00'::timestamp without time zone) AND (display_timestamp <= '2023-03-01 23:59:59.999'::timestamp without time zone) AND ((product_sid)::text = 'CARSWELL.WESTLAW'::text) AND (is_deleted = '0'::numeric) AND ((client_id)::text = 'WLNCE_VNJXL7'::text))

Filter: (((event_type)::text <> ALL ('{versionsSearchWithinQueryEvent,notesOfDecisionsSearchWithinQueryEvent,citingReferencesSearchWithinQueryEvent,tocSearchWithinQueryEvent,searchWithinDocumentEvent,coCitesSearchWithinQueryEvent}'::text[])) AND (((product_view)::text = ANY ('{DefaultProductView,TNPPlus,PLCUS,Indigo,IndigoPremium,INDIGOCA}'::text[])) OR (product_view IS NULL)))

Buffers: shared hit=341

Planning:

Buffers: shared hit=6

Planning Time: 0.266 ms

Execution Time: 0.965 ms

(10 rows)

*** 341 logical reads to find 332 rows.

Oracle will find the same 332 rows using the same index but in Oracle it only does 20 logical reads. I thought maybe the index was fragmented so I reindexed that index:

reindex index hist28.history_event_display_timesta_prism_guid_display_timestamp_idx1;

Plan after that:

Aggregate (cost=40.64..40.65 rows=1 width=8) (actual time=0.707..0.708 rows=1 loops=1)

Buffers: shared hit=328

-> Index Scan using history_event_display_timesta_prism_guid_display_timestamp_idx1 on history_event_display_timestamp_20230301 historyeve0_ (cost=0.42..40.64 rows=1 width=33) (actual time=0.032..0.683 rows=332 loops=1)

Buffers: shared hit=328

Planning:

Buffers: shared hit=27

Planning Time: 0.321 ms

Execution Time: 0.741 ms

(10 rows)

Shared hit came down a little but is still 16x more than Oracle. The actual query will hit 12 partitions so the extra amount of work it does in Postgres adds up (note some data points will find significantly more rows than 332 so performance is much worse than shown here but this data point is used just to show the difference between Oracle and Postgres). The interesting part it is seems Postgres is doing 1 shared hit per row. I don’t know anyone on this list knows Postgres’s internal index implementations vs Oracle’s but is Postgres’s Index Scan operation significantly different than Oracle’s index range scan? There is something implemented much less efficiently here vs Oracle and I don’t know what I can do to get this query to perform more closer to Oracle.

Thanks in advance.

Here is the partition definition in Postgres:

\d hist28.history_event_display_timestamp_20230301

Table "hist28.history_event_display_timestamp_20230301"

Column | Type | Collation | Nullable | Default

------------------------+--------------------------------+-----------+----------+-----------------------------------------

display_timestamp | timestamp(6) without time zone | | not null |

history_event_sid | character varying(32) | | not null |

event_type | character varying(50) | | not null |

event_sub_type | character varying(100) | | |

content_type | character varying(100) | | |

related_entity_type | character varying(50) | | |

related_entity_id | character varying(256) | | |

prism_guid | character varying(50) | | not null |

client_id | character varying(250) | | |

matter_id | character varying(50) | | |

delimiter | character varying(20) | | |

session_id | character varying(35) | | not null |

is_deleted | numeric(1,0) | | not null | 0

created | timestamp(6) without time zone | | not null |

changed | timestamp(6) without time zone | | not null |

cobalt_event_timestamp | timestamp(6) without time zone | | |

product_sid | character varying(32) | | not null |

cobalt_event_id | character varying(256) | | |

unique_lookup_key | character varying(200) | | not null |

display_name | character varying(3000) | | not null |

event_payload | text | | |

sub_content_type | character varying(250) | | |

product_view | character varying(64) | | | 'DefaultProductView'::character varying

Partition of: hist28.history_event FOR VALUES FROM ('2023-02-01 00:00:00') TO ('2023-03-01 00:00:00')

Indexes:

"history_event_display_timestamp_20230301_pkey" PRIMARY KEY, btree (history_event_sid, display_timestamp)

"history_event_display_timest_prism_guid_display_timestamp_idx39" btree (prism_guid, display_timestamp, content_type, event_type, product_sid, is_deleted, related_entity_id)

"history_event_display_timesta_prism_guid_display_timestamp_idx1" UNIQUE, btree (prism_guid, display_timestamp, unique_lookup_key, event_type, session_id, related_entity_id, product_sid, is_deleted, client_

id, product_view)

"history_event_display_timesta_prism_guid_is_deleted_changed_idx" btree (prism_guid, is_deleted, changed)

"history_event_display_timestamp_20230301_changed_idx" btree (changed)

The index the query is using- history_event_display_timesta_prism_guid_display_timestamp_idx1- it is the same index being used in Oracle, just named differently in Oracle.

Here is the Oracle plan and execution statistics hitting the same partition:

select count(historyeve0_.HISTORY_EVENT_SID) as col_0_0_ from hist28.HISTORY_EVENT partition(DISPLAY_TIMESTAMP_20230301) historyeve0_ where historyeve0_.IS_DELETED=0

and historyeve0_.PRISM_GUID='i0accd6a20000018405f095ee669dc5b4'

and historyeve0_.PRODUCT_SID='CARSWELL.WESTLAW'

and (historyeve0_.PRODUCT_VIEW in ('DefaultProductView','TNPPlus','PLCUS','Indigo','IndigoPremium','INDIGOCA')

or historyeve0_.PRODUCT_VIEW is null)

and historyeve0_.CLIENT_ID='WLNCE_VNJXL7'

and (historyeve0_.DISPLAY_TIMESTAMP between to_timestamp('2022-03-01 00:00:00','YYYY-MM-DD HH24:MI:SS') and to_timestamp('2023-03-01 23:59:59','YYYY-MM-DD HH24:MI:SS'));

Plan hash value: 3361538278

----------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 20 |

| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 20 |

|* 2 | FILTER | | 1 | | 332 |00:00:00.01 | 20 |

| 3 | PARTITION RANGE SINGLE| | 1 | 1 | 332 |00:00:00.01 | 20 |

|* 4 | INDEX RANGE SCAN | HISTORY_EVENT_IDX02 | 1 | 1 | 332 |00:00:00.01 | 20 |

----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - filter(TO_TIMESTAMP(:SYS_B_16,:SYS_B_17)<=TO_TIMESTAMP(:SYS_B_18,:SYS_B_19))

4 - access("HISTORYEVE0_"."PRISM_GUID"=:SYS_B_01 AND

"HISTORYEVE0_"."DISPLAY_TIMESTAMP">=TO_TIMESTAMP(:SYS_B_16,:SYS_B_17) AND

"HISTORYEVE0_"."PRODUCT_SID"=:SYS_B_02 AND "HISTORYEVE0_"."IS_DELETED"=:SYS_B_00 AND

"HISTORYEVE0_"."CLIENT_ID"=:SYS_B_15 AND "HISTORYEVE0_"."DISPLAY_TIMESTAMP"<=TO_TIMESTAMP(:SYS_B_1

8,:SYS_B_19))

filter(("HISTORYEVE0_"."CLIENT_ID"=:SYS_B_15 AND "HISTORYEVE0_"."PRODUCT_SID"=:SYS_B_02

AND "HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_03 AND "HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_04 AND

"HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_05 AND "HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_06 AND

"HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_07 AND "HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_08 AND

"HISTORYEVE0_"."IS_DELETED"=:SYS_B_00 AND (INTERNAL_FUNCTION("HISTORYEVE0_"."PRODUCT_VIEW") OR

"HISTORYEVE0_"."PRODUCT_VIEW" IS NULL)))

It finds the same 332 rows but it does only 20 logical reads. The Oracle “access” step and the Postgres “Index Cond” columns used for index access appears to be the same for Postgres and Oracle.

Thanks

This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html

Re: Query performance going from Oracle to Postgres

From

Laurenz Albe

Date:

06 September 2023, 21:14:37

On Wed, 2023-09-06 at 20:06 +0000, Dirschel, Steve wrote:
> We are in the process of converting from Oracle to Postgres and I have a query that is using
> the same index in Postgres as is used in Oracle but in Postgres the query does 16x more
> buffer/logical reads.  I’d like to understand why.
>  
> explain (analyze, buffers)
> select count(historyeve0_.HISTORY_EVENT_SID) as col_0_0_
> from hist28.history_event_display_timestamp_20230301 historyeve0_ where historyeve0_.IS_DELETED=0
> and historyeve0_.PRISM_GUID='i0accd6a20000018405f095ee669dc5b4'
> and historyeve0_.PRODUCT_SID='CARSWELL.WESTLAW'
> and (historyeve0_.EVENT_TYPE not in 
>
('versionsSearchWithinQueryEvent','notesOfDecisionsSearchWithinQueryEvent','citingReferencesSearchWithinQueryEvent','tocSearchWithinQueryEvent','searchWithinDocumentEvent','coCitesSearchWithinQueryE
> vent'))
> and (historyeve0_.PRODUCT_VIEW in ('DefaultProductView','TNPPlus','PLCUS','Indigo','IndigoPremium','INDIGOCA')
> or historyeve0_.PRODUCT_VIEW is null)
> and historyeve0_.CLIENT_ID='WLNCE_VNJXL7'
> and (historyeve0_.DISPLAY_TIMESTAMP between '2022-03-01 00:00:00' and '2023-03-01 23:59:59.999');
>  
> Aggregate  (cost=56.64..56.65 rows=1 width=8) (actual time=0.930..0.931 rows=1 loops=1)
>    Buffers:shared hit=341
>    ->  Index Scan using history_event_display_timesta_prism_guid_display_timestamp_idx1 on
history_event_display_timestamp_20230301historyeve0_  (cost=0.42..56.64 rows=1 width=33) (actual 
> time=0.034..0.897 rows=332 loops=1)
>          Index Cond: (((prism_guid)::text = 'i0accd6a20000018405f095ee669dc5b4'::text) AND (display_timestamp >=
'2022-03-0100:00:00'::timestamp without time zone) AND (display_timestamp <= '2023- 
> 03-01 23:59:59.999'::timestamp without time zone) AND ((product_sid)::text = 'CARSWELL.WESTLAW'::text) AND
(is_deleted= '0'::numeric) AND ((client_id)::text = 'WLNCE_VNJXL7'::text)) 
>          Filter: (((event_type)::text <> ALL
>
('{versionsSearchWithinQueryEvent,notesOfDecisionsSearchWithinQueryEvent,citingReferencesSearchWithinQueryEvent,tocSearchWithinQueryEvent,searchWithinDocumentEvent,coCitesSearchWithinQueryEvent}'::t
> ext[])) AND (((product_view)::text = ANY
('{DefaultProductView,TNPPlus,PLCUS,Indigo,IndigoPremium,INDIGOCA}'::text[]))OR (product_view IS NULL))) 
>          Buffers: shared hit=341
> Planning:
>    Buffers: shared hit=6
> Planning Time: 0.266 ms
> Execution Time: 0.965 ms
> (10 rows)
>  
> *** 341 logical reads to find 332 rows.
>  
> Plan hash value: 3361538278
>  
> ----------------------------------------------------------------------------------------------------------
> | Id  | Operation                | Name                | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
> ----------------------------------------------------------------------------------------------------------
> |   0 | SELECT STATEMENT         |                     |      1 |        |      1 |00:00:00.01 |     20 |
> |   1 |  SORT AGGREGATE          |                     |      1 |      1 |      1 |00:00:00.01 |      20 |
> |*  2 |   FILTER                 |                     |      1 |        |   332 |00:00:00.01 |      20 |
> |   3 |    PARTITION RANGE SINGLE|                     |      1 |      1 |    332 |00:00:00.01 |      20 |
> |*  4 |     INDEX RANGE SCAN     |HISTORY_EVENT_IDX02 |      1 |      1 |    332 |00:00:00.01 |      20 |
> ----------------------------------------------------------------------------------------------------------
>  
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>  
>    2 - filter(TO_TIMESTAMP(:SYS_B_16,:SYS_B_17)<=TO_TIMESTAMP(:SYS_B_18,:SYS_B_19))
>    4 - access("HISTORYEVE0_"."PRISM_GUID"=:SYS_B_01 AND
>               "HISTORYEVE0_"."DISPLAY_TIMESTAMP">=TO_TIMESTAMP(:SYS_B_16,:SYS_B_17) AND
>               "HISTORYEVE0_"."PRODUCT_SID"=:SYS_B_02 AND "HISTORYEVE0_"."IS_DELETED"=:SYS_B_00 AND
>               "HISTORYEVE0_"."CLIENT_ID"=:SYS_B_15 AND "HISTORYEVE0_"."DISPLAY_TIMESTAMP"<=TO_TIMESTAMP(:SYS_B_1
>               8,:SYS_B_19))
>        filter(("HISTORYEVE0_"."CLIENT_ID"=:SYS_B_15 AND "HISTORYEVE0_"."PRODUCT_SID"=:SYS_B_02
>               AND "HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_03 AND "HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_04 AND
>               "HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_05 AND "HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_06 AND
>               "HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_07 AND "HISTORYEVE0_"."EVENT_TYPE"<>:SYS_B_08 AND
>               "HISTORYEVE0_"."IS_DELETED"=:SYS_B_00 AND (INTERNAL_FUNCTION("HISTORYEVE0_"."PRODUCT_VIEW") OR
>               "HISTORYEVE0_"."PRODUCT_VIEW" IS NULL)))
>  
> It finds the same 332 rows but it does only 20 logical reads.

In PostgreSQL, the 332 matching rows seem to me stored mostly on different pages, while they are
better clustered in your Oracle table.

If it is really very important for you, and the 57 milliseconds for the index scan is too much,
you can reorganize the table with

 CLUSTER hist28.history_event_display_timestamp_20230301 USING
history_event_display_timesta_prism_guid_display_timestamp_idx1;

That should reduce the number of pages read.

Yours,
Laurenz Albe

Re: Query performance going from Oracle to Postgres

From

David Rowley

Date:

07 September 2023, 02:28:17

On Thu, 7 Sept 2023 at 11:14, Dirschel, Steve
<steve.dirschel@thomsonreuters.com> wrote:
> select count(historyeve0_.HISTORY_EVENT_SID) as col_0_0_ from hist28.history_event_display_timestamp_20230301
historyeve0_where historyeve0_.IS_DELETED=0

> history_event_sid      | character varying(32)          |           | not null |

Do you get an Index Only Scan if you use count(*) instead of
count(historyeve0_.HISTORY_EVENT_SID)?

It seems that column is the only one missing from the index and that
might be the only thing that's stopping PostgreSQL using an Index Only
Scan. Perhaps Oracle realises that the column is NOT NULL so can do
count(*) instead of counting the non-nulls from that column.  That may
allow it to Index Only Scan? I'm not versed in reading Oracle plans so
don't know if it's possible to tell.

David

Re: Query performance going from Oracle to Postgres

From

Peter Geoghegan

Date:

07 September 2023, 03:07:47

On Wed, Sep 6, 2023 at 1:07 PM Dirschel, Steve
<steve.dirschel@thomsonreuters.com> wrote:
> Oracle will find the same 332 rows using the same index but in Oracle it only does 20 logical reads.  I thought maybe
theindex was fragmented so I reindexed that index: 

It seems likely that the problem here is that some of the predicates
appear as so-called "Filter:" conditions, as opposed to true index
quals. For reasons that aren't particularly fundamental, Postgres
B-Trees cannot push down != (or <>) to the index level. Strangely
enough, this is possible with IS NOT NULL. I'm working to improve
things in this area.

That difference is easy to see in the following example. The first
query does far fewer buffer accesses than the. second query, despite
being very similar:

regression=# explain (analyze, buffers) select * from multi_test where
a = 1 and b is not null;
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using multi_test_idx on multi_test  (cost=0.29..50.10
rows=2191 width=8) (actual time=0.095..0.100 rows=14 loops=1)
   Index Cond: ((a = 1) AND (b IS NOT NULL))
   Heap Fetches: 0
   Buffers: shared hit=3
 Planning Time: 0.056 ms
 Execution Time: 0.145 ms
(6 rows)

regression=# explain (analyze, buffers) select * from multi_test where
a = 1 and b != 42;
                                                              QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using multi_test_idx on multi_test
(cost=0.29..222.57 rows=2191 width=8) (actual time=0.087..2.982
rows=14 loops=1)
   Index Cond: (a = 1)
   Filter: (b <> 42)
   Rows Removed by Filter: 10000
   Heap Fetches: 0
   Buffers: shared hit=11
 Planning Time: 0.076 ms
 Execution Time: 3.204 ms
(8 rows)

(There are lots of index tuples matching (a, b) = "(1, NULL)" here, you see.)

--
Peter Geoghegan

Re: Query performance going from Oracle to Postgres

From

David Rowley

Date:

07 September 2023, 10:48:06

On Thu, 7 Sept 2023 at 19:17, Peter Geoghegan <pg@bowt.ie> wrote:
> It seems likely that the problem here is that some of the predicates
> appear as so-called "Filter:" conditions, as opposed to true index
> quals.

hmm, if that were true we'd see "Rows Removed by Filter" in the
explain analyze.

I think all that's going on is that each tuple is on a different page
and the heap accesses are just causing many buffers to be accessed. It
seems to me that an IOS would likely fix that problem which is why I
suggested count(*) instead of
count(<not_null_column_thats_the_only_col_used_thats_not_in_the_index>)

David

Re: Query performance going from Oracle to Postgres

From

Peter Geoghegan

Date:

08 September 2023, 01:22:21

On Thu, Sep 7, 2023 at 3:48 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On Thu, 7 Sept 2023 at 19:17, Peter Geoghegan <pg@bowt.ie> wrote:
> > It seems likely that the problem here is that some of the predicates
> > appear as so-called "Filter:" conditions, as opposed to true index
> > quals.
>
> hmm, if that were true we'd see "Rows Removed by Filter" in the
> explain analyze.

That's why I hedged, with "seems likely". The problem with using
filter conditions rather than true index quals isn't limited to the
problem of extra heap accesses. It happened to be convenient to make
my point that way, but that isn't particularly fundamental here.

I deliberately chose to make my example involve an index-only scan
(that doesn't require any heap accesses) for this reason.

> I think all that's going on is that each tuple is on a different page
> and the heap accesses are just causing many buffers to be accessed.

This index is an absolute monstrosity. I find it far easier to believe
that the real explanation is the one that Steve intuited: that there
is an issue with the way that the physical data structures (which are
more or less comparable in both systems) are accessed in Postgres.

The index in question
("history_event_display_timesta_prism_guid_display_timestamp_idx1")
has certain columns that are omitted from the query. These columns
nevertheless appear between other columns that the query filters on.
The leading two columns ("prism_guid" and "display_timestamp") are
made into index quals by Postgres, but the next index column after
those two ("unique_lookup_key") is omitted by the query, and so isn't an index
qual. In fact *four* columns are omitted after that one. But, the very-low-order
index column "product_sid" *does* appear in the query, and so also
appears as Postgres index quals.

There is every chance that the true underlying explanation is that
Oracle is able to skip over significant parts of the index structure
dynamically. In fact I'm practically certain that that's the case,
since the "product_sid" column appears as an "access predicate", rather
than as a "filter predicate". These terms are explained here:

https://use-the-index-luke.com/sql/explain-plan/oracle/filter-predicates

https://use-the-index-luke.com/sql/explain-plan/postgresql/filter-predicates

How could "product_sid" be used as an "access predicate" given the
omitted index columns? It seems very likely that parts of the index
can be skipped in Oracle, but not in Postgres -- at least not yet.

Like Markus Winand, I think that it's a real problem that EXPLAIN
doesn't yet expose the difference between access predicates and filter
predicates. Some of the index quals shown by EXPLAIN are marked
SK_BT_REQFWD/SK_BT_REQBKWD by nbtree (meaning that they're what Oracle
calls "access predicates"), while other are not (meaning that they're
what Oracle calls "filter predicates"). That issue makes it far harder to spot
these sorts of problems.

> It seems to me that an IOS would likely fix that problem which is why I
> suggested count(*) instead of
> count(<not_null_column_thats_the_only_col_used_thats_not_in_the_index>)

Probably, but why should we have to use an index-only scan? And what
about my example, that already used one, and still showed a big
disparity where there is no inherently reason why there had to be? My
example didn't involve jumping to another part of the index because the
problem seems to be more general than that.

--
Peter Geoghegan

RE: [EXT] Re: Query performance going from Oracle to Postgres

From

"Dirschel, Steve"

Date:

13 September 2023, 17:58:00

Thanks to all who replied on this issue.  I overlooked the difference in the execution plan between Oracle and
Postgres. Oracle in fact does an index range scan only with no table lookup.  When I changed the query to be a count(*)
Postgreswent from 332 logical reads to 19.  Oracle did 20.  Oracle did 20 regardless if it was count(*) or
count(non-indexed-column).

Regards
Steve

-----Original Message-----
From: Peter Geoghegan <pg@bowt.ie> 
Sent: Thursday, September 7, 2023 8:22 PM
To: David Rowley <dgrowleyml@gmail.com>
Cc: Dirschel, Steve <steve.dirschel@thomsonreuters.com>; pgsql-general@lists.postgresql.org; Wong, Kam Fook (TR
Technology)<kamfook.wong@thomsonreuters.com>

Subject: [EXT] Re: Query performance going from Oracle to Postgres

External Email: Use caution with links and attachments.

On Thu, Sep 7, 2023 at 3:48 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On Thu, 7 Sept 2023 at 19:17, Peter Geoghegan <pg@bowt.ie> wrote:
> > It seems likely that the problem here is that some of the predicates 
> > appear as so-called "Filter:" conditions, as opposed to true index 
> > quals.
>
> hmm, if that were true we'd see "Rows Removed by Filter" in the 
> explain analyze.

That's why I hedged, with "seems likely". The problem with using filter conditions rather than true index quals isn't
limitedto the problem of extra heap accesses. It happened to be convenient to make my point that way, but that isn't
particularlyfundamental here.

I deliberately chose to make my example involve an index-only scan (that doesn't require any heap accesses) for this
reason.

> I think all that's going on is that each tuple is on a different page 
> and the heap accesses are just causing many buffers to be accessed.

This index is an absolute monstrosity. I find it far easier to believe that the real explanation is the one that Steve
intuited:that there is an issue with the way that the physical data structures (which are more or less comparable in
bothsystems) are accessed in Postgres.

The index in question
("history_event_display_timesta_prism_guid_display_timestamp_idx1")
has certain columns that are omitted from the query. These columns nevertheless appear between other columns that the
queryfilters on.

The leading two columns ("prism_guid" and "display_timestamp") are made into index quals by Postgres, but the next
indexcolumn after those two ("unique_lookup_key") is omitted by the query, and so isn't an index qual. In fact *four*
columnsare omitted after that one. But, the very-low-order index column "product_sid" *does* appear in the query, and
soalso appears as Postgres index quals.

There is every chance that the true underlying explanation is that Oracle is able to skip over significant parts of the
indexstructure dynamically. In fact I'm practically certain that that's the case, since the "product_sid" column
appearsas an "access predicate", rather than as a "filter predicate". These terms are explained here:

https://urldefense.com/v3/__https://use-the-index-luke.com/sql/explain-plan/oracle/filter-predicates__;!!GFN0sa3rsbfR8OLyAw!fZJXNap0mP7xLTcE_5unwlR5eDfBHgw2F5LTBDGtjV_btV6Zze0MSWnTHamU16Fmu-kII-FwrC4WK7WLP60$

https://urldefense.com/v3/__https://use-the-index-luke.com/sql/explain-plan/postgresql/filter-predicates__;!!GFN0sa3rsbfR8OLyAw!fZJXNap0mP7xLTcE_5unwlR5eDfBHgw2F5LTBDGtjV_btV6Zze0MSWnTHamU16Fmu-kII-FwrC4WCXw7ubs$

How could "product_sid" be used as an "access predicate" given the omitted index columns? It seems very likely that
partsof the index can be skipped in Oracle, but not in Postgres -- at least not yet.

Like Markus Winand, I think that it's a real problem that EXPLAIN doesn't yet expose the difference between access
predicatesand filter predicates. Some of the index quals shown by EXPLAIN are marked SK_BT_REQFWD/SK_BT_REQBKWD by
nbtree(meaning that they're what Oracle calls "access predicates"), while other are not (meaning that they're what
Oraclecalls "filter predicates"). That issue makes it far harder to spot these sorts of problems.

> It seems to me that an IOS would likely fix that problem which is why 
> I suggested count(*) instead of
> count(<not_null_column_thats_the_only_col_used_thats_not_in_the_index>
> )

Probably, but why should we have to use an index-only scan? And what about my example, that already used one, and still
showeda big disparity where there is no inherently reason why there had to be? My example didn't involve jumping to
anotherpart of the index because the problem seems to be more general than that.

--
Peter Geoghegan