Thread: Poor performance with row wise comparisons
Hi everyone
I wrote a query to iterate over two different columns. These columns have a composite unique index on them, and I’m only retrieving those columns. I found that there was a very large difference in performance when using row wise comparison versus a column comparison. The column comparison was much faster and the row wise comparison showed an increase in both CPU and shared buffer hits.
My expectation is that the following two queries would have roughly the same performance. They both use the same index only scans and return the same 100 rows of data. The main difference I see in the explain output is that the row wise comparison has 3,000 times the shared buffer hits, but it is unclear why there would need to be more hits.
I’ve also found that when I add a limit 100 clause to the row wise comparison I get roughly equivalent performance, but with limit 101, the extra shared hits return, which leads me to believe that the <= condition is not stopping the execution of the query.
We are running on AWS Aurora with version 16 and db.r5.24xlarge instance size.
I’ve attached 3 explains:
- Well performing query without a row wise comparison
- Poor performing query with row wise comparison
- Well performing query with row wise comparison and limit
explain (analyze, buffers)
select data_model_id, primary_key
from entity
WHERE data_model_id = 123 AND primary_key >= ‘ABC’ AND primary_key <= ‘DEF’;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using entity_data_model_id_primary_key_uniq on entity (cost=0.70..2.92 rows=1 width=31) (actual time=0.088..0.154 rows=100 loops=1)
Index Cond: ((data_model_id = 287) AND (primary_key >= ‘ABC’::text) AND (primary_key <= ‘DEF’::text))
Heap Fetches: 21
Buffers: shared hit=37
Planning Time: 0.111 ms
Execution Time: 0.175 ms
(6 rows)
explain (analyze, buffers)
select data_model_id, primary_key
from entity
WHERE (data_model_id, primary_key) >= (123 , ‘ABC’) AND (data_model_id, primary_key) <= (123, ‘DEF’);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using entity_data_model_id_primary_key_uniq on entity (cost=0.70..7135372.50 rows=15347234 width=31) (actual time=0.116..2779.874 rows=100 loops=1)
Index Cond: ((ROW(data_model_id, primary_key) >= ROW(123, ‘ABC’::text)) AND (ROW(data_model_id, primary_key) <= ROW(123, ‘DEF’::text)))
Heap Fetches: 21
Buffers: shared hit=97261
Planning:
Buffers: shared hit=112
Planning Time: 0.324 ms
Execution Time: 2779.915 ms
(8 rows)
explain (analyze, buffers)
select data_model_id, primary_key
from entity
WHERE (data_model_id, primary_key) BETWEEN (123, ‘DEF’) AND (123, 'DEF') limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.70..6.31 rows=100 width=31) (actual time=0.075..0.175 rows=100 loops=1)
Buffers: shared hit=34
-> Index Only Scan using entity_data_model_id_primary_key_uniq on entity (cost=0.70..873753.60 rows=15581254 width=31) (actual time=0.075..0.167 rows=100 loops=1)
Index Cond: ((ROW(data_model_id, primary_key) >= ROW(123, 'DEF'::text)) AND (ROW(data_model_id, primary_key) <= ROW(123, 'DEF'::text)))
Heap Fetches: 4
Buffers: shared hit=34
Planning:
Buffers: shared hit=104
Planning Time: 0.264 ms
Execution Time: 0.198 ms
(10 rows)
A truncated schema of our table:
Table "public.entity"
Column | Type | Collation | Nullable | Default
------------------------------+--------------------------+-----------+----------+----------------------------------------
id | bigint | | not null | nextval('api_entity_id_seq'::regclass)
updated_at | timestamp with time zone | | not null |
created_at | timestamp with time zone | | not null |
entity_id | uuid | | not null |
primary_key | character varying(255) | | not null |
data_model_id | bigint | | |
Indexes:
"api_entity_pkey" PRIMARY KEY, btree (id)
"entity_data_model_id_primary_key_uniq" UNIQUE, btree (data_model_id, primary_key)
"api_entity_entity_id_key" UNIQUE CONSTRAINT, btree (entity_id)
"api_entity_updated_045756_idx" btree (updated_at DESC)
Check constraints:
"api_entity_data_model_v3_not_null" CHECK (data_model_v3_id IS NOT NULL)
Best,
Jon
The content of this email is confidential, may contain proprietary information, and is solely intended for the recipient specified. If you received this message by mistake, please reply to this message and follow with its deletion, so that we can ensure such a mistake does not occur in the future.
On Fri, Feb 7, 2025 at 2:05 AM Jon Emord <jon@peregrine.io> wrote:
but with limit 101, the extra shared hits return
Can you show the explain analyze for the limit 101 case?
Cheers,
Greg
Jon Emord <jon@peregrine.io> writes: > -> Index Only Scan using entity_data_model_id_primary_key_uniq on entity (cost=0.70..873753.60 rows=15581254 width=31)(actual time=0.093..2712.836 rows=100 loops=1) > Index Cond: ((ROW(data_model_id, primary_key) >= ROW(123, 'ABC'::text)) AND (ROW(data_model_id, primary_key) <=ROW(123, 'DEF'::text))) > Heap Fetches: 4 > Buffers: shared hit=97259 > 2. > data_model_id = 123 is the 15 most common value of data_model_id with 10.8 million records Hm. I think your answer is in this comment in nbtree's key-preprocessing logic: * Row comparison keys are currently also treated without any smarts: * we just transfer them into the preprocessed array without any * editorialization. We can treat them the same as an ordinary inequality * comparison on the row's first index column, for the purposes of the logic * about required keys. That is, for the purposes of deciding when the index scan can stop, the "<= ROW" condition acts like "data_model_id <= 123". So it will run through all of the data_model_id = 123 entries before stopping. regards, tom lane
On Fri, 2025-02-07 at 16:16 +0000, Jon Emord wrote: > explain (analyze, buffers) > select data_model_id, primary_key > from entity > WHERE (data_model_id, primary_key) BETWEEN (123, ‘ABC’) AND (123, ‘DEF’) > limit 101; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.70..6.37 rows=101 width=31) (actual time=0.094..2712.844 rows=100 loops=1) > Buffers: shared hit=97259 > -> Index Only Scan using entity_data_model_id_primary_key_uniq on entity (cost=0.70..873753.60 rows=15581254 width=31)(actual time=0.093..2712.836 rows=100 loops=1) > Index Cond: ((ROW(data_model_id, primary_key) >= ROW(123, 'ABC'::text)) AND (ROW(data_model_id, primary_key) <=ROW(123, 'DEF'::text))) > Heap Fetches: 4 > Buffers: shared hit=97259 > Planning: > Buffers: shared hit=104 > Planning Time: 0.204 ms > Execution Time: 2712.873 ms How is the index "entity_data_model_id_primary_key_uniq" defined? Yours, Laurenz Albe