Home > mailing lists

Re: Index filter instead of index condition w/ IN / ANY queries above certain set size - Mailing list pgsql-general

From	Danny Shemesh
Subject	Re: Index filter instead of index condition w/ IN / ANY queries above certain set size
Date	November 23, 2022 16:38:47
Msg-id	CAFZC=QrNdG+6+K1tWyZTRtiAz6+VG_8hQrnhTYV-Dac2eFaanQ@mail.gmail.com Whole thread Raw
In response to	Re: Index filter instead of index condition w/ IN / ANY queries above certain set size (Laurenz Albe <laurenz.albe@cybertec.at>)
Responses	Re: Index filter instead of index condition w/ IN / ANY queries above certain set size (Laurenz Albe <laurenz.albe@cybertec.at>) Re: Index filter instead of index condition w/ IN / ANY queries above certain set size (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-general

Tree view

Hey Laurenz, thanks for the prompt response !

What I meant is this - the plan consists of either an index scan that uses all passed columns in the index condition, or an index scan

that uses only one column as an index condition, with an additional filter step.

The below are explain (analyze, buffers) outputs of both queries from our system, redacted:

This one uses = ANY({element}) and has an index cond + filter (which is less favorable on our data set, performance wise):

EXPLAIN (ANALYZE, BUFFERS)

SELECT distinct(pid) FROM hashes WHERE tid = '13371337-1337-1337-1337-133713371337' AND hid = any(WITH RECURSIVE cte AS ((SELECT pidh FROM refs WHERE tid = '13371337-1337-1337-1337-133713371337' AND tidh = ANY('{13391339-1339-1339-1339-133913391339}')

ORDER BY pidh LIMIT 1

) UNION ALL SELECT lateral_join.* FROM cte CROSS JOIN LATERAL (SELECT pidh FROM refs WHERE tid = '13371337-1337-1337-1337-133713371337' AND tidh = ANY('{13391339-1339-1339-1339-133913391339}') pidh > cte.pidh ORDER BY pidh LIMIT 1) lateral_join)

SELECT pidh FROM cte);
Unique (cost=2557.89..2560.30 rows=65 width=16) (actual time=105369.476..105369.498 rows=37 loops=1)
Buffers: shared hit=336506 read=902254
I/O Timings: read=2423376.942
-> Sort (cost=2557.89..2559.09 rows=482 width=16) (actual time=105369.474..105369.484 rows=57 loops=1)
Sort Key: hashes.pid
Sort Method: quicksort Memory: 27kB
Buffers: shared hit=336506 read=902254
I/O Timings: read=2423376.942
-> Hash Semi Join (cost=2449.10..2536.41 rows=482 width=16) (actual time=105358.077..105369.411 rows=57 loops=1)
Hash Cond: (hashes.hid = cte.pidh)
Buffers: shared hit=336503 read=902254
I/O Timings: read=2423376.942
-> Seq Scan on hashes (cost=0.00..73.28 rows=3302 width=32) (actual time=0.865..11.736 rows=3400 loops=1)
Filter: (tid = '13371337-1337-1337-1337-133713371337'::uuid)
Buffers: shared read=32
I/O Timings: read=39.112
-> Hash (cost=2447.84..2447.84 rows=101 width=16) (actual time=105357.200..105357.204 rows=39 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=336503 read=902222
I/O Timings: read=2423337.830
-> CTE Scan on cte (cost=2444.81..2446.83 rows=101 width=16) (actual time=1727.214..105357.126 rows=39 loops=1)
Buffers: shared hit=336503 read=902222
I/O Timings: read=2423337.830
CTE cte
-> Recursive Union (cost=0.56..2444.81 rows=101 width=16) (actual time=1727.212..105357.035 rows=39 loops=1)
Buffers: shared hit=336503 read=902222
I/O Timings: read=2423337.830
-> Limit (cost=0.56..22.00 rows=1 width=16) (actual time=1727.210..1727.210 rows=1 loops=1)
Buffers: shared hit=13051 read=14561
I/O Timings: read=53405.294
-> Index Only Scan using idx_hashes on refs (cost=0.56..722735.47 rows=33715 width=16) (actual time=1727.208..1727.208 rows=1 loops=1)
Index Cond: (tid = '13371337-1337-1337-1337-133713371337'::uuid)
Filter: (tidh = ANY ('{13391339-1339-1339-1339-133913391339}'::uuid[])) <<<<<<<<<<<<<<<- Note this line
Rows Removed by Filter: 109087
Heap Fetches: 16976
Buffers: shared hit=13051 read=14561
I/O Timings: read=53405.294
-> Nested Loop (cost=0.56..242.08 rows=10 width=16) (actual time=2657.169..2657.171 rows=1 loops=39)
Buffers: shared hit=323452 read=887661
I/O Timings: read=2369932.536
-> WorkTable Scan on cte cte_1 (cost=0.00..0.20 rows=10 width=16) (actual time=0.000..0.001 rows=1 loops=39)
-> Limit (cost=0.56..24.17 rows=1 width=16) (actual time=2657.167..2657.167 rows=1 loops=39)
Buffers: shared hit=323452 read=887661
I/O Timings: read=2369932.536
-> Index Only Scan using idx_hashes on refs refs_1 (cost=0.56..265306.68 rows=11238 width=16) (actual time=2657.162..2657.162 rows=1 loops=39)
Index Cond: ((tid = '13371337-1337-1337-1337-133713371337'::uuid) AND (pidh > cte_1.pidh))
Filter: (tidh = ANY ('{13391339-1339-1339-1339-133913391339}'::uuid[])) <<<<<<<<<<<<<<<- Note this line
Rows Removed by Filter: 346024
Heap Fetches: 1506359
Buffers: shared hit=323452 read=887661
I/O Timings: read=2369932.536
Planning:
Buffers: shared hit=8
Planning Time: 0.537 ms
Execution Time: 105369.560 ms
Time: 105.378s (1 minute 45 seconds), executed in: 105.378s (1 minute 45 seconds)

Now, this is a plan that only changes the = ANY to IN, where we could note three things:

1. The IN operator was converted to a plain comparison of x = y

2. The filter step is now gone, and the column now appears in the Index Cond step

3. The query is 350x ~ faster

EXPLAIN (ANALYZE, BUFFERS)
SELECT distinct(pid) FROM hashes WHERE tid = '13371337-1337-1337-1337-133713371337' AND hid = any(WITH RECURSIVE cte AS ((SELECT pidh FROM refs WHERE tid = '13371337-1337-1337-1337-133713371337' AND tidh in('13391339-1339-1339-1339-133913391339')
ORDER BY pidh LIMIT 1
) UNION ALL SELECT lateral_join.* FROM cte CROSS JOIN LATERAL (SELECT pidh FROM refs WHERE tid = '13371337-1337-1337-1337-133713371337' AND tidh in ('13391339-1339-1339-1339-133913391339') pidh > cte.pidh ORDER BY pidh LIMIT 1) lateral_join)
SELECT pidh FROM cte);
HashAggregate (cost=854.80..855.45 rows=65 width=16) (actual time=297.874..297.883 rows=37 loops=1)
Group Key: hashes.pid
Batches: 1 Memory Usage: 24kB
Buffers: shared hit=91843 read=164
I/O Timings: read=154.439
-> Hash Semi Join (cost=766.29..853.60 rows=482 width=16) (actual time=297.009..297.857 rows=57 loops=1)
Hash Cond: (hashes.hid = cte.pidh)
Buffers: shared hit=91843 read=164
I/O Timings: read=154.439
-> Seq Scan on hashes (cost=0.00..73.28 rows=3302 width=32) (actual time=0.006..0.449 rows=3400 loops=1)
Filter: (tid = '13371337-1337-1337-1337-133713371337'::uuid)
Buffers: shared hit=32
I/O Timings: read=3.289
-> Hash (cost=765.03..765.03 rows=101 width=16) (actual time=296.995..296.998 rows=39 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=91811 read=164
I/O Timings: read=151.151
-> CTE Scan on cte (cost=762.00..764.02 rows=101 width=16) (actual time=1.940..296.958 rows=39 loops=1)
Buffers: shared hit=91811 read=164
I/O Timings: read=151.151
CTE cte
-> Recursive Union (cost=0.56..762.00 rows=101 width=16) (actual time=1.938..296.919 rows=39 loops=1)
Buffers: shared hit=91811 read=164
I/O Timings: read=151.151
-> Limit (cost=0.56..6.83 rows=1 width=16) (actual time=1.936..1.938 rows=1 loops=1)
Buffers: shared hit=1791
-> Index Only Scan using idx_hashes on refs (cost=0.56..211281.01 rows=33715 width=16) (actual time=1.936..1.936 rows=1 loops=1)
Index Cond: ((tid = '13371337-1337-1337-1337-133713371337'::uuid) AND (tidh = '13391339-1339-1339-1339-133913391339'::uuid)) <<<<<<<<<<<<<<<- Note this line
Heap Fetches: 0
Buffers: shared hit=1791
-> Nested Loop (cost=0.56..75.32 rows=10 width=16) (actual time=7.561..7.562 rows=1 loops=39)
Buffers: shared hit=90020 read=164
I/O Timings: read=151.151
-> WorkTable Scan on cte cte_1 (cost=0.00..0.20 rows=10 width=16) (actual time=0.000..0.000 rows=1 loops=39)
-> Limit (cost=0.56..7.49 rows=1 width=16) (actual time=7.559..7.560 rows=1 loops=39)
Buffers: shared hit=90020 read=164
I/O Timings: read=151.151
-> Index Only Scan using idx_hashes on refs refs_1 (cost=0.56..77897.32 rows=11238 width=16) (actual time=7.557..7.557 rows=1 loops=39)
Index Cond: ((tid = '13371337-1337-1337-1337-133713371337'::uuid) AND (pidh > cte_1.pidh) AND (tidh = '13391339-1339-1339-1339-133913391339'::uuid)) <<<<<<<<<<<<<<<- Note this line
Heap Fetches: 10
Buffers: shared hit=90020 read=164
I/O Timings: read=151.151
Planning:
Buffers: shared hit=8
Planning Time: 0.505 ms
Execution Time: 297.948 ms

I can also note that if I change the IN expression to compare on 2 or more items - it is converted to an ANY expression,

and the previous plan is chosen again.

Thanks a ton !

Danny

On Wed, Nov 23, 2022 at 11:22 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2022-11-23 at 10:49 +0200, Danny Shemesh wrote:
> I'm trying to understand when the planner decides to use an index condition vs an index filter

I am not sure what you mean by "index filter".

If you could send the result of EXPLAIN (ANALYZE, BUFFERS) for the queries,
that would be most useful.

Yours,
Laurenz Albe

pgsql-general by date:

From: Karsten Hilbert
Date: 23 November 2022, 14:56:36
Subject: Re: Getting PSQL in Windows to support TAB/Autocomplete via modified readline...

From: Laurenz Albe
Date: 23 November 2022, 17:24:40
Subject: Re: Index filter instead of index condition w/ IN / ANY queries above certain set size

Re: Index filter instead of index condition w/ IN / ANY queries above certain set size - Mailing list pgsql-general

Previous

Next