Home > mailing lists

Re: Index filter instead of index condition w/ IN / ANY queries above certain set size - Mailing list pgsql-general

From	Danny Shemesh
Subject	Re: Index filter instead of index condition w/ IN / ANY queries above certain set size
Date	November 23, 2022 19:23:49
Msg-id	CAFZC=Qqn+v-EjPszVP5y7XMowhjJyKYpy36S9akO1i4zJyh_-g@mail.gmail.com Whole thread
In response to	Re: Index filter instead of index condition w/ IN / ANY queries above certain set size (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-general

Tree view

Hey Laurenz, Tom - thanks again !

> that it is cheaper to use the index that supports the ORDER BY

Thing is, that both queries use the exact same index (idx_hashes), but one uses it w/ the filter and one does not.

> This doesn't match up terribly well with the table definition you showed before

Yeah.. it was a bit hard to reproduce exactly, but the fiddle does showcase that there's some threshold to the ANY set-size

where it stops using the column in the index condition, and moves it to the filter step - I thought it might originate

from similar reasons.

> but I wonder whether tidh is a low-order index column.

The index indeed uses tidh as a low order column, and it's better to have it the other way around -

currently, it's: (tid, pidh, tidh) - where (tid, tidh, pidh) would've probably worked better.

We've already optimized the query itself - but for pure understanding of the planner decision here,

I'd really still like to understand, if possible, the difference between ANY and IN,

and why, even though the column order isn't optimal - one plan still successfully uses the index more efficiently than another.

Any idea where I could zone-in in the source code to look for hints, maybe ?

Appreciate it !

Danny

On Wed, Nov 23, 2022 at 4:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Danny Shemesh <dany74q@gmail.com> writes:
> -> Index Only Scan using
> idx_hashes on refs (cost=0.56..722735.47 rows=33715 width=16) (actual
> time=1727.208..1727.208 rows=1 loops=1)
> Index Cond: (tid =
> '13371337-1337-1337-1337-133713371337'::uuid)
> * Filter: (tidh = ANY
> ('{13391339-1339-1339-1339-133913391339}'::uuid[])) <<<<<<<<<<<<<<<-
> Note this line* Rows Removed
> by Filter: 109087
> Heap Fetches: 16976
> Buffers: shared hit=13051
> read=14561
> I/O Timings: read=53405.294

This doesn't match up terribly well with the table definition
you showed before, but I wonder whether tidh is a low-order
index column. If you need to optimize this specific shape
of query you need to pay attention to the index column order, per

https://www.postgresql.org/docs/current/indexes-multicolumn.html

That is, tid and tidh need to be the first two index columns.

regards, tom lane

pgsql-general by date:

From: Simon Riggs
Date: 23 November 2022, 19:11:52
Subject: Re: MERGE RETURNING

From: Ron
Date: 23 November 2022, 19:24:15
Subject: Re: table inheritance partition and indexes

Re: Index filter instead of index condition w/ IN / ANY queries above certain set size - Mailing list pgsql-general

Previous

Next