Thread: Index only scans for expressional indices when querying for the expression

Index only scans for expressional indices when querying for the expression

From
Danny Shemesh
Date:
Hello everyone,

Quick question here about index-only scans and expressional indices, tested on postgres <= 14,
Say I have a column, and an expressional index on said column (fiddle) - e.g.

create table t1 (x text);
create index idx_upper on t1 (upper(x));

I see that even if I query for upper(x) in itself, I won't achieve an index-only scan without creating a covering index that includes the manipulated field:
create index idx_upper_covering on t1 (upper(x)) include (x);

I wonder if it would've been possible to have an index-only scan for similar cases without having to include the base column ? 
I believe the expressional index in itself could've been considered as covering, when querying for the expression explicitly.

The thing is, indices with include clauses do not support page deduplication, which causes the size of the index to bloat in our case, over 20x in size at times.
Also, it could've been beneficial when creating indices on complex types, say - indexing the first element on an array, or a specific nested element of a jsonb column, et-al.

Appreciate your time !
Danny

Re: Index only scans for expressional indices when querying for the expression

From
"David G. Johnston"
Date:
On Thursday, August 4, 2022, Danny Shemesh <dany74q@gmail.com> wrote:
I believe the expressional index in itself could've been considered as covering, when querying for the expression explicitly.

This belief is wrong.  When storing f(x) there is no way to recover the value of x.

David J.
 

Re: Index only scans for expressional indices when querying for the expression

From
Danny Shemesh
Date:
Hey David - thanks for the prompt response !

That is of course correct, but what I mean is that, I think that if one would explicitly query f(x), and never for x directly, it would've been
theoretically possible to say that the index is covering for every f(x), wouldn't it ?

Or in other words, if one only ever queries f(x), and the only available index is on f(x), then the index will hold all f(x) values
and would never need to reverse engineer the value of x to answer such a specific query.

Danny

On Thu, Aug 4, 2022 at 3:28 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thursday, August 4, 2022, Danny Shemesh <dany74q@gmail.com> wrote:
I believe the expressional index in itself could've been considered as covering, when querying for the expression explicitly.

This belief is wrong.  When storing f(x) there is no way to recover the value of x.

David J.
 
Danny Shemesh <dany74q@gmail.com> writes:
> That is of course correct, but what I mean is that, I think that if one
> would explicitly query f(x), and never for x directly, it would've been
> theoretically possible to say that the index is covering for every f(x),
> wouldn't it ?

Theoretically, yeah, but we don't support that: an index-only scan
will only be considered if x itself is available from the index.
There are a couple of reasons for that, but the main one is that
detecting whether an index matches the query would be far more expensive
if it had to consider expression subtrees not just the base Vars.

            regards, tom lane



Re: Index only scans for expressional indices when querying for the expression

From
Danny Shemesh
Date:
A-ha, interesting !

I think we have some specific use cases where it'd be worth the overhead, I'd need to measure it, though; 

Do you think there'd be room to accept a contribution for such functionality with a disabled-by-default pg setting,
or are you skeptical it would ever be worth the trade-off ?

Thanks again,
Danny

On Thu, Aug 4, 2022 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Danny Shemesh <dany74q@gmail.com> writes:
> That is of course correct, but what I mean is that, I think that if one
> would explicitly query f(x), and never for x directly, it would've been
> theoretically possible to say that the index is covering for every f(x),
> wouldn't it ?

Theoretically, yeah, but we don't support that: an index-only scan
will only be considered if x itself is available from the index.
There are a couple of reasons for that, but the main one is that
detecting whether an index matches the query would be far more expensive
if it had to consider expression subtrees not just the base Vars.

                        regards, tom lane
Danny Shemesh <dany74q@gmail.com> writes:
> Do you think there'd be room to accept a contribution for such
> functionality with a disabled-by-default pg setting,
> or are you skeptical it would ever be worth the trade-off ?

If you can show me a matching algorithm with non-exponential runtime,
I'd be interested.

            regards, tom lane