Thread: GIN indexes on an = ANY(array) clause

GIN indexes on an = ANY(array) clause

From
Corey Huinker
Date:
(moving this over from pgsql-performance) 

A client had an issue with a where that had a where clause something like this:

WHERE 123456 = ANY(integer_array_column)

I was surprised that this didn't use the pre-existing GIN index on integer_array_column, whereas recoding as

WHERE ARRAY[123456] <@ integer_array_column

did cause the GIN index to be used. Is this a known/expected behavior? If so, is there any logical reason why we couldn't have the planner pick up on that?

Flo Rance (tourance@gmail.com) was nice enough to show that yes, this is expected behavior.

Which leaves the questions: 
- is the transformation I made is algebraically correct in a general case?
- if so, could we have the planner do that automatically when in the presence of a matching GIN index?

This seems like it might tie in with the enforcement of foreign keys within an array thread (which I can't presently find...).

Re: GIN indexes on an = ANY(array) clause

From
Tom Lane
Date:
Corey Huinker <corey.huinker@gmail.com> writes:
> A client had an issue with a where that had a where clause something like
> this:
> WHERE 123456 = ANY(integer_array_column)
> I was surprised that this didn't use the pre-existing GIN index on
> integer_array_column, whereas recoding as
> WHERE ARRAY[123456] <@ integer_array_column
> did cause the GIN index to be used. Is this a known/expected behavior? If
> so, is there any logical reason why we couldn't have the planner pick up on
> that?
> Flo Rance (tourance@gmail.com) was nice enough to show that yes, this is
> expected behavior.

The planner doesn't know enough about the semantics of array <@ to make
such a transformation.  (As pointed out in the stackoverflow article Flo
pointed you to, the equivalence might not even hold, depending on which
version of <@ we're talking about.)

Since the GIN index type is heavily oriented towards array-related
operators, I spent some time wondering whether we could get any mileage
by making ScalarArrayOpExpr indexquals be natively supported by GIN
(right now they aren't).  But really I don't see where the GIN AM would
get the knowledge from, either.  What it knows about the array_ops
opclass is basically the list of associated operators:

regression=# select amopopr::regoperator from pg_amop where amopfamily = 2745;
        amopopr        
-----------------------
 &&(anyarray,anyarray)
 @>(anyarray,anyarray)
 <@(anyarray,anyarray)
 =(anyarray,anyarray)
(4 rows)

and none of those are obviously related to the =(int4,int4) operator that
is in the ScalarArrayOp.  The only way to get from point A to point B is
to know very specifically that =(anyarray,anyarray) is related to any
scalar-type btree equality operator, which is not the kind of thing the
GIN AM ought to know either.

Really the array_ops opclass itself is the widest scope where it'd be
reasonable to embed knowledge about this sort of thing --- but we lack
any API at all whereby opclass-specific code could affect planner behavior
at this level.  Even if we had one, there's no obvious reason why we
should be consulting a GIN opclass about a ScalarArrayOp that does not
contain an operator visibly related to the opclass.  That path soon
leads to consulting everybody about everything and planner performance
going into the tank.

Extensibility is a harsh mistress.  

            regards, tom lane


Re: GIN indexes on an = ANY(array) clause

From
Andreas Karlsson
Date:
On 3/13/19 5:38 PM, Tom Lane wrote:
> regression=# select amopopr::regoperator from pg_amop where amopfamily = 2745;
>          amopopr
> -----------------------
>   &&(anyarray,anyarray)
>   @>(anyarray,anyarray)
>   <@(anyarray,anyarray)
>   =(anyarray,anyarray)
> (4 rows)
> 
> and none of those are obviously related to the =(int4,int4) operator that
> is in the ScalarArrayOp.  The only way to get from point A to point B is
> to know very specifically that =(anyarray,anyarray) is related to any
> scalar-type btree equality operator, which is not the kind of thing the
> GIN AM ought to know either.

In the discussions for the patch for foreign keys from arrays[1] some 
people proposed add a new operator, <<@(anyelement,anyarray), to avoid 
having to construct left hand side arrays. Would that help here or does 
it still have the same issues?

1. 
https://www.postgresql.org/message-id/CAJvoCut7zELHnBSC8HrM6p-R6q-NiBN1STKhqnK5fPE-9%3DGq3g%40mail.gmail.com

Andreas