Thread: GIN indexes on an = ANY(array) clause
(moving this over from pgsql-performance)
A client had an issue with a where that had a where clause something like this:
WHERE 123456 = ANY(integer_array_column)
I was surprised that this didn't use the pre-existing GIN index on integer_array_column, whereas recoding as
WHERE ARRAY[123456] <@ integer_array_column
did cause the GIN index to be used. Is this a known/expected behavior? If so, is there any logical reason why we couldn't have the planner pick up on that?
Flo Rance (tourance@gmail.com) was nice enough to show that yes, this is expected behavior.
Which leaves the questions:
- is the transformation I made is algebraically correct in a general case?
- if so, could we have the planner do that automatically when in the presence of a matching GIN index?
- is the transformation I made is algebraically correct in a general case?
- if so, could we have the planner do that automatically when in the presence of a matching GIN index?
This seems like it might tie in with the enforcement of foreign keys within an array thread (which I can't presently find...).
Corey Huinker <corey.huinker@gmail.com> writes: > A client had an issue with a where that had a where clause something like > this: > WHERE 123456 = ANY(integer_array_column) > I was surprised that this didn't use the pre-existing GIN index on > integer_array_column, whereas recoding as > WHERE ARRAY[123456] <@ integer_array_column > did cause the GIN index to be used. Is this a known/expected behavior? If > so, is there any logical reason why we couldn't have the planner pick up on > that? > Flo Rance (tourance@gmail.com) was nice enough to show that yes, this is > expected behavior. The planner doesn't know enough about the semantics of array <@ to make such a transformation. (As pointed out in the stackoverflow article Flo pointed you to, the equivalence might not even hold, depending on which version of <@ we're talking about.) Since the GIN index type is heavily oriented towards array-related operators, I spent some time wondering whether we could get any mileage by making ScalarArrayOpExpr indexquals be natively supported by GIN (right now they aren't). But really I don't see where the GIN AM would get the knowledge from, either. What it knows about the array_ops opclass is basically the list of associated operators: regression=# select amopopr::regoperator from pg_amop where amopfamily = 2745; amopopr ----------------------- &&(anyarray,anyarray) @>(anyarray,anyarray) <@(anyarray,anyarray) =(anyarray,anyarray) (4 rows) and none of those are obviously related to the =(int4,int4) operator that is in the ScalarArrayOp. The only way to get from point A to point B is to know very specifically that =(anyarray,anyarray) is related to any scalar-type btree equality operator, which is not the kind of thing the GIN AM ought to know either. Really the array_ops opclass itself is the widest scope where it'd be reasonable to embed knowledge about this sort of thing --- but we lack any API at all whereby opclass-specific code could affect planner behavior at this level. Even if we had one, there's no obvious reason why we should be consulting a GIN opclass about a ScalarArrayOp that does not contain an operator visibly related to the opclass. That path soon leads to consulting everybody about everything and planner performance going into the tank. Extensibility is a harsh mistress. regards, tom lane
On 3/13/19 5:38 PM, Tom Lane wrote: > regression=# select amopopr::regoperator from pg_amop where amopfamily = 2745; > amopopr > ----------------------- > &&(anyarray,anyarray) > @>(anyarray,anyarray) > <@(anyarray,anyarray) > =(anyarray,anyarray) > (4 rows) > > and none of those are obviously related to the =(int4,int4) operator that > is in the ScalarArrayOp. The only way to get from point A to point B is > to know very specifically that =(anyarray,anyarray) is related to any > scalar-type btree equality operator, which is not the kind of thing the > GIN AM ought to know either. In the discussions for the patch for foreign keys from arrays[1] some people proposed add a new operator, <<@(anyelement,anyarray), to avoid having to construct left hand side arrays. Would that help here or does it still have the same issues? 1. https://www.postgresql.org/message-id/CAJvoCut7zELHnBSC8HrM6p-R6q-NiBN1STKhqnK5fPE-9%3DGq3g%40mail.gmail.com Andreas