Thread: Overloaded && operator from intarray module prevents index usage.

Overloaded && operator from intarray module prevents index usage.

From
Thomas Kellerer
Date:
While testing a query on an integer array with a GIN index, I stumbled over a behaviour which surprised me and which I
wouldconsider a bug - but maybe I am wrong. 
 

Consider the following table:

    create table idlist (ids int[], ... other columns ...);
    create index on idlist using gin (ids array_ops);

Now the following query works fine and uses the index:

  select *
  from idlist
  where ids && array[1,2,3,4,6];

The above yields the following execution plan:

    Bitmap Heap Scan on public.idlist  (cost=1748.30..70780.07 rows=423281 width=240) (actual time=81.321..161.195
rows=423281loops=1)
 
      Output: ids
      Recheck Cond: (idlist.ids && '{1,2,3,4,6}'::integer[])
      Heap Blocks: exact=67084
      Buffers: shared hit=67196
      ->  Bitmap Index Scan on idlist_ids_idx  (cost=0.00..1642.48 rows=423281 width=0) (actual time=70.764..70.764
rows=423281loops=1)
 
            Index Cond: (idlist.ids && '{1,2,3,4,6}'::integer[])
            Buffers: shared hit=112
    Planning Time: 0.178 ms
    Execution Time: 171.245 ms


But when I ran that on a database where the intarray extension is installed (and part of the search_path), Postgres
usesthe intarray operator which can't use the GIN index with the array_ops opclass, so there the query yields the
followingexecution plan:
 

    Seq Scan on public.idlist  (cost=0.00..76127.00 rows=423281 width=240) (actual time=0.021..5046.396 rows=423281
loops=1)
      Output: ids
      Filter: (idlist.ids && '{1,2,3,4,6}'::integer[])
      Rows Removed by Filter: 1576719
      Buffers: shared hit=67127
    Planning Time: 0.123 ms
    Execution Time: 5056.144 ms

I can work around that, using "OPERATOR(pg_catalog.&&)" instead of "&&", but that seems like a kludge to me. 
The above happens even if the intarray extension is a the end of the search path, e.g. "set search_path = public,
intarray".
If I set the search path to only "public", the the index is used again. 

I tried the above with Postgres 11.2 on Windows and CentOS

Is this expected behaviour? Is this caused by the Postgres core (e.g. the optimizer to taking the opclass into account)
oris it a "problem" in the way the intarray module defines its operators? 
 

I would have expected that the optimizer uses the operator that matches the opclass for the index, or at least the
"first"one found in the search path. 
 

Any ideas?
Thomas



Re: Overloaded && operator from intarray module prevents index usage.

From
Andrew Gierth
Date:
>>>>> "Thomas" == Thomas Kellerer <spam_eater@gmx.net> writes:

[intarray woes]

 Thomas> Is this expected behaviour? Is this caused by the Postgres core
 Thomas> (e.g. the optimizer to taking the opclass into account) or is
 Thomas> it a "problem" in the way the intarray module defines its
 Thomas> operators?

It's basically a conflict between intarray (which is historically older)
and the built-in array indexing support.

The reason it happens is that the operator resolution logic matches an
(integer[] && integer[]) operator in preference to (anyarray && anyarray)
regardless of their relative position on the search_path. This
resolution happens before anything is known about any indexes that might
be applicable. Then later, at planning time, an index is chosen based on
the operator, not the reverse.

My own recommendation for most cases is to never install intarray on the
search path, and invoke its functions via explicit qualification or wrap
them in your own functions.

-- 
Andrew (irc:RhodiumToad)


Re: Overloaded && operator from intarray module prevents index usage.

From
Thomas Kellerer
Date:
Andrew Gierth schrieb am 28.02.2019 um 10:29:
> [intarray woes]
> 
>  Thomas> Is this expected behaviour? Is this caused by the Postgres core
>  Thomas> (e.g. the optimizer to taking the opclass into account) or is
>  Thomas> it a "problem" in the way the intarray module defines its
>  Thomas> operators?
> 
> It's basically a conflict between intarray (which is historically older)
> and the built-in array indexing support.
> 
> The reason it happens is that the operator resolution logic matches an
> (integer[] && integer[]) operator in preference to (anyarray && anyarray)
> regardless of their relative position on the search_path. This
> resolution happens before anything is known about any indexes that might
> be applicable. Then later, at planning time, an index is chosen based on
> the operator, not the reverse.

That makes sense, thanks. 



Re: Overloaded && operator from intarray module prevents index usage.

From
Tom Lane
Date:
Thomas Kellerer <spam_eater@gmx.net> writes:
> While testing a query on an integer array with a GIN index, I stumbled over a behaviour which surprised me and which
Iwould consider a bug - but maybe I am wrong.  

It's not a bug --- the problem is that that operator is capturing your
query reference, and it's not a member of the opclass for the index
you have, so no index scan for you.

> The above happens even if the intarray extension is a the end of the search path, e.g. "set search_path = public,
intarray".

Yeah, because it's an exact datatype match while the core operator
is anyarray && anyarray which is not.

Ideally, perhaps, the extension could add its operator to the core
gin-arrays opclass, but we lack any reasonable way to manage that.

Something that's maybe more likely to happen is to remove that
operator from the extension altogether; but that will break things
for some people too, no doubt :-(

            regards, tom lane


Re: Overloaded && operator from intarray module prevents index usage.

From
Michael Lewis
Date:
Yeah, because it's an exact datatype match while the core operator
is anyarray && anyarray which is not.

Can you dumb down how to change the index or column type such that an index will be used for the && operator while intarray extension is installed? We have the intarray extension installed and I doubt that I can get it removed. I've added gin index on a column declared as type "integer array" and had expected the index to be used, but I did not test it yet since there is some bad data with NULLs stored as a value in the integer array and I am waiting on that data cleanup to be done before directly using && operator. I expect that when I do test it, I will be impacted by this same concern. It is simplest to change the column type from integer array to anyarray? Is there a con to doing so?

Re: Overloaded && operator from intarray module prevents index usage.

From
Thomas Kellerer
Date:
Michael Lewis schrieb am 28.02.2019 um 21:23:
>     Yeah, because it's an exact datatype match while the core operator
>     is anyarray && anyarray which is not.
> 
> 
> Can you dumb down how to change the index or column type such that an
> index will be used for the && operator while intarray extension is
> installed? We have the intarray extension installed and I doubt that
> I can get it removed. I've added gin index on a column declared as
> type "integer array" and had expected the index to be used, but I did
> not test it yet since there is some bad data with NULLs stored as a
> value in the integer array and I am waiting on that data cleanup to
> be done before directly using && operator. I expect that when I do
> test it, I will be impacted by this same concern. It is simplest to
> change the column type from integer array to anyarray? Is there a con
> to doing so?

I see two options: create the gin index with the opclass provided by intarray:
    create index ... using gin (... gin__int_ops);

Or if you use the regular opclass for the GIN index, make sure to use the built-in operator,
e.g.: OPERATOR(pg_catalog.&&) instead of && or OPERATOR(pg_catalog.@>) instead of @>

Thomas


Re: Overloaded && operator from intarray module prevents index usage.

From
Tom Lane
Date:
Michael Lewis <mlewis@entrata.com> writes:
> Can you dumb down how to change the index or column type such that an index
> will be used for the && operator while intarray extension is installed? We
> have the intarray extension installed and I doubt that I can get it
> removed.

There's no magic nice solution to this, or we'd have told you about it.

Possible options:

1. Remove intarray extension.
2. Move intarray extension to a schema that's not in your search path.
3. Create an index using intarray's opclass, instead of or in addition
   to the core-opclass index.
4. Rename intarray's && operator to something else (will bite you at
   next dump/reload, where the renaming will be lost).
5. Always schema-qualify references to the core && operator.

All of these have obvious downsides, especially if you're actively
using the intarray extension for other purposes.

            regards, tom lane


Re: Overloaded && operator from intarray module prevents index usage.

From
Michael Lewis
Date:

On Thu, Feb 28, 2019 at 3:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Lewis <mlewis@entrata.com> writes:
> Can you dumb down how to change the index or column type such that an index
> will be used for the && operator while intarray extension is installed? We
> have the intarray extension installed and I doubt that I can get it
> removed.

There's no magic nice solution to this, or we'd have told you about it.

Possible options:

1. Remove intarray extension.
2. Move intarray extension to a schema that's not in your search path.
3. Create an index using intarray's opclass, instead of or in addition
   to the core-opclass index.

Thank you so much for you time in enumerating the options. What's the concern/problem/cost to re-creating the index with the intarray's opclass? If that's what will be used by && with the extension installed, then what's the downside?

I see significant code refactor for option 1 and 2 as it will have system wide impact rather than specific to the use of this particular column which is limited.

 
4. Rename intarray's && operator to something else (will bite you at
   next dump/reload, where the renaming will be lost).
5. Always schema-qualify references to the core && operator.

Would a sixth option be to re-create the column as array type and keep the index as is and ensure that in queries, I am using conditions like ARRAY[1] && table.column_name rather than '{1}'::integer[] && table.column_name? Or would I still need to schema qualify references to the core && operator for it to be used?

If I created a table and disabled sequential scan and such, I suppose I could easily test that the index get used or not.
 

All of these have obvious downsides, especially if you're actively
using the intarray extension for other purposes.

                        regards, tom lane
On 2/28/19 4:53 PM, Michael Lewis wrote:
[snip]
Would a sixth option be to re-create the column as array type

Codd is spinning in his grave...

--
Angular momentum makes the world go 'round.

Re: Overloaded && operator from intarray module prevents index usage.

From
Michael Lewis
Date:


On Thu, Feb 28, 2019 at 4:57 PM Ron <ronljohnsonjr@gmail.com> wrote:
On 2/28/19 4:53 PM, Michael Lewis wrote:
[snip]
Would a sixth option be to re-create the column as array type

Codd is spinning in his grave...

I'd hope he would be fine with people asking questions to learn. I'm open to studying any suggested resources. I also love to learn from those with experience who are further down this path, just as I love to share my experience with people who don't from a WHERE from a HAVING. If there is anything specific you can point me to, please do chime in with something constructive.
On 2/28/19 7:53 PM, Michael Lewis wrote:


On Thu, Feb 28, 2019 at 4:57 PM Ron <ronljohnsonjr@gmail.com> wrote:
On 2/28/19 4:53 PM, Michael Lewis wrote:
[snip]
Would a sixth option be to re-create the column as array type

Codd is spinning in his grave...

I'd hope he would be fine with people asking questions to learn. I'm open to studying any suggested resources. I also love to learn from those with experience who are further down this path, just as I love to share my experience with people who don't from a WHERE from a HAVING. If there is anything specific you can point me to, please do chime in with something constructive.

Michael,

E. F. Codd developed the relational model, and Normal Forms to structure the attributes (columns), tuples (rows) and relations (tables).  They work to prevent insert, update and delete anomalies.

Quoting him in The Relational Model for Database Management Version 2 from 1990: "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS."

Arrays are -- by definition -- not atomic, and so they fundamentally break the model that relational databases are founded upon.  If you want to be a good database designer, don't use arrays.

(Darwen and Date deconstruct "atomic value" into meaninglessness by claiming that, for example, strings are arrays of characters and thus arrays are ok.  I think that's bollocks.)

--
Angular momentum makes the world go 'round.

Re: Overloaded && operator from intarray module prevents index usage.

From
Michael Lewis
Date:
Arrays are -- by definition -- not atomic, and so they fundamentally break the model that relational databases are founded upon.  If you want to be a good database designer, don't use arrays.

Thanks. I was reading about Codd after your last email, but couldn't guess at which point was objectionable. I'll have to keep that in mind and always question when I come across an array or if I am ever tempted to use one.

While I can make more minor modifications to schema and suggest best practices, I am not in the position to set or enforce policy within my organization. The table and field in question are absolutely a mess and this field in particular is actually an array of IDs of other records on the same table (that actually represent another object that is similar but not the same as the one containing the array of IDs).

So, I am just looking to make the best of a bad situation currently and understand how best to use the various tools available to me, with little impact on the existing codebase but still getting performance benefits from proper indexing. Thanks for sharing the additional context.
On 2/28/19 10:26 PM, Michael Lewis wrote:
Arrays are -- by definition -- not atomic, and so they fundamentally break the model that relational databases are founded upon.  If you want to be a good database designer, don't use arrays.

Thanks. I was reading about Codd after your last email, but couldn't guess at which point was objectionable. I'll have to keep that in mind and always question when I come across an array or if I am ever tempted to use one.

While I can make more minor modifications to schema and suggest best practices, I am not in the position to set or enforce policy within my organization. The table and field in question are absolutely a mess and this field in particular is actually an array of IDs of other records on the same table (that actually represent another object that is similar but not the same as the one containing the array of IDs).

So, I am just looking to make the best of a bad situation currently and understand how best to use the various tools available to me, with little impact on the existing codebase but still getting performance benefits from proper indexing. Thanks for sharing the additional context.

Don't worry.  I'm just in an especially "geezer" mood today.

--
Angular momentum makes the world go 'round.

Re: Overloaded && operator from intarray module prevents index usage.

From
Michael Lewis
Date:
I'll try to stay off your lawn.

Re: Overloaded && operator from intarray module prevents index usage.

From
Andrew Gierth
Date:
>>>>> "Ron" == Ron  <ronljohnsonjr@gmail.com> writes:

 Ron> Arrays are -- by definition -- not atomic, and so they
 Ron> fundamentally break the model that relational databases are
 Ron> founded upon.  If you want to be a good database designer, don't
 Ron> use arrays.

"In theory there is no difference between theory and practice, but in
practice there is."

Sometimes a good database designer has to tell the theoreticians where
to get off, and do something more pragmatic.

--
Andrew (irc:RhodiumToad)