Re: intarray GiST index gets wrong answers for '{}' <@ anything - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: intarray GiST index gets wrong answers for '{}' <@ anything
Date
Msg-id CAPpHfducs5m5XxxgSi5K=Ugcm0Y7LyYCvytyJDvOsi7mgt1XHg@mail.gmail.com
Whole thread Raw
In response to intarray GiST index gets wrong answers for '{}' <@ anything  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: intarray GiST index gets wrong answers for '{}' <@ anything
List pgsql-hackers
Hi!

On Tue, Aug 6, 2019 at 8:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The reason appears to be that the condition for descending through a
> non-leaf index key for the RTContainedBy case is incorrectly optimistic:
> it supposes that we only need to descend into subtrees whose union key
> overlaps the query array.  But this does not guarantee to find subtrees
> that contain empty-array entries.  Worse, such entries could be anywhere
> in the tree, and because of the way that the insertion penalty is
> calculated, they probably are.  (We will compute a zero penalty to add
> an empty array item to any subtree.)  The reason it sometimes works
> seems to be that GiST randomizes its insertion decisions when there are
> equal penalties (cf gistchoose()), and sometimes by luck it puts all
> of the empty-array entries into subtrees that the existing rule will
> search.

Right, existing logic could work correctly, when dataset contains no
empty arrays.  But it clearly doesn't handle empty arrays.

> So as far as I can see, we have little choice but to lobotomize the
> RTContainedBy case and force a whole-index search.  This applies to
> both the gist__int_ops and gist__intbig_ops opclasses.  This is
> pretty awful for any applications that are depending on such queries
> to be fast, but it's hard to argue with "it gets the wrong answer,
> and not even reproducibly so".

+1 for pushing this

> In the future we might think about removing <@ from these opclasses,
> or making a non-backward-compatible change to segregate empty arrays
> from everything else in the index.  But neither answer seems very
> back-patchable, and I'm not really sure I want to put so much work
> into a second-class-citizen contrib module anyway.

+1 for removing <@ from opclasses.  Trying to segregate empty arrays
looks like invention of new opclass rather than bugfix for current
one.  One, who is interested in this piece of work, can implement this
new opclass.

Users, who likes existing behavior of handling <@ operator in intarray
opclasses, may be advised to rewrite their queries as following.

"col <@ const" => "col <@ const AND col && const"

New queries would have opclass support and handle non-empty arrays in
the same way.  It will be slightly slower because of evaluation of two
operators instead of one.  But this doesn't seem critical.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: How am I supposed to fix this?
Next
From: Larry Rosenman
Date:
Subject: Re: How am I supposed to fix this?