tsquery @> operator bugs - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject tsquery @> operator bugs
Date
Msg-id 544C03E8.4020402@vmware.com
Whole thread Raw
List pgsql-bugs
While looking at all the places where we currently use CRC, I bumped
into this:

postgres=# select 'penomaha'::tsquery @> 'lbgimpca'::tsquery;
  ?column?
----------
  t
(1 row)

The @> operator is supposed to return true if the first query contains
all the terms of the second query. The above result is bogus; the
strings are completely different. It returns true because both terms
have the same CRC (with our funky CRC algorithm), and the tsq_mcontains
function only compares the CRCs, not the actual values.

Another bug is that the function performs a length check first, and
returns false if the second string is larger than the first. The
thinking goes that the first string cannot possibly contain the second
string if the second string is larger. But that doesn't take into
account that there can be duplicate strings (this is basically the same
bug that was recently fixed in jsonb):

postgres=# select 'a & b' @> 'a & a'::tsquery; /* CORRECT */
  ?column?
----------
  t
(1 row)

postgres-# select 'a' @> 'a & a'::tsquery; /* WRONG */
  ?column?
----------
  f
(1 row)

I propose the attached fix. It completely rewrites the tsq_mcontains
function, so that it first extracts all the strings from both tsqueries,
then sorts them and removes duplicates, and then compares the arrays.

(I actually find the whole operator pretty useless. What is it good for?
But that's a different story..)

- Heikki

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Re[2]: [BUGS] BUG #11761: range_in dosn't work via direct functional call
Next
From: Tom Lane
Date:
Subject: Re: BUG #11617: issue with dump/restore involving view with hstore data type embedded in where condition