Re: Behavior of a pg_trgm index for 2 (or < 3) character LIKE queries - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Behavior of a pg_trgm index for 2 (or < 3) character LIKE queries
Date
Msg-id CA+HiwqGnEqXZr131aGrBPRu2T87Rk=XJmNoXBqSj4eHSK2To+g@mail.gmail.com
Whole thread Raw
In response to Re: Behavior of a pg_trgm index for 2 (or < 3) character LIKE queries  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: Behavior of a pg_trgm index for 2 (or < 3) character LIKE queries  (Sawada Masahiko <sawada.mshk@gmail.com>)
List pgsql-hackers
On Fri, May 31, 2013 at 4:25 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> On Thu, May 30, 2013 at 12:49 PM, Sawada Masahiko <sawada.mshk@gmail.com>
> wrote:
>>
>> following emails are discussed about partial match of pg_trgm.  I hope
>> will this help.
>>
>> <http://www.postgresql.org/message-id/CAHGQGwFJshvV2nGME19wdTW9teFw_w7h2ns4E+YYsjkB9WdWDQ@mail.gmail.com>
>> as you may know, if search string contains multibyte characters
>> trigram key is converted to CRC of 4 byte and it is used as key.
>> (but only use upper 3 byte from CRC)
>> so we can do partial matching if KEEPONLYALNUM is enabled.
>
>
> Please, read the further discussion on that thread. We can't do partial
> matching because of CRC independently of KEEPONLYALNUM.
>

Also, a few more questions:

1) When building a trgm index, are there any differences for
multi-byte character strings. For example, would a 2 character
Japanese string (multi-byte offcourse) produce exactly 3 trigrams to
be stored in the index which would later be used while look-up?

2) And if that is so, is there problem in gin_extract_query_trgm(),
that is while generating trigrams from a query search term that causes
trigrams (stored in the index if answer to (1) is yes) NOT to be used
in such a partial matching case?

--
Amit Langote



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Freezing without write I/O
Next
From: Robert Haas
Date:
Subject: Re: Freezing without write I/O