Re: pg_trgm partial-match - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: pg_trgm partial-match
Date
Msg-id CAPpHfdvuQPDgckJZWcgq=ggQGO1whvPQBKp+pEdzyW=4jVyK=Q@mail.gmail.com
Whole thread Raw
In response to pg_trgm partial-match  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: pg_trgm partial-match  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
Hi!

On Thu, Nov 15, 2012 at 11:39 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
Note that we cannot do a partial-match if KEEPONLYALNUM is disabled,
i.e., if query key contains multibyte characters. In this case, byte length of
the trigram string might be larger than three, and its CRC is used as a
trigram key instead of the trigram string itself. Because of using CRC, we
cannot do a partial-match. Attached patch extends pg_trgm so that it
compares a partial-match query key only when KEEPONLYALNUM is
enabled.

Didn't get this point. How does KEEPONLYALNUM guarantee that each trigram character is singlebyte?

CREATE TABLE test (val TEXT);
INSERT INTO test VALUES ('aa'), ('aaa'), ('шaaш');
CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops);
ANALYZE test;
test=# SELECT * FROM test WHERE val LIKE '%aa%';
 val  
------
 aa
 aaa
 шaaш
(3 rows)
test=# set enable_seqscan = off;
SET
test=# SELECT * FROM test WHERE val LIKE '%aa%';
 val 
-----
 aa
 aaa
(2 rows)

I think we can use partial match only for singlebyte encodings. Or, at most, in cases when all alpha-numeric characters are singlebyte (have no idea how to check this).

------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

Previous
From: JiangGuiqing
Date:
Subject: [PATCH] Patch to fix missing libecpg_compat.lib and libpgtypes.lib.
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH 13/14] Introduce pg_receivellog, the pg_receivexlog equivalent for logical changes