Note that we cannot do a partial-match if KEEPONLYALNUM is disabled, i.e., if query key contains multibyte characters. In this case, byte length of the trigram string might be larger than three, and its CRC is used as a trigram key instead of the trigram string itself. Because of using CRC, we cannot do a partial-match. Attached patch extends pg_trgm so that it compares a partial-match query key only when KEEPONLYALNUM is enabled.
Didn't get this point. How does KEEPONLYALNUM guarantee that each trigram character is singlebyte?
CREATE TABLE test (val TEXT);
INSERT INTO test VALUES ('aa'), ('aaa'), ('шaaш');
CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops);
ANALYZE test;
test=# SELECT * FROM test WHERE val LIKE '%aa%';
val
------
aa
aaa
шaaш
(3 rows)
test=# set enable_seqscan = off;
SET
test=# SELECT * FROM test WHERE val LIKE '%aa%';
val
-----
aa
aaa
(2 rows)
I think we can use partial match only for singlebyte encodings. Or, at most, in cases when all alpha-numeric characters are singlebyte (have no idea how to check this).
Actually, I also was fiddling around idea of partial match on trigrams when I was working on initial LIKE patch. But, I concluded that we would need a separate opclass which always keeps full trigram in entry.