Behavior of a pg_trgm index for 2 (or < 3) character LIKE queries - Mailing list pgsql-hackers

From Amit Langote
Subject Behavior of a pg_trgm index for 2 (or < 3) character LIKE queries
Date
Msg-id CA+HiwqH7F1xZAq7KeRGOzRYBruiMb++wMQfxxXPLMiAv_wLJkg@mail.gmail.com
Whole thread Raw
Responses Re: Behavior of a pg_trgm index for 2 (or < 3) character LIKE queries  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hello,

I have been trying to understand how pg_trgm works. As part of that, I
was looking at gin_extract_query_trgm(), which I think, extracts
trigrams from a search query string. So, I debugged for 3 cases:

1) column_name LIKE '%緊急%'

in this case, inside gin_extract_query_trgm(), after a call to
generate_wildcard_trgm(), returned trglen is 0, hence
GIN_SEARCH_MODE_ALL search mode is used.

2) column_name LIKE '%os%'

same as in case (1)

3) column_name LIKE '%ost%'

returned trglen is > 0, things proceed differently. May be, trigrams
have been generated and cane be used for index search.

I later commented out #define KEEPONLYALNUM from
contrib/pg_trgm/trgm.h (following from a related discussion on
-hackers viz.
http://www.postgresql.org/message-id/flat/CAHGQGwFJshvV2nGME19wdTW9teFw_w7h2ns4E+YYsjkB9WdWDQ@mail.gmail.com#CAHGQGwFJshvV2nGME19wdTW9teFw_w7h2ns4E+YYsjkB9WdWDQ@mail.gmail.com),
but things didn't change.

So, it appears, for search strings consisting of 2 (or < 3)
characters, trigrams can not be utilized. No?

NOTE: Using the master branch. The indexed column is a text field and
data consists of mix of Japanese, alphanumeric characters.

--
Amit Langote



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: removing PD_ALL_VISIBLE
Next
From: Robert Haas
Date:
Subject: Re: all_visible replay aborting due to uninitialized pages