pgsql: Improve contrib/pg_trgm's heuristics for regexp index searches. - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Improve contrib/pg_trgm's heuristics for regexp index searches.
Date
Msg-id E1WWbGj-0007uX-QM@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Improve contrib/pg_trgm's heuristics for regexp index searches.

When extracting trigrams from a regular expression for search of a GIN or
GIST trigram index, it's useful to penalize (preferentially discard)
trigrams that contain whitespace, since those are typically far more common
in the index than trigrams not containing whitespace.  Of course, this
should only be a preference not a hard rule, since we might otherwise end
up with no trigrams to search for.  The previous coding tended to produce
fairly inefficient trigram search sets for anchored regexp patterns, as
reported by Erik Rijkers.  This patch penalizes whitespace-containing
trigrams, and also reduces the target number of extracted trigrams, since
experience suggests that the original coding tended to select too many
trigrams to search for.

Alexander Korotkov, reviewed by Tom Lane

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/80a5cf643adb496abe577a1ca6dc0c476d849c19

Modified Files
--------------
contrib/pg_trgm/trgm_regexp.c |  104 ++++++++++++++++++++++++++++++-----------
1 file changed, 76 insertions(+), 28 deletions(-)


pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: pgsql: Block signals earlier during postmaster startup.
Next
From: Simon Riggs
Date:
Subject: pgsql: Reduce lock levels of some ALTER TABLE cmds