Re: pg_trgm: unicode string not working - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: pg_trgm: unicode string not working
Date
Msg-id 0A0622C3-C778-4C6A-9345-0A6D81929BCF@phlo.org
Whole thread Raw
In response to pg_trgm: unicode string not working  (Sushant Sinha <sushant354@gmail.com>)
Responses Re: pg_trgm: unicode string not working
List pgsql-hackers
Hi

Next time, please post questions regarding the usage of postgres
to the -general list, not to -hackers. The purpose of -hackers is
to discuss the development of postgres proper, not the development
of applications using postgres.

On Jun12, 2011, at 13:33 , Sushant Sinha wrote:
> I am using pg_trgm for spelling correction as prescribed in the
> documentation. But I see that it does not work for unicode sring. The
> database was initialized with utf8 encoding and the C locale.

I think you need to use a locale (more precisely, a CTYPE) in which
'क', 'त', 'द' are considered to be alphanumeric.

You can specify the CTYPE when creating the database with CREATE DATABASE ... LC_CTYPE = ...

> Here is the table:
> \d words
>     Table "public.words"
> Column |  Type   | Modifiers
> --------+---------+-----------
> word   | text    |
> ndoc   | integer |
> nentry | integer |
> Indexes:
>    "words_idx" gin (word gin_trgm_ops)
>
> Query: select word from words where word % 'कतद';
>
> I get an error:
>
> ERROR:  GIN indexes do not support whole-index scans


pg_trgm probably ignores non-alphanumeric characters during
comparison, so you end up with an empty search string, which
translates to a whole-index scan. Postgres up to 9.0 does
not support such scans for GIN indices.

Note that this restriction was removed in postgres 9.1 which
is currently in beta. However, GIT indices must be re-created
with REINDEX after upgrading from 9.0 to leverage that
improvement.

best regards.
Florian Pflug



pgsql-hackers by date:

Previous
From: Sushant Sinha
Date:
Subject: pg_trgm: unicode string not working
Next
From: Tom Lane
Date:
Subject: Re: Creating new remote branch in git?