Re: pg_trgm - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: pg_trgm
Date
Msg-id 20100528.085439.93468861.t-ishii@sraoss.co.jp
Whole thread Raw
In response to Re: pg_trgm  (Peter Eisentraut <peter_e@gmx.net>)
Responses Re: pg_trgm
List pgsql-hackers
> I think the problem at hand has nothing at all to do with agglutination
> or CJK-specific issues.  You will get the same problem with other
> languages *if* you set a locale that does not adequately support the
> characters in use.  E.g., Russian with locale C and encoding UTF8:
> 
> select similarity(E'\u0441\u043B\u043E\u043D', E'\u0441\u043B\u043E
> \u043D\u044B');
>  similarity
> ────────────
>         NaN
> (1 row)

Of course. That's why I started this thread.

With my patch:

test=# select similarity(E'\u0441\u043B\u043E\u043D', E'\u0441\u043B\u043E\u043D\u044B');similarity 
------------      0.75
(1 row)

Or you could just #undef KEEPONLYALNUM in trgm.h. But I'm not sure
this is the right thing for you.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: VPATH docs
Next
From: Tatsuo Ishii
Date:
Subject: Re: pg_trgm