On fre, 2010-05-28 at 10:04 +0900, Tatsuo Ishii wrote:
> > I think the problem at hand has nothing at all to do with agglutination
> > or CJK-specific issues. You will get the same problem with other
> > languages *if* you set a locale that does not adequately support the
> > characters in use. E.g., Russian with locale C and encoding UTF8:
> >
> > select similarity(E'\u0441\u043B\u043E\u043D', E'\u0441\u043B\u043E
> > \u043D\u044B');
> > similarity
> > ────────────
> > NaN
> > (1 row)
>
> Wait. This works fine for me with stock pg_trgm. local is C and
> encoding is UTF8. What version of PostgreSQL are you using? Mine is
> 8.4.4.
This is in 9.0, because 8.4 doesn't recognize the \u escape syntax. If
you run this in 8.4, you're just comparing a sequence of ASCII letters
and digits.