Re: daitch_mokotoff module - Mailing list pgsql-hackers

From Dag Lem
Subject Re: daitch_mokotoff module
Date
Msg-id ygeo84tvugy.fsf@sid.nimrod.no
Whole thread Raw
In response to Re: daitch_mokotoff module  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: daitch_mokotoff module
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Thomas Munro <thomas.munro@gmail.com> writes:
>> Erm, it looks like something weird is happening somewhere in cfbot's
>> pipeline, because Dag's patch says:
>
>> +SELECT daitch_mokotoff('Straßburg');
>> + daitch_mokotoff
>> +-----------------
>> + 294795
>> +(1 row)
>
> ... so, that test case is guaranteed to fail in non-UTF8 encodings,
> I suppose?  I wonder what the LANG environment is in that cfbot
> instance.
>
> (We do have methods for dealing with non-ASCII test cases, but
> I can't see that this patch is using any of them.)
>
>             regards, tom lane
>

I naively assumed that tests would be run in an UTF8 environment.

Running "ack -l '[\x80-\xff]'" in the contrib/ directory reveals that
two other modules are using UTF8 characters in tests - citext and
unaccent.

The citext tests seem to be commented out - "Multibyte sanity
tests. Uncomment to run."

Looking into the unaccent module, I don't quite understand how it will
work with various encodings, since it doesn't seem to decode its input -
will it fail if run under anything but ASCII or UTF8?

In any case, I see that unaccent.sql starts as follows:


CREATE EXTENSION unaccent;

-- must have a UTF8 database
SELECT getdatabaseencoding();

SET client_encoding TO 'UTF8';


Would doing the same thing in fuzzystrmatch.sql fix the problem with
failing tests? Should I prepare a new patch?


Best regards

Dag Lem



pgsql-hackers by date:

Previous
From: Suraj Kharage
Date:
Subject: Remove extra spaces
Next
From: Peter Eisentraut
Date:
Subject: Re: Add Boolean node