plpython_unicode test (was Re: buildfarm / handling (undefined) locales) - Mailing list pgsql-hackers

From Tom Lane
Subject plpython_unicode test (was Re: buildfarm / handling (undefined) locales)
Date
Msg-id 6789.1401655517@sss.pgh.pa.us
Whole thread Raw
In response to Re: buildfarm / handling (undefined) locales  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: plpython_unicode test (was Re: buildfarm / handling (undefined) locales)
List pgsql-hackers
Tomas Vondra <tv@fuzzy.cz> writes:
> On 13.5.2014 20:58, Tom Lane wrote:
>> Tomas Vondra <tv@fuzzy.cz> writes:
>>> Yeah, not really what we were shooting for. I've fixed this by
>>> defining the missing locales, and indeed - magpie now fails in
>>> plpython tests.

>> I saw that earlier today (tho right now the buildfarm server seems
>> to not be responding :-().  Probably we should use some
>> more-widely-used character code in that specific test?

> Any idea what other character could be used in those tests? ISTM fixing
> this universally would mean using ASCII characters - the subset of UTF-8
> common to all the encodings. But I'm afraid that'd contradict the very
> purpose of those tests ...

We really ought to resolve this issue so that we can get rid of some of
the red in the buildfarm.  ISTM there are three possible approaches:

1. Decide that we're not going to support running the plpython regression
tests under "weird" server encodings, in which case Tomas should just
remove cs_CZ.WIN-1250 from the set of encodings his buildfarm animals
test.  Don't much care for this, but it has the attraction of being
minimal work.

2. Change the plpython_unicode test to use some ASCII character in place
of \u0080.  We could keep on using the \u syntax to create the character,
but as stated above, this still seems like it's losing a significant
amount of test coverage.

3. Try to select some "more portable" non-ASCII character, perhaps U+00A0
(non breaking space) or U+00E1 (a-acute).  I think this would probably
work for most encodings but it might still fail in the Far East.  Another
objection is that the expected/plpython_unicode.out file would contain
that character in UTF8 form.  In principle that would work, since the test
sets client_encoding = utf8 explicitly, but I'm worried about accidental
corruption of the expected file by text editors, file transfers, etc.
(The current usage of U+0080 doesn't suffer from this risk because psql
special-cases printing of multibyte UTF8 control characters, so that we
get exactly "\u0080".)

Thoughts?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Maxence Ahlouche
Date:
Subject: Re: [GSoC] Clustering in MADlib - status update
Next
From: Tom Lane
Date:
Subject: Re: plpython_unicode test (was Re: buildfarm / handling (undefined) locales)