Re: pl/perl and utf-8 in sql_ascii databases - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: pl/perl and utf-8 in sql_ascii databases
Date
Msg-id 1342201377-sup-3678@alvh.no-ip.org
Whole thread Raw
In response to Re: [SPAM] [MessageLimit][lowlimit] Re: pl/perl and utf-8 in sql_ascii databases  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: pl/perl and utf-8 in sql_ascii databases  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
Excerpts from Kyotaro HORIGUCHI's message of jue jul 12 00:09:19 -0400 2012:
>
> Hmm... Sorry for immature patch..

No need to apologize.

> > ... and this story hasn't ended yet, because one of the new tests is
> > failing.  See here:
> >
> > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=magpie&dt=2012-07-11%2010%3A00%3A04
> >
> > The interesting part of the diff is:
> ...
> >   SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
> > ! ERROR:  character with byte sequence 0xe5 0xb7 0x9d in encoding "UTF8" has no equivalent in encoding "LATIN1"
> > ! CONTEXT:  PL/Perl function "perl_utf_inout"
> >
> >
> > I am not sure what can we do here other than remove this function and
> > query from the test.
>
> I've run the regress only for the environment capable to handle
> the character U+5ddd (Japanese character which means river)...
>
> The byte sequences which can be decoded and the result byte
> sequences of encoding from a unicode character vary among the
> encodings.

Right.  I only ran the test in C and UTF8, not Latin1, so I didn't see
it fail either.

> The problem itself which is the aim of this thread could be
> covered without the additional test. That confirms if
> encoding/decoding is done as expected on calling the language
> handler.

Right.

> I suppose that testing for the two cases and additional
> one case which runs pg_do_encoding_conversion(), say latin1,
> would be enough to confirm that encoding/decoding is properly
> done, since the concrete conversion scheme is not significant
> this case.
>
> So I recommend that we should add the test for latin1 and omit
> the test from other than sql_ascii, utf8 and latin1. This might
> be archieved by create empty plperl_lc.sql and plperl_lc.out
> files for those encodings.
>
> What do you think about that?

I think that's probably too much engineering for something that doesn't
really warrant it.  A real solution to this problem could be to create
yet another new test file containing just this function definition and
the query that calls it, and have one expected file for each encoding;
but that's too much work and too many files, I'm afraid.

I can see us supporting tests that require a small number of expected
files.  No Make tricks with file copying, though.  If we can't get
some easy way to test this without that, I submit we should just remove
the test.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Type modifier parameter of input function
Next
From: Tom Lane
Date:
Subject: Re: initdb and fsync