Thread: Error and wrong lenghth of non-ASCII Unicode string in plpythonu

Error and wrong lenghth of non-ASCII Unicode string in plpythonu

From
Volker Paul
Date:
Hi,

the following plpythonu function reveals wrong length calculation
and error on accessing single characters in a Unicode string.

Is the problem solved in more recent versions?

Or is there a possibility to work around the problem?


CREATE OR REPLACE FUNCTION test_plpython_unicode()
RETURNS TEXT AS $$
     s = u"abc"
     plpy.notice("s:%s len:%d; 2+:%s" % (s, len(s), s[2:]))
     # output: NOTICE:  s:abc len:3; 2+:c
     s = u"äbc"
     plpy.notice("len:%d; 2+:%s" % (len(s), s[2:]))
     # output:   NOTICE:  len:4; 2+:bc
     # expected: NOTICE:  len:3; 2+:c
     # Note that including s:%s as above leads to an error
     for c in u"abcÄÖÜ":
         plpy.notice(c)
     # output: (the first umlaut leads to error, even to error in error
reporting)
     #NOTICE:  a
     #NOTICE:  b
     #NOTICE:  c
     #ERROR:  plpy.Error: could not parse error message in plpy.elog
     # platform info:
     # psql (9.3.10, server 9.1.18)
     # Ubuntu 14.04.3 LTS
$$ LANGUAGE plpythonu;

Re: Error and wrong lenghth of non-ASCII Unicode string in plpythonu

From
Peter Eisentraut
Date:
On 1/18/16 12:59 PM, Volker Paul wrote:
> the following plpythonu function reveals wrong length calculation
> and error on accessing single characters in a Unicode string.

The problem is that in Python 2, source text must be ASCII.  So you need
to escape any non-ASCII characters using \x or \u escapes or something else.

I've tried patching PL/Python so that it automatically inserts a
"coding" declaration per PEP-263, but I haven't gotten that to work yet.

Your example works fine with Python 3, which expects UTF-8 source
encoding, so maybe that's an option for you.  (Although now that I think
about it, that might have failure scenarios in other database encodings.)