Re: plpython_unicode test (was Re: buildfarm / handling (undefined) locales) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: plpython_unicode test (was Re: buildfarm / handling (undefined) locales)
Date
Msg-id 25903.1401751642@sss.pgh.pa.us
Whole thread Raw
In response to Re: plpython_unicode test (was Re: buildfarm / handling (undefined) locales)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> Let's just stick to ASCII.

> The more I think about it, the more I think that using a plain-ASCII
> character would defeat most of the purpose of the test.  Non-breaking
> space seems like the best bet here, not least because it has several
> different representations among the encodings we support.

I've confirmed that a patch as attached behaves per expectation
(in particular, it passes with WIN1250 database encoding).

I think that the worry I expressed about UTF8 characters in expected-files
is probably overblown: we have such in collate.linux.utf8 test, and we've
not heard reports of that one breaking.  (Though of course, it's not run
by default :-(.)  It's still annoying that the test would fail in EUC_xx
encodings, but I see no way around that without largely lobotomizing the
test.

So I propose to apply this, and back-patch to 9.1 (not 9.0, because its
version of this test is different anyway --- so Tomas will have to drop
testing cs_CZ.WIN-1250 in 9.0).

            regards, tom lane

diff --git a/src/pl/plpython/expected/plpython_unicode.out b/src/pl/plpython/expected/plpython_unicode.out
index 859edbb..c7546dd 100644
*** a/src/pl/plpython/expected/plpython_unicode.out
--- b/src/pl/plpython/expected/plpython_unicode.out
***************
*** 1,22 ****
  --
  -- Unicode handling
  --
  SET client_encoding TO UTF8;
  CREATE TABLE unicode_test (
      testvalue  text NOT NULL
  );
  CREATE FUNCTION unicode_return() RETURNS text AS E'
! return u"\\x80"
  ' LANGUAGE plpythonu;
  CREATE FUNCTION unicode_trigger() RETURNS trigger AS E'
! TD["new"]["testvalue"] = u"\\x80"
  return "MODIFY"
  ' LANGUAGE plpythonu;
  CREATE TRIGGER unicode_test_bi BEFORE INSERT ON unicode_test
    FOR EACH ROW EXECUTE PROCEDURE unicode_trigger();
  CREATE FUNCTION unicode_plan1() RETURNS text AS E'
  plan = plpy.prepare("SELECT $1 AS testvalue", ["text"])
! rv = plpy.execute(plan, [u"\\x80"], 1)
  return rv[0]["testvalue"]
  ' LANGUAGE plpythonu;
  CREATE FUNCTION unicode_plan2() RETURNS text AS E'
--- 1,27 ----
  --
  -- Unicode handling
  --
+ -- Note: this test case is known to fail if the database encoding is
+ -- EUC_CN, EUC_JP, EUC_KR, or EUC_TW, for lack of any equivalent to
+ -- U+00A0 (no-break space) in those encodings.  However, testing with
+ -- plain ASCII data would be rather useless, so we must live with that.
+ --
  SET client_encoding TO UTF8;
  CREATE TABLE unicode_test (
      testvalue  text NOT NULL
  );
  CREATE FUNCTION unicode_return() RETURNS text AS E'
! return u"\\xA0"
  ' LANGUAGE plpythonu;
  CREATE FUNCTION unicode_trigger() RETURNS trigger AS E'
! TD["new"]["testvalue"] = u"\\xA0"
  return "MODIFY"
  ' LANGUAGE plpythonu;
  CREATE TRIGGER unicode_test_bi BEFORE INSERT ON unicode_test
    FOR EACH ROW EXECUTE PROCEDURE unicode_trigger();
  CREATE FUNCTION unicode_plan1() RETURNS text AS E'
  plan = plpy.prepare("SELECT $1 AS testvalue", ["text"])
! rv = plpy.execute(plan, [u"\\xA0"], 1)
  return rv[0]["testvalue"]
  ' LANGUAGE plpythonu;
  CREATE FUNCTION unicode_plan2() RETURNS text AS E'
*************** return rv[0]["testvalue"]
*** 27,46 ****
  SELECT unicode_return();
   unicode_return
  ----------------
!  \u0080
  (1 row)

  INSERT INTO unicode_test (testvalue) VALUES ('test');
  SELECT * FROM unicode_test;
   testvalue
  -----------
!  \u0080
  (1 row)

  SELECT unicode_plan1();
   unicode_plan1
  ---------------
!  \u0080
  (1 row)

  SELECT unicode_plan2();
--- 32,51 ----
  SELECT unicode_return();
   unicode_return
  ----------------
!  ��
  (1 row)

  INSERT INTO unicode_test (testvalue) VALUES ('test');
  SELECT * FROM unicode_test;
   testvalue
  -----------
!  ��
  (1 row)

  SELECT unicode_plan1();
   unicode_plan1
  ---------------
!  ��
  (1 row)

  SELECT unicode_plan2();
diff --git a/src/pl/plpython/sql/plpython_unicode.sql b/src/pl/plpython/sql/plpython_unicode.sql
index bdd40c4..a11e5ee 100644
*** a/src/pl/plpython/sql/plpython_unicode.sql
--- b/src/pl/plpython/sql/plpython_unicode.sql
***************
*** 1,6 ****
--- 1,11 ----
  --
  -- Unicode handling
  --
+ -- Note: this test case is known to fail if the database encoding is
+ -- EUC_CN, EUC_JP, EUC_KR, or EUC_TW, for lack of any equivalent to
+ -- U+00A0 (no-break space) in those encodings.  However, testing with
+ -- plain ASCII data would be rather useless, so we must live with that.
+ --

  SET client_encoding TO UTF8;

*************** CREATE TABLE unicode_test (
*** 9,19 ****
  );

  CREATE FUNCTION unicode_return() RETURNS text AS E'
! return u"\\x80"
  ' LANGUAGE plpythonu;

  CREATE FUNCTION unicode_trigger() RETURNS trigger AS E'
! TD["new"]["testvalue"] = u"\\x80"
  return "MODIFY"
  ' LANGUAGE plpythonu;

--- 14,24 ----
  );

  CREATE FUNCTION unicode_return() RETURNS text AS E'
! return u"\\xA0"
  ' LANGUAGE plpythonu;

  CREATE FUNCTION unicode_trigger() RETURNS trigger AS E'
! TD["new"]["testvalue"] = u"\\xA0"
  return "MODIFY"
  ' LANGUAGE plpythonu;

*************** CREATE TRIGGER unicode_test_bi BEFORE IN
*** 22,28 ****

  CREATE FUNCTION unicode_plan1() RETURNS text AS E'
  plan = plpy.prepare("SELECT $1 AS testvalue", ["text"])
! rv = plpy.execute(plan, [u"\\x80"], 1)
  return rv[0]["testvalue"]
  ' LANGUAGE plpythonu;

--- 27,33 ----

  CREATE FUNCTION unicode_plan1() RETURNS text AS E'
  plan = plpy.prepare("SELECT $1 AS testvalue", ["text"])
! rv = plpy.execute(plan, [u"\\xA0"], 1)
  return rv[0]["testvalue"]
  ' LANGUAGE plpythonu;


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: pg_stat directory and pg_stat_statements
Next
From: Robert Haas
Date:
Subject: Re: Re-create dependent views on ALTER TABLE ALTER COLUMN ... TYPE?