Thread: pl/perl and utf-8 in sql_ascii databases

pl/perl and utf-8 in sql_ascii databases

From
Christoph Berg
Date:
Hi,

we have a database that is storing strings in various encodings (and
non-encodings, namely the arbitrary byte soup that you might see in
email headers from the internet). For this reason, the database uses
sql_ascii encoding. The columns are text, as most characters are
ascii, so bytea didn't seem the right way to go.

Currently we are on 8.3 and try to upgrade to 9.1, but the plperlu
functions we have are acting up.

Old behavior on 8.3 .. 9.0:

sql_ascii =# create or replace function whitespace(text) returns text
language plperlu as $$ $a = shift; $a =~ s/[\t ]+/ /g; return $a; $$;
CREATE FUNCTION

sql_ascii =# select whitespace (E'\200'); -- 0x80 is not valid utf-8whitespace
------------

sql_ascii =# select whitespace (E'\200')::bytea;whitespace
------------\x80

New behavior on 9.1.2:

sql_ascii =# select whitespace (E'\200');
ERROR:  XX000: Malformed UTF-8 character (fatal) at line 1.
KONTEXT:  PL/Perl function "whitespace"
ORT:  plperl_call_perl_func, plperl.c:2037

A crude workaround is:

sql_ascii =# create or replace function whitespace_utf8_off(text)
returns text language plperlu as $$ use Encode; $a = shift;
Encode::_utf8_off($a); $a =~ s/[\t ]+/ /g; return $a; $$;
CREATE FUNCTION

sql_ascii =# select whitespace_utf8_off (E'\200');whitespace_utf8_off
---------------------\u0080

sql_ascii =# select whitespace_utf8_off (E'\200')::bytea;whitespace_utf8_off
---------------------\xc280

(Note that the workaround is not perfect as the resulting 0x80..0xff
bytes are still tagged to be utf8.)


I think the bug is in plperl_helpers.h:

/** Create a new SV from a string assumed to be in the current database's* encoding.*/

static inline SV *
cstr2sv(const char *str)
{       SV                 *sv;       char       *utf8_str = utf_e2u(str);
       sv = newSVpv(utf8_str, 0);       SvUTF8_on(sv);
       pfree(utf8_str);
       return sv;
}

In sql_ascii databases, utf_e2u does not do any recoding, but then
SvUTF8_on still marks the string as utf-8, while it isn't.

(Returned values might also need fixing.)

In my view, this is clearly a bug in pl/perl on sql_ascii databases.

Christoph
--
cb@df7cb.de | http://www.df7cb.de/

Re: pl/perl and utf-8 in sql_ascii databases

From
Alex Hunsaker
Date:
On Thu, Feb 9, 2012 at 03:21, Christoph Berg <cb@df7cb.de> wrote:
> Hi,
>
> we have a database that is storing strings in various encodings (and
> non-encodings, namely the arbitrary byte soup [ ... ]
> For this reason, the database uses
> sql_ascii encoding

> ...snip...

> In sql_ascii databases, utf_e2u does not do any recoding, but then
> SvUTF8_on still marks the string as utf-8, while it isn't.
>
> (Returned values might also need fixing.)
>
> In my view, this is clearly a bug in pl/perl on sql_ascii databases.

Yeah, there was some musing about this over in:
http://archives.postgresql.org/pgsql-hackers/2011-02/msg01142.php

Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
and SvPVUTF8() when turning a perl string into a cstring.

With the attached I get:
=> create or replace function perl_white(a text) returns text as $$
return shift; $$ language plperlu;
=> select perl_white(E'\200'), perl_white(E'\200')::bytea,
coalesce(perl_white(E'\200'), 'null');
 perl_white | perl_white | coalesce
------------+------------+----------
            | \x80       |

=> select perl_white(E'\401');
 perl_white
------------
 \x01
(1 row)

Does the attached fix the issue for you?

Ill note that all the pls seem to behave a bit differently:

=> create or replace function py_white(a text) returns text as $$
return a; $$ language plpython3u;
=> select py_white(E'\200'), py_white(E'\200')::bytea,
coalesce(py_white(E'\200'), 'null');
py_white | py_white | coalesce
----------+----------+----------
          |          | null
(1 row)

=>select py_white(E'\401');
 py_white
----------
 \x01
(1 row)

=> create or replace function tcl_white(text) returns text as $$
return $1; $$ language pltcl;
=> select tcl_white(E'\200'), tcl_white(E'\200')::bytea,
coalesce(tcl_white(E'\200'), 'null');
 tcl_white | tcl_white | coalesce
-----------+-----------+----------
           | \x80      |

 => select tcl_white(E'\402');
 tcl_white
-----------
 \x02
(1 row)

Attachment

Re: pl/perl and utf-8 in sql_ascii databases

From
Christoph Berg
Date:
Re: Alex Hunsaker 2012-02-10 <CAFaPBrR9y1fu6gpVu+8TA8vTY6QVCm3DfarKT8JG_EhGeTXnDA@mail.gmail.com>
> Does the attached fix the issue for you?

Yes. :)

Christoph
--
cb@df7cb.de | http://www.df7cb.de/

Re: pl/perl and utf-8 in sql_ascii databases

From
Alvaro Herrera
Date:
Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012:

> Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
> and SvPVUTF8() when turning a perl string into a cstring.

Hmm, this patch belongs into back branches too, right?  Not just the
current development tree?

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: pl/perl and utf-8 in sql_ascii databases

From
Robert Haas
Date:
On Mon, Jun 18, 2012 at 3:30 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012:
>
>> Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
>> and SvPVUTF8() when turning a perl string into a cstring.
>
> Hmm, this patch belongs into back branches too, right?  Not just the
> current development tree?

It seems like a bug fix to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pl/perl and utf-8 in sql_ascii databases

From
Alex Hunsaker
Date:
On Mon, Jun 18, 2012 at 1:30 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
>
> Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012:
>
>> Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
>> and SvPVUTF8() when turning a perl string into a cstring.
>
> Hmm, this patch belongs into back branches too, right?  Not just the
> current development tree?

(Have been out of town, sorry for the late reply)

Yeah, back to 9.1.


Re: pl/perl and utf-8 in sql_ascii databases

From
Alvaro Herrera
Date:
Excerpts from Robert Haas's message of mar jun 19 11:36:41 -0400 2012:
>
> On Mon, Jun 18, 2012 at 3:30 PM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
> > Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012:
> >
> >> Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
> >> and SvPVUTF8() when turning a perl string into a cstring.
> >
> > Hmm, this patch belongs into back branches too, right?  Not just the
> > current development tree?
>
> It seems like a bug fix to me.

That's what I thought.  I will commit to both branches soon, then.
Mind you, this should have been an "open item", not a commitfest item.
(Actually not even an open item.  We should have committed it right
away.)

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: pl/perl and utf-8 in sql_ascii databases

From
Robert Haas
Date:
On Tue, Jun 19, 2012 at 3:03 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> That's what I thought.  I will commit to both branches soon, then.

I think there might be three branches involved.

> Mind you, this should have been an "open item", not a commitfest item.
> (Actually not even an open item.  We should have committed it right
> away.)

True, but it's not a perfect world, and I didn't feel qualified to
commit it myself.  Adding it the CommitFest at least prevented it from
getting lost forever.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pl/perl and utf-8 in sql_ascii databases

From
Alvaro Herrera
Date:
Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012:

> Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
> and SvPVUTF8() when turning a perl string into a cstring.

Right.

So I played a bit with this patch, and touched it a bit mainly just to
add some more comments; and while at it I noticed that some of the
functions in Util.xs might leak some memory, so I made an attempt to
plug them, as in the attached patch (which supersedes yours).

Now, with my version of the patch applied and using a SQL_ASCII database
to test the problem in the original report, I notice that we now have a
regression failure:

*** /pgsql/source/HEAD/src/pl/plperl/expected/plperl.out        2012-05-16 13:38:02.495647259 -0400
--- /home/alvherre/Code/pgsql/build/HEAD/src/pl/plperl/results/plperl.out       2012-06-20 15:09:19.869778824 -0400
***************
*** 658,664 ****
    return "abcd\0efg";
  $$ LANGUAGE plperl;
  SELECT perl_zerob();
! ERROR:  invalid byte sequence for encoding "UTF8": 0x00
  CONTEXT:  PL/Perl function "perl_zerob"
  -- make sure functions marked as VOID without an explicit return work
  CREATE OR REPLACE FUNCTION myfuncs() RETURNS void AS $$
--- 658,664 ----
    return "abcd\0efg";
  $$ LANGUAGE plperl;
  SELECT perl_zerob();
! ERROR:  invalid byte sequence for encoding "SQL_ASCII": 0x00
  CONTEXT:  PL/Perl function "perl_zerob"
  -- make sure functions marked as VOID without an explicit return work
  CREATE OR REPLACE FUNCTION myfuncs() RETURNS void AS $$

======================================================================


I'm not really sure what to do here -- maybe have a second expected file
for that test is a good enough answer?  Or should I just take the test
out?  Opinions please.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment

Re: pl/perl and utf-8 in sql_ascii databases

From
Kyotaro HORIGUCHI
Date:
Hello,

> > Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
> > and SvPVUTF8() when turning a perl string into a cstring.
> 
> Right.

I spent a bit longer time catching on pl/perl and now understand
what is the problem...

> So I played a bit with this patch, and touched it a bit mainly just to
> add some more comments; and while at it I noticed that some of the
> functions in Util.xs might leak some memory, so I made an attempt to
> plug them, as in the attached patch (which supersedes yours).

Ok, Is it ok to look into the newer patch including fix of leaks
at first?

-- Coding and styles.

This also seems to have polished the previous one on some codes,
styles and comments which generally look reasonable. And patch
style was corrected into unified.

-- Functions
I seems to work properly on the database the encodings of which
are SQL_ASCII and UTF8 (and EUC-JP) as below,

=================
=> create or replace function foo(text) returns text language plperlu as $$ $a = shift; return "BOO!" if ($a !=
"a\x80cあ");return $a; $$;
 
SQL_ASCII=> select foo(E'a\200cあ') = E'a\200cあ';?column? 
----------t
UTF8=> select foo(E'a\200cあ');
ERROR:  invalid byte sequence for encoding "UTF8": 0x80
UTF8=> select foo(E'a\302\200cあ') = E'a\u0080cあ';?column? 
----------t
=================

This looks quite valid according to the definition of the
encodings and perl's nature as far as I see.


-- The others

Variable naming in util_quote_*() seems a bit confusing,

>      text *arg = sv2text(sv);
>      text *ret = DatumGetTextP(..., PointerGetDatum(arg)));
>      char *str;
>      pfree(arg);
>      str = text_to_cstring(ret);
>      RETVAL = cstr2sv(str);
>      pfree(str);

Renaming ret to quoted and str to ret as the patch attached might
make it easily readable.


> Now, with my version of the patch applied and using a SQL_ASCII database
> to test the problem in the original report, I notice that we now have a
> regression failure:
snip.
> I'm not really sure what to do here -- maybe have a second expected file
> for that test is a good enough answer?  Or should I just take the test
> out?  Opinions please.


The attached ugly patch does it. We seem should put NO_LOCALE=1
on the 'make check' command line for the encodings not compatible
with the environmental locale, although it looks work.

# UtfToLocal() seems to have a bug that always report illegal
# encoding was "UTF8" regardless of the real encoding. But
# plper_lc_*.(sql|out) increases if the bug is fixed.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.
diff --git a/src/pl/plperl/Util.xs b/src/pl/plperl/Util.xs
index 7d0102b..4b4b680 100644
--- a/src/pl/plperl/Util.xs
+++ b/src/pl/plperl/Util.xs
@@ -67,8 +67,11 @@ static text *sv2text(SV *sv){    char       *str = sv2cstr(sv);
+    text       *text;
-    return cstring_to_text(str);
+    text = cstring_to_text(str);
+    pfree(str);
+    return text;}MODULE = PostgreSQL::InServer::Util PREFIX = util_
@@ -113,10 +116,12 @@ util_quote_literal(sv)    }    else {        text *arg = sv2text(sv);
-        text *ret = DatumGetTextP(DirectFunctionCall1(quote_literal, PointerGetDatum(arg)));
-        char *str = text_to_cstring(ret);
-        RETVAL = cstr2sv(str);
-        pfree(str);
+        text *quoted = DatumGetTextP(DirectFunctionCall1(quote_literal, PointerGetDatum(arg)));
+        char *ret;
+        pfree(arg);
+        ret = text_to_cstring(quoted);
+        RETVAL = cstr2sv(ret);
+        pfree(ret);    }    OUTPUT:    RETVAL
@@ -132,10 +137,12 @@ util_quote_nullable(sv)    else    {        text *arg = sv2text(sv);
-        text *ret = DatumGetTextP(DirectFunctionCall1(quote_nullable, PointerGetDatum(arg)));
-        char *str = text_to_cstring(ret);
-        RETVAL = cstr2sv(str);
-        pfree(str);
+        text *quoted = DatumGetTextP(DirectFunctionCall1(quote_nullable, PointerGetDatum(arg)));
+        char *ret;
+        pfree(arg);
+        ret = text_to_cstring(quoted);
+        RETVAL = cstr2sv(ret);
+        pfree(ret);    }    OUTPUT:    RETVAL
@@ -145,14 +152,15 @@ util_quote_ident(sv)    SV *sv    PREINIT:        text *arg;
-        text *ret;
-        char *str;
+        text *quoted;
+        char *ret;    CODE:        arg = sv2text(sv);
-        ret = DatumGetTextP(DirectFunctionCall1(quote_ident, PointerGetDatum(arg)));
-        str = text_to_cstring(ret);
-        RETVAL = cstr2sv(str);
-        pfree(str);
+        quoted = DatumGetTextP(DirectFunctionCall1(quote_ident, PointerGetDatum(arg)));
+        pfree(arg);
+        ret = text_to_cstring(quoted);
+        RETVAL = cstr2sv(ret);
+        pfree(ret);    OUTPUT:    RETVAL
diff --git a/src/pl/plperl/plperl_helpers.h b/src/pl/plperl/plperl_helpers.h
index 1b6648b..ed99194 100644
--- a/src/pl/plperl/plperl_helpers.h
+++ b/src/pl/plperl/plperl_helpers.h
@@ -3,21 +3,29 @@/* * convert from utf8 to database encoding
+ *
+ * Returns a palloc'ed copy of the original string */static inline char *
-utf_u2e(const char *utf8_str, size_t len)
+utf_u2e(char *utf8_str, size_t len){    int            enc = GetDatabaseEncoding();
-
-    char       *ret = (char *) pg_do_encoding_conversion((unsigned char *) utf8_str, len, PG_UTF8, enc);
+    char       *ret;    /*
-     * when we are a PG_UTF8 or SQL_ASCII database pg_do_encoding_conversion()
-     * will not do any conversion or verification. we need to do it manually
-     * instead.
+     * When we are in a PG_UTF8 or SQL_ASCII database
+     * pg_do_encoding_conversion() will not do any conversion (which is good)
+     * or verification (not so much), so we need to run the verification step
+     * separately.     */    if (enc == PG_UTF8 || enc == PG_SQL_ASCII)
-        pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
+    {
+        pg_verify_mbstr_len(enc, utf8_str, len, false);
+        ret = utf8_str;
+    }
+    else
+        ret = (char *) pg_do_encoding_conversion((unsigned char *) utf8_str,
+                                                 len, PG_UTF8, enc);    if (ret == utf8_str)        ret =
pstrdup(ret);
@@ -27,11 +35,15 @@ utf_u2e(const char *utf8_str, size_t len)/* * convert from database encoding to utf8
+ *
+ * Returns a palloc'ed copy of the original string */static inline char *utf_e2u(const char *str){
-    char       *ret = (char *) pg_do_encoding_conversion((unsigned char *) str, strlen(str), GetDatabaseEncoding(),
PG_UTF8);
+    char       *ret =
+        (char *) pg_do_encoding_conversion((unsigned char *) str, strlen(str),
+                                           GetDatabaseEncoding(), PG_UTF8);    if (ret == str)        ret =
pstrdup(ret);
@@ -41,6 +53,8 @@ utf_e2u(const char *str)/* * Convert an SV to a char * in the current database encoding
+ *
+ * Returns a palloc'ed copy of the original string */static inline char *sv2cstr(SV *sv)
@@ -51,7 +65,9 @@ sv2cstr(SV *sv)    /*     * get a utf8 encoded char * out of perl. *note* it may not be valid utf8!
-     *
+     */
+
+    /*     * SvPVutf8() croaks nastily on certain things, like typeglobs and     * readonly objects such as $^V.
That'sa perl bug - it's not supposed to     * happen. To avoid crashing the backend, we make a copy of the sv before
 
@@ -63,18 +79,27 @@ sv2cstr(SV *sv)        (SvTYPE(sv) > SVt_PVLV && SvTYPE(sv) != SVt_PVFM))        sv = newSVsv(sv);
 else
 
-
+    {        /*         * increase the reference count so we can just SvREFCNT_dec() it when         * we are done
   */        SvREFCNT_inc_simple_void(sv);
 
+    }
-    val = SvPVutf8(sv, len);
+    /*
+     * Request the string from Perl, in UTF-8 encoding; but if we're in a
+     * SQL_ASCII database, just request the byte soup without trying to make it
+     * UTF8, because that might fail.
+     */
+    if (GetDatabaseEncoding() == PG_SQL_ASCII)
+        val = SvPV(sv, len);
+    else
+        val = SvPVutf8(sv, len);    /*
-     * we use perl's length in the event we had an embedded null byte to
-     * ensure we error out properly
+     * Now convert to database encoding.  We use perl's length in the event we
+     * had an embedded null byte to ensure we error out properly.     */    res = utf_u2e(val, len);
@@ -88,16 +113,20 @@ sv2cstr(SV *sv) * Create a new SV from a string assumed to be in the current database's *
encoding.*/
 
-static inline SV *cstr2sv(const char *str){    SV           *sv;
-    char       *utf8_str = utf_e2u(str);
+    char       *utf8_str;
+
+    /* no conversion when SQL_ASCII */
+    if (GetDatabaseEncoding() == PG_SQL_ASCII)
+        return newSVpv(str, 0);
+
+    utf8_str = utf_e2u(str);    sv = newSVpv(utf8_str, 0);    SvUTF8_on(sv);
-    pfree(utf8_str);    return sv;
diff --git a/src/pl/plperl/GNUmakefile b/src/pl/plperl/GNUmakefile
index 188d7d2..8ab90a6 100644
--- a/src/pl/plperl/GNUmakefile
+++ b/src/pl/plperl/GNUmakefile
@@ -44,7 +44,9 @@ PERLCHUNKS = plc_perlboot.pl plc_trusted.plSHLIB_LINK = $(perl_embed_ldflags)REGRESS_OPTS =
--dbname=$(PL_TESTDB)--load-extension=plperl  --load-extension=plperlu
 
-REGRESS = plperl plperl_trigger plperl_shared plperl_elog plperl_util plperl_init plperlu plperl_array
+REGRESS_LC0 = $(subst .sql,,$(shell cd sql; ls plperl_lc_$(shell echo $(ENCODING) | tr "A-Z-" "a-z_").sql
2>/dev/null))
+REGRESS_LC = $(if $(REGRESS_LC0),$(REGRESS_LC0),plperl_lc)
+REGRESS = plperl $(REGRESS_LC) plperl_trigger plperl_shared plperl_elog plperl_util plperl_init plperlu plperl_array#
ifPerl can support two interpreters in one backend,# test plperl-and-plperlu casesifneq ($(PERL),)
 
diff --git a/src/pl/plperl/expected/plperl.out b/src/pl/plperl/expected/plperl.out
index df54937..906dc15 100644
--- a/src/pl/plperl/expected/plperl.out
+++ b/src/pl/plperl/expected/plperl.out
@@ -650,16 +650,6 @@ CONTEXT:  PL/Perl anonymous code blockDO $do$ use warnings FATAL => qw(void) ; my @y; my $x = sort
@y;1; $do$ LANGUAGE plperl;ERROR:  Useless use of sort in scalar context at line 1.CONTEXT:  PL/Perl anonymous code
block
---
--- Make sure strings are validated
--- Should fail for all encodings, as nul bytes are never permitted.
---
-CREATE OR REPLACE FUNCTION perl_zerob() RETURNS TEXT AS $$
-  return "abcd\0efg";
-$$ LANGUAGE plperl;
-SELECT perl_zerob();
-ERROR:  invalid byte sequence for encoding "UTF8": 0x00
-CONTEXT:  PL/Perl function "perl_zerob"-- make sure functions marked as VOID without an explicit return workCREATE OR
REPLACEFUNCTION myfuncs() RETURNS void AS $$   $_SHARED{myquote} = sub {
 
diff --git a/src/pl/plperl/expected/plperl_lc.out b/src/pl/plperl/expected/plperl_lc.out
new file mode 100644
index 0000000..4f8c08f
--- /dev/null
+++ b/src/pl/plperl/expected/plperl_lc.out
@@ -0,0 +1,10 @@
+--
+-- Make sure strings are validated
+-- Should fail for all encodings, as nul bytes are never permitted.
+--
+CREATE OR REPLACE FUNCTION perl_zerob() RETURNS TEXT AS $$
+  return "abcd\0efg";
+$$ LANGUAGE plperl;
+SELECT perl_zerob();
+ERROR:  invalid byte sequence for encoding "UTF8": 0x00
+CONTEXT:  PL/Perl function "perl_zerob"
diff --git a/src/pl/plperl/expected/plperl_lc_sql_ascii.out b/src/pl/plperl/expected/plperl_lc_sql_ascii.out
new file mode 100644
index 0000000..022c3e2
--- /dev/null
+++ b/src/pl/plperl/expected/plperl_lc_sql_ascii.out
@@ -0,0 +1,10 @@
+--
+-- Make sure strings are validated
+-- Should fail for all encodings, as nul bytes are never permitted.
+--
+CREATE OR REPLACE FUNCTION perl_zerob() RETURNS TEXT AS $$
+  return "abcd\0efg";
+$$ LANGUAGE plperl;
+SELECT perl_zerob();
+ERROR:  invalid byte sequence for encoding "SQL_ASCII": 0x00
+CONTEXT:  PL/Perl function "perl_zerob"
diff --git a/src/pl/plperl/sql/plperl.sql b/src/pl/plperl/sql/plperl.sql
index 84af1fd..a5e3840 100644
--- a/src/pl/plperl/sql/plperl.sql
+++ b/src/pl/plperl/sql/plperl.sql
@@ -423,15 +423,6 @@ DO $do$ use strict; my $name = "foo"; my $ref = $$name; $do$ LANGUAGE plperl;-- yields "ERROR:
Uselessuse of sort in scalar context."DO $do$ use warnings FATAL => qw(void) ; my @y; my $x = sort @y; 1; $do$ LANGUAGE
plperl;
---
--- Make sure strings are validated
--- Should fail for all encodings, as nul bytes are never permitted.
---
-CREATE OR REPLACE FUNCTION perl_zerob() RETURNS TEXT AS $$
-  return "abcd\0efg";
-$$ LANGUAGE plperl;
-SELECT perl_zerob();
--- make sure functions marked as VOID without an explicit return workCREATE OR REPLACE FUNCTION myfuncs() RETURNS void
AS$$   $_SHARED{myquote} = sub {
 
diff --git a/src/pl/plperl/sql/plperl_lc.sql b/src/pl/plperl/sql/plperl_lc.sql
new file mode 100644
index 0000000..a4a06e7
--- /dev/null
+++ b/src/pl/plperl/sql/plperl_lc.sql
@@ -0,0 +1,8 @@
+--
+-- Make sure strings are validated
+-- Should fail for all encodings, as nul bytes are never permitted.
+--
+CREATE OR REPLACE FUNCTION perl_zerob() RETURNS TEXT AS $$
+  return "abcd\0efg";
+$$ LANGUAGE plperl;
+SELECT perl_zerob();
diff --git a/src/pl/plperl/sql/plperl_lc_sql_ascii.sql b/src/pl/plperl/sql/plperl_lc_sql_ascii.sql
new file mode 120000
index 0000000..9da97db
--- /dev/null
+++ b/src/pl/plperl/sql/plperl_lc_sql_ascii.sql
@@ -0,0 +1 @@
+plperl_lc.sql
\ No newline at end of file

Re: pl/perl and utf-8 in sql_ascii databases

From
Kyotaro HORIGUCHI
Date:
Ouch!

> # UtfToLocal() seems to have a bug that always report illegal
> # encoding was "UTF8" regardless of the real encoding. But
> # plper_lc_*.(sql|out) increases if the bug is fixed.

This is not a bug. The error message "invalid byte sequence for
encoding "UTF8"" meant "invalid INPUT byte sequence as encoding
"UTF8"" not encoding for output.. So the regtest patch covers all
of possible patterns - UTF8 and SQL_ASCII..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.


Re: pl/perl and utf-8 in sql_ascii databases

From
Alex Hunsaker
Date:
On Wed, Jun 20, 2012 at 1:15 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012:
>
>> Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
>> and SvPVUTF8() when turning a perl string into a cstring.
>
> Right.
>
> So I played a bit with this patch, and touched it a bit mainly just to
> add some more comments; and while at it I noticed that some of the
> functions in Util.xs might leak some memory, so I made an attempt to
> plug them, as in the attached patch (which supersedes yours).

I think most of these leaks go back to 9.0. Dunno if its worth
backpatching them...

> to test the problem in the original report, I notice that we now have a
> regression failure:

> I'm not really sure what to do here -- maybe have a second expected file
> for that test is a good enough answer?  Or should I just take the test
> out?  Opinions please.

I think we have broken that check twice so it seems like it would be
nice to keep. But I don't feel *to* strongly about it.

The comment and cleanups all look good to me.

Thanks!


Re: pl/perl and utf-8 in sql_ascii databases

From
Alex Hunsaker
Date:
On Thu, Jun 21, 2012 at 5:22 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

>> So I played a bit with this patch, and touched it a bit mainly
>> [...] functions in Util.xs might leak some memory, so I made an attempt to

> Ok, Is it ok to look into the newer patch including fix of leaks
> at first?

Yeah :-).

> Variable naming in util_quote_*() seems a bit confusing,
>
> Renaming ret to quoted and str to ret as the patch attached might
> make it easily readable.

Ok.

>> [ regression failure on a  SQL_ASCII database ]
>> I'm not really sure what to do here -- maybe have a second expected file
>> for that test is a good enough answer?  Or should I just take the test
>> out?  Opinions please.

> The attached ugly patch does it. We seem should put NO_LOCALE=1
> on the 'make check' command line for the encodings not compatible
> with the environmental locale, although it looks work.

+REGRESS_LC0 = $(subst .sql,,$(shell cd sql; ls plperl_lc_$(shell echo
$(ENCODING) | tr "A-Z-" "a-z_").sql 2>/dev/null))
+REGRESS_LC = $(if $(REGRESS_LC0),$(REGRESS_LC0),plperl_lc)
+REGRESS = plperl $(REGRESS_LC) plperl_trigger plperl_shared
plperl_elog plperl_util plperl_init plperlu plperl_array

Hrm, that's quite cute. I dunno if there is a more cannon way of doing
the above-- but it seems to work. I'm not sure this regression test is
worth it. I'm thinking maybe we should just remove
theegressionegression test instead.

There is a minor issue with the patch where
sql/plperl_lc_sql_ascii.sql contains the text "plperl_lc.sql". After
copying sql/plperl_lc.sql to sql/plperl_lc_sql_ascii.sql everything
worked as described.

Thanks for the review!


Re: pl/perl and utf-8 in sql_ascii databases

From
Alvaro Herrera
Date:
Excerpts from Kyotaro HORIGUCHI's message of jue jun 21 08:02:58 -0400 2012:
>
> Ouch!
>
> > # UtfToLocal() seems to have a bug that always report illegal
> > # encoding was "UTF8" regardless of the real encoding. But
> > # plper_lc_*.(sql|out) increases if the bug is fixed.
>
> This is not a bug. The error message "invalid byte sequence for
> encoding "UTF8"" meant "invalid INPUT byte sequence as encoding
> "UTF8"" not encoding for output.. So the regtest patch covers all
> of possible patterns - UTF8 and SQL_ASCII..

Right, that confused me too for a while.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: pl/perl and utf-8 in sql_ascii databases

From
Alvaro Herrera
Date:
Excerpts from Kyotaro HORIGUCHI's message of jue jun 21 07:22:43 -0400 2012:
> Hello,
>
> > > Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
> > > and SvPVUTF8() when turning a perl string into a cstring.
> >
> > Right.
>
> I spent a bit longer time catching on pl/perl and now understand
> what is the problem...

Hi,

Thanks for the review.

> > So I played a bit with this patch, and touched it a bit mainly just to
> > add some more comments; and while at it I noticed that some of the
> > functions in Util.xs might leak some memory, so I made an attempt to
> > plug them, as in the attached patch (which supersedes yours).
>
> Ok, Is it ok to look into the newer patch including fix of leaks
> at first?

Yeah, I'm going to have to backpatch the whole of it so having full
review is good.  Thanks.

> -- The others
>
> Variable naming in util_quote_*() seems a bit confusing,
>
> >      text *arg = sv2text(sv);
> >      text *ret = DatumGetTextP(..., PointerGetDatum(arg)));
> >      char *str;
> >      pfree(arg);
> >      str = text_to_cstring(ret);
> >      RETVAL = cstr2sv(str);
> >      pfree(str);
>
> Renaming ret to quoted and str to ret as the patch attached might
> make it easily readable.

I think I'm going to refrain from this because it will be more painful
to backpatch.

> > Now, with my version of the patch applied and using a SQL_ASCII database
> > to test the problem in the original report, I notice that we now have a
> > regression failure:
> snip.
> > I'm not really sure what to do here -- maybe have a second expected file
> > for that test is a good enough answer?  Or should I just take the test
> > out?  Opinions please.
>
>
> The attached ugly patch does it. We seem should put NO_LOCALE=1
> on the 'make check' command line for the encodings not compatible
> with the environmental locale, although it looks work.

The idea of separating the test into its own file has its merit; but
instead of having two different tests, I'm going to have a single test
and two expected files.  That seems simpler than messing around in the
makefile.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: pl/perl and utf-8 in sql_ascii databases

From
Alvaro Herrera
Date:
Excerpts from Alex Hunsaker's message of jue jun 21 10:27:41 -0400 2012:
>
> On Wed, Jun 20, 2012 at 1:15 PM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
> > Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012:
> >
> >> Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
> >> and SvPVUTF8() when turning a perl string into a cstring.
> >
> > Right.
> >
> > So I played a bit with this patch, and touched it a bit mainly just to
> > add some more comments; and while at it I noticed that some of the
> > functions in Util.xs might leak some memory, so I made an attempt to
> > plug them, as in the attached patch (which supersedes yours).
>
> I think most of these leaks go back to 9.0. Dunno if its worth
> backpatching them...

Well, nobody has ever complained about them so maybe they're just not
worth the effort.

> > to test the problem in the original report, I notice that we now have a
> > regression failure:
>
> > I'm not really sure what to do here -- maybe have a second expected file
> > for that test is a good enough answer?  Or should I just take the test
> > out?  Opinions please.
>
> I think we have broken that check twice so it seems like it would be
> nice to keep. But I don't feel *to* strongly about it.

Hmm, if we're broken it then it's probably best to keep it.

> The comment and cleanups all look good to me.

Thanks for the review.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: pl/perl and utf-8 in sql_ascii databases

From
Kyotaro HORIGUCHI
Date:
Hello.

> > The attached ugly patch does it. We seem should put NO_LOCALE=1
> > on the 'make check' command line for the encodings not compatible
> > with the environmental locale, although it looks work.
> 
> +REGRESS_LC0 = $(subst .sql,,$(shell cd sql; ls plperl_lc_$(shell echo
snip.
> Hrm, that's quite cute. I dunno if there is a more cannon way of doing
> the above-- but it seems to work. I'm not sure this regression test is
> worth it. I'm thinking maybe we should just remove
> theegressionegression test instead.

I agree. That is the fundamental question. I've coded just for my
fun but I don't see not so much signicance to do that. We might
omit the test for this which is non-ciritical and corner cases.

I'll leave it to your decision whether to do that.

> There is a minor issue with the patch where
> sql/plperl_lc_sql_ascii.sql contains the text "plperl_lc.sql". After
> copying sql/plperl_lc.sql to sql/plperl_lc_sql_ascii.sql everything
> worked as described.

Ah. It is what was a simbolic link. I made the patch with doubt
whether symlink could be encoded into diff, and saw the doubious
result but left as it is :-p. I leaned that no meaningful
symbolic-link cannot be used in source tree managed by git.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.


Re: pl/perl and utf-8 in sql_ascii databases

From
Kyotaro HORIGUCHI
Date:
Hello.

> > Renaming ret to quoted and str to ret as the patch attached might
> > make it easily readable.
> 
> I think I'm going to refrain from this because it will be more painful
> to backpatch.

I've felt hesitation to do so, too.

The new patch is indeed avoid leaks although which does not lasts
permanently. This will preserve more heap for future work but has
no necessity to do.

On the other hand, the return value of DatumGetTextP() is also
may (and 'should' in this case) palloc'ed but not pfree'd like
other part of PostgreSQL source (as far as I saw..) because of, I
suppose, the nature of these functions that it is
difficult/unable to be predicted/determined whether returning
memory block is palloc'ed ones or not. And the pain to maintain
such codes unrobust for future modification.

From such a point of view, we might be good to refrain to
backport this.


> > The attached ugly patch does it. We seem should put NO_LOCALE=1
> > on the 'make check' command line for the encodings not compatible
> > with the environmental locale, although it looks work.
> 
> The idea of separating the test into its own file has its merit; but
> instead of having two different tests, I'm going to have a single test
> and two expected files.  That seems simpler than messing around in the
> makefile.

Yes, you're right. But it was easier to add pairs of .sql and
.out to do that. Plus, as I wrote in another message, I'm
unwilling to push it nevertheless I've wrote it:-(

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.


Re: pl/perl and utf-8 in sql_ascii databases

From
Tom Lane
Date:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
>>> +REGRESS_LC0 = $(subst .sql,,$(shell cd sql; ls plperl_lc_$(shell echo

>> Hrm, that's quite cute. I dunno if there is a more cannon way of doing
>> the above-- but it seems to work. I'm not sure this regression test is
>> worth it. I'm thinking maybe we should just remove
>> theegressionegression test instead.

> I agree. That is the fundamental question. I've coded just for my
> fun but I don't see not so much signicance to do that. We might
> omit the test for this which is non-ciritical and corner cases.

We need these tests to work on Windows too, so fancy gmake tricks are
probably not the way to deal with varying results.
        regards, tom lane


Re: pl/perl and utf-8 in sql_ascii databases

From
Kyotaro HORIGUCHI
Date:
Hello,  thank you for your sugestion.

> > I agree. That is the fundamental question. I've coded just for my
> > fun but I don't see not so much signicance to do that. We might
> > omit the test for this which is non-ciritical and corner cases.
> 
> We need these tests to work on Windows too, so fancy gmake tricks are
> probably not the way to deal with varying results.

Hmm. I understand that you suggested that we should do this in
normal regression test.

Ok. Since there found to be only two patterns in the regression
test. The fancy thing is no more needed. I will unfold them and
make sure to work on mingw build environment.

And for one more environment, on the one with VC++.. I'll need a
bit longer time to make out what vcregress.pl does.


On the other things, I will decide as following and sent to
committer as soon as the above is finished.

- The main patch fixes the sql-ascii handling itself shoud ported into 9.2 and 9.1. Someone shoud work for this. (me?)

- The remainder of the patch whic fixes the easy fixable leakes of palloc'ed memory won't be ported into 9.1. This is
onlyfor 9.3dev.
 

- The patch for 9.3dev will be provided with the new regression test. It will be easily ported into 9.1 and 9.2 and
thereseems to be no problem technically, but a bit unsure from the other points of view...
 

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.


Re: pl/perl and utf-8 in sql_ascii databases

From
Kyotaro HORIGUCHI
Date:
Hello, Here is regression test runs on pg's also built with
cygwin-gcc and VC++.

The patches attached following,

- plperl_sql_ascii-4.patch         : fix for pl/perl utf8 vs sql_ascii
- plperl_sql_ascii_regress-1.patch : regression test for this patch.                                    I added some
testson encoding to this.
 

I will mark this patch as 'ready for committer' after this.

For the continuity of the behavior for sql_ascii and the chars
like \x80, It might be better if the main patch is back ported
into 9.1 and 9.2. New regression tests seems to have less
necessity to do since it has not been there from the first..


This regression test runs for all of Build with gcc3 / Linux(CentOS6.2-64) Built with Cygwin gcc3 / Windows7-64 Built
withVC++2008 / ActivePerl5.12 / Windows7-64
 

==========

I've been stuck in mud trying to plperl work on windows
environment. I saw many messages complaining that plperl wouldn't
be built to work. For the convenience of those and myself, I
describe the process of building postgresql with plperl on
Windows with cygwin and VC++ I've done below.

> Ok. Since there found to be only two patterns in the regression
> test. The fancy thing is no more needed. I will unfold them and
> make sure to work on mingw build environment.
> 
> And for one more environment, on the one with VC++.. I'll need a
> bit longer time to make out what vcregress.pl does.

I could understand what you meant after I managed to build plperl
to run properly. vcregress.pl reads $REGRESS in GNUmakefile so
variable substitution of make does not work on Windows'
regression. I resolved this problem by copying plperl_lc_*.out
files into plperl_lc.out before it runs pg_regress.

> - The main patch fixes the sql-ascii handling itself shoud ported
>   into 9.2 and 9.1. Someone shoud work for this. (me?)

done.

> - The remainder of the patch whic fixes the easy fixable leakes
>   of palloc'ed memory won't be ported into 9.1. This is only for
>   9.3dev.
> 
> - The patch for 9.3dev will be provided with the new regression
>   test. It will be easily ported into 9.1 and 9.2 and there seems
>   to be no problem technically, but a bit unsure from the other
>   points of view...

What should I do for this?

regards,

========
Addition - Building Windows binary for plperl

NOTE: This is ONE example I tried and turned out a success.

A. Cygwin
Versions: Windows 7 64bit          Cygwin 1.7.15          gcc 3.4.4 (cygwin-server is running)
1. Build perl aside system-installed one   perl-5.16.0$ export PATH=/usr/local/bin:/bin:/usr/bin:/usr/sbin
perl-5.16.0$./Configure --   perl-5.16.0$ ./Configure -d   perl-5.16.0$ make   perl-5.16.0$ make install
 
 - The first line needed to avoid the Makefile of perl stops by   parens in search path.
2. Build postgresql with --with-perl   pg93dev$ ./configure --with-perl   pg93dev$ make all
3. Run the regression test for plperl   pg93dev/src/pl/plperl$ make check   ....   pg93dev/src/pl/plperl$ make check
ENCODING=sql-ascii  ....
 

B. VC++   
Versions: Windows 7 64bit          Microsoft Visual C++ 2008 Express Edition          Active Perl v6.12.4 x86
1. Install Active Perl normally.  Assuming the install location   is "c:\Perl" and I did all operation on cmd.exe after
this.
2. Create config.pl in src/tools/msvc   pg> cd src\tools\msvc   msvc> copy config_default.pl config.pl   .... Edit as
follows
   -   perl=>undef,             # --with-perl   +   perl=>'c:\Perl',             # --with-perl
3. Build it
   msvc> C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\vcvars32.bat   msvc> build   msvc> install c:\pgsql
4. Run the regression tests
   4.1 Create test database and run postgres for UTF8 test
   A> set PATH=c:\pgsql\bin;c:\pgsql\lib;%PATH%   A> initdb -D <pgdata dir> --no-locale --encoding=utf8   A> postgres
-D<pgdata dir>
 
   ... on another cmd.exe
   B> set PATH=c:\pgsql\bin;c:\pgsql\lib;%PATH%   pg> cd src\tools\msvc   msvc> vcregress plcheck
   4.2 Run regression test for SQL-ASCII.   A> (delete <pgdata dir> and its contents.)   A> initdb -D <pgdata dir>
--no-locale--encoding=sql_ascii   A> postgres -D <pgdata dir>   ... same as 4.1 here after...
 


-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.
diff --git a/src/pl/plperl/Util.xs b/src/pl/plperl/Util.xs
index 7d0102b..4b4b680 100644
--- a/src/pl/plperl/Util.xs
+++ b/src/pl/plperl/Util.xs
@@ -67,8 +67,11 @@ static text *sv2text(SV *sv){    char       *str = sv2cstr(sv);
+    text       *text;
-    return cstring_to_text(str);
+    text = cstring_to_text(str);
+    pfree(str);
+    return text;}MODULE = PostgreSQL::InServer::Util PREFIX = util_
@@ -113,10 +116,12 @@ util_quote_literal(sv)    }    else {        text *arg = sv2text(sv);
-        text *ret = DatumGetTextP(DirectFunctionCall1(quote_literal, PointerGetDatum(arg)));
-        char *str = text_to_cstring(ret);
-        RETVAL = cstr2sv(str);
-        pfree(str);
+        text *quoted = DatumGetTextP(DirectFunctionCall1(quote_literal, PointerGetDatum(arg)));
+        char *ret;
+        pfree(arg);
+        ret = text_to_cstring(quoted);
+        RETVAL = cstr2sv(ret);
+        pfree(ret);    }    OUTPUT:    RETVAL
@@ -132,10 +137,12 @@ util_quote_nullable(sv)    else    {        text *arg = sv2text(sv);
-        text *ret = DatumGetTextP(DirectFunctionCall1(quote_nullable, PointerGetDatum(arg)));
-        char *str = text_to_cstring(ret);
-        RETVAL = cstr2sv(str);
-        pfree(str);
+        text *quoted = DatumGetTextP(DirectFunctionCall1(quote_nullable, PointerGetDatum(arg)));
+        char *ret;
+        pfree(arg);
+        ret = text_to_cstring(quoted);
+        RETVAL = cstr2sv(ret);
+        pfree(ret);    }    OUTPUT:    RETVAL
@@ -145,14 +152,15 @@ util_quote_ident(sv)    SV *sv    PREINIT:        text *arg;
-        text *ret;
-        char *str;
+        text *quoted;
+        char *ret;    CODE:        arg = sv2text(sv);
-        ret = DatumGetTextP(DirectFunctionCall1(quote_ident, PointerGetDatum(arg)));
-        str = text_to_cstring(ret);
-        RETVAL = cstr2sv(str);
-        pfree(str);
+        quoted = DatumGetTextP(DirectFunctionCall1(quote_ident, PointerGetDatum(arg)));
+        pfree(arg);
+        ret = text_to_cstring(quoted);
+        RETVAL = cstr2sv(ret);
+        pfree(ret);    OUTPUT:    RETVAL
diff --git a/src/pl/plperl/plperl_helpers.h b/src/pl/plperl/plperl_helpers.h
index 1b6648b..ed99194 100644
--- a/src/pl/plperl/plperl_helpers.h
+++ b/src/pl/plperl/plperl_helpers.h
@@ -3,21 +3,29 @@/* * convert from utf8 to database encoding
+ *
+ * Returns a palloc'ed copy of the original string */static inline char *
-utf_u2e(const char *utf8_str, size_t len)
+utf_u2e(char *utf8_str, size_t len){    int            enc = GetDatabaseEncoding();
-
-    char       *ret = (char *) pg_do_encoding_conversion((unsigned char *) utf8_str, len, PG_UTF8, enc);
+    char       *ret;    /*
-     * when we are a PG_UTF8 or SQL_ASCII database pg_do_encoding_conversion()
-     * will not do any conversion or verification. we need to do it manually
-     * instead.
+     * When we are in a PG_UTF8 or SQL_ASCII database
+     * pg_do_encoding_conversion() will not do any conversion (which is good)
+     * or verification (not so much), so we need to run the verification step
+     * separately.     */    if (enc == PG_UTF8 || enc == PG_SQL_ASCII)
-        pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
+    {
+        pg_verify_mbstr_len(enc, utf8_str, len, false);
+        ret = utf8_str;
+    }
+    else
+        ret = (char *) pg_do_encoding_conversion((unsigned char *) utf8_str,
+                                                 len, PG_UTF8, enc);    if (ret == utf8_str)        ret =
pstrdup(ret);
@@ -27,11 +35,15 @@ utf_u2e(const char *utf8_str, size_t len)/* * convert from database encoding to utf8
+ *
+ * Returns a palloc'ed copy of the original string */static inline char *utf_e2u(const char *str){
-    char       *ret = (char *) pg_do_encoding_conversion((unsigned char *) str, strlen(str), GetDatabaseEncoding(),
PG_UTF8);
+    char       *ret =
+        (char *) pg_do_encoding_conversion((unsigned char *) str, strlen(str),
+                                           GetDatabaseEncoding(), PG_UTF8);    if (ret == str)        ret =
pstrdup(ret);
@@ -41,6 +53,8 @@ utf_e2u(const char *str)/* * Convert an SV to a char * in the current database encoding
+ *
+ * Returns a palloc'ed copy of the original string */static inline char *sv2cstr(SV *sv)
@@ -51,7 +65,9 @@ sv2cstr(SV *sv)    /*     * get a utf8 encoded char * out of perl. *note* it may not be valid utf8!
-     *
+     */
+
+    /*     * SvPVutf8() croaks nastily on certain things, like typeglobs and     * readonly objects such as $^V.
That'sa perl bug - it's not supposed to     * happen. To avoid crashing the backend, we make a copy of the sv before
 
@@ -63,18 +79,27 @@ sv2cstr(SV *sv)        (SvTYPE(sv) > SVt_PVLV && SvTYPE(sv) != SVt_PVFM))        sv = newSVsv(sv);
 else
 
-
+    {        /*         * increase the reference count so we can just SvREFCNT_dec() it when         * we are done
   */        SvREFCNT_inc_simple_void(sv);
 
+    }
-    val = SvPVutf8(sv, len);
+    /*
+     * Request the string from Perl, in UTF-8 encoding; but if we're in a
+     * SQL_ASCII database, just request the byte soup without trying to make it
+     * UTF8, because that might fail.
+     */
+    if (GetDatabaseEncoding() == PG_SQL_ASCII)
+        val = SvPV(sv, len);
+    else
+        val = SvPVutf8(sv, len);    /*
-     * we use perl's length in the event we had an embedded null byte to
-     * ensure we error out properly
+     * Now convert to database encoding.  We use perl's length in the event we
+     * had an embedded null byte to ensure we error out properly.     */    res = utf_u2e(val, len);
@@ -88,16 +113,20 @@ sv2cstr(SV *sv) * Create a new SV from a string assumed to be in the current database's *
encoding.*/
 
-static inline SV *cstr2sv(const char *str){    SV           *sv;
-    char       *utf8_str = utf_e2u(str);
+    char       *utf8_str;
+
+    /* no conversion when SQL_ASCII */
+    if (GetDatabaseEncoding() == PG_SQL_ASCII)
+        return newSVpv(str, 0);
+
+    utf8_str = utf_e2u(str);    sv = newSVpv(utf8_str, 0);    SvUTF8_on(sv);
-    pfree(utf8_str);    return sv;
diff --git a/src/pl/plperl/GNUmakefile b/src/pl/plperl/GNUmakefile
index 188d7d2..533b41d 100644
--- a/src/pl/plperl/GNUmakefile
+++ b/src/pl/plperl/GNUmakefile
@@ -44,7 +44,12 @@ PERLCHUNKS = plc_perlboot.pl plc_trusted.plSHLIB_LINK = $(perl_embed_ldflags)REGRESS_OPTS =
--dbname=$(PL_TESTDB)--load-extension=plperl  --load-extension=plperlu
 
-REGRESS = plperl plperl_trigger plperl_shared plperl_elog plperl_util plperl_init plperlu plperl_array
+ifeq ($(shell echo $(ENCODING) | tr "A-Z-" "a-z_"),sql_ascii)
+REGRESS_LC_OUT_ORG=expected/plperl_lc_sql_ascii.out
+else
+REGRESS_LC_OUT_ORG=expected/plperl_lc_utf8.out
+endif
+REGRESS = plperl plperl_lc plperl_trigger plperl_shared plperl_elog plperl_util plperl_init plperlu plperl_array# if
Perlcan support two interpreters in one backend,# test plperl-and-plperlu casesifneq ($(PERL),)
 
@@ -98,6 +103,8 @@ uninstall-data:check: all submake
+    @echo Setting up per-locale regression
+    cp $(REGRESS_LC_OUT_ORG) expected/plperl_lc.out    $(pg_regress_check) $(REGRESS_OPTS) $(REGRESS)installcheck:
submake
diff --git a/src/pl/plperl/expected/plperl.out b/src/pl/plperl/expected/plperl.out
index df54937..906dc15 100644
--- a/src/pl/plperl/expected/plperl.out
+++ b/src/pl/plperl/expected/plperl.out
@@ -650,16 +650,6 @@ CONTEXT:  PL/Perl anonymous code blockDO $do$ use warnings FATAL => qw(void) ; my @y; my $x = sort
@y;1; $do$ LANGUAGE plperl;ERROR:  Useless use of sort in scalar context at line 1.CONTEXT:  PL/Perl anonymous code
block
---
--- Make sure strings are validated
--- Should fail for all encodings, as nul bytes are never permitted.
---
-CREATE OR REPLACE FUNCTION perl_zerob() RETURNS TEXT AS $$
-  return "abcd\0efg";
-$$ LANGUAGE plperl;
-SELECT perl_zerob();
-ERROR:  invalid byte sequence for encoding "UTF8": 0x00
-CONTEXT:  PL/Perl function "perl_zerob"-- make sure functions marked as VOID without an explicit return workCREATE OR
REPLACEFUNCTION myfuncs() RETURNS void AS $$   $_SHARED{myquote} = sub {
 
diff --git a/src/pl/plperl/expected/plperl_lc_sql_ascii.out b/src/pl/plperl/expected/plperl_lc_sql_ascii.out
new file mode 100644
index 0000000..c454c44
--- /dev/null
+++ b/src/pl/plperl/expected/plperl_lc_sql_ascii.out
@@ -0,0 +1,41 @@
+--
+-- Make sure strings are validated
+-- Should fail for all encodings, as nul bytes are never permitted.
+--
+CREATE OR REPLACE FUNCTION perl_zerob() RETURNS TEXT AS $$
+  return "abcd\0efg";
+$$ LANGUAGE plperl;
+SELECT perl_zerob();
+ERROR:  invalid byte sequence for encoding "SQL_ASCII": 0x00
+CONTEXT:  PL/Perl function "perl_zerob"
+CREATE OR REPLACE FUNCTION perl_0x80_in(text) RETURNS BOOL AS $$
+  return ($_[0] eq "abc\x80de" ? "true" : "false");
+$$ LANGUAGE plperl;
+SELECT perl_0x80_in(E'abc\x80de');
+ perl_0x80_in 
+--------------
+ t
+(1 row)
+
+CREATE OR REPLACE FUNCTION perl_0x80_out() RETURNS TEXT AS $$
+  return "abc\x80de";
+$$ LANGUAGE plperl;
+SELECT perl_0x80_out() = E'abc\x80de';
+ ?column? 
+----------
+ t
+(1 row)
+
+CREATE OR REPLACE FUNCTION perl_utf_inout(text) RETURNS TEXT AS $$
+  $str = $_[0]; $code = "NotUTF8:"; $match = "ab\xe5\xb1\xb1cd";
+  if (utf8::is_utf8($str)) {
+    $code = "UTF8:"; utf8::decode($str); $match="ab\x{5c71}cd";
+  }
+  return ($str ne $match ? $code."DIFFER" : $code."ab\x{5ddd}cd");
+$$ LANGUAGE plperl;
+SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
+          encode          
+--------------------------
+ NotUTF8:ab\345\267\235cd
+(1 row)
+
diff --git a/src/pl/plperl/expected/plperl_lc_utf8.out b/src/pl/plperl/expected/plperl_lc_utf8.out
new file mode 100644
index 0000000..8557b46
--- /dev/null
+++ b/src/pl/plperl/expected/plperl_lc_utf8.out
@@ -0,0 +1,33 @@
+--
+-- Make sure strings are validated
+-- Should fail for all encodings, as nul bytes are never permitted.
+--
+CREATE OR REPLACE FUNCTION perl_zerob() RETURNS TEXT AS $$
+  return "abcd\0efg";
+$$ LANGUAGE plperl;
+SELECT perl_zerob();
+ERROR:  invalid byte sequence for encoding "UTF8": 0x00
+CONTEXT:  PL/Perl function "perl_zerob"
+CREATE OR REPLACE FUNCTION perl_0x80_in(text) RETURNS BOOL AS $$
+  return ($_[0] eq "abc\x80de" ? "true" : "false");
+$$ LANGUAGE plperl;
+SELECT perl_0x80_in(E'abc\x80de');
+ERROR:  invalid byte sequence for encoding "UTF8": 0x80
+CREATE OR REPLACE FUNCTION perl_0x80_out() RETURNS TEXT AS $$
+  return "abc\x80de";
+$$ LANGUAGE plperl;
+SELECT perl_0x80_out() = E'abc\x80de';
+ERROR:  invalid byte sequence for encoding "UTF8": 0x80
+CREATE OR REPLACE FUNCTION perl_utf_inout(text) RETURNS TEXT AS $$
+  $str = $_[0]; $code = "NotUTF8:"; $match = "ab\xe5\xb1\xb1cd";
+  if (utf8::is_utf8($str)) {
+    $code = "UTF8:"; utf8::decode($str); $match="ab\x{5c71}cd";
+  }
+  return ($str ne $match ? $code."DIFFER" : $code."ab\x{5ddd}cd");
+$$ LANGUAGE plperl;
+SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
+        encode         
+-----------------------
+ UTF8:ab\345\267\235cd
+(1 row)
+
diff --git a/src/pl/plperl/sql/plperl.sql b/src/pl/plperl/sql/plperl.sql
index 84af1fd..a5e3840 100644
--- a/src/pl/plperl/sql/plperl.sql
+++ b/src/pl/plperl/sql/plperl.sql
@@ -423,15 +423,6 @@ DO $do$ use strict; my $name = "foo"; my $ref = $$name; $do$ LANGUAGE plperl;-- yields "ERROR:
Uselessuse of sort in scalar context."DO $do$ use warnings FATAL => qw(void) ; my @y; my $x = sort @y; 1; $do$ LANGUAGE
plperl;
---
--- Make sure strings are validated
--- Should fail for all encodings, as nul bytes are never permitted.
---
-CREATE OR REPLACE FUNCTION perl_zerob() RETURNS TEXT AS $$
-  return "abcd\0efg";
-$$ LANGUAGE plperl;
-SELECT perl_zerob();
--- make sure functions marked as VOID without an explicit return workCREATE OR REPLACE FUNCTION myfuncs() RETURNS void
AS$$   $_SHARED{myquote} = sub {
 
diff --git a/src/pl/plperl/sql/plperl_lc.sql b/src/pl/plperl/sql/plperl_lc.sql
new file mode 100644
index 0000000..fd75bc0
--- /dev/null
+++ b/src/pl/plperl/sql/plperl_lc.sql
@@ -0,0 +1,24 @@
+--
+-- Make sure strings are validated
+-- Should fail for all encodings, as nul bytes are never permitted.
+--
+CREATE OR REPLACE FUNCTION perl_zerob() RETURNS TEXT AS $$
+  return "abcd\0efg";
+$$ LANGUAGE plperl;
+SELECT perl_zerob();
+CREATE OR REPLACE FUNCTION perl_0x80_in(text) RETURNS BOOL AS $$
+  return ($_[0] eq "abc\x80de" ? "true" : "false");
+$$ LANGUAGE plperl;
+SELECT perl_0x80_in(E'abc\x80de');
+CREATE OR REPLACE FUNCTION perl_0x80_out() RETURNS TEXT AS $$
+  return "abc\x80de";
+$$ LANGUAGE plperl;
+SELECT perl_0x80_out() = E'abc\x80de';
+CREATE OR REPLACE FUNCTION perl_utf_inout(text) RETURNS TEXT AS $$
+  $str = $_[0]; $code = "NotUTF8:"; $match = "ab\xe5\xb1\xb1cd";
+  if (utf8::is_utf8($str)) {
+    $code = "UTF8:"; utf8::decode($str); $match="ab\x{5c71}cd";
+  }
+  return ($str ne $match ? $code."DIFFER" : $code."ab\x{5ddd}cd");
+$$ LANGUAGE plperl;
+SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index ef70350..58ca5b2 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -183,11 +183,19 @@ sub plcheck        }        print
"============================================================\n";       print "Checking $lang\n";
 
+        my @encoding =();
+        if ($lang eq 'plperl')
+        {
+            setupPlperlRegress("../../../$Config/psql/psql.exe");
+        }
+        my @args = (            "../../../$Config/pg_regress/pg_regress",
"--psqldir=../../../$Config/psql",
-            "--dbname=pl_regression",@lang_args,@tests
+            "--dbname=pl_regression",@encoding,@lang_args,@tests        );
+        print join('//', @args), '\n';
+        system(@args);        my $status = $? >> 8;        exit $status if $status;
@@ -197,6 +205,44 @@ sub plcheck    chdir "../../..";}
+sub setupPlperlRegress
+{
+    my $psql_cmd = shift;
+    my $encoding = `$psql_cmd -t -d postgres -c "select encoding from pg_database where datname='template0'"`;
+    my $status = $? >> 8;
+    my $expectfile = "expected/plperl_lc_utf8.out";
+    my @ret = ();
+    exit $status if $status;
+    
+    if ($encoding == 0)
+    {
+        $expectfile = "expected/plperl_lc_sql_ascii.out";
+        @ret = ("--encoding=SQL_ASCII");
+    }
+
+    print "Copy $expectfile as expected/plperl_lc.out\n";
+    simpleCopyFile($expectfile, 'expected/plperl_lc.out');
+
+    return @ret;
+}
+
+sub simpleCopyFile
+{
+    my ($from, $to) = @_;
+    my $inf;
+    my $outf;
+
+    # Copy expect file.
+    open($inf, "<$from") ||
+        die "Failed to open file \'$from\': $!";
+    open($outf, ">expected/plperl_lc.out") ||
+        die "Failed to open file \'$to\': $!";
+    print $outf $_ while(<$inf>);
+    close($inf);
+    close($outf);
+    
+}
+sub contribcheck{    chdir "../../../contrib";

Re: pl/perl and utf-8 in sql_ascii databases

From
Alex Hunsaker
Date:
On Tue, Jul 3, 2012 at 2:59 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello, Here is regression test runs on pg's also built with
> cygwin-gcc and VC++.

Thank you!

> The patches attached following,
>
> - plperl_sql_ascii-4.patch         : fix for pl/perl utf8 vs sql_ascii
> - plperl_sql_ascii_regress-1.patch : regression test for this patch.
>                                      I added some tests on encoding to this.
>
> I will mark this patch as 'ready for committer' after this.

Both patches look good to me..

> I've been stuck in mud trying to plperl work on windows
> environment. I saw many messages complaining that plperl wouldn't
> be built to work. For the convenience of those and myself, I
> describe the process of building postgresql with plperl on
> Windows with cygwin and VC++ I've done below.

Hrm, I don't develop on windows here, but out of curiosity, what were
the messages like?

>> - The remainder of the patch whic fixes the easy fixable leakes
>>   of palloc'ed memory won't be ported into 9.1. This is only for
>>   9.3dev.

> What should I do for this?

Just let the commiter decide? :-)


Re: pl/perl and utf-8 in sql_ascii databases

From
Kyotaro HORIGUCHI
Date:
Hello,

> > I've been stuck in mud trying to plperl work on windows
> > environment. I saw many messages complaining that plperl wouldn't
> > be built to work. For the convenience of those and myself, I
> > describe the process of building postgresql with plperl on
> > Windows with cygwin and VC++ I've done below.
> 
> Hrm, I don't develop on windows here, but out of curiosity, what were
> the messages like?

My memory about that has already become faint.. As far as I
remember, I saw two patterns of crash.

One is caused by gcc-4's stack-protector in cygperl5_10.dll. It
caused crash on "create function", (or create language). Building
postgresql with gcc-4 did not help for me. Finally, I gave up to
use pre-installed dll and built all including perl with GCC-3 to
make it work.

The another is 0xC0000005 (Access Violation) on 'create language
plperl' for VC10(:-p) vs ActivePerl5.14. This happenend at ERRSV
in plperl_(un)trasted_init(). Replacing ERRSV with
get_sv("@",FALSE) had put down that (but also I don't know if it
works) but finally I had a error "didn't get a CODE reference
from compiling function" on "create function .. language plperl"
which was the sign of dead end for me. I decided to behave well
to use ActivePerl5.12 and VC8 at last. I suppose this is a kind
of so-called "DLL HELL" related to memory allocation. ActivePerl
5.12 links the system's msvcrt.dll but VC links its output with
msvcrxx.dll. MS says memory allocaltion/deallocation across DLL
bounary should cause crash. But I don't know why the pair of
AP5.12 and VC8 results in success.

http://msdn.microsoft.com/en-us/library/ms235460.aspx

badalex> >> - The remainder of the patch whic fixes the easy fixable leakes
badalex> >>   of palloc'ed memory won't be ported into 9.1. This is only for
badalex> >>   9.3dev.
badalex> 
badalex> > What should I do for this?
badalex> 
badalex> Just let the commiter decide? :-)

Agreed.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.


Excerpts from Kyotaro HORIGUCHI's message of mar jul 03 04:59:38 -0400 2012:
> Hello, Here is regression test runs on pg's also built with
> cygwin-gcc and VC++.
>
> The patches attached following,
>
> - plperl_sql_ascii-4.patch         : fix for pl/perl utf8 vs sql_ascii
> - plperl_sql_ascii_regress-1.patch : regression test for this patch.
>                                      I added some tests on encoding to this.
>
> I will mark this patch as 'ready for committer' after this.

I have pushed these changes to HEAD, 9.2 and 9.1.  Instead of the games
with plperl_lc_*.out being copied around, I just used the ASCII version
as plperl_lc_1.out and the UTF8 one as plperl_lc.out.

I chose to backpatch the whole thing instead of cherry-picking parts of
it; that was turning into a tedious and pointless exercise.  We'll see
how does the buildfarm like the whole thing -- including on MSVC, which
I did not test at all.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Excerpts from Alvaro Herrera's message of mar jul 10 16:23:57 -0400 2012:
> Excerpts from Kyotaro HORIGUCHI's message of mar jul 03 04:59:38 -0400 2012:
> > Hello, Here is regression test runs on pg's also built with
> > cygwin-gcc and VC++.
> >
> > The patches attached following,
> >
> > - plperl_sql_ascii-4.patch         : fix for pl/perl utf8 vs sql_ascii
> > - plperl_sql_ascii_regress-1.patch : regression test for this patch.
> >                                      I added some tests on encoding to this.
> >
> > I will mark this patch as 'ready for committer' after this.
>
> I have pushed these changes to HEAD, 9.2 and 9.1.  Instead of the games
> with plperl_lc_*.out being copied around, I just used the ASCII version
> as plperl_lc_1.out and the UTF8 one as plperl_lc.out.

... and this story hasn't ended yet, because one of the new tests is
failing.  See here:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=magpie&dt=2012-07-11%2010%3A00%3A04

The interesting part of the diff is:

***************
*** 34,41 ****   return ($str ne $match ? $code."DIFFER" : $code."ab\x{5ddd}cd"); $$ LANGUAGE plperl; SELECT
encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea,'escape') 
!           encode
! --------------------------
!  NotUTF8:ab\345\267\235cd
! (1 row)
!
--- 34,38 ----   return ($str ne $match ? $code."DIFFER" : $code."ab\x{5ddd}cd"); $$ LANGUAGE plperl; SELECT
encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea,'escape') 
! ERROR:  character with byte sequence 0xe5 0xb7 0x9d in encoding "UTF8" has no equivalent in encoding "LATIN1"
! CONTEXT:  PL/Perl function "perl_utf_inout"


I am not sure what can we do here other than remove this function and
query from the test.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


On Wed, Jul 11, 2012 at 1:42 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
>
>> I have pushed these changes to HEAD, 9.2 and 9.1.  Instead of the games
>> with plperl_lc_*.out being copied around, I just used the ASCII version
>> as plperl_lc_1.out and the UTF8 one as plperl_lc.out.
>
> ... and this story hasn't ended yet, because one of the new tests is
> failing.  See here:
>
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=magpie&dt=2012-07-11%2010%3A00%3A04

> [...]
>   SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
> ! ERROR:  character with byte sequence 0xe5 0xb7 0x9d in encoding "UTF8" has no equivalent in encoding "LATIN1"
> ! CONTEXT:  PL/Perl function "perl_utf_inout"
>
>
> I am not sure what can we do here other than remove this function and
> query from the test.

Hrm, me neither. I say drop em.


Re: [SPAM] [MessageLimit][lowlimit] Re: pl/perl and utf-8 in sql_ascii databases

From
Kyotaro HORIGUCHI
Date:
Hmm... Sorry for immature patch..

> ... and this story hasn't ended yet, because one of the new tests is
> failing.  See here:
> 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=magpie&dt=2012-07-11%2010%3A00%3A04
> 
> The interesting part of the diff is:
...
>   SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
> ! ERROR:  character with byte sequence 0xe5 0xb7 0x9d in encoding "UTF8" has no equivalent in encoding "LATIN1"
> ! CONTEXT:  PL/Perl function "perl_utf_inout"
> 
> 
> I am not sure what can we do here other than remove this function and
> query from the test.

I've run the regress only for the environment capable to handle
the character U+5ddd (Japanese character which means river)...

The byte sequences which can be decoded and the result byte
sequences of encoding from a unicode character vary among the
encodings.

The problem itself which is the aim of this thread could be
covered without the additional test. That confirms if
encoding/decoding is done as expected on calling the language
handler. I suppose that testing for the two cases and additional
one case which runs pg_do_encoding_conversion(), say latin1,
would be enough to confirm that encoding/decoding is properly
done, since the concrete conversion scheme is not significant
this case.

So I recommend that we should add the test for latin1 and omit
the test from other than sql_ascii, utf8 and latin1. This might
be archieved by create empty plperl_lc.sql and plperl_lc.out
files for those encodings.

What do you think about that?


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.


Re: pl/perl and utf-8 in sql_ascii databases

From
Kyotaro HORIGUCHI
Date:
Very sorry for rotten subject. I resent the message with correct subject.
# Our mail server insisted that the message is spam. sigh..
====
Hmm... Sorry for immature patch..

> ... and this story hasn't ended yet, because one of the new tests is
> failing.  See here:
> 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=magpie&dt=2012-07-11%2010%3A00%3A04
> 
> The interesting part of the diff is:
...
>   SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
> ! ERROR:  character with byte sequence 0xe5 0xb7 0x9d in encoding "UTF8" has no equivalent in encoding "LATIN1"
> ! CONTEXT:  PL/Perl function "perl_utf_inout"
> 
> 
> I am not sure what can we do here other than remove this function and
> query from the test.

I've run the regress only for the environment capable to handle
the character U+5ddd (Japanese character which means river)...

The byte sequences which can be decoded and the result byte
sequences of encoding from a unicode character vary among the
encodings.

The problem itself which is the aim of this thread could be
covered without the additional test. That confirms if
encoding/decoding is done as expected on calling the language
handler. I suppose that testing for the two cases and additional
one case which runs pg_do_encoding_conversion(), say latin1,
would be enough to confirm that encoding/decoding is properly
done, since the concrete conversion scheme is not significant
this case.

So I recommend that we should add the test for latin1 and omit
the test from other than sql_ascii, utf8 and latin1. This might
be archieved by create empty plperl_lc.sql and plperl_lc.out
files for those encodings.

What do you think about that?


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.


Re: pl/perl and utf-8 in sql_ascii databases

From
Alvaro Herrera
Date:
Excerpts from Kyotaro HORIGUCHI's message of jue jul 12 00:09:19 -0400 2012:
>
> Hmm... Sorry for immature patch..

No need to apologize.

> > ... and this story hasn't ended yet, because one of the new tests is
> > failing.  See here:
> >
> > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=magpie&dt=2012-07-11%2010%3A00%3A04
> >
> > The interesting part of the diff is:
> ...
> >   SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
> > ! ERROR:  character with byte sequence 0xe5 0xb7 0x9d in encoding "UTF8" has no equivalent in encoding "LATIN1"
> > ! CONTEXT:  PL/Perl function "perl_utf_inout"
> >
> >
> > I am not sure what can we do here other than remove this function and
> > query from the test.
>
> I've run the regress only for the environment capable to handle
> the character U+5ddd (Japanese character which means river)...
>
> The byte sequences which can be decoded and the result byte
> sequences of encoding from a unicode character vary among the
> encodings.

Right.  I only ran the test in C and UTF8, not Latin1, so I didn't see
it fail either.

> The problem itself which is the aim of this thread could be
> covered without the additional test. That confirms if
> encoding/decoding is done as expected on calling the language
> handler.

Right.

> I suppose that testing for the two cases and additional
> one case which runs pg_do_encoding_conversion(), say latin1,
> would be enough to confirm that encoding/decoding is properly
> done, since the concrete conversion scheme is not significant
> this case.
>
> So I recommend that we should add the test for latin1 and omit
> the test from other than sql_ascii, utf8 and latin1. This might
> be archieved by create empty plperl_lc.sql and plperl_lc.out
> files for those encodings.
>
> What do you think about that?

I think that's probably too much engineering for something that doesn't
really warrant it.  A real solution to this problem could be to create
yet another new test file containing just this function definition and
the query that calls it, and have one expected file for each encoding;
but that's too much work and too many files, I'm afraid.

I can see us supporting tests that require a small number of expected
files.  No Make tricks with file copying, though.  If we can't get
some easy way to test this without that, I submit we should just remove
the test.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: pl/perl and utf-8 in sql_ascii databases

From
Kyotaro HORIGUCHI
Date:
Hello,

> > I suppose that testing for the two cases and additional
> > one case which runs pg_do_encoding_conversion(), say latin1,
> > would be enough to confirm that encoding/decoding is properly
> > done, since the concrete conversion scheme is not significant
> > this case.
> > 
> > So I recommend that we should add the test for latin1 and omit
> > the test from other than sql_ascii, utf8 and latin1. This might
> > be archieved by create empty plperl_lc.sql and plperl_lc.out
> > files for those encodings.
> > 
> > What do you think about that?
> 
> I think that's probably too much engineering for something that doesn't
> really warrant it.  A real solution to this problem could be to create
> yet another new test file containing just this function definition and
> the query that calls it, and have one expected file for each encoding;
> but that's too much work and too many files, I'm afraid.

I agree completely. The balance between the additional complexity
of regress and the what we would get from the complexity...

> I can see us supporting tests that require a small number of expected
> files.  No Make tricks with file copying, though.  If we can't get
> some easy way to test this without that, I submit we should just remove
> the test.

Ok I agree to do so.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.


Re: pl/perl and utf-8 in sql_ascii databases

From
Alvaro Herrera
Date:
Excerpts from Kyotaro HORIGUCHI's message of mar jul 17 05:01:10 -0400 2012:

> > I think that's probably too much engineering for something that doesn't
> > really warrant it.  A real solution to this problem could be to create
> > yet another new test file containing just this function definition and
> > the query that calls it, and have one expected file for each encoding;
> > but that's too much work and too many files, I'm afraid.
>
> I agree completely. The balance between the additional complexity
> of regress and the what we would get from the complexity...

I had to remove both that test and the one about the 0x80, because it
wasn't working for me in either SQL_ASCII or Latin1, I forget which.
I'm not sure I understand the reason for the failure -- I was getting a
false result instead of true, which was unexpected.  Maybe there's a
trivial explanation for this .. or maybe it really is broken.

In any case, maybe it'd be a good idea to have more tests related to
encodings, if we can write them in some reasonable manner.  But only in
HEAD, I guess, because having to backpatch stuff and test every branch
in at least three encodings is just too annoying.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support