Thread: utf8 encoding problem with plperlu

utf8 encoding problem with plperlu

From
Ronald Peterson
Date:
The following short function illustrates a problem I'm having with the plperlu module.

CREATE OR REPLACE FUNCTION
doublezero ()
RETURNS VOID
AS $$
use Encode qw/encode decode/;
$pass = "double00";
elog( INFO, "$pass" );
$mspass = encode( 'UTF-16LE', qq("$pass") );
elog( INFO, "$mspass" );
$$ LANGUAGE plperlu
STRICT;

# select * from doublezero();
INFO:  double00
CONTEXT:  PL/Perl function "doublezero"
ERROR:  invalid byte sequence for encoding "UTF8": 0x00 at line 8, <DATA> line 558.
CONTEXT:  PL/Perl function "doublezero"

I don't understand this.  I need to pass $mspass to Active Directory, and it the encoding is exactly as it should be, which is to say, it works for strings that don't include two consecutive zeros.  Is this a bug?

-R-




Re: utf8 encoding problem with plperlu

From
Adrian Klaver
Date:
On 07/15/2015 07:14 AM, Ronald Peterson wrote:
> The following short function illustrates a problem I'm having with the
> plperlu module.
>
> CREATE OR REPLACE FUNCTION
> doublezero ()
> RETURNS VOID
> AS $$
> use Encode qw/encode decode/;
> $pass = "double00";
> elog( INFO, "$pass" );
> $mspass = encode( 'UTF-16LE', qq("$pass") );
> elog( INFO, "$mspass" );
> $$ LANGUAGE plperlu
> STRICT;
>
> # select * from doublezero();
> INFO:  double00
> CONTEXT:  PL/Perl function "doublezero"
> ERROR:  invalid byte sequence for encoding "UTF8": 0x00 at line 8,
> <DATA> line 558.
> CONTEXT:  PL/Perl function "doublezero"
>
> I don't understand this.  I need to pass $mspass to Active Directory,
> and it the encoding is exactly as it should be, which is to say, it
> works for strings that don't include two consecutive zeros.  Is this a bug?

I am not a Perl user, but the question that came to mind is-

Does this:

mspass = encode( 'UTF-16LE', qq("$pass") )

work in Perl outside of plperlu?

>
> -R-
>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: utf8 encoding problem with plperlu

From
"Daniel Verite"
Date:
    Ronald Peterson wrote:

> # select * from doublezero();
> INFO:  double00
> CONTEXT:  PL/Perl function "doublezero"
> ERROR:  invalid byte sequence for encoding "UTF8": 0x00 at line 8, <DATA>
> line 558.
> CONTEXT:  PL/Perl function "doublezero"
>
> I don't understand this.  I need to pass $mspass to Active Directory, and it
> the encoding is exactly as it should be, which is to say, it works for
> strings that don't include two consecutive zeros.  Is this a bug?

When replacing the literal "double00" with "foobar" in your function,
the same error occurs for me:

    test=# select doublezero();
    INFO:  foobar
    CONTEXT:  PL/Perl function "doublezero"
    ERROR:  invalid byte sequence for encoding "UTF8": 0x00 at line 6.
    CONTEXT:  fonction PL/Perl « doublezero »

Anyway it's not clear what you expect. PG doesn't support UTF-16,
and even if it did, it wouldn't accept such strings when the current
encoding is UTF-8.
If Active Directory wants UTF-16LE, you have to do that conversion, but
don't pass the result back to postgres in this format.


Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org


Re: utf8 encoding problem with plperlu

From
Ronald Peterson
Date:
That's interesting.  What I'm really doing, instead of the second elog statement, is this:

$ret = $ldap->modify( $dn,
                      replace => {
                         unicodePwd => $mspass
                  } );

This does work for strings that don't contain consecutive zeroes.  I'm not really passing the string to PostgreSQL, but to Net::LDAP, but it must hit PostgreSQL anyway?  Active Directory requires this encoding, so I'm not sure what to do here.


On Wed, Jul 15, 2015 at 11:57 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
        Ronald Peterson wrote:

> # select * from doublezero();
> INFO:  double00
> CONTEXT:  PL/Perl function "doublezero"
> ERROR:  invalid byte sequence for encoding "UTF8": 0x00 at line 8, <DATA>
> line 558.
> CONTEXT:  PL/Perl function "doublezero"
>
> I don't understand this.  I need to pass $mspass to Active Directory, and it
> the encoding is exactly as it should be, which is to say, it works for
> strings that don't include two consecutive zeros.  Is this a bug?

When replacing the literal "double00" with "foobar" in your function,
the same error occurs for me:

    test=# select doublezero();
    INFO:  foobar
    CONTEXT:  PL/Perl function "doublezero"
    ERROR:  invalid byte sequence for encoding "UTF8": 0x00 at line 6.
    CONTEXT:  fonction PL/Perl « doublezero »

Anyway it's not clear what you expect. PG doesn't support UTF-16,
and even if it did, it wouldn't accept such strings when the current
encoding is UTF-8.
If Active Directory wants UTF-16LE, you have to do that conversion, but
don't pass the result back to postgres in this format.


Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org



--
-R-




Re: utf8 encoding problem with plperlu

From
Pavel Stehule
Date:


2015-07-15 20:20 GMT+02:00 Ronald Peterson <ron@hub.yellowbank.com>:
That's interesting.  What I'm really doing, instead of the second elog statement, is this:

$ret = $ldap->modify( $dn,
                      replace => {
                         unicodePwd => $mspass
                  } );

This does work for strings that don't contain consecutive zeroes.  I'm not really passing the string to PostgreSQL, but to Net::LDAP, but it must hit PostgreSQL anyway?  Active Directory requires this encoding, so I'm not sure what to do here.

I had some issues, when I used some Perl libraries with UTF strings - some requires, some not UTF flag in string. And Postgres didn't well set thist UTF flag well.

http://blog.endpoint.com/2014/02/dbdpg-utf-8-perl-postgresql.html

Maybe you have similar issue - on server side.

Pavel

 


On Wed, Jul 15, 2015 at 11:57 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
        Ronald Peterson wrote:

> # select * from doublezero();
> INFO:  double00
> CONTEXT:  PL/Perl function "doublezero"
> ERROR:  invalid byte sequence for encoding "UTF8": 0x00 at line 8, <DATA>
> line 558.
> CONTEXT:  PL/Perl function "doublezero"
>
> I don't understand this.  I need to pass $mspass to Active Directory, and it
> the encoding is exactly as it should be, which is to say, it works for
> strings that don't include two consecutive zeros.  Is this a bug?

When replacing the literal "double00" with "foobar" in your function,
the same error occurs for me:

    test=# select doublezero();
    INFO:  foobar
    CONTEXT:  PL/Perl function "doublezero"
    ERROR:  invalid byte sequence for encoding "UTF8": 0x00 at line 6.
    CONTEXT:  fonction PL/Perl « doublezero »

Anyway it's not clear what you expect. PG doesn't support UTF-16,
and even if it did, it wouldn't accept such strings when the current
encoding is UTF-8.
If Active Directory wants UTF-16LE, you have to do that conversion, but
don't pass the result back to postgres in this format.


Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org



--
-R-





Re: utf8 encoding problem with plperlu

From
Ronald Peterson
Date:
Thanks Pavel, this looks promising.  I didn't know about the Data::Peek module - that might help me figure out what is going on.

On Wed, Jul 15, 2015 at 2:28 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2015-07-15 20:20 GMT+02:00 Ronald Peterson <ron@hub.yellowbank.com>:
That's interesting.  What I'm really doing, instead of the second elog statement, is this:

$ret = $ldap->modify( $dn,
                      replace => {
                         unicodePwd => $mspass
                  } );

This does work for strings that don't contain consecutive zeroes.  I'm not really passing the string to PostgreSQL, but to Net::LDAP, but it must hit PostgreSQL anyway?  Active Directory requires this encoding, so I'm not sure what to do here.

I had some issues, when I used some Perl libraries with UTF strings - some requires, some not UTF flag in string. And Postgres didn't well set thist UTF flag well.

http://blog.endpoint.com/2014/02/dbdpg-utf-8-perl-postgresql.html

Maybe you have similar issue - on server side.

Pavel

 


On Wed, Jul 15, 2015 at 11:57 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
        Ronald Peterson wrote:

> # select * from doublezero();
> INFO:  double00
> CONTEXT:  PL/Perl function "doublezero"
> ERROR:  invalid byte sequence for encoding "UTF8": 0x00 at line 8, <DATA>
> line 558.
> CONTEXT:  PL/Perl function "doublezero"
>
> I don't understand this.  I need to pass $mspass to Active Directory, and it
> the encoding is exactly as it should be, which is to say, it works for
> strings that don't include two consecutive zeros.  Is this a bug?

When replacing the literal "double00" with "foobar" in your function,
the same error occurs for me:

    test=# select doublezero();
    INFO:  foobar
    CONTEXT:  PL/Perl function "doublezero"
    ERROR:  invalid byte sequence for encoding "UTF8": 0x00 at line 6.
    CONTEXT:  fonction PL/Perl « doublezero »

Anyway it's not clear what you expect. PG doesn't support UTF-16,
and even if it did, it wouldn't accept such strings when the current
encoding is UTF-8.
If Active Directory wants UTF-16LE, you have to do that conversion, but
don't pass the result back to postgres in this format.


Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org



--
-R-








--
-R-




Re: utf8 encoding problem with plperlu

From
Tom Lane
Date:
Ronald Peterson <ron@hub.yellowbank.com> writes:
> This does work for strings that don't contain consecutive zeroes.  I'm not
> really passing the string to PostgreSQL, but to Net::LDAP, but it must hit
> PostgreSQL anyway?  Active Directory requires this encoding, so I'm not
> sure what to do here.

Hm, well, the concrete example you showed involved passing the string to
elog(), which definitely will complain if what it's fed isn't legal data
according to the database encoding; as would any other attempt to push
data into the Postgres server environment.  I don't see why operations
that are strictly within Perl would have a problem, though.

            regards, tom lane


Re: utf8 encoding problem with plperlu

From
Ronald Peterson
Date:
Still trying to figure this out, still confused, but like most frustrating programming problems, I think I may be looking in the wrong place for the source of this error.  Perhaps.

On Wed, Jul 15, 2015 at 11:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ronald Peterson <ron@hub.yellowbank.com> writes:
> This does work for strings that don't contain consecutive zeroes.  I'm not
> really passing the string to PostgreSQL, but to Net::LDAP, but it must hit
> PostgreSQL anyway?  Active Directory requires this encoding, so I'm not
> sure what to do here.

Hm, well, the concrete example you showed involved passing the string to
elog(), which definitely will complain if what it's fed isn't legal data
according to the database encoding; as would any other attempt to push
data into the Postgres server environment.  I don't see why operations
that are strictly within Perl would have a problem, though.

                        regards, tom lane



--
-R-