Thread: plperl: enable UTF-8 support

plperl: enable UTF-8 support

From
David Kamholz
Date:
Hello,

Here's a patch I added against plperl, originally against beta5, now
against rc1. It simply checks with GetDatabaseEncoding() if the current
database is in UTF-8, and if so, sets the UTF-8 flag on the arguments
that are passed to perl. This means that it isn't necessary to
utf8::upgrade() every string, as perl has no way of knowing offhand
that a string is UTF-8 -- but postgres does, because the database
encoding is specified, so it makes sense to turn the flag on. You
should also be able to properly manipulate UTF-8 strings now from
plperl as opposed to plperlu, because otherwise you'd have to use
encoding 'utf8' which was not allowed. It could also eliminate some
unexpected bugs if you assume that perl knows the string is unicode. It
is enabled only for perl 5.6 and higher, so earlier versions will not
be affected.

I have been assured by crab that the patch is quite harmless and will
not break anything. It would be great to see it in 8 final! :-)

Regards,
Dave


Attachment

Re: plperl: enable UTF-8 support

From
Bruce Momjian
Date:
I need someone who understands UTF8 and perl to review this before being
applied.

Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


David Kamholz wrote:
> Hello,
>
> Here's a patch I added against plperl, originally against beta5, now
> against rc1. It simply checks with GetDatabaseEncoding() if the current
> database is in UTF-8, and if so, sets the UTF-8 flag on the arguments
> that are passed to perl. This means that it isn't necessary to
> utf8::upgrade() every string, as perl has no way of knowing offhand
> that a string is UTF-8 -- but postgres does, because the database
> encoding is specified, so it makes sense to turn the flag on. You
> should also be able to properly manipulate UTF-8 strings now from
> plperl as opposed to plperlu, because otherwise you'd have to use
> encoding 'utf8' which was not allowed. It could also eliminate some
> unexpected bugs if you assume that perl knows the string is unicode. It
> is enabled only for perl 5.6 and higher, so earlier versions will not
> be affected.
>
> I have been assured by crab that the patch is quite harmless and will
> not break anything. It would be great to see it in 8 final! :-)
>
> Regards,
> Dave
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: plperl: enable UTF-8 support

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I need someone who understands UTF8 and perl to review this before being
> applied.

I looked at it and felt that we couldn't risk it for 8.0 --- although
the patch looks simple enough, we don't really know the consequences for
compatibility with different Perl versions.  I wouldn't object to
applying it for 8.1, I just think it's too risky at this point in the
8.0 cycle.

            regards, tom lane

Re: plperl: enable UTF-8 support

From
Bruce Momjian
Date:
Moved:


This has been saved for the 8.1 release:

    http:/momjian.postgresql.org/cgi-bin/pgpatches2

---------------------------------------------------------------------------

David Kamholz wrote:
> Hello,
>
> Here's a patch I added against plperl, originally against beta5, now
> against rc1. It simply checks with GetDatabaseEncoding() if the current
> database is in UTF-8, and if so, sets the UTF-8 flag on the arguments
> that are passed to perl. This means that it isn't necessary to
> utf8::upgrade() every string, as perl has no way of knowing offhand
> that a string is UTF-8 -- but postgres does, because the database
> encoding is specified, so it makes sense to turn the flag on. You
> should also be able to properly manipulate UTF-8 strings now from
> plperl as opposed to plperlu, because otherwise you'd have to use
> encoding 'utf8' which was not allowed. It could also eliminate some
> unexpected bugs if you assume that perl knows the string is unicode. It
> is enabled only for perl 5.6 and higher, so earlier versions will not
> be affected.
>
> I have been assured by crab that the patch is quite harmless and will
> not break anything. It would be great to see it in 8 final! :-)
>
> Regards,
> Dave
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: plperl: enable UTF-8 support

From
Bruce Momjian
Date:
No problem.  Moved to 8.1 patches queue.

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I need someone who understands UTF8 and perl to review this before being
> > applied.
>
> I looked at it and felt that we couldn't risk it for 8.0 --- although
> the patch looks simple enough, we don't really know the consequences for
> compatibility with different Perl versions.  I wouldn't object to
> applying it for 8.1, I just think it's too risky at this point in the
> 8.0 cycle.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: plperl: enable UTF-8 support

From
Bruce Momjian
Date:
It seems the plperl code has changed in the areas you are modifying.
Would you update your patch against current CVS?  Thanks.

---------------------------------------------------------------------------

David Kamholz wrote:
> Hello,
>
> Here's a patch I added against plperl, originally against beta5, now
> against rc1. It simply checks with GetDatabaseEncoding() if the current
> database is in UTF-8, and if so, sets the UTF-8 flag on the arguments
> that are passed to perl. This means that it isn't necessary to
> utf8::upgrade() every string, as perl has no way of knowing offhand
> that a string is UTF-8 -- but postgres does, because the database
> encoding is specified, so it makes sense to turn the flag on. You
> should also be able to properly manipulate UTF-8 strings now from
> plperl as opposed to plperlu, because otherwise you'd have to use
> encoding 'utf8' which was not allowed. It could also eliminate some
> unexpected bugs if you assume that perl knows the string is unicode. It
> is enabled only for perl 5.6 and higher, so earlier versions will not
> be affected.
>
> I have been assured by crab that the patch is quite harmless and will
> not break anything. It would be great to see it in 8 final! :-)
>
> Regards,
> Dave
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: plperl: enable UTF-8 support

From
Bruce Momjian
Date:
Newest patch applied.  Thanks.

---------------------------------------------------------------------------

Bruce Momjian wrote:
>
> It seems the plperl code has changed in the areas you are modifying.
> Would you update your patch against current CVS?  Thanks.
>
> ---------------------------------------------------------------------------
>
> David Kamholz wrote:
> > Hello,
> >
> > Here's a patch I added against plperl, originally against beta5, now
> > against rc1. It simply checks with GetDatabaseEncoding() if the current
> > database is in UTF-8, and if so, sets the UTF-8 flag on the arguments
> > that are passed to perl. This means that it isn't necessary to
> > utf8::upgrade() every string, as perl has no way of knowing offhand
> > that a string is UTF-8 -- but postgres does, because the database
> > encoding is specified, so it makes sense to turn the flag on. You
> > should also be able to properly manipulate UTF-8 strings now from
> > plperl as opposed to plperlu, because otherwise you'd have to use
> > encoding 'utf8' which was not allowed. It could also eliminate some
> > unexpected bugs if you assume that perl knows the string is unicode. It
> > is enabled only for perl 5.6 and higher, so earlier versions will not
> > be affected.
> >
> > I have been assured by crab that the patch is quite harmless and will
> > not break anything. It would be great to see it in 8 final! :-)
> >
> > Regards,
> > Dave
> >
>
> [ Attachment, skipping... ]
>
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073