Thread: plperl: enable UTF-8 support
Hello, Here's a patch I added against plperl, originally against beta5, now against rc1. It simply checks with GetDatabaseEncoding() if the current database is in UTF-8, and if so, sets the UTF-8 flag on the arguments that are passed to perl. This means that it isn't necessary to utf8::upgrade() every string, as perl has no way of knowing offhand that a string is UTF-8 -- but postgres does, because the database encoding is specified, so it makes sense to turn the flag on. You should also be able to properly manipulate UTF-8 strings now from plperl as opposed to plperlu, because otherwise you'd have to use encoding 'utf8' which was not allowed. It could also eliminate some unexpected bugs if you assume that perl knows the string is unicode. It is enabled only for perl 5.6 and higher, so earlier versions will not be affected. I have been assured by crab that the patch is quite harmless and will not break anything. It would be great to see it in 8 final! :-) Regards, Dave
Attachment
I need someone who understands UTF8 and perl to review this before being applied. Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --------------------------------------------------------------------------- David Kamholz wrote: > Hello, > > Here's a patch I added against plperl, originally against beta5, now > against rc1. It simply checks with GetDatabaseEncoding() if the current > database is in UTF-8, and if so, sets the UTF-8 flag on the arguments > that are passed to perl. This means that it isn't necessary to > utf8::upgrade() every string, as perl has no way of knowing offhand > that a string is UTF-8 -- but postgres does, because the database > encoding is specified, so it makes sense to turn the flag on. You > should also be able to properly manipulate UTF-8 strings now from > plperl as opposed to plperlu, because otherwise you'd have to use > encoding 'utf8' which was not allowed. It could also eliminate some > unexpected bugs if you assume that perl knows the string is unicode. It > is enabled only for perl 5.6 and higher, so earlier versions will not > be affected. > > I have been assured by crab that the patch is quite harmless and will > not break anything. It would be great to see it in 8 final! :-) > > Regards, > Dave > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I need someone who understands UTF8 and perl to review this before being > applied. I looked at it and felt that we couldn't risk it for 8.0 --- although the patch looks simple enough, we don't really know the consequences for compatibility with different Perl versions. I wouldn't object to applying it for 8.1, I just think it's too risky at this point in the 8.0 cycle. regards, tom lane
Moved: This has been saved for the 8.1 release: http:/momjian.postgresql.org/cgi-bin/pgpatches2 --------------------------------------------------------------------------- David Kamholz wrote: > Hello, > > Here's a patch I added against plperl, originally against beta5, now > against rc1. It simply checks with GetDatabaseEncoding() if the current > database is in UTF-8, and if so, sets the UTF-8 flag on the arguments > that are passed to perl. This means that it isn't necessary to > utf8::upgrade() every string, as perl has no way of knowing offhand > that a string is UTF-8 -- but postgres does, because the database > encoding is specified, so it makes sense to turn the flag on. You > should also be able to properly manipulate UTF-8 strings now from > plperl as opposed to plperlu, because otherwise you'd have to use > encoding 'utf8' which was not allowed. It could also eliminate some > unexpected bugs if you assume that perl knows the string is unicode. It > is enabled only for perl 5.6 and higher, so earlier versions will not > be affected. > > I have been assured by crab that the patch is quite harmless and will > not break anything. It would be great to see it in 8 final! :-) > > Regards, > Dave > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
No problem. Moved to 8.1 patches queue. --------------------------------------------------------------------------- Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I need someone who understands UTF8 and perl to review this before being > > applied. > > I looked at it and felt that we couldn't risk it for 8.0 --- although > the patch looks simple enough, we don't really know the consequences for > compatibility with different Perl versions. I wouldn't object to > applying it for 8.1, I just think it's too risky at this point in the > 8.0 cycle. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
It seems the plperl code has changed in the areas you are modifying. Would you update your patch against current CVS? Thanks. --------------------------------------------------------------------------- David Kamholz wrote: > Hello, > > Here's a patch I added against plperl, originally against beta5, now > against rc1. It simply checks with GetDatabaseEncoding() if the current > database is in UTF-8, and if so, sets the UTF-8 flag on the arguments > that are passed to perl. This means that it isn't necessary to > utf8::upgrade() every string, as perl has no way of knowing offhand > that a string is UTF-8 -- but postgres does, because the database > encoding is specified, so it makes sense to turn the flag on. You > should also be able to properly manipulate UTF-8 strings now from > plperl as opposed to plperlu, because otherwise you'd have to use > encoding 'utf8' which was not allowed. It could also eliminate some > unexpected bugs if you assume that perl knows the string is unicode. It > is enabled only for perl 5.6 and higher, so earlier versions will not > be affected. > > I have been assured by crab that the patch is quite harmless and will > not break anything. It would be great to see it in 8 final! :-) > > Regards, > Dave > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Newest patch applied. Thanks. --------------------------------------------------------------------------- Bruce Momjian wrote: > > It seems the plperl code has changed in the areas you are modifying. > Would you update your patch against current CVS? Thanks. > > --------------------------------------------------------------------------- > > David Kamholz wrote: > > Hello, > > > > Here's a patch I added against plperl, originally against beta5, now > > against rc1. It simply checks with GetDatabaseEncoding() if the current > > database is in UTF-8, and if so, sets the UTF-8 flag on the arguments > > that are passed to perl. This means that it isn't necessary to > > utf8::upgrade() every string, as perl has no way of knowing offhand > > that a string is UTF-8 -- but postgres does, because the database > > encoding is specified, so it makes sense to turn the flag on. You > > should also be able to properly manipulate UTF-8 strings now from > > plperl as opposed to plperlu, because otherwise you'd have to use > > encoding 'utf8' which was not allowed. It could also eliminate some > > unexpected bugs if you assume that perl knows the string is unicode. It > > is enabled only for perl 5.6 and higher, so earlier versions will not > > be affected. > > > > I have been assured by crab that the patch is quite harmless and will > > not break anything. It would be great to see it in 8 final! :-) > > > > Regards, > > Dave > > > > [ Attachment, skipping... ] > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Don't 'kill -9' the postmaster > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073