Home > mailing lists

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding. - Mailing list pgsql-hackers

From	Amit Khandekar
Subject	Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date	October 5, 2011 02:46:41
Msg-id	CACoZds04PaVcuV_GWdqBuBoaaUO52w-wiPROeRp9ZN+f6BvVtQ@mail.gmail.com Whole thread
In response to	Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding. (Alex Hunsaker <badalex@gmail.com>)
Responses	Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
List	pgsql-hackers

Tree view

On 4 October 2011 22:57, Alex Hunsaker <badalex@gmail.com> wrote:
> On Tue, Oct 4, 2011 at 03:09, Amit Khandekar
> <amit.khandekar@enterprisedb.com> wrote:
>> On 4 October 2011 14:04, Alex Hunsaker <badalex@gmail.com> wrote:
>>> On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
>>> <amit.khandekar@enterprisedb.com> wrote:
>>>
>>>> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
>>>> utf8_str, so pg_verify_mbstr_len() will not get called. [...]
>>>
>>> Consider a latin1 database where utf8_str was a string of ascii
>>> characters. [...]
>
>>> [Patch] Look ok to you?
>>>
>>
>> +       if(GetDatabaseEncoding() == PG_UTF8)
>> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>>
>> In your patch, the above will again skip mb-validation if the database
>> encoding is SQL_ASCII. Note that in pg_do_encoding_conversion returns
>> the un-converted string even if *one* of the src and dest encodings is
>> SQL_ASCII.
>
> *scratches head* I thought the point of SQL_ASCII was no encoding
> conversion was done and so there would be nothing to verify.
>
> Ahh I see looks like pg_verify_mbstr_len() will make sure there are no
> NULL bytes in the string when we are a single byte encoding.
>
>> I think :
>>        if (ret == utf8_str)
>> +       {
>> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>>                ret = pstrdup(ret);
>> +       }
>>
>> This (ret == utf8_str) condition would be a reliable way for knowing
>> whether pg_do_encoding_conversion() has done the conversion at all.
>
> Yes. However (and maybe im nitpicking here), I dont see any reason to
> verify certain strings twice if we can avoid it.
>
> What do you think about:
> +    /*
> +    * when we are a PG_UTF8 or SQL_ASCII database pg_do_encoding_conversion()
> +    * will not do any conversion or verification. we need to do it
> manually instead.
> +    */
> +       if( GetDatabaseEncoding() == PG_UTF8 ||
>              GetDatabaseEncoding() == SQL_ASCII)
> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>

You mean the final changes in plperl_helpers.h would look like
something like this right? :
static inline char *utf_u2e(const char *utf8_str, size_t len){       char       *ret = (char *)
pg_do_encoding_conversion((unsigned
char *) utf8_str, len, PG_UTF8, GetDatabaseEncoding());
       if (ret == utf8_str)
+       {
+               if (GetDatabaseEncoding() == PG_UTF8 ||
+                       GetDatabaseEncoding() == PG_SQL_ASCII)
+               {
+                       pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
+               }
+               ret = pstrdup(ret);
+       }       return ret;}


Yeah I am ok with that. It's just an additional check besides (ret ==
utf8_str) to know if we really require validation.

pgsql-hackers by date:

From: Tom Lane
Date: 05 October 2011, 02:37:49
Subject: Re: timezone buglet?

From: Greg Smith
Date: 05 October 2011, 03:27:22
Subject: Re: [PATCH] Unremovable tuple monitoring

Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding. - Mailing list pgsql-hackers

Previous

Next