Thread: UTF8 encoding and non-text data types

UTF8 encoding and non-text data types

From
"Medi Montaseri"
Date:
I understand PG supports UTF-8 encoding and I have sucessfully inserted Unicode text into columns. I was wondering
aboutother data types such as numbers, decimal, dates<br /><br />That is, say I have a table t1 with<br />create table
t1{ name text, cost decimal } <br />I can insert UTF8 text datatype into this table with no problem<br />But if my
applicationattempts to insert numbers encloded in UTF8, then I get wrong datatype error<br /><br />Is the solution for
theapplication layer (not database) to convert the non-text UTF8 numbers to ASCII and then insert it into database ?
<br/><br />Thanks<br />Medi<br /> 

Re: UTF8 encoding and non-text data types

From
Steve Midgley
Date:
At 02:22 PM 1/13/2008, pgsql-sql-owner@postgresql.org wrote:
>Date: Sat, 12 Jan 2008 14:21:00 -0800
>From: "Medi Montaseri" <montaseri@gmail.com>
>To: pgsql-sql@postgresql.org
>Subject: UTF8 encoding and non-text data types
>Message-ID: 
><8078a1730801121421l1c9b90c1lc65d36cf6e752a6f@mail.gmail.com>
>
>I understand PG supports UTF-8 encoding and I have sucessfully 
>inserted
>Unicode text into columns. I was wondering about other data types such 
>as
>numbers, decimal, dates
>
>That is, say I have a table t1 with
>create table t1 { name text, cost decimal }
>I can insert UTF8 text datatype into this table with no problem
>But if my application attempts to insert numbers encloded in UTF8, 
>then I
>get wrong datatype error
>
>Is the solution for the application layer (not database) to convert 
>the
>non-text UTF8 numbers to ASCII and then insert it into database ?
>
>Thanks
>Medi

Hi Medi,

I have only limited experience in this area, but it sounds like you 
sending your numbers as strings? In your example:

>create table t1 { name text, cost decimal };

insert into t1 (name, cost) values ('name1', '1');

I can't think of how else you're sending numeric values as UTF8? I know 
that Pg will accept numbers as strings and convert internally (that has 
worked for me in some object relational environments where I don't 
choose to cope with data types), but I think it would be better if you 
simply didn't send your numeric data in quotations, whether as UTF8 or 
ASCII. If you don't have control over this layer (that quotes your 
values), then I'd say converting to ASCII would solve the problem. But 
better to convert to numeric and not ship quoted strings at all.

I may be totally off-base and missing something fundamental and I'm 
very open to correction (by anyone), but that's what I can see here.

Best regards,

Steve



Re: UTF8 encoding and non-text data types

From
"Medi Montaseri"
Date:
Thanks Steve,

Actually I do not insert text data into my numeric field.
As I mentioned given
create table t1 { name text, cost decimal }
then I would like to insert numeric data into column "cost" because then I can later benefit from numerical operators like SUM, AVG, etc

More specifically, I am using HTML, Perl and PG. So from the HTML point of view a textfield is just some strings. So my user would enter 12345 but expressed in UTF8. Perl would get this and use DBI to insert it into PG

What I am experiencing now is that DB errors that I am trying to insert an incorrect data into column "cost" which is numeric and the data is coming in from HTML in UTF8

Mybe I have to convert it to ASCII numbers in Perl before inserting  them into PG

Thanks
Medi

On Jan 13, 2008 8:51 PM, Steve Midgley <public@misuse.org> wrote:
At 02:22 PM 1/13/2008, pgsql-sql-owner@postgresql.org wrote:
>Date: Sat, 12 Jan 2008 14:21:00 -0800
>From: "Medi Montaseri" < montaseri@gmail.com>
>To: pgsql-sql@postgresql.org
>Subject: UTF8 encoding and non-text data types
>Message-ID:
>< 8078a1730801121421l1c9b90c1lc65d36cf6e752a6f@mail.gmail.com>
>
>I understand PG supports UTF-8 encoding and I have sucessfully
>inserted
>Unicode text into columns. I was wondering about other data types such
>as
>numbers, decimal, dates
>
>That is, say I have a table t1 with
>create table t1 { name text, cost decimal }
>I can insert UTF8 text datatype into this table with no problem
>But if my application attempts to insert numbers encloded in UTF8,
>then I
>get wrong datatype error
>
>Is the solution for the application layer (not database) to convert
>the
>non-text UTF8 numbers to ASCII and then insert it into database ?
>
>Thanks
>Medi

Hi Medi,

I have only limited experience in this area, but it sounds like you
sending your numbers as strings? In your example:

>create table t1 { name text, cost decimal };

insert into t1 (name, cost) values ('name1', '1');

I can't think of how else you're sending numeric values as UTF8? I know
that Pg will accept numbers as strings and convert internally (that has
worked for me in some object relational environments where I don't
choose to cope with data types), but I think it would be better if you
simply didn't send your numeric data in quotations, whether as UTF8 or
ASCII. If you don't have control over this layer (that quotes your
values), then I'd say converting to ASCII would solve the problem. But
better to convert to numeric and not ship quoted strings at all.

I may be totally off-base and missing something fundamental and I'm
very open to correction (by anyone), but that's what I can see here.

Best regards,

Steve


Re: UTF8 encoding and non-text data types

From
Tom Lane
Date:
"Medi Montaseri" <montaseri@gmail.com> writes:
> More specifically, I am using HTML, Perl and PG. So from the HTML point of
> view a textfield is just some strings. So my user would enter 12345 but
> expressed in UTF8. Perl would get this and use DBI to insert it into PG

> What I am experiencing now is that DB errors that I am trying to insert an
> incorrect data into column "cost" which is numeric and the data is coming in
> from HTML in UTF8

> Mybe I have to convert it to ASCII numbers in Perl before inserting  them
> into PG

Uh, there is *no* difference between the ASCII and UTF8 representations
of decimal digits, nor of any other character that would be allowed in
input for a decimal field.  I can't tell what your problem really is,
but you have certainly misunderstood or misexplained it.
        regards, tom lane


Re: UTF8 encoding and non-text data types

From
dmp
Date:
Hi Steve,
Have you tried converting to a decimal type or cast for the cost field? 
If you
are gathering this data from a text field and  placing in a variable of 
type string
then using that variable in the insert statement it may be rejected 
because it is not
type decimal. This  has been my experience with trying to get input data 
from
user's textfields and placing in the db.

dana.

> Thanks Steve,
>
> Actually I do not insert text data into my numeric field.
> As I mentioned given
> create table t1 { name text, cost decimal }
> then I would like to insert numeric data into column "cost" because 
> then I can later benefit from numerical operators like SUM, AVG, etc
>
> More specifically, I am using HTML, Perl and PG. So from the HTML 
> point of view a textfield is just some strings. So my user would enter 
> 12345 but expressed in UTF8. Perl would get this and use DBI to insert 
> it into PG
>
> What I am experiencing now is that DB errors that I am trying to 
> insert an incorrect data into column "cost" which is numeric and the 
> data is coming in from HTML in UTF8
>
> Mybe I have to convert it to ASCII numbers in Perl before inserting  
> them into PG
>
> Thanks
> Medi
>
> >
> >I understand PG supports UTF-8 encoding and I have sucessfully
> >inserted
> >Unicode text into columns. I was wondering about other data types such
> >as
> >numbers, decimal, dates
> >
> >That is, say I have a table t1 with
> >create table t1 { name text, cost decimal }
> >I can insert UTF8 text datatype into this table with no problem
> >But if my application attempts to insert numbers encloded in UTF8,
> >then I
> >get wrong datatype error
> >
> >Is the solution for the application layer (not database) to convert
> >the
> >non-text UTF8 numbers to ASCII and then insert it into database ?
> >
> >Thanks
> >Medi
>
> Hi Medi,
>
> I have only limited experience in this area, but it sounds like you
> sending your numbers as strings? In your example:
>
> >create table t1 { name text, cost decimal };
>
> insert into t1 (name, cost) values ('name1', '1');
>
> I can't think of how else you're sending numeric values as UTF8? I know
> that Pg will accept numbers as strings and convert internally (that has
> worked for me in some object relational environments where I don't
> choose to cope with data types), but I think it would be better if you
> simply didn't send your numeric data in quotations, whether as UTF8 or
> ASCII. If you don't have control over this layer (that quotes your
> values), then I'd say converting to ASCII would solve the problem. But
> better to convert to numeric and not ship quoted strings at all.
>
> I may be totally off-base and missing something fundamental and I'm
> very open to correction (by anyone), but that's what I can see here.
>
> Best regards,
>
> Steve



Re: UTF8 encoding and non-text data types

From
dmp
Date:
Sorry this should have been addressed to Medi
dana.

> Hi Steve,
> Have you tried converting to a decimal type or cast for the cost 
> field? If you
> are gathering this data from a text field and  placing in a variable 
> of type string
> then using that variable in the insert statement it may be rejected 
> because it is not
> type decimal. This  has been my experience with trying to get input 
> data from
> user's textfields and placing in the db.
>
> dana.
>
>> Thanks Steve,
>>
>> Actually I do not insert text data into my numeric field.
>> As I mentioned given
>> create table t1 { name text, cost decimal }
>> then I would like to insert numeric data into column "cost" because 
>> then I can later benefit from numerical operators like SUM, AVG, etc
>>
>> More specifically, I am using HTML, Perl and PG. So from the HTML 
>> point of view a textfield is just some strings. So my user would 
>> enter 12345 but expressed in UTF8. Perl would get this and use DBI to 
>> insert it into PG
>>
>> What I am experiencing now is that DB errors that I am trying to 
>> insert an incorrect data into column "cost" which is numeric and the 
>> data is coming in from HTML in UTF8
>>
>> Mybe I have to convert it to ASCII numbers in Perl before inserting  
>> them into PG
>>
>> Thanks
>> Medi
>>
>> >
>> >I understand PG supports UTF-8 encoding and I have sucessfully
>> >inserted
>> >Unicode text into columns. I was wondering about other data types such
>> >as
>> >numbers, decimal, dates
>> >
>> >That is, say I have a table t1 with
>> >create table t1 { name text, cost decimal }
>> >I can insert UTF8 text datatype into this table with no problem
>> >But if my application attempts to insert numbers encloded in UTF8,
>> >then I
>> >get wrong datatype error
>> >
>> >Is the solution for the application layer (not database) to convert
>> >the
>> >non-text UTF8 numbers to ASCII and then insert it into database ?
>> >
>> >Thanks
>> >Medi
>>
>> Hi Medi,
>>
>> I have only limited experience in this area, but it sounds like you
>> sending your numbers as strings? In your example:
>>
>> >create table t1 { name text, cost decimal };
>>
>> insert into t1 (name, cost) values ('name1', '1');
>>
>> I can't think of how else you're sending numeric values as UTF8? I know
>> that Pg will accept numbers as strings and convert internally (that has
>> worked for me in some object relational environments where I don't
>> choose to cope with data types), but I think it would be better if you
>> simply didn't send your numeric data in quotations, whether as UTF8 or
>> ASCII. If you don't have control over this layer (that quotes your
>> values), then I'd say converting to ASCII would solve the problem. But
>> better to convert to numeric and not ship quoted strings at all.
>>
>> I may be totally off-base and missing something fundamental and I'm
>> very open to correction (by anyone), but that's what I can see here.
>>
>> Best regards,
>>
>> Steve
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org



Re: UTF8 encoding and non-text data types

From
Steve Midgley
Date:
<br /><blockquote cite="" class="cite" type="cite">On Jan 13, 2008 8:51 PM, Steve Midgley <<a
href="mailto:public@misuse.org">public@misuse.org</a>>wrote:<br /><dl><dd>At 02:22 PM 1/13/2008, <a
href="mailto:pgsql-sql-owner@postgresql.org">pgsql-sql-owner@postgresql.org</a> wrote:<br /><dd>>Date: Sat, 12 Jan
200814:21:00 -0800<br /><dd>>From: "Medi Montaseri" <<a href="mailto:montaseri@gmail.com">
montaseri@gmail.com</a>><br/><dd>>To: <a href="mailto:pgsql-sql@postgresql.org">pgsql-sql@postgresql.org</a><br
/><dd>>Subject:UTF8 encoding and non-text data types<br /><dd>>Message-ID:<br /><dd> ><<a
href="mailto:8078a1730801121421l1c9b90c1lc65d36cf6e752a6f@mail.gmail.com">
8078a1730801121421l1c9b90c1lc65d36cf6e752a6f@mail.gmail.com</a>><br/><dd>><br /><dd>>I understand PG supports
UTF-8encoding and I have sucessfully<br /><dd>>inserted<br /><dd>>Unicode text into columns. I was wondering
aboutother data types such <br /><dd>>as<br /><dd>>numbers, decimal, dates<br /><dd>><br /><dd>>That is,
sayI have a table t1 with<br /><dd>>create table t1 { name text, cost decimal }<br /><dd>>I can insert UTF8 text
datatypeinto this table with no problem<br /><dd>>But if my application attempts to insert numbers encloded in UTF8,
<br/><dd>>then I<br /><dd>>get wrong datatype error<br /><dd>><br /><dd>>Is the solution for the
applicationlayer (not database) to convert<br /><dd>>the<br /><dd>>non-text UTF8 numbers to ASCII and then insert
itinto database ?<br /><dd>> <br /><dd>>Thanks<br /><dd>>Medi<br /><br /><dd>Hi Medi,<br /><br /><dd>I have
onlylimited experience in this area, but it sounds like you<br /><dd>sending your numbers as strings? In your
example:<br/><br /><dd>>create table t1 { name text, cost decimal }; <br /><br /><dd>insert into t1 (name, cost)
values('name1', '1');<br /><br /><dd>I can't think of how else you're sending numeric values as UTF8? I know<br
/><dd>thatPg will accept numbers as strings and convert internally (that has <br /><dd>worked for me in some object
relationalenvironments where I don't<br /><dd>choose to cope with data types), but I think it would be better if you<br
/><dd>simplydidn't send your numeric data in quotations, whether as UTF8 or <br /><dd>ASCII. If you don't have control
overthis layer (that quotes your<br /><dd>values), then I'd say converting to ASCII would solve the problem. But<br
/><dd>betterto convert to numeric and not ship quoted strings at all.<br /><br /><dd>I may be totally off-base and
missingsomething fundamental and I'm<br /><dd>very open to correction (by anyone), but that's what I can see here.<br
/><br/><dd>Best regards,<br /><font color="#888888"><br /></font><dd>Steve<br /></dl>At 11:01 AM 1/14/2008, Medi
Montaseriwrote:<br /> Thanks Steve,<br /><br /> Actually I do not insert text data into my numeric field.<br /> As I
mentionedgiven <br /> create table t1 { name text, cost decimal }<br /> then I would like to insert numeric data into
column"cost" because then I can later benefit from numerical operators like SUM, AVG, etc <br /><br /> More
specifically,I am using HTML, Perl and PG. So from the HTML point of view a textfield is just some strings. So my user
wouldenter 12345 but expressed in UTF8. Perl would get this and use DBI to insert it into PG <br /><br /> What I am
experiencingnow is that DB errors that I am trying to insert an incorrect data into column "cost" which is numeric and
thedata is coming in from HTML in UTF8<br /><br /> Mybe I have to convert it to ASCII numbers in Perl before inserting 
theminto PG <br /><br /> Thanks<br /> Medi<br /></blockquote><br /> Hi Medi,<br /><br /> I agree that you should
convertyour values in Perl before handing to DBI. I'm not familiar with DBI but presumably if you're sending it UTF8
valuesit's attempting to quote them or do something with them, that a numeric field in Pg can't handle. Can you
trap/monitorthe exact sql statement that is generated by DBI and sent to Pg? That would help a lot in knowing what it
isdoing, but I suspect if you just convert your numbers from the HTML/UTF8 source values into actual Perl numeric
valuesand then ship to DBI you'll be better off. And you'll get some input validation for free.<br /><br /> I hope this
helps,<br/><br /> Steve<br /> 

Re: UTF8 encoding and non-text data types

From
"Medi Montaseri"
Date:
Here is my traces from perl CGI code, I'll include two samples one in ASCII and one UTF so we know what to expect

Here is actual SQL statement being executed in Perl and DBI. I do not quote the numerical value, just provided to DBI raw.

insert into t1 (c1, cost) values ('tewt', 1234)
this works find....
insert into t1 (c1, cost) values ('&#1588;&#1583;', &#1777;&#1778;&#1779;&#1780;)
 DBD::Pg::db do failed: ERROR:  syntax error at or near ";" at character 59,

And the PG log itself is very similar and says
ERROR:  syntax error at or near ";" at character 59

Char 59 by the way is the first accurance of semi-colon as in &#17777; which is being caught by PG parser.

Medi


On Jan 14, 2008 12:18 PM, Steve Midgley <public@misuse.org> wrote:

On Jan 13, 2008 8:51 PM, Steve Midgley <public@misuse.org> wrote:
At 02:22 PM 1/13/2008, pgsql-sql-owner@postgresql.org wrote:
>Date: Sat, 12 Jan 2008 14:21:00 -0800
>From: "Medi Montaseri" < montaseri@gmail.com>
>To: pgsql-sql@postgresql.org
>Subject: UTF8 encoding and non-text data types
>Message-ID:
>< 8078a1730801121421l1c9b90c1lc65d36cf6e752a6f@mail.gmail.com>
>
>I understand PG supports UTF-8 encoding and I have sucessfully
>inserted
>Unicode text into columns. I was wondering about other data types such
>as
>numbers, decimal, dates
>
>That is, say I have a table t1 with
>create table t1 { name text, cost decimal }
>I can insert UTF8 text datatype into this table with no problem
>But if my application attempts to insert numbers encloded in UTF8,
>then I
>get wrong datatype error
>
>Is the solution for the application layer (not database) to convert
>the
>non-text UTF8 numbers to ASCII and then insert it into database ?
>
>Thanks
>Medi

Hi Medi,

I have only limited experience in this area, but it sounds like you
sending your numbers as strings? In your example:

>create table t1 { name text, cost decimal };

insert into t1 (name, cost) values ('name1', '1');

I can't think of how else you're sending numeric values as UTF8? I know
that Pg will accept numbers as strings and convert internally (that has
worked for me in some object relational environments where I don't
choose to cope with data types), but I think it would be better if you
simply didn't send your numeric data in quotations, whether as UTF8 or
ASCII. If you don't have control over this layer (that quotes your
values), then I'd say converting to ASCII would solve the problem. But
better to convert to numeric and not ship quoted strings at all.

I may be totally off-base and missing something fundamental and I'm
very open to correction (by anyone), but that's what I can see here.

Best regards,

Steve
At 11:01 AM 1/14/2008, Medi Montaseri wrote:
Thanks Steve,

Actually I do not insert text data into my numeric field.
As I mentioned given
create table t1 { name text, cost decimal }
then I would like to insert numeric data into column "cost" because then I can later benefit from numerical operators like SUM, AVG, etc

More specifically, I am using HTML, Perl and PG. So from the HTML point of view a textfield is just some strings. So my user would enter 12345 but expressed in UTF8. Perl would get this and use DBI to insert it into PG

What I am experiencing now is that DB errors that I am trying to insert an incorrect data into column "cost" which is numeric and the data is coming in from HTML in UTF8

Mybe I have to convert it to ASCII numbers in Perl before inserting  them into PG

Thanks
Medi

Hi Medi,

I agree that you should convert your values in Perl before handing to DBI. I'm not familiar with DBI but presumably if you're sending it UTF8 values it's attempting to quote them or do something with them, that a numeric field in Pg can't handle. Can you trap/monitor the exact sql statement that is generated by DBI and sent to Pg? That would help a lot in knowing what it is doing, but I suspect if you just convert your numbers from the HTML/UTF8 source values into actual Perl numeric values and then ship to DBI you'll be better off. And you'll get some input validation for free.

I hope this helps,

Steve

Re: UTF8 encoding and non-text data types

From
Steve Midgley
Date:
At 12:43 PM 1/14/2008, Medi Montaseri wrote:<br /><blockquote cite="" class="cite" type="cite">Here is my traces from
perlCGI code, I'll include two samples one in ASCII and one UTF so we know what to expect<br /><br /> Here is actual
SQLstatement being executed in Perl and DBI. I do not quote the numerical value, just provided to DBI raw. <br /><br />
insertinto t1 (c1, cost) values ('tewt', 1234)<br /> this works find....<br /> insert into t1 (c1, cost) values
('&#1588;&#1583;',&#1777;&#1778;&#1779;&#1780;)<br />  DBD::Pg::db do failed: ERROR:  syntax
errorat or near ";" at character 59, <br /><br /> And the PG log itself is very similar and says<br /> ERROR:  syntax
errorat or near ";" at character 59<br /><br /> Char 59 by the way is the first accurance of semi-colon as in
&#17777;which is being caught by PG parser. <br /><br /> Medi<br /><br /><br /> On Jan 14, 2008 12:18 PM, Steve
Midgley<<a href="mailto:public@misuse.org">public@misuse.org</a>> wrote:<br /><dl><br /><blockquote cite=""
class="cite"type="cite"></blockquote><dd>On Jan 13, 2008 8:51 PM, Steve Midgley <<a
href="mailto:public@misuse.org">public@misuse.org</a>>wrote: <dl><dd>At 02:22 PM 1/13/2008, <a
href="mailto:pgsql-sql-owner@postgresql.org">pgsql-sql-owner@postgresql.org</a> wrote: <dd>>Date: Sat, 12 Jan 2008
14:21:00-0800 <dd>>From: "Medi Montaseri" <<a href="mailto:montaseri@gmail.com"> montaseri@gmail.com</a>>
<dd>>To:<a href="mailto:pgsql-sql@postgresql.org">pgsql-sql@postgresql.org</a><dd>>Subject: UTF8 encoding and
non-textdata types <dd>>Message-ID: <dd> ><<a
href="mailto:8078a1730801121421l1c9b90c1lc65d36cf6e752a6f@mail.gmail.com">
8078a1730801121421l1c9b90c1lc65d36cf6e752a6f@mail.gmail.com</a>><dd>> <dd>>I understand PG supports UTF-8
encodingand I have sucessfully <dd>>inserted <dd>>Unicode text into columns. I was wondering about other data
typessuch <dd>>as <dd>>numbers, decimal, dates <dd>> <dd>>That is, say I have a table t1 with
<dd>>createtable t1 { name text, cost decimal } <dd>>I can insert UTF8 text datatype into this table with no
problem<dd>>But if my application attempts to insert numbers encloded in UTF8, <dd>>then I <dd>>get wrong
datatypeerror <dd>> <dd>>Is the solution for the application layer (not database) to convert <dd>>the
<dd>>non-textUTF8 numbers to ASCII and then insert it into database ? <dd>> <dd>>Thanks <dd>>Medi<br
/><dd>HiMedi,<br /><dd>I have only limited experience in this area, but it sounds like you <dd>sending your numbers as
strings?In your example:<br /><dd>>create table t1 { name text, cost decimal }; <br /><dd>insert into t1 (name,
cost)values ('name1', '1');<br /><dd>I can't think of how else you're sending numeric values as UTF8? I know <dd>that
Pgwill accept numbers as strings and convert internally (that has <dd>worked for me in some object relational
environmentswhere I don't <dd>choose to cope with data types), but I think it would be better if you <dd>simply didn't
sendyour numeric data in quotations, whether as UTF8 or <dd>ASCII. If you don't have control over this layer (that
quotesyour <dd>values), then I'd say converting to ASCII would solve the problem. But <dd>better to convert to numeric
andnot ship quoted strings at all.<br /><dd>I may be totally off-base and missing something fundamental and I'm
<dd>veryopen to correction (by anyone), but that's what I can see here.<br /><dd>Best regards,<font color="#888888">
</font><dd>Steve</dl><dd>At 11:01 AM 1/14/2008, Medi Montaseri wrote:<br /><dd>Thanks Steve,<br /><br /><dd>Actually I
donot insert text data into my numeric field.<br /><dd>As I mentioned given <br /><dd>create table t1 { name text, cost
decimal}<br /><dd>then I would like to insert numeric data into column "cost" because then I can later benefit from
numericaloperators like SUM, AVG, etc <br /><br /><dd>More specifically, I am using HTML, Perl and PG. So from the HTML
pointof view a textfield is just some strings. So my user would enter 12345 but expressed in UTF8. Perl would get this
anduse DBI to insert it into PG <br /><br /><dd>What I am experiencing now is that DB errors that I am trying to insert
anincorrect data into column "cost" which is numeric and the data is coming in from HTML in UTF8<br /><br /><dd>Mybe I
haveto convert it to ASCII numbers in Perl before inserting  them into PG <br /><br /><dd>Thanks<br
/><dd>Medi</dl></blockquote><br/><dl><dd>Hi Medi,<br /><br /><dd>I agree that you should convert your values in Perl
beforehanding to DBI. I'm not familiar with DBI but presumably if you're sending it UTF8 values it's attempting to
quotethem or do something with them, that a numeric field in Pg can't handle. Can you trap/monitor the exact sql
statementthat is generated by DBI and sent to Pg? That would help a lot in knowing what it is doing, but I suspect if
youjust convert your numbers from the HTML/UTF8 source values into actual Perl numeric values and then ship to DBI
you'llbe better off. And you'll get some input validation for free.<br /><br /><dd>I hope this helps,<br /><br
/><dd>Steve<br/><br /></dl>Hi Medi,<br /><br /> That structure for numeric values is never going to work, as best as I
understandPostgres (and other sql pipes). You have to convert those UTF chars to straight numeric format. Hopefully
thatsolves your problem? I hope it's not too hard for you to get at the code which is sending the numbers as UTF?<br
/><br/> Steve<br /><br /><br /> 

Re: UTF8 encoding and non-text data types

From
Tom Lane
Date:
"Medi Montaseri" <montaseri@gmail.com> writes:
> insert into t1 (c1, cost) values ('tewt', 1234)
> this works find....
> insert into t1 (c1, cost) values ('شد',
> ۱۲۳۴)
>  DBD::Pg::db do failed: ERROR:  syntax error at or near ";" at character 59,

Well, you've got two problems there.  The first and biggest is that
&#NNN; is an HTML notation, not a SQL notation; no SQL database is going
to think that that string in its input is a representation of a single
Unicode character.  The other problem is that even if this did happen,
code points 1777 and nearby are not digits; they're something or other
in Arabic, apparently.  So I think you've got a problem in your Unicode
conversions as well as a notational problem.
        regards, tom lane


Re: UTF8 encoding and non-text data types

From
Joe
Date:
Tom Lane wrote:
> "Medi Montaseri" <montaseri@gmail.com> writes:
>   
>> insert into t1 (c1, cost) values ('tewt', 1234)
>> this works find....
>> insert into t1 (c1, cost) values ('شد',
>> ۱۲۳۴)
>>  DBD::Pg::db do failed: ERROR:  syntax error at or near ";" at character 59,
>>     
>
> Well, you've got two problems there.  The first and biggest is that
> &#NNN; is an HTML notation, not a SQL notation; no SQL database is going
> to think that that string in its input is a representation of a single
> Unicode character.  The other problem is that even if this did happen,
> code points 1777 and nearby are not digits; they're something or other
> in Arabic, apparently.
>   
Precisely. 1777 through 1780 decimal equate to code points U+06F1 
through U+06F4, which correspond to the Arabic numerals 1 through 4.

Joe


Re: UTF8 encoding and non-text data types

From
Tom Lane
Date:
Joe <dev@freedomcircle.net> writes:
> Tom Lane wrote:
>> Well, you've got two problems there.  The first and biggest is that
>> &#NNN; is an HTML notation, not a SQL notation; no SQL database is going
>> to think that that string in its input is a representation of a single
>> Unicode character.  The other problem is that even if this did happen,
>> code points 1777 and nearby are not digits; they're something or other
>> in Arabic, apparently.
>> 
> Precisely. 1777 through 1780 decimal equate to code points U+06F1 
> through U+06F4, which correspond to the Arabic numerals 1 through 4.

Oh?  Interesting.  But even if we wanted to teach Postgres about that,
wouldn't there be a pretty strong risk of getting confused by Arabic's
right-to-left writing direction?  Wouldn't be real helpful if the entry
came out as 4321 when the user wanted 1234.  Definitely seems like
something that had better be left to the application side, where there's
more context about what the string means.
        regards, tom lane


Re: UTF8 encoding and non-text data types

From
Joe
Date:
Tom Lane wrote:
> Oh?  Interesting.  But even if we wanted to teach Postgres about that,
> wouldn't there be a pretty strong risk of getting confused by Arabic's
> right-to-left writing direction?  Wouldn't be real helpful if the entry
> came out as 4321 when the user wanted 1234.  Definitely seems like
> something that had better be left to the application side, where there's
> more context about what the string means.
>   
The Arabic language is written right-to-left, except ... when it comes 
to numbers.

http://www2.ignatius.edu/faculty/turner/arabic/anumbers.htm

I agree that it's application specific.  The HTML/Perl script ought to 
convert to Western numerals.

Joe


Re: UTF8 encoding and non-text data types

From
Gregory Stark
Date:
"Joe" <dev@freedomcircle.net> writes:

> Tom Lane wrote:
>> Oh?  Interesting.  But even if we wanted to teach Postgres about that,
>> wouldn't there be a pretty strong risk of getting confused by Arabic's
>> right-to-left writing direction?  Wouldn't be real helpful if the entry
>> came out as 4321 when the user wanted 1234.  Definitely seems like
>> something that had better be left to the application side, where there's
>> more context about what the string means.
>>   
> The Arabic language is written right-to-left, except ... when it comes to
> numbers.

I don't think that matters anyways. Unicode strings are always in "logical"
order, not display order. Displaying the string in the right order is up to
the display engine in the Unicode world-view.

I'm not sure what to think about this though. It may be that Arabic notation
are close enough that it would be straightforward (IIRC decimal notation was
invented in the Arabic world after all). But other writing systems have some
pretty baroque notations which would be far more difficult to convert.

If anything I would expect this kind of conversion to live in the same place
as things like roman numerals or other more flexible formatting.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!


Re: UTF8 encoding and non-text data types

From
John Hasler
Date:
Joe writes:
> The Arabic language is written right-to-left, except ... when it comes to
> numbers.

Perhaps they read their numbers right to left but use a little-endian
notation.
-- 
John Hasler 
john@dhh.gt.org
Elmwood, WI USA