Thread: encoding

encoding

From
"Joel Fradkin"
Date:

I just wanted to document a recent issue, it may be that I am not aware of the proper way to use encoding with the 8.0 versions of odbc.

With 7.4 I was getting char codes correctly from the odbc.

With version 8. (just downloaded) I had a issue on my windows 2000 servers displaying question marks instead of the French chars.

I was testing on win2003 with 7.4, so I switched the win2k machines and they display correctly (I amusing asp).

 

Joel Fradkin

 

Wazagua, Inc.
2520 Trailmate Dr
Sarasota, Florida 34243
Tel.  941-753-7111 ext 305

 

jfradkin@wazagua.com
www.wazagua.com
Powered by Wazagua
Providing you with the latest Web-based technology & advanced tools.
© 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc
 This email message is for the use of the intended recipient(s) and may contain confidential and privileged information.  Any unauthorized review, use, disclosure or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments.

 


 

 

Re: encoding

From
Marko Ristola
Date:

Hi

Database's charset must be something other than plain ASCII.
(Same thing needs to be in Windows.)

Client charset is defined by environment variables.
PostgreSQL Server charset is defined at least in database creation.

When charsets are defined correctly, the PostgreSQL does know
the charsets and can do client charset conversions.

The newest Windows ODBC driver requires correct locale settings.
Maybe the older PostgreSQL server + ODBC driver don't do any
conversions, thus they just works, in that case, when there
is no need for charset conversions.

What is you PostgreSQL server's database locale setting?
Please see documentation for "create database",
and INITDB commandline tools for charset selection.


I hope this helps. I'm interested in charset alterations in ODBC, but
I don't know the psqodbc charset alteration history, or the last version's
functionality, well enough, to give robust answers.

Marko Ristola

Joel Fradkin wrote:

>I just wanted to document a recent issue, it may be that I am not aware of
>the proper way to use encoding with the 8.0 versions of odbc.
>
>With 7.4 I was getting char codes correctly from the odbc.
>
>With version 8. (just downloaded) I had a issue on my windows 2000 servers
>displaying question marks instead of the French chars.
>
>I was testing on win2003 with 7.4, so I switched the win2k machines and they
>display correctly (I amusing asp).
>
>
>
>Joel Fradkin
>
>
>
>Wazagua, Inc.
>2520 Trailmate Dr
>Sarasota, Florida 34243
>Tel.  941-753-7111 ext 305
>
>
>
>jfradkin@wazagua.com
>www.wazagua.com
>Powered by Wazagua
>Providing you with the latest Web-based technology & advanced tools.
>C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc
> This email message is for the use of the intended recipient(s) and may
>contain confidential and privileged information.  Any unauthorized review,
>use, disclosure or distribution is prohibited.  If you are not the intended
>recipient, please contact the sender by reply email and delete and destroy
>all copies of the original message, including attachments.
>
>
>
>
>
>
>
>
>
>
>


Re: encoding

From
"Joel Fradkin"
Date:
The data base is SQL_ASCHII
I guess the locale is whatever it defaults to when you install from rpm on
redhat as4 not sure?
lc_messages = 'en_US.UTF-8'        # locale for system error message
strings
lc_monetary = 'en_US.UTF-8'        # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'        # locale for number formatting
lc_time = 'en_US.UTF-8'            # locale for time formatting

The client is a win2k box.

I can see the chars look ok when I view using pgadmin.
.net was displaying them ok.
The old odbc driver was displaying them ok.

Just the new ODBC driver is doing something to them to make them appear as
question marks.

In any event I switched to the old driver and the site is ok.
I am very busy with after conversion repairs, but maybe later I can take a
closer look at if there is a better way (I am brain dead at the moment 75
hours last week and looking like that this week).

Unfortunately I am still having severe issues with speed and may need to use
my 2 proc SQL server for some reporting.

Joel Fradkin

Wazagua, Inc.
2520 Trailmate Dr
Sarasota, Florida 34243
Tel.  941-753-7111 ext 305

jfradkin@wazagua.com
www.wazagua.com
Powered by Wazagua
Providing you with the latest Web-based technology & advanced tools.
C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc
 This email message is for the use of the intended recipient(s) and may
contain confidential and privileged information.  Any unauthorized review,
use, disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and delete and destroy
all copies of the original message, including attachments.




-----Original Message-----
From: Marko Ristola [mailto:marko.ristola@kolumbus.fi]
Sent: Wednesday, May 11, 2005 1:17 PM
To: Joel Fradkin
Cc: pgsql-odbc@postgresql.org
Subject: Re: [ODBC] encoding



Hi

Database's charset must be something other than plain ASCII.
(Same thing needs to be in Windows.)

Client charset is defined by environment variables.
PostgreSQL Server charset is defined at least in database creation.

When charsets are defined correctly, the PostgreSQL does know
the charsets and can do client charset conversions.

The newest Windows ODBC driver requires correct locale settings.
Maybe the older PostgreSQL server + ODBC driver don't do any
conversions, thus they just works, in that case, when there
is no need for charset conversions.

What is you PostgreSQL server's database locale setting?
Please see documentation for "create database",
and INITDB commandline tools for charset selection.


I hope this helps. I'm interested in charset alterations in ODBC, but
I don't know the psqodbc charset alteration history, or the last version's
functionality, well enough, to give robust answers.

Marko Ristola

Joel Fradkin wrote:

>I just wanted to document a recent issue, it may be that I am not aware of
>the proper way to use encoding with the 8.0 versions of odbc.
>
>With 7.4 I was getting char codes correctly from the odbc.
>
>With version 8. (just downloaded) I had a issue on my windows 2000 servers
>displaying question marks instead of the French chars.
>
>I was testing on win2003 with 7.4, so I switched the win2k machines and
they
>display correctly (I amusing asp).
>
>
>
>Joel Fradkin
>
>
>
>Wazagua, Inc.
>2520 Trailmate Dr
>Sarasota, Florida 34243
>Tel.  941-753-7111 ext 305
>
>
>
>jfradkin@wazagua.com
>www.wazagua.com
>Powered by Wazagua
>Providing you with the latest Web-based technology & advanced tools.
>C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc
> This email message is for the use of the intended recipient(s) and may
>contain confidential and privileged information.  Any unauthorized review,
>use, disclosure or distribution is prohibited.  If you are not the intended
>recipient, please contact the sender by reply email and delete and destroy
>all copies of the original message, including attachments.
>
>
>
>
>
>
>
>
>
>
>


Re: encoding

From
Marko Ristola
Date:

So the new ODBC driver does 7bit ASCII -> UNICODE conversion.
Windows Unicode conversion functions seem to set 8 bit non-ascii
characters into question marks. That is a correct behaviour for charset
conversion
functions.

I recommend strongly for new database installs to use something else
than SQL_ASCII,
because you use non-US characters also. Latin1(iso-8859-1) or similar is
fine
for workability. UTF8 is very good alternative, because everybody is
moving into it in the long term.
You get more portability with UTF-8, but it is a bit slower than Latin1.

The new driver has an improved unicode support. That is the reason,
why the 7bit ASCII->UNICODE conversion will be done in the new failing
driver.

About ten years ago UNICODE was not used so much. All programs worked
well with 7bit ASCII settings.

Nowadays you need to tell for applications, that what charset you are using.
Otherwise you might find a program, that does charset conversions, and the
characters will move into question marks, like they did.

So the first step is to tell for the database, that what charset you are
using :)


So, you have still performance issues to solve.

On my opinion, different databases might need a bit different optimization:
if you optimize for MSSQL, it might be slow with PostgreSQL, and perhaps
vice versa. This rule applies for many databases, although I don't have
experience
with MSSQL on this regard.

If both databases use a similar query plan, they
might be of similar speed (algorithmically similar).  I don't know, how
the number of CPUs affect on this with these databases: alghoritmically the
work to be done is the same on similar plans, but there are two workers.
Query speed increase in time might be up to twise as fast compared to
one CPU
(if the query in question can be parallelized nicely at software and
hardware levels).

It is sometimes a good idea to use more than one database server, if the
performance
is not good enough otherwise: for example using different databases for
different
tasks to balance the load. There was in these days (this week) an
interesting thread on the
PostgreSQL performance list about 100 computer WWW server system with
many databases
and caches to avoid unnecessary database usage.

Good luck for you.

Marko Ristola

Joel Fradkin wrote:

>The data base is SQL_ASCHII
>I guess the locale is whatever it defaults to when you install from rpm on
>redhat as4 not sure?
>lc_messages = 'en_US.UTF-8'        # locale for system error message
>strings
>lc_monetary = 'en_US.UTF-8'        # locale for monetary formatting
>lc_numeric = 'en_US.UTF-8'        # locale for number formatting
>lc_time = 'en_US.UTF-8'            # locale for time formatting
>
>The client is a win2k box.
>
>I can see the chars look ok when I view using pgadmin.
>.net was displaying them ok.
>The old odbc driver was displaying them ok.
>
>Just the new ODBC driver is doing something to them to make them appear as
>question marks.
>
>In any event I switched to the old driver and the site is ok.
>I am very busy with after conversion repairs, but maybe later I can take a
>closer look at if there is a better way (I am brain dead at the moment 75
>hours last week and looking like that this week).
>
>Unfortunately I am still having severe issues with speed and may need to use
>my 2 proc SQL server for some reporting.
>
>Joel Fradkin
>
>Wazagua, Inc.
>2520 Trailmate Dr
>Sarasota, Florida 34243
>Tel.  941-753-7111 ext 305
>
>jfradkin@wazagua.com
>www.wazagua.com
>Powered by Wazagua
>Providing you with the latest Web-based technology & advanced tools.
>C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc
> This email message is for the use of the intended recipient(s) and may
>contain confidential and privileged information.  Any unauthorized review,
>use, disclosure or distribution is prohibited.  If you are not the intended
>recipient, please contact the sender by reply email and delete and destroy
>all copies of the original message, including attachments.
>
>
>
>
>-----Original Message-----
>From: Marko Ristola [mailto:marko.ristola@kolumbus.fi]
>Sent: Wednesday, May 11, 2005 1:17 PM
>To: Joel Fradkin
>Cc: pgsql-odbc@postgresql.org
>Subject: Re: [ODBC] encoding
>
>
>
>Hi
>
>Database's charset must be something other than plain ASCII.
>(Same thing needs to be in Windows.)
>
>Client charset is defined by environment variables.
>PostgreSQL Server charset is defined at least in database creation.
>
>When charsets are defined correctly, the PostgreSQL does know
>the charsets and can do client charset conversions.
>
>The newest Windows ODBC driver requires correct locale settings.
>Maybe the older PostgreSQL server + ODBC driver don't do any
>conversions, thus they just works, in that case, when there
>is no need for charset conversions.
>
>What is you PostgreSQL server's database locale setting?
>Please see documentation for "create database",
>and INITDB commandline tools for charset selection.
>
>
>I hope this helps. I'm interested in charset alterations in ODBC, but
>I don't know the psqodbc charset alteration history, or the last version's
>functionality, well enough, to give robust answers.
>
>Marko Ristola
>
>Joel Fradkin wrote:
>
>
>
>>I just wanted to document a recent issue, it may be that I am not aware of
>>the proper way to use encoding with the 8.0 versions of odbc.
>>
>>With 7.4 I was getting char codes correctly from the odbc.
>>
>>With version 8. (just downloaded) I had a issue on my windows 2000 servers
>>displaying question marks instead of the French chars.
>>
>>I was testing on win2003 with 7.4, so I switched the win2k machines and
>>
>>
>they
>
>
>>display correctly (I amusing asp).
>>
>>
>>
>>Joel Fradkin
>>
>>
>>
>>Wazagua, Inc.
>>2520 Trailmate Dr
>>Sarasota, Florida 34243
>>Tel.  941-753-7111 ext 305
>>
>>
>>
>>jfradkin@wazagua.com
>>www.wazagua.com
>>Powered by Wazagua
>>Providing you with the latest Web-based technology & advanced tools.
>>C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc
>>This email message is for the use of the intended recipient(s) and may
>>contain confidential and privileged information.  Any unauthorized review,
>>use, disclosure or distribution is prohibited.  If you are not the intended
>>recipient, please contact the sender by reply email and delete and destroy
>>all copies of the original message, including attachments.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>


Re: encoding

From
"Joel Fradkin"
Date:
I originally tried a Unicode database, but when the .net application I wrote
to move the data from mssql to postgres blew up on the french characters.

I am live now on postgres, is there a simple way to move from SQL_ASCHII to
Unicode? Assuming the new 8.0 odbc driver will correctly present the data if
the database is Unicode.

It was the older odbc driver that gave me the error writing to the Unicode
database. I guess the lib connection is ok since I could cut and paste
French chars into the Unicode database, but when I used the program (7.4
odbc driver) it gave me an error trying to update the data base.

That is why I switched to the SQL_ASCHII at the time.
I do plan on implementing a second postgres server for reporting.
I am hoping I can figure out how to use slonie to replicate the first server
onto the second (can start with a restore, just need to keep the data synced
up).

I am a bit worried about the replication slowing things down even more.

Joel Fradkin

Wazagua, Inc.
2520 Trailmate Dr
Sarasota, Florida 34243
Tel.  941-753-7111 ext 305

jfradkin@wazagua.com
www.wazagua.com
Powered by Wazagua
Providing you with the latest Web-based technology & advanced tools.
C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc
 This email message is for the use of the intended recipient(s) and may
contain confidential and privileged information.  Any unauthorized review,
use, disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and delete and destroy
all copies of the original message, including attachments.




-----Original Message-----
From: Marko Ristola [mailto:marko.ristola@kolumbus.fi]
Sent: Thursday, May 12, 2005 3:28 PM
To: Joel Fradkin
Cc: pgsql-odbc@postgresql.org
Subject: Re: [ODBC] encoding



So the new ODBC driver does 7bit ASCII -> UNICODE conversion.
Windows Unicode conversion functions seem to set 8 bit non-ascii
characters into question marks. That is a correct behaviour for charset
conversion
functions.

I recommend strongly for new database installs to use something else
than SQL_ASCII,
because you use non-US characters also. Latin1(iso-8859-1) or similar is
fine
for workability. UTF8 is very good alternative, because everybody is
moving into it in the long term.
You get more portability with UTF-8, but it is a bit slower than Latin1.

The new driver has an improved unicode support. That is the reason,
why the 7bit ASCII->UNICODE conversion will be done in the new failing
driver.

About ten years ago UNICODE was not used so much. All programs worked
well with 7bit ASCII settings.

Nowadays you need to tell for applications, that what charset you are using.
Otherwise you might find a program, that does charset conversions, and the
characters will move into question marks, like they did.

So the first step is to tell for the database, that what charset you are
using :)


So, you have still performance issues to solve.

On my opinion, different databases might need a bit different optimization:
if you optimize for MSSQL, it might be slow with PostgreSQL, and perhaps
vice versa. This rule applies for many databases, although I don't have
experience
with MSSQL on this regard.

If both databases use a similar query plan, they
might be of similar speed (algorithmically similar).  I don't know, how
the number of CPUs affect on this with these databases: alghoritmically the
work to be done is the same on similar plans, but there are two workers.
Query speed increase in time might be up to twise as fast compared to
one CPU
(if the query in question can be parallelized nicely at software and
hardware levels).

It is sometimes a good idea to use more than one database server, if the
performance
is not good enough otherwise: for example using different databases for
different
tasks to balance the load. There was in these days (this week) an
interesting thread on the
PostgreSQL performance list about 100 computer WWW server system with
many databases
and caches to avoid unnecessary database usage.

Good luck for you.

Marko Ristola

Joel Fradkin wrote:

>The data base is SQL_ASCHII
>I guess the locale is whatever it defaults to when you install from rpm on
>redhat as4 not sure?
>lc_messages = 'en_US.UTF-8'        # locale for system error message
>strings
>lc_monetary = 'en_US.UTF-8'        # locale for monetary formatting
>lc_numeric = 'en_US.UTF-8'        # locale for number formatting
>lc_time = 'en_US.UTF-8'            # locale for time formatting
>
>The client is a win2k box.
>
>I can see the chars look ok when I view using pgadmin.
>.net was displaying them ok.
>The old odbc driver was displaying them ok.
>
>Just the new ODBC driver is doing something to them to make them appear as
>question marks.
>
>In any event I switched to the old driver and the site is ok.
>I am very busy with after conversion repairs, but maybe later I can take a
>closer look at if there is a better way (I am brain dead at the moment 75
>hours last week and looking like that this week).
>
>Unfortunately I am still having severe issues with speed and may need to
use
>my 2 proc SQL server for some reporting.
>
>Joel Fradkin
>
>Wazagua, Inc.
>2520 Trailmate Dr
>Sarasota, Florida 34243
>Tel.  941-753-7111 ext 305
>
>jfradkin@wazagua.com
>www.wazagua.com
>Powered by Wazagua
>Providing you with the latest Web-based technology & advanced tools.
>C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc
> This email message is for the use of the intended recipient(s) and may
>contain confidential and privileged information.  Any unauthorized review,
>use, disclosure or distribution is prohibited.  If you are not the intended
>recipient, please contact the sender by reply email and delete and destroy
>all copies of the original message, including attachments.
>
>
>
>
>-----Original Message-----
>From: Marko Ristola [mailto:marko.ristola@kolumbus.fi]
>Sent: Wednesday, May 11, 2005 1:17 PM
>To: Joel Fradkin
>Cc: pgsql-odbc@postgresql.org
>Subject: Re: [ODBC] encoding
>
>
>
>Hi
>
>Database's charset must be something other than plain ASCII.
>(Same thing needs to be in Windows.)
>
>Client charset is defined by environment variables.
>PostgreSQL Server charset is defined at least in database creation.
>
>When charsets are defined correctly, the PostgreSQL does know
>the charsets and can do client charset conversions.
>
>The newest Windows ODBC driver requires correct locale settings.
>Maybe the older PostgreSQL server + ODBC driver don't do any
>conversions, thus they just works, in that case, when there
>is no need for charset conversions.
>
>What is you PostgreSQL server's database locale setting?
>Please see documentation for "create database",
>and INITDB commandline tools for charset selection.
>
>
>I hope this helps. I'm interested in charset alterations in ODBC, but
>I don't know the psqodbc charset alteration history, or the last version's
>functionality, well enough, to give robust answers.
>
>Marko Ristola
>
>Joel Fradkin wrote:
>
>
>
>>I just wanted to document a recent issue, it may be that I am not aware of
>>the proper way to use encoding with the 8.0 versions of odbc.
>>
>>With 7.4 I was getting char codes correctly from the odbc.
>>
>>With version 8. (just downloaded) I had a issue on my windows 2000 servers
>>displaying question marks instead of the French chars.
>>
>>I was testing on win2003 with 7.4, so I switched the win2k machines and
>>
>>
>they
>
>
>>display correctly (I amusing asp).
>>
>>
>>
>>Joel Fradkin
>>
>>
>>
>>Wazagua, Inc.
>>2520 Trailmate Dr
>>Sarasota, Florida 34243
>>Tel.  941-753-7111 ext 305
>>
>>
>>
>>jfradkin@wazagua.com
>>www.wazagua.com
>>Powered by Wazagua
>>Providing you with the latest Web-based technology & advanced tools.
>>C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc
>>This email message is for the use of the intended recipient(s) and may
>>contain confidential and privileged information.  Any unauthorized review,
>>use, disclosure or distribution is prohibited.  If you are not the
intended
>>recipient, please contact the sender by reply email and delete and destroy
>>all copies of the original message, including attachments.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>


Re: encoding

From
Andreas Pflug
Date:
Joel Fradkin wrote:
> I originally tried a Unicode database, but when the .net application I wrote
> to move the data from mssql to postgres blew up on the french characters.

You probably hit a (non-)conversion problem in the driver, giving the
message "invalid byte sequence for unicode" or "unicode char > 100000
not allowed" when the server receives a nonconverted char when it
expects unicode. See my message from January (4th or so?) about this, if
that patch helps you too this really should go into psqlodbc.

Regards,
Andreas