Thread: Re: psqlODBC-Driver Test / text fields

Re: psqlODBC-Driver Test / text fields

From
"Dave Page"
Date:

> -----Original Message-----
> From: Johann Zuschlag [mailto:zuschlag2@online.de]
> Sent: 27 March 2006 13:46
> To: Dave Page
> Cc: pgsql-odbc@postgresql.org
> Subject: psqlODBC-Driver Test / text fields
>
> Hi Dave,
>
> here is another one, same problem like Miguel:

Miguel's problem is somewhat unique to BDE from what I recall -
specifically it ignores Unicode text columns which the Unicode driver
will offer by default. The same problem has been reported with Oracle
and other DBMS's on usenet.

> PostgreSQL 8.0X on Debian Sarge
> Unicode Database
> Application with psqlodbc+ 7.02.02.60 on Win XP Sp2
>
> Operation: Reading and writing of a, A, u, U, o, O-Umlaut and
> sz (german
> characters)
> Data type: text
> Remarks: none
> Protocol: 7.4+
> Result: failed, insert works, querying doesn't work

OK, that sounds like the old bug we were seeing. What is the difference
between those and the previous results you reported a few minutes ago?
Is it just the text/varchar difference?

Regards, Dave.

Re: psqlODBC-Driver Test / text fields

From
Johann Zuschlag
Date:
Dave Page schrieb:

>
> OK, that sounds like the old bug we were seeing. What is the difference
> between those and the previous results you reported a few minutes ago?
> Is it just the text/varchar difference?
>
>
>
Yes, I forgot to test the text field. I shall set debug higher to see
what is going on.

Regards,
Johann


Re: psqlODBC-Driver Test / text fields

From
Johann Zuschlag
Date:
Dave Page schrieb:

>OK, that sounds like the old bug we were seeing. What is the difference
>between those and the previous results you reported a few minutes ago?
>Is it just the text/varchar difference?
>
>Regards, Dave.
>
>
>

Hi Dave,

the problem doesn't seem to be related to the text field. Please note
the examples below:

1. Searching for a string starting with a 't' seems to work fine:

2006-03-27 16:49:48 [2931] LOG:  statement: declare "SQL_CUR0210FD50"
cursor with hold for SELECT t6.* FROM KUNDE  t6 WHERE t6.name >= 't' AND
t6.name <= 'tz' ORDER BY t6.name ASC, t6.kundenid ASC

2. Searching for a string starting with o-Umlaut (german character)
doesn't return any results (in my app.).

2006-03-27 16:50:37 [2931] LOG:  statement: declare "SQL_CUR0210FD50"
cursor with hold for SELECT t6.* FROM KUNDE  t6 WHERE t6.name >= 'ö'
AND t6.name <= 'öz' ORDER BY t6.name ASC, t6.kundenid ASC

Maybe the WHERE-statement is not parsed by the driver. But the hex
representation of 'ö' is 'C3B6', that is the correct UTF8 code (not
unicode) of o-Umlaut.

3. Furthermore I noticed that I wouldn't get anything back if the
searched string contains any "Umlauts". So I guess there could be a
problem with the result set too.

SELECT without a WHERE-statement works.

Any ideas?

A mylog ist available, as well as the tiny data set (6 lines)..

Hope that helps.

Regards,
Johann


Re: psqlODBC-Driver Test / text fields

From
Hiroshi Inoue
Date:
Johann Zuschlag wrote:

> Dave Page schrieb:
>
>> OK, that sounds like the old bug we were seeing. What is the difference
>> between those and the previous results you reported a few minutes ago?
>> Is it just the text/varchar difference?
>>
>> Regards, Dave.
>>
>>
>>
>
> Hi Dave,
>
> the problem doesn't seem to be related to the text field. Please note
> the examples below:
>
> 2. Searching for a string starting with o-Umlaut (german character)
> doesn't return any results (in my app.).
>
> 2006-03-27 16:50:37 [2931] LOG:  statement: declare "SQL_CUR0210FD50"
> cursor with hold for SELECT t6.* FROM KUNDE  t6 WHERE t6.name >= 'ö'
> AND t6.name <= 'öz' ORDER BY t6.name ASC, t6.kundenid ASC
>
> Maybe the WHERE-statement is not parsed by the driver. But the hex
> representation of 'ö' is 'C3B6', that is the correct UTF8 code (not
> unicode) of o-Umlaut.

 Hi Johann,
Could you try the same query using psql with the client_encoding 'UTF8' ?

regards,
Hiroshi Inoue

Re: psqlODBC-Driver Test / text fields

From
Johann Zuschlag
Date:
Hiroshi Inoue schrieb:

>
>>
>> 2. Searching for a string starting with o-Umlaut (german character)
>> doesn't return any results (in my app.).
>>
>> 2006-03-27 16:50:37 [2931] LOG:  statement: declare "SQL_CUR0210FD50"
>> cursor with hold for SELECT t6.* FROM KUNDE  t6 WHERE t6.name >= 'ö'
>> AND t6.name <= 'öz' ORDER BY t6.name ASC, t6.kundenid ASC
>>
>> Maybe the WHERE-statement is not parsed by the driver. But the hex
>> representation of 'ö' is 'C3B6', that is the correct UTF8 code (not
>> unicode) of o-Umlaut.
>
>
> Hi Johann,
> Could you try the same query using psql with the client_encoding 'UTF8' ?
>
> regards,
> Hiroshi Inoue
>

Hi Hiroshi,

Do you mean psql on the Linux-server?

my locales:
de_DE.ISO-8859-1 (default)
de_DE.UTF-8
de_DE.UTF-8@euro
de_DE.ISO-8859-15@euro

I call psql:
- set client_encoding='UTF8';
- select name from kunde;
result e.g: 'öä-test' (=o-Umlaut, a-Umlaut, -test)

- select name from kunde where name >= 'ö' and name <= 'öz' order by
name asc;

result: ERROR:  Unicode characters greater than or equal to 0x10000 are
not supported

export LANG=de_DE.UTF-8 doesn't change the behavior.

(select name from kunde where name >= 'ö' and name <= 'öz' order by
name asc; works of course)

regards,
Johann


Re: psqlODBC-Driver Test / text fields

From
Johann Zuschlag
Date:
Johann Zuschlag schrieb:

>
>
> (select name from kunde where name >= 'ö' and name <= 'öz' order by
> name asc; works of course)
>
>
Maybe that is more precise:

select name from kunde where name >= 'ö' and name <= 'öz' order by
name asc;

Does not give an error, but 0 result lines.

regards,
Johann


Re: psqlODBC-Driver Test / text fields

From
Hiroshi Inoue
Date:
Johann Zuschlag wrote:

> Johann Zuschlag schrieb:
>
>>
>>
>> (select name from kunde where name >= 'ö' and name <= 'öz' order by
>> name asc; works of course)
>>
>>
> Maybe that is more precise:
>
> select name from kunde where name >= 'ö' and name <= 'öz' order by
> name asc;
>
> Does not give an error, but 0 result lines.


Thanks.
Could you issue the following 2 queries

  select name from kunde where name >= 'ö' order by name asc;

  select name from kunde where name <= 'öz' order by name asc;

and see the results ?

regards,
Hiroshi Inoue

Re: psqlODBC-Driver Test / text fields

From
Johann Zuschlag
Date:
Hiroshi Inoue schrieb:
>
> Thanks.
> Could you issue the following 2 queries
>
>  select name from kunde where name >= 'ö' order by name asc;
>
   name
--------
öä-test
 öäüÃÃ
          ÃÃ-test
(2 Zeilen)

>  select name from kunde where name <= 'öz' order by name asc;
>
  name
--------
Hühne
 Müller
 Täst
 test
 test-2
(5 Zeilen)


The complete "data" set:

select name from kunde order by name asc;

     name
---------------
 Hühne
 Müller
 Täst
 test
 test-2
 öä-test
 öäüÃÃ
          ÃÃ-test
(7 Zeilen)

Seems, that the query is not correct my application is sending.

Regards,
Johann


Re: psqlODBC-Driver Test / text fields

From
Hiroshi Inoue
Date:
Johann Zuschlag wrote:

> Hiroshi Inoue schrieb:
>
>>
>> Thanks.
>> Could you issue the following 2 queries
>>
>>  select name from kunde where name >= 'ö' order by name asc;
>>
>   name
> --------
> öä-test
> öäüÃÃ
>          ÃÃ-test
> (2 Zeilen)
>
>>  select name from kunde where name <= 'öz' order by name asc;
>>
>  name
> --------
> Hühne
> Müller
> Täst
> test
> test-2
> (5 Zeilen)


Hmm utf8 code of a-umlaut seems bigger than 'z' .

Well how is the result of the following(original ?) query
    under default encoding ?

  select name from kunde where name >= 'ö' and name <= 'öz' order by
name asc;

regards,
Hiroshi inoue

Re: psqlODBC-Driver Test / text fields

From
"Campbell, Greg"
Date:
When I look at a character map, for instance with Windows Start, Run, All Programs, Accessories, System
Tools, Character Map and a extended character set like Arial.
I see that all the accented characters (umlauts, graves, etc.) are higher that unaccented letters. I also
can see the UTF8 equivalents.



Hiroshi Inoue wrote:
> Johann Zuschlag wrote:
>
>> Hiroshi Inoue schrieb:
>>
>>>
>>> Thanks.
>>> Could you issue the following 2 queries
>>>
>>>  select name from kunde where name >= 'ö' order by name asc;
>>>
>>   name
>> --------
>> öä-test
>> öäüÃÃ
>>          ÃÃ-test
>> (2 Zeilen)
>>
>>>  select name from kunde where name <= 'öz' order by name asc;
>>>
>>  name
>> --------
>> Hühne
>> Müller
>> Täst
>> test
>> test-2
>> (5 Zeilen)
>
>
>
> Hmm utf8 code of a-umlaut seems bigger than 'z' .
>
> Well how is the result of the following(original ?) query
>    under default encoding ?
>
>  select name from kunde where name >= 'ö' and name <= 'öz' order by name
> asc;
>
> regards,
> Hiroshi inoue
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Attachment

Re: psqlODBC-Driver Test / text fields

From
Johann Zuschlag
Date:
Hiroshi Inoue schrieb:
>
> Hmm utf8 code of a-umlaut seems bigger than 'z' .
>
> Well how is the result of the following(original ?) query
>    under default encoding ?
>
>  select name from kunde where name >= 'ö' and name <= 'öz' order by
> name asc;
>
ERROR:  Unicode characters greater than or equal to 0x10000 are not
supported

:-)

Regards,
Johann


Re: psqlODBC-Driver Test / text fields

From
Hiroshi Inoue
Date:
Johann Zuschlag wrote:

> Hiroshi Inoue schrieb:
>
>>
>> Hmm utf8 code of a-umlaut seems bigger than 'z' .
>>
>> Well how is the result of the following(original ?) query
>>    under default encoding ?
>>
>>  select name from kunde where name >= 'ö' and name <= 'öz' order by
>> name asc;
>>
> ERROR:  Unicode characters greater than or equal to 0x10000 are not
> supported


With the default encoding(ISO-8859-1 ?) ?

regards,
Hiroshi Inoue

Re: psqlODBC-Driver Test / text fields

From
Hiroshi Inoue
Date:
Campbell, Greg wrote:

> When I look at a character map, for instance with Windows Start, Run,
> All Programs, Accessories, System Tools, Character Map and a extended
> character set like Arial.
> I see that all the accented characters (umlauts, graves, etc.) are
> higher that unaccented letters. I also can see the UTF8 equivalents.
>

Do you mean  the behabior reported by Johann is reasonable ?
I'm not familiar with LATIN encoding.

regards,
Hiroshi Inoue

Re: psqlODBC-Driver Test / text fields

From
"Campbell, Greg"
Date:
I think, yes.
When I look at characters and numeric representations of them, SORTS and comparison operators seem to be
behaving correctly.

Hiroshi Inoue wrote:

> Campbell, Greg wrote:
>
>> When I look at a character map, for instance with Windows Start, Run,
>> All Programs, Accessories, System Tools, Character Map and a extended
>> character set like Arial.
>> I see that all the accented characters (umlauts, graves, etc.) are
>> higher that unaccented letters. I also can see the UTF8 equivalents.
>>
>
> Do you mean  the behabior reported by Johann is reasonable ?
> I'm not familiar with LATIN encoding.
>
> regards,
> Hiroshi Inoue
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

Attachment

Re: psqlODBC-Driver Test / text fields

From
Marc Herbert
Date:
"Campbell, Greg" <greg.campbell@us.michelin.com> writes:

> When I look at a character map, for instance with Windows Start, Run,
> All Programs, Accessories, System Tools, Character Map and a extended
> character set like Arial.
> I see that all the accented characters (umlauts, graves, etc.) are
> higher that unaccented letters. I also can see the UTF8 equivalents.
>

Not 100% sure I understood this issue, but it seems you are confusing
encoding and collation.

See strcmp() and strcoll().



Re: psqlODBC-Driver Test / text fields

From
Johann Zuschlag
Date:
Hiroshi Inoue schrieb:
> Johann Zuschlag wrote:
>
>> Hiroshi Inoue schrieb:
>>
>>>
>>> Hmm utf8 code of a-umlaut seems bigger than 'z' .
>>>
>>> Well how is the result of the following(original ?) query
>>>    under default encoding ?
>>>
>>>  select name from kunde where name >= 'ö' and name <= 'öz' order by
>>> name asc;
>>>
>> ERROR:  Unicode characters greater than or equal to 0x10000 are not
>> supported
>
>
> With the default encoding(ISO-8859-1 ?) ?
>
Oh, I misunderstood you. Do you mean the database default encoding?
IIRC ISO-8859-1 would be equivalent to LATIN1?
I generated a new database with the same data in LATIN1:

test-latin1=# select name from kunde order by name asc;
 name
--------------
 Hühne
 Müller
 Täst
 test
 test-2
 öä-test
 öäüÖÄÜß-test
(7 Zeilen)

test-latin1=# select name from kunde where name >= 'ö' and name <= 'öz'
order by name asc;
 name
------
(0 Zeilen)

And please note the following:

test-latin1=# select name from kunde where name >= 'ö' and name <= 'ö'
order by name asc;
 name
------
(0 Zeilen)

Hmm...

Regards,
Johann


Re: psqlODBC-Driver Test / text fields

From
Ludek Finstrle
Date:
> test-latin1=# select name from kunde order by name asc;
> name
> --------------
> Hühne
> Müller
> Täst
> test
> test-2
> öä-test
> öäüÖÄÜß-test
> (7 Zeilen)
>
> test-latin1=# select name from kunde where name >= 'ö' and name <= 'öz'
> order by name asc;
> name
> ------
> (0 Zeilen)
>
> And please note the following:
>
> test-latin1=# select name from kunde where name >= 'ö' and name <= 'ö'
> order by name asc;
> name
> ------
> (0 Zeilen)
>
> Hmm...

The last query is equivalent with:
select name from kunde where name = 'ö'

So I'm not surprised with the result ;-)

What is the locale for backend? Isn't this the real problem?

When you have the problem in psql client feel free to ask
in another pgsql-* mailing list. It isn't odbc related bug.
Maybe we solve your problem but maybe you get the right advice
faster in another list.

Regards,

Luf

Re: psqlODBC-Driver Test / text fields

From
"Dave Page"
Date:

> -----Original Message-----
> From: Ludek Finstrle [mailto:luf@pzkagis.cz]
> Sent: 29 March 2006 15:35
> To: Johann Zuschlag
> Cc: Hiroshi Inoue; Dave Page; pgsql-odbc@postgresql.org
> Subject: Re: [ODBC] psqlODBC-Driver Test / text fields
>
> When you have the problem in psql client feel free to ask
> in another pgsql-* mailing list. It isn't odbc related bug.
> Maybe we solve your problem but maybe you get the right advice
> faster in another list.

Hi Luf,

Johann is trying out some queries in psql to help Hiroshi Inoue track
down a psqlODBC problem (the bug that caused us to release both ANSI and
Unicode builds of the driver).

Regards, Dave

Re: psqlODBC-Driver Test / text fields

From
Ludek Finstrle
Date:
> > When you have the problem in psql client feel free to ask
> > in another pgsql-* mailing list. It isn't odbc related bug.
> > Maybe we solve your problem but maybe you get the right advice
> > faster in another list.
>
> Johann is trying out some queries in psql to help Hiroshi Inoue track
> down a psqlODBC problem (the bug that caused us to release both ANSI and
> Unicode builds of the driver).

Hello,

  maybe I remember it wrong but I remember that there is same problem
in psql client.

Regards,

Luf

Re: psqlODBC-Driver Test / text fields

From
"Dave Page"
Date:

> -----Original Message-----
> From: Ludek Finstrle [mailto:luf@pzkagis.cz]
> Sent: 29 March 2006 15:40
> To: Dave Page
> Cc: Ludek Finstrle; Johann Zuschlag; Hiroshi Inoue;
> pgsql-odbc@postgresql.org
> Subject: Re: [ODBC] psqlODBC-Driver Test / text fields
>
> > > When you have the problem in psql client feel free to ask
> > > in another pgsql-* mailing list. It isn't odbc related bug.
> > > Maybe we solve your problem but maybe you get the right advice
> > > faster in another list.
> >
> > Johann is trying out some queries in psql to help Hiroshi
> Inoue track
> > down a psqlODBC problem (the bug that caused us to release
> both ANSI and
> > Unicode builds of the driver).
>
> Hello,
>
>   maybe I remember it wrong but I remember that there is same problem
> in psql client.

Haven't heard of it. I would think that's the sort of thing that would
get fixed pretty quickly to be honest.

Regards, Dave

Re: psqlODBC-Driver Test / text fields

From
Johann Zuschlag
Date:
Ludek Finstrle schrieb:
>> The last query is equivalent with:
>> select name from kunde where name = 'ö'
>>
>> So I'm not surprised with the result ;-)
>>
>> What is the locale for backend? Isn't this the real problem?
>>
Hi Luf,

sure, that is the same, but the result is 0 lines for both. :-)

test-latin1=# select name from kunde where name = 'ö';
 name
------
(0 Zeilen)

the psql driver is definitely better than ever. In my (humble) opinion
we don't need two drivers anymore. Thanks a lot for your work. So far
all my tests with UNICODE, LATIN1 and SQL-ASCII databases seem to work
with the new 7.02.260 driver on Win XP, (i.e. insert, update, select)

My set-up is:

PostgreSQL 8.0X on Debian Sarge
Unicode or LATIN1 or SQL-ASCII Database

locale=de_DE

de_DE.ISO-8859-1 (default)
de_DE.UTF-8
de_DE.UTF-8@euro
de_DE.ISO-8859-15@euro

changing to de_DE didn't change anything, client_encoding didn't help,
since the problem doesn't seem to be related to the driver.

Hiroshi tracked down the problem by doing the same query with the
psql.-client. So you may be right that this an old problem.

regards,
Johann


Re: psqlODBC-Driver Test / text fields

From
Ludek Finstrle
Date:
> > > > When you have the problem in psql client feel free to ask
> > > > in another pgsql-* mailing list. It isn't odbc related bug.
> > > > Maybe we solve your problem but maybe you get the right advice
> > > > faster in another list.
> > >
> > > Johann is trying out some queries in psql to help Hiroshi
> > Inoue track
> > > down a psqlODBC problem (the bug that caused us to release
> > both ANSI and
> > > Unicode builds of the driver).
> >
> > Hello,
> >
> >   maybe I remember it wrong but I remember that there is same problem
> > in psql client.
>
> Haven't heard of it. I would think that's the sort of thing that would
> get fixed pretty quickly to be honest.

Let's try read and it's ancestor:
http://archives.postgresql.org/pgsql-odbc/2006-03/msg00188.php

Regards,

Luf

Re: psqlODBC-Driver Test / text fields

From
"Dave Page"
Date:

> -----Original Message-----
> From: Ludek Finstrle [mailto:luf@pzkagis.cz]
> Sent: 29 March 2006 16:23
> To: Dave Page
> Cc: Ludek Finstrle; Johann Zuschlag; Hiroshi Inoue;
> pgsql-odbc@postgresql.org
> Subject: Re: [ODBC] psqlODBC-Driver Test / text fields
>
> Let's try read and it's ancestor:
> http://archives.postgresql.org/pgsql-odbc/2006-03/msg00188.php

I'm not sure I understand that that test is actually valid anyway. Consider the test query:

select name from kunde where name >= 'ö';

If 'ö' is 'ö', then isn't the query above mixing single and a multibyte encoding? Ie. It should all be single byte -
e.g.

select name from kunde where name >= 'ö' order by name asc;

Or all multibyte (displayed byte by byte) whatever that results in:

s*e*l*e*c*t* *n*a*m*e* *f*r*o*m* *k*u*n*d*e* *w*h*e*r*e* *n*a*m*e* *>*=* *'*ö'*;*

Of course, we all know how well I grok encoding issues :-)

Regards, Dave.

Re: psqlODBC-Driver Test / text fields

From
Ludek Finstrle
Date:
> >>The last query is equivalent with:
> >>select name from kunde where name = 'ö'
> >>
> >>So I'm not surprised with the result ;-)
> >>
> >>What is the locale for backend? Isn't this the real problem?
>
> sure, that is the same, but the result is 0 lines for both. :-)

It's ok result. I see no 'ö' in data you have posted. I see only
'ösomething'.

> we don't need two drivers anymore. Thanks a lot for your work. So far

It's not my work. I only fixed 08.01 branch since 08.01.0102.
The enahnced branch is Hiroshi (Inoue and Saito) work.

> PostgreSQL 8.0X on Debian Sarge
> Unicode or LATIN1 or SQL-ASCII Database

You tried this all?

> locale=de_DE

I think it could be the problem. You have unicode data but you try
sort it in de_DE.ISO-8859-1.
Could you try change the locale for _backend_ process to de_DE.UTF-8 or
de_DE.UTF-8@euro? (Maybe also in postmaster.conf configuration file
of postgresql). I'm not sure if it isn't even initdb time related.

> de_DE.ISO-8859-1 (default)
> de_DE.UTF-8
> de_DE.UTF-8@euro
> de_DE.ISO-8859-15@euro
>
> changing to de_DE didn't change anything, client_encoding didn't help,
> since the problem doesn't seem to be related to the driver.

I don't speak about psql (client or driver) locale. I speak about
backend locale (and settings in postmaster.conf).
The sort operation is performed in backend process.

I suggest to try ask for this (with psql client problem) in pgsql-general.

Regards,

Luf

Re: psqlODBC-Driver Test / text fields

From
Ludek Finstrle
Date:
> I'm not sure I understand that that test is actually valid anyway.

...

> Of course, we all know how well I grok encoding issues :-)

I know the encoding issue (UTF8, UNICODE, ...) at same level as you.
My good knowledge ending at single byte encoding.

I see no progress for some time so I hope someone in pgsql-general
(maybe peter_e) could help when the (or similar) problem is in psql
client too.

Regards,

Luf

psqlODBC-Driver Test, ok and closed

From
Johann Zuschlag
Date:
Hi Luv,<br /> Hi Hiroshi,<br /> Hi Dave,<br /><br /> I'm sorry, but I made a mistake during my last tests.<br
/><blockquotecite="mid20060329153519.GL18148@soptik.pzkagis.cz" type="cite"><pre wrap="">It's not my work. I only fixed
08.01branch since 08.01.0102.
 
The enahnced branch is Hiroshi (Inoue and Saito) work.
 </pre></blockquote> Yeah, I know. But I hope your work for 8.01.0102 can be included in the future driver<br
/><blockquotecite="mid20060329153519.GL18148@soptik.pzkagis.cz" type="cite"><pre wrap=""></pre><blockquote
type="cite"><prewrap="">PostgreSQL 8.0X on Debian Sarge
 
Unicode or LATIN1 or SQL-ASCII Database   </pre></blockquote><pre wrap="">
You tried this all?
 </pre></blockquote> Yes, and while doing so many initdb's I made a mistake with my locales for the last tests.<br />
Thanksfor pointing me in the right direction.<br /><br /> So my first test results where indeed correct. I'm sorry for
keepingyou all so busy. But I wanted to supply you with complete test results. So finally: concerning the ANSI/UTF8
problemthe driver seems to work (at least for me :-) ). WinME test will follow later.<br /><br /> Only some minor
issuesare left:<br /><br /> - sometimes my application gives an (memory) error when leaving. (Will try to trace
that)<br/> - continuously I get 'unknown configuration parameter "max_identifier_length"<br /> - Typ "lo" is not
existing(I don't think my app is doing that)<br /><br /> I never got the last two with the former drivers. But I'll
checkthe mylog.<br /><br /> For further tests I switched back my Debian test system to PostgreSQL 7.4.X since my
productionserver is still running that version.<br /><br /> Regards,<br /> Johann<br /><br /> 

Re: psqlODBC-Driver Test, ok and closed

From
Hiroshi Inoue
Date:
Johann Zuschlag wrote:

>Hi Luv,
>Hi Hiroshi,
>Hi Dave,
>
>I'm sorry, but I made a mistake during my last tests.
>
>>It's not my work. I only fixed 08.01 branch since 08.01.0102.
>>The enahnced branch is Hiroshi (Inoue and Saito) work.
>>
>>
>>
>Yeah, I know. But I hope your work for 8.01.0102 can be included in the future
>driver
>
>>>PostgreSQL 8.0X on Debian Sarge
>>>Unicode or LATIN1 or SQL-ASCII Database
>>>
>>>
>>
>>You tried this all?
>>
>>
>>
>Yes, and while doing so many initdb's I made a mistake with my locales for the
>last tests.
>Thanks for pointing me in the right direction.
>
>So my first test results where indeed correct. I'm sorry for keeping you all so
>busy. But I wanted to supply you with complete test results. So finally:
>concerning the ANSI/UTF8 problem the driver seems to work (at least for me :-)
>). WinME test will follow later.
>
>

Thanks for your testing.
What we've had so far is *locale* support which I've never used and
we've had
 no *collation* support unfortunaltely.

>Only some minor issues are left:
>- sometimes my application gives an (memory) error when leaving. (Will try to
>trace that)
>
>

I think this is fixed in the dll at
    http://www.geocities.jp/inocchichichi/psqlodbc/index.html .

>- continuously I get 'unknown configuration parameter "max_identifier_length"
>
>

7.4 doesn't seems to have the max_identifier_length parameter.
This may be fixed in the abov dll also..

>- Typ "lo" is not existing (I don't think my app is doing that)
>
>

This isn't an error at all, just checking the existence of lo type when
connecting to the server.

regards,
Hiroshi Inoue

Unicode is not UTF-8. was :psqlODBC-Driver Test / text fields

From
Johann Zuschlag
Date:
Dave Page schrieb:
> If 'ö' is 'ö', then isn't the query above mixing single and a multibyte encoding? Ie. It should all be single byte -
e.g.
>
> select name from kunde where name >= 'ö' order by name asc;
>
> Or all multibyte (displayed byte by byte) whatever that results in:
>
> s*e*l*e*c*t* *n*a*m*e* *f*r*o*m* *k*u*n*d*e* *w*h*e*r*e* *n*a*m*e* *>*=* *'*ö'*;*
>
> Of course, we all know how well I grok encoding issues :-)
>
Hi Dave,

I can understand you. This encoding issues drive me also crazy some
times. :-)

The problem with UTF-8 is that all ASCII characters are represented by
one byte and all non ASCII characters, e.g. German Umlauts, are
represented by two bytes. That's why UTF-8 is called a "variable-length
multibyte encoding". In a pure Unicode world, e.g. U+xxxx with two
bytes, every character is represented by two bytes (fixed-length
multibyte encoding). So Unicode is not equal to UTF-8, even though the
PostgreSQL documentation is stating that.

If you like, see: http://www.utf8-chartable.de/ or some explanation at
http://czyborra.com/utf/

Windows XP supports ANSI, UTF-8, Unicode and Unicode Big Endian.
Unfortunately (or fortunately?) Windows seems to use UTF-8 for European
languages. Hiroshi can you explain that? I guess the Japanese edition of
Windows XP is using pure 2 byte Unicode.

I can't say anything about psql. But the new  psqlodbc driver 7.03.26X
seems to handle that situation very well.

So I suppose the test was valid to a certain extend, since the
characters are handled in this mixed way in Win XP. I still have some
funny behaviour with Unicode in psql (even after setting LC_COLLATE
correctly :-) ).

For my production machines I will anyway use ISO-8859-1 (or
ISO-8859-15). Then the driver will convert all characters to single byte
avoiding all kind of problems.

But feel free to ask me for tests... ;-)

Regards,
Johann


Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text fields

From
Hiroshi Inoue
Date:
Johann Zuschlag wrote:

> Dave Page schrieb:
>
>> If 'ö' is 'ö', then isn't the query above mixing single and a
>> multibyte encoding? Ie. It should all be single byte - e.g.
>>
>> select name from kunde where name >= 'ö' order by name asc;
>>
>> Or all multibyte (displayed byte by byte) whatever that results in:
>>
>> s*e*l*e*c*t* *n*a*m*e* *f*r*o*m* *k*u*n*d*e* *w*h*e*r*e* *n*a*m*e*
>> *>*=* *'*ö'*;*
>>
>> Of course, we all know how well I grok encoding issues :-)
>>
>
> Hi Dave,
>
> I can understand you. This encoding issues drive me also crazy some
> times. :-)
>
> The problem with UTF-8 is that all ASCII characters are represented by
> one byte and all non ASCII characters, e.g. German Umlauts, are
> represented by two bytes. That's why UTF-8 is called a
> "variable-length multibyte encoding". In a pure Unicode world, e.g.
> U+xxxx with two bytes, every character is represented by two bytes
> (fixed-length multibyte encoding). So Unicode is not equal to UTF-8,
> even though the PostgreSQL documentation is stating that.
>
> If you like, see: http://www.utf8-chartable.de/ or some explanation at
> http://czyborra.com/utf/
>
> Windows XP supports ANSI, UTF-8, Unicode and Unicode Big Endian.
> Unfortunately (or fortunately?) Windows seems to use UTF-8 for
> European languages. Hiroshi can you explain that? I guess the Japanese
> edition of Windows XP is using pure 2 byte Unicode.


Unicode ODBC drivers handle UCS-2 not UTF-8 even in European environemt.
Unfortunately PostgreSQL doesn't handle UCS-2
directly(because it could contain NULL bytes in the string), the unicode
driver sets the client_encoding to UTF-8 automatically and
converts from UCS-2 data to UTF-8 data which  the PostgreSQL  backend
can understands when sending  queries. So what you
can see in the backend log  is UTF-8. Then the backend converts from
UTF-8 data to the server encoding data. After all,  the locale
(especially LC_COLLATE) setting you need is the one which matches the
backend encoding.

>
> I can't say anything about psql. But the new  psqlodbc driver 7.03.26X
> seems to handle that situation very well.
>
> So I suppose the test was valid to a certain extend,


Yes thanks. I can't test the LATINxx encoding by myself.

regards,
Hiroshi Inoue

Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text

From
Bart Samwel
Date:
Johann Zuschlag wrote:
> The problem with UTF-8 is that all ASCII characters are represented by
> one byte and all non ASCII characters, e.g. German Umlauts, are
> represented by two bytes. That's why UTF-8 is called a "variable-length
> multibyte encoding". In a pure Unicode world, e.g. U+xxxx with two
> bytes, every character is represented by two bytes (fixed-length
> multibyte encoding). So Unicode is not equal to UTF-8, even though the
> PostgreSQL documentation is stating that.

Well, it's actually even more complicated, because Unicode is actually a
32-bit character set. There is actually UTF8 (variable-length multibyte,
8 bits per unit), UTF16 (variable-length multibyte) and UTF32
(fixed-length multibyte). There is also UCS2 (fixed-length 16-bit),
which is limited to the 16 bits of the Basic Multilingual Plane, and
UCS4, which is functionally identical to UTF32. UTF-8 actually supports
up to 4 bytes per character, so it is more complete than the purely
16-bit UCS-2. Any of the variable-length encodings, and the 32-bit
UTF-32 and UCS-4 encodings can represent the whole of the character set.
A pure Unicode world can use any of those encodings, so it's a tradeoff.
If you want a direct relationship between the number of characters in a
string and the number of bytes taken, use a fixed-length encoding. If
you want to be able to encode everything, use a variable-length encoding
or a 32-bit encoding. If you want to use little space, use an 8-bit
encoding. That's it.

 > Windows XP supports ANSI, UTF-8, Unicode and Unicode Big Endian.
 > Unfortunately (or fortunately?) Windows seems to use UTF-8 for European
 > languages. Hiroshi can you explain that? I guess the Japanese edition of
 > Windows XP is using pure 2 byte Unicode.

In fact, the Win32 API is UTF-16 even in European languages(started out
as UCS-2 but became UTF-16 when Unicode went 32-bit :-) ), but it
provides an 8-bit compatibility interface. Don't know if te 8-bit
encoding is UTF-8 or plain 8-bit code pages though.

Reference: http://en.wikipedia.org/wiki/Unicode

Cheers,
Bart

Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text

From
Johann Zuschlag
Date:
Hiroshi Inoue schrieb:
>
> Unicode ODBC drivers handle UCS-2 not UTF-8 even in European
> environemt. Unfortunately PostgreSQL doesn't handle UCS-2
> directly(because it could contain NULL bytes in the string), the
> unicode driver sets the client_encoding to UTF-8 automatically and
> converts from UCS-2 data to UTF-8 data which  the PostgreSQL  backend
> can understands when sending  queries. So what you
> can see in the backend log  is UTF-8. Then the backend converts from
> UTF-8 data to the server encoding data. After all,  the locale
> (especially LC_COLLATE) setting you need is the one which matches the
> backend encoding.
>
Hmm..., so Windows  XP uses UCS-2 or do be more correct (like Bart
mentioned) UTF-16 (which is nearly the same, except for the surrogates).
That is converted to UTF-8, sent to the backend and then converted to
the proper locale and stored. I've read about the problems with the NULL
bytes on Unix machines.

Let's have two examples:
1.
backend-1 = ISO8859-1
backend-2 = UTF-8

'A' = U+0041 (does windows use big-endian?)

Win UCS-2: U+0041
ODBC UTF-8: U+41
backend-1 stores = 0x41
backend-2 stores = U+41

2.
'Ä' = U+00C4 (german A-Umlaut)

Win UCS-2: U+00C4
ODBC UTF-8: U+C384
backend-1 stores = 0xC4
backend-2 stores = U+C384

Did I get that right? So I have to be really careful when testing.

Regards,
Johann


Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text

From
Johann Zuschlag
Date:
Johann Zuschlag schrieb:
> Let's have two examples:
> 1.
> backend-1 = ISO8859-1
> backend-2 = UTF-8
>
> 'A' = U+0041 (does windows use big-endian?)
>
> Win UCS-2: U+0041
> ODBC UTF-8: U+41
> backend-1 stores = 0x41
> backend-2 stores = U+41
>
> 2.
> 'Ä' = U+00C4 (german A-Umlaut)
>
> Win UCS-2: U+00C4
> ODBC UTF-8: U+C384
> backend-1 stores = 0xC4
> backend-2 stores = U+C384
>
> Did I get that right? So I have to be really careful when testing.
>
No, again wrong. Or is it more like this:

1.
a) locale = ISO8859-1
    backend-1 = LATIN1

b) locale = UTF-8
    backend-2 = Unicode

'A' = U+0041 (does windows use big-endian?)

Win UCS-2: U+0041
ODBC UTF-8: U+41
backend-1 stores = U+41
backend-2 stores = U+0041

2.
'Ä' = U+00C4 (german A-Umlaut)

Win UCS-2: U+00C4
ODBC UTF-8: U+C384
backend-1 stores = 0xC4
backend-2 stores = U+00C4


Did I get that right?

Regards,
Johann


Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text

From
Marc Herbert
Date:
Johann Zuschlag <zuschlag2@online.de> writes:

> 'A' = U+0041 (does windows use big-endian?)

Argh, please do not make it even more complex than it needs to be!

Endianness is by chance an _independent_ issue. You just care about it
at the low-low level when dealing with files or network sockets, but
then it's over and you never want to hear about it anymore at a higher
level.

So U+0041 is an integer whose value is:

    zero thousand zero hundred forty one

and this is always true, whatever is the byte ordering used by the
processor. You don't need to know more than this, even when converting
to UTF-8 or anything else.



Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text

From
Marc Herbert
Date:
Johann Zuschlag <zuschlag2@online.de> writes:

> Hmm..., so Windows  XP uses UCS-2 or do be more correct (like Bart
> mentioned) UTF-16 (which is nearly the same, except for the
> surrogates).

It's nearly the same... but that makes a huge difference.

The reason why you use fixed-character length encoding in memory is
speed. This saves you a lot of time when computing string lengths,
look for some characters (isalnum(),...), collating etc.

If don't care about all this speed then you better stay in a
variable-length encoding like UTF-8 which saves you A LOT of space,
especially with small occidental alphabets.

I think that by moving from UCS-2 to UTF-16 you lose on BOTH sides
[insert some missing benchmarks here]

And you can be sure that it brings a lot of bugs: one bug every
time some string code has been "forgotten" and not updated, still
assuming UCS-2.

Anyway those bugs are only for far-away and unknown countries out of
the BMP so who cares? :-/

So it really looks like a poor compatibility hack to me (java does it
too).




Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text

From
Marc Herbert
Date:
Johann Zuschlag <zuschlag2@online.de> writes:

> I've read about the problems with the NULL bytes on Unix machines.

This problem is not related to Unix at all but to the programming
language used. Most standard C functions use the zero byte convention
as a string terminator, so it becomes a forbidden character in C.

On the other hand String objects in C++ and Java use a separate length
field, and having NULLs inside a string is a no brainer there.

The ODBC API has been designed for C and Cobol. Cobol does not forbid
zero as a character either. When browsing the ODBC spec you'll notice
it carefully caters for the two ways.


Guess which programming language is used PostgreSQL.


I suspect unicode does not care at all about this. After all unicode
is just about characters not about strings.



Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text

From
Bart Samwel
Date:
Marc Herbert wrote:
> Johann Zuschlag <zuschlag2@online.de> writes:
>
>> I've read about the problems with the NULL bytes on Unix machines.
>
> This problem is not related to Unix at all but to the programming
> language used. Most standard C functions use the zero byte convention
> as a string terminator, so it becomes a forbidden character in C.
>
> On the other hand String objects in C++ and Java use a separate length
> field, and having NULLs inside a string is a no brainer there.
>
> The ODBC API has been designed for C and Cobol. Cobol does not forbid
> zero as a character either. When browsing the ODBC spec you'll notice
> it carefully caters for the two ways.
>
>
> Guess which programming language is used PostgreSQL.

C++ even introduced a special alternative character type "wchar_t" for
this, just so that people could handle both 8-bit char* and 16-bit
wchar_t* strings. In wchar_t* strings, 8-bit NULs are not a problem
because only 16-bit NULs count (and AFAIK the Unicode standard does
allows this to be interpreted as a NUL aka end-of-string). The downside
of this solution is that no application actually uses it, and everybody
is stuck with 8-bit ASCII plus a random local codepage unless special
support is added. Why didn't they just upgrade chars to 32 bits and be
done with it... :-/

Cheers,
Bart

Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text

From
Hiroshi Inoue
Date:
Johann Zuschlag wrote:
> Johann Zuschlag schrieb:
>
>>
> No, again wrong. Or is it more like this:
>
> 1.
> a) locale = ISO8859-1
>    backend-1 = LATIN1
>
> b) locale = UTF-8
>    backend-2 = Unicode

What do you mean by the Unicode and are you really setting b)
as above ?

First note that in PostgreSQL the encoding has nothing to do
with the locale setting. Though PostgreSQL manages the encoding
settings by itself, as for the locale setting it completely relies
on the OS environment. There exists an essential flaw from the first.
Anyway you can change the encoding as you like per database at
createdb time but the locale setting LC_COLLATE and LC_CTYPE are
fixed at initdb time.

regards,
Hiroshi Inoue

Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text

From
Marc Herbert
Date:
Bart Samwel <bart@samwel.tk> writes:
>
> C++ even introduced a special alternative character type "wchar_t" for
> this, just so that people could handle both 8-bit char* and 16-bit
> wchar_t* strings. In wchar_t* strings, 8-bit NULs are not a problem
> because only 16-bit NULs count (and AFAIK the Unicode standard does
> allows this to be interpreted as a NUL aka end-of-string). The
> downside of this solution is that no application actually uses it, and
> everybody is stuck with 8-bit ASCII plus a random local codepage
> unless special support is added.

wchar_t is not defined as 16-bits, but as "wide enough to hold any
character of the platform". For instance if the platform uses UCS-4,
then wchar_t is 32 bits wide.

(UTF-16 wchar_t violates this)


I don't clearly see how you want to use a 8-bit NULL to terminate a
(wider) wchar_t array... ?


> Why didn't they just upgrade chars to 32 bits and be done with
> it... :-/

Because "char" was and is still used to store multibyte /
variable-length / encoded characters.


Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text

From
Bart Samwel
Date:
Marc Herbert wrote:
> Bart Samwel <bart@samwel.tk> writes:
> wchar_t is not defined as 16-bits, but as "wide enough to hold any
> character of the platform". For instance if the platform uses UCS-4,
> then wchar_t is 32 bits wide.
>
> (UTF-16 wchar_t violates this)

Ahhh, this explains a lot. The same assumption used to be true for char
until they came up with UTF-8 char. And they couldn't just upgrade char
because too much code assumed that char was one byte. Then platforms
started to use UCS-2 wchar_t, then upgraded those to UTF-16 because they
couldn't just upgrade wchar_t because too much code assumed that wchar_t
was two bytes. Same pattern. Time to introduce wwchar_t_t. :-)

> I don't clearly see how you want to use a 8-bit NULL to terminate a
> (wider) wchar_t array... ?

This was a backreference to a situation mentioned earlier in the
discussion, where wchar_t buffers couldn't be "tunneled through" a layer
that used char*, as the wider wchar_t characters may contain NUL bytes.

Cheers,
Bart

Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text

From
Johann Zuschlag
Date:
Hiroshi Inoue schrieb:
> Johann Zuschlag wrote:
>> Johann Zuschlag schrieb:
>>
>>>
>> No, again wrong. Or is it more like this:
>>
>> 1.
>> a) locale = ISO8859-1
>>    backend-1 = LATIN1
>>
>> b) locale = UTF-8
>>    backend-2 = Unicode
>
> What do you mean by the Unicode and are you really setting b)
> as above ?
>
Oh, typo! On my system (Debian Sarge) it is in fact:

b) locale = de_DE.UTF-8
   backend-2 = Unicode

Regards,
Johann


Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text

From
Hiroshi Inoue
Date:
Johann Zuschlag wrote:

> Hiroshi Inoue schrieb:
>
>> Johann Zuschlag wrote:
>>
>>> Johann Zuschlag schrieb:
>>>
>>>>
>>> No, again wrong. Or is it more like this:
>>>
>>> 1.
>>> a) locale = ISO8859-1
>>>    backend-1 = LATIN1
>>>
>>> b) locale = UTF-8
>>>    backend-2 = Unicode
>>
>>
>> What do you mean by the Unicode and are you really setting b)
>> as above ?
>>
> Oh, typo! On my system (Debian Sarge) it is in fact:
>
> b) locale = de_DE.UTF-8
>   backend-2 = Unicode


What's the result of
    show server_encoding
?

regards,
Hiroshi Inoue


Re: Unicode is not UTF-8

From
Johann Zuschlag
Date:
Hiroshi Inoue schrieb:
> b) locale = de_DE.UTF-8
>>   backend-2 = Unicode
>
>
> What's the result of
>    show server_encoding
Hi Hiroshi,

when I use an unicode database it is UNICODE,
for LATIN1 it's of course LATIN1.
(and for LATIN9 it's LATIN9)

Thank you for your explanations. Now I'm understanding what is happening
with a windows client on one end and PostgreSQL on the other end.

I am just experimenting with the three above settings (initdb with
ISO8859-15, locale ISO8859-15 only). So far it seems to work pretty
well. No problems with an application using psqlodbc driver. All German
characters including euro (LATIN9, UNICODE) seem to work. Sorting is ok.
pgAdminIII is producing the same output. psql works as well, except psql
in conjunction with unicode behaves still a little bit strange. Maybe my
locale setting is not appropriate, but de_DE.UTF-8 doesn't change it.
Will check my console later.

Unfortunately I couldn't test 7.03.262 (psqlodbc35W.dll) since the zip
file has some kind of error.

Regards,
Johann


Re: Unicode is not UTF-8

From
Hiroshi Inoue
Date:
Johann Zuschlag wrote:
> Hiroshi Inoue schrieb:
>
>> b) locale = de_DE.UTF-8
>>
>>>   backend-2 = Unicode
>>
>>
>>
>> What's the result of
>>    show server_encoding
>
> Hi Hiroshi,

Hi Johann,

> Unfortunately I couldn't test 7.03.262 (psqlodbc35W.dll) since the zip
> file has some kind of error.

Sorry. Fixed.

regards,
Hiroshi Inoue