Thread: Troubles using German Umlauts with JDBC

Troubles using German Umlauts with JDBC

From
"Alexander Troppmann"
Date:
Hi,

  we have troubles with German umlauts (e.g.: äüÖ) using the Postgresql JDBC
  driver form the 7.1.2 distribution... already tried to debug our Java
  software but it seems that the database driver modifies the umlauts in any
  way - a debug before any INSERT or after a SELECT query shows that the
  umlaut "ü" for example gets lost on the way though the JDBC driver...

  So e.g. the attribute city='München' gets "M\?nchen" when testing the JDBC
  driver using a simple Java program.

  Any idea what happens?

  Best regards,
  Alex T.





Re: Troubles using German Umlauts with JDBC

From
Dave Cramer
Date:
Alexander,

You have to set the encoding when you make the connection.

Properties props = new Properties();
props.put("user",user);
props.put("password",password);
props.put("charSet",encoding);
Connection con = DriverManager.getConnection(url,props);
where encoding is the proper encoding for your database

Dave
On Tue, 2001-09-04 at 09:16, Alexander Troppmann wrote:
> Hi,
>
>   we have troubles with German umlauts (e.g.: äüÖ) using the Postgresql JDBC
>   driver form the 7.1.2 distribution... already tried to debug our Java
>   software but it seems that the database driver modifies the umlauts in any
>   way - a debug before any INSERT or after a SELECT query shows that the
>   umlaut "ü" for example gets lost on the way though the JDBC driver...
>
>   So e.g. the attribute city='München' gets "M\?nchen" when testing the JDBC
>   driver using a simple Java program.
>
>   Any idea what happens?
>
>   Best regards,
>   Alex T.
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>




Re: Troubles using German Umlauts with JDBC

From
Rene Pijlman
Date:
[forwarding to pgsql-hackers and Bruce as Todo list maintainer,
see comment below]

[insert with JDBC converts Latin-1 umlaut to ?]
On 04 Sep 2001 09:54:27 -0400, Dave Cramer wrote:
>You have to set the encoding when you make the connection.
>
>Properties props = new Properties();
>props.put("user",user);
>props.put("password",password);
>props.put("charSet",encoding);
>Connection con = DriverManager.getConnection(url,props);
>where encoding is the proper encoding for your database

For completeness, I quote the answer Barry Lind gave yesterday.

"[the driver] asks the server what character set is being used
for the database.  Unfortunatly the server only knows about
character sets if multibyte support is compiled in. If the
server is compiled without multibyte, then it always reports to
the client that the character set is SQL_ASCII (where SQL_ASCII
is 7bit ascii).  Thus if you don't have multibyte enabled on the
server you can't support 8bit characters through the jdbc
driver, unless you specifically tell the connection what
character set to use (i.e. override the default obtained from
the server)."

This really is confusing and I think PostgreSQL should be able
to support single byte encoding conversions without enabling
multi-byte.

To the very least there should be a --enable-encoding-conversion
or something similar, even if it just enables the current
multibyte support.

Bruce, can this be put on the TODO list one way or the other?
This problem has appeared 4 times in two months or so on the
JDBC list.

Regards,
René Pijlman <rene@lab.applinet.nl>

Re: [HACKERS] Troubles using German Umlauts with JDBC

From
Barry Lind
Date:
Rene,

I would like to add one additional comment.  In current sources the jdbc
driver detects (through a hack) that the server doesn't have multibyte
enabled and then ignores the SQL_ASCII return value and defaults to the
JVM's character set instead of using SQL_ASCII.

The problem boils down to the fact that without multibyte enabled, the
server has know way of specifiying which 8bit character set is being
used for a particular database.  Thus a client like JDBC doesn't know
what character set to use when converting to UNICODE.  Thus the best we
can do in JDBC is use our best guess (JVM character set is probably the
best default), and allow the user to explicitly specify something else
if necessary.

thanks,
--Barry

Rene Pijlman wrote:
> [forwarding to pgsql-hackers and Bruce as Todo list maintainer,
> see comment below]
>
> [insert with JDBC converts Latin-1 umlaut to ?]
> On 04 Sep 2001 09:54:27 -0400, Dave Cramer wrote:
>
>>You have to set the encoding when you make the connection.
>>
>>Properties props = new Properties();
>>props.put("user",user);
>>props.put("password",password);
>>props.put("charSet",encoding);
>>Connection con = DriverManager.getConnection(url,props);
>>where encoding is the proper encoding for your database
>>
>
> For completeness, I quote the answer Barry Lind gave yesterday.
>
> "[the driver] asks the server what character set is being used
> for the database.  Unfortunatly the server only knows about
> character sets if multibyte support is compiled in. If the
> server is compiled without multibyte, then it always reports to
> the client that the character set is SQL_ASCII (where SQL_ASCII
> is 7bit ascii).  Thus if you don't have multibyte enabled on the
> server you can't support 8bit characters through the jdbc
> driver, unless you specifically tell the connection what
> character set to use (i.e. override the default obtained from
> the server)."
>
> This really is confusing and I think PostgreSQL should be able
> to support single byte encoding conversions without enabling
> multi-byte.
>
> To the very least there should be a --enable-encoding-conversion
> or something similar, even if it just enables the current
> multibyte support.
>
> Bruce, can this be put on the TODO list one way or the other?
> This problem has appeared 4 times in two months or so on the
> JDBC list.
>
> Regards,
> René Pijlman <rene@lab.applinet.nl>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://www.postgresql.org/search.mpl
>
>



Re: [HACKERS] Troubles using German Umlauts with JDBC

From
Rene Pijlman
Date:
I've added a new section "Character encoding" to
http://lab.applinet.nl/postgresql-jdbc/, based on the
information from Dave and Barry.

I haven't seen a confirmation from pgsql-hackers or Bruce yet
that this issue will be added to the Todo list. I'm under the
impression that the backend developers don't see this as a
problem.

Regards,
René Pijlman

On Tue, 04 Sep 2001 10:40:36 -0700, Barry Lind wrote:
>I would like to add one additional comment.  In current sources the jdbc
>driver detects (through a hack) that the server doesn't have multibyte
>enabled and then ignores the SQL_ASCII return value and defaults to the
>JVM's character set instead of using SQL_ASCII.
>
>The problem boils down to the fact that without multibyte enabled, the
>server has know way of specifiying which 8bit character set is being
>used for a particular database.  Thus a client like JDBC doesn't know
>what character set to use when converting to UNICODE.  Thus the best we
>can do in JDBC is use our best guess (JVM character set is probably the
>best default), and allow the user to explicitly specify something else
>if necessary.
>
>thanks,
>--Barry
>
>Rene Pijlman wrote:
>> [forwarding to pgsql-hackers and Bruce as Todo list maintainer,
>> see comment below]
>>
>> [insert with JDBC converts Latin-1 umlaut to ?]
>> On 04 Sep 2001 09:54:27 -0400, Dave Cramer wrote:
>>
>>>You have to set the encoding when you make the connection.
>>>
>>>Properties props = new Properties();
>>>props.put("user",user);
>>>props.put("password",password);
>>>props.put("charSet",encoding);
>>>Connection con = DriverManager.getConnection(url,props);
>>>where encoding is the proper encoding for your database
>>>
>>
>> For completeness, I quote the answer Barry Lind gave yesterday.
>>
>> "[the driver] asks the server what character set is being used
>> for the database.  Unfortunatly the server only knows about
>> character sets if multibyte support is compiled in. If the
>> server is compiled without multibyte, then it always reports to
>> the client that the character set is SQL_ASCII (where SQL_ASCII
>> is 7bit ascii).  Thus if you don't have multibyte enabled on the
>> server you can't support 8bit characters through the jdbc
>> driver, unless you specifically tell the connection what
>> character set to use (i.e. override the default obtained from
>> the server)."
>>
>> This really is confusing and I think PostgreSQL should be able
>> to support single byte encoding conversions without enabling
>> multi-byte.
>>
>> To the very least there should be a --enable-encoding-conversion
>> or something similar, even if it just enables the current
>> multibyte support.
>>
>> Bruce, can this be put on the TODO list one way or the other?
>> This problem has appeared 4 times in two months or so on the
>> JDBC list.
>>
>> Regards,
>> René Pijlman <rene@lab.applinet.nl>

Re: [HACKERS] Troubles using German Umlauts with JDBC

From
Bruce Momjian
Date:
I can add something if people agree there is an issue here.

> I've added a new section "Character encoding" to
> http://lab.applinet.nl/postgresql-jdbc/, based on the
> information from Dave and Barry.
>
> I haven't seen a confirmation from pgsql-hackers or Bruce yet
> that this issue will be added to the Todo list. I'm under the
> impression that the backend developers don't see this as a
> problem.
>
> Regards,
> Ren? Pijlman
>
> On Tue, 04 Sep 2001 10:40:36 -0700, Barry Lind wrote:
> >I would like to add one additional comment.  In current sources the jdbc
> >driver detects (through a hack) that the server doesn't have multibyte
> >enabled and then ignores the SQL_ASCII return value and defaults to the
> >JVM's character set instead of using SQL_ASCII.
> >
> >The problem boils down to the fact that without multibyte enabled, the
> >server has know way of specifiying which 8bit character set is being
> >used for a particular database.  Thus a client like JDBC doesn't know
> >what character set to use when converting to UNICODE.  Thus the best we
> >can do in JDBC is use our best guess (JVM character set is probably the
> >best default), and allow the user to explicitly specify something else
> >if necessary.
> >
> >thanks,
> >--Barry
> >
> >Rene Pijlman wrote:
> >> [forwarding to pgsql-hackers and Bruce as Todo list maintainer,
> >> see comment below]
> >>
> >> [insert with JDBC converts Latin-1 umlaut to ?]
> >> On 04 Sep 2001 09:54:27 -0400, Dave Cramer wrote:
> >>
> >>>You have to set the encoding when you make the connection.
> >>>
> >>>Properties props = new Properties();
> >>>props.put("user",user);
> >>>props.put("password",password);
> >>>props.put("charSet",encoding);
> >>>Connection con = DriverManager.getConnection(url,props);
> >>>where encoding is the proper encoding for your database
> >>>
> >>
> >> For completeness, I quote the answer Barry Lind gave yesterday.
> >>
> >> "[the driver] asks the server what character set is being used
> >> for the database.  Unfortunatly the server only knows about
> >> character sets if multibyte support is compiled in. If the
> >> server is compiled without multibyte, then it always reports to
> >> the client that the character set is SQL_ASCII (where SQL_ASCII
> >> is 7bit ascii).  Thus if you don't have multibyte enabled on the
> >> server you can't support 8bit characters through the jdbc
> >> driver, unless you specifically tell the connection what
> >> character set to use (i.e. override the default obtained from
> >> the server)."
> >>
> >> This really is confusing and I think PostgreSQL should be able
> >> to support single byte encoding conversions without enabling
> >> multi-byte.
> >>
> >> To the very least there should be a --enable-encoding-conversion
> >> or something similar, even if it just enables the current
> >> multibyte support.
> >>
> >> Bruce, can this be put on the TODO list one way or the other?
> >> This problem has appeared 4 times in two months or so on the
> >> JDBC list.
> >>
> >> Regards,
> >> Ren? Pijlman <rene@lab.applinet.nl>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://www.postgresql.org/search.mpl
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [HACKERS] Troubles using German Umlauts with JDBC

From
Rene Pijlman
Date:
On Sun, 9 Sep 2001 10:24:32 -0400 (EDT), Bruce Momjian wrote:
>I can add something if people agree there is an issue here.

IMO the issue is twofold. Without multibyte compiled in:

1) the server cannot tell the client which single byte character
encoding is being used, so a client like JDBC cannot properly
convert to its native encoding

2) its not possible to create a database with a single byte
encoding other than ASCII (see my posting
http://fts.postgresql.org/db/mw/msg.html?mid=1029462)

I'm not sure to what extent these issues are related.

Also, client/server character conversion is coupled to multibyte
support (see Peter's reply to my posting). This may be a
limitation for other clients, but I'm not sure about that.

Basically, it seems that multibyte support is adding features
that are needed in single byte environents as well. Perhaps the
problem can be solved by documentation (recommending to enable
multibyte support in non-ASCII singlebyte environments), perhaps
by an alias (--enable-character-encoding), perhaps the
functionality needs to be split into a true multibyte part and a
generic part. I don't know what's best, this probably depends on
the "price" of compiling in multibyte support.

Regards,
René Pijlman

>> I've added a new section "Character encoding" to
>> http://lab.applinet.nl/postgresql-jdbc/, based on the
>> information from Dave and Barry.
>>
>> I haven't seen a confirmation from pgsql-hackers or Bruce yet
>> that this issue will be added to the Todo list. I'm under the
>> impression that the backend developers don't see this as a
>> problem.
>>
>> Regards,
>> Ren? Pijlman
>>
>> On Tue, 04 Sep 2001 10:40:36 -0700, Barry Lind wrote:
>> >I would like to add one additional comment.  In current sources the jdbc
>> >driver detects (through a hack) that the server doesn't have multibyte
>> >enabled and then ignores the SQL_ASCII return value and defaults to the
>> >JVM's character set instead of using SQL_ASCII.
>> >
>> >The problem boils down to the fact that without multibyte enabled, the
>> >server has know way of specifiying which 8bit character set is being
>> >used for a particular database.  Thus a client like JDBC doesn't know
>> >what character set to use when converting to UNICODE.  Thus the best we
>> >can do in JDBC is use our best guess (JVM character set is probably the
>> >best default), and allow the user to explicitly specify something else
>> >if necessary.
>> >
>> >thanks,
>> >--Barry
>> >
>> >Rene Pijlman wrote:
>> >> [forwarding to pgsql-hackers and Bruce as Todo list maintainer,
>> >> see comment below]
>> >>
>> >> [insert with JDBC converts Latin-1 umlaut to ?]
>> >> On 04 Sep 2001 09:54:27 -0400, Dave Cramer wrote:
>> >>
>> >>>You have to set the encoding when you make the connection.
>> >>>
>> >>>Properties props = new Properties();
>> >>>props.put("user",user);
>> >>>props.put("password",password);
>> >>>props.put("charSet",encoding);
>> >>>Connection con = DriverManager.getConnection(url,props);
>> >>>where encoding is the proper encoding for your database
>> >>>
>> >>
>> >> For completeness, I quote the answer Barry Lind gave yesterday.
>> >>
>> >> "[the driver] asks the server what character set is being used
>> >> for the database.  Unfortunatly the server only knows about
>> >> character sets if multibyte support is compiled in. If the
>> >> server is compiled without multibyte, then it always reports to
>> >> the client that the character set is SQL_ASCII (where SQL_ASCII
>> >> is 7bit ascii).  Thus if you don't have multibyte enabled on the
>> >> server you can't support 8bit characters through the jdbc
>> >> driver, unless you specifically tell the connection what
>> >> character set to use (i.e. override the default obtained from
>> >> the server)."
>> >>
>> >> This really is confusing and I think PostgreSQL should be able
>> >> to support single byte encoding conversions without enabling
>> >> multi-byte.
>> >>
>> >> To the very least there should be a --enable-encoding-conversion
>> >> or something similar, even if it just enables the current
>> >> multibyte support.
>> >>
>> >> Bruce, can this be put on the TODO list one way or the other?
>> >> This problem has appeared 4 times in two months or so on the
>> >> JDBC list.
>> >>
>> >> Regards,
>> >> Ren? Pijlman <rene@lab.applinet.nl>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: Have you searched our list archives?
>>
>> http://www.postgresql.org/search.mpl
>>


Re: [HACKERS] Troubles using German Umlauts with JDBC

From
Barry Lind
Date:
Rene,

Two comments on your writeup about the problem:

1) Depending on version you will see different behavior:
    7.0 - default client character set is used
    7.1 - database character set is used (although it may be reported
incorrectly as SQL_ASCII)
    7.2 - database character set is used if multibyte, else use the
client character set.

In all versions it is possible to set the character set explicitly via
the charSet parameter.


2) The charSet parameter (as can any parameter the driver expects) can
also be set in the connection URL. (i.e.
jdbc:postgresql://localhost/dbname?charSet=UTF-8&user=foo&password=bar
shows passing the charSet, user and password in the URL)

thanks,
--Barry


Rene Pijlman wrote:
> I've added a new section "Character encoding" to
> http://lab.applinet.nl/postgresql-jdbc/, based on the
> information from Dave and Barry.
>
> I haven't seen a confirmation from pgsql-hackers or Bruce yet
> that this issue will be added to the Todo list. I'm under the
> impression that the backend developers don't see this as a
> problem.
>
> Regards,
> René Pijlman
>
> On Tue, 04 Sep 2001 10:40:36 -0700, Barry Lind wrote:
>
>>I would like to add one additional comment.  In current sources the jdbc
>>driver detects (through a hack) that the server doesn't have multibyte
>>enabled and then ignores the SQL_ASCII return value and defaults to the
>>JVM's character set instead of using SQL_ASCII.
>>
>>The problem boils down to the fact that without multibyte enabled, the
>>server has know way of specifiying which 8bit character set is being
>>used for a particular database.  Thus a client like JDBC doesn't know
>>what character set to use when converting to UNICODE.  Thus the best we
>>can do in JDBC is use our best guess (JVM character set is probably the
>>best default), and allow the user to explicitly specify something else
>>if necessary.
>>
>>thanks,
>>--Barry
>>
>>Rene Pijlman wrote:
>>
>>>[forwarding to pgsql-hackers and Bruce as Todo list maintainer,
>>>see comment below]
>>>
>>>[insert with JDBC converts Latin-1 umlaut to ?]
>>>On 04 Sep 2001 09:54:27 -0400, Dave Cramer wrote:
>>>
>>>
>>>>You have to set the encoding when you make the connection.
>>>>
>>>>Properties props = new Properties();
>>>>props.put("user",user);
>>>>props.put("password",password);
>>>>props.put("charSet",encoding);
>>>>Connection con = DriverManager.getConnection(url,props);
>>>>where encoding is the proper encoding for your database
>>>>
>>>>
>>>For completeness, I quote the answer Barry Lind gave yesterday.
>>>
>>>"[the driver] asks the server what character set is being used
>>>for the database.  Unfortunatly the server only knows about
>>>character sets if multibyte support is compiled in. If the
>>>server is compiled without multibyte, then it always reports to
>>>the client that the character set is SQL_ASCII (where SQL_ASCII
>>>is 7bit ascii).  Thus if you don't have multibyte enabled on the
>>>server you can't support 8bit characters through the jdbc
>>>driver, unless you specifically tell the connection what
>>>character set to use (i.e. override the default obtained from
>>>the server)."
>>>
>>>This really is confusing and I think PostgreSQL should be able
>>>to support single byte encoding conversions without enabling
>>>multi-byte.
>>>
>>>To the very least there should be a --enable-encoding-conversion
>>>or something similar, even if it just enables the current
>>>multibyte support.
>>>
>>>Bruce, can this be put on the TODO list one way or the other?
>>>This problem has appeared 4 times in two months or so on the
>>>JDBC list.
>>>
>>>Regards,
>>>René Pijlman <rene@lab.applinet.nl>
>>>
>



Re: [HACKERS] Troubles using German Umlauts with JDBC

From
Barry Lind
Date:
Bruce,

I think the TODO item should be:

Ability to set character set for a database without multibyte enabled

Currently createdb -E (and the corresponding create database sql
command) only works if multibyte is enabled.  However it is useful to
know which single byte character set is being used even when multibyte
isn't enabled.  Currently there is no way to specify which single byte
character set a database is using (unless you compile with multibyte).

thanks,
--Barry


Bruce Momjian wrote:
> I can add something if people agree there is an issue here.
>
>
>>I've added a new section "Character encoding" to
>>http://lab.applinet.nl/postgresql-jdbc/, based on the
>>information from Dave and Barry.
>>
>>I haven't seen a confirmation from pgsql-hackers or Bruce yet
>>that this issue will be added to the Todo list. I'm under the
>>impression that the backend developers don't see this as a
>>problem.
>>
>>Regards,
>>Ren? Pijlman
>>
>>On Tue, 04 Sep 2001 10:40:36 -0700, Barry Lind wrote:
>>
>>>I would like to add one additional comment.  In current sources the jdbc
>>>driver detects (through a hack) that the server doesn't have multibyte
>>>enabled and then ignores the SQL_ASCII return value and defaults to the
>>>JVM's character set instead of using SQL_ASCII.
>>>
>>>The problem boils down to the fact that without multibyte enabled, the
>>>server has know way of specifiying which 8bit character set is being
>>>used for a particular database.  Thus a client like JDBC doesn't know
>>>what character set to use when converting to UNICODE.  Thus the best we
>>>can do in JDBC is use our best guess (JVM character set is probably the
>>>best default), and allow the user to explicitly specify something else
>>>if necessary.
>>>
>>>thanks,
>>>--Barry
>>>
>>>Rene Pijlman wrote:
>>>
>>>>[forwarding to pgsql-hackers and Bruce as Todo list maintainer,
>>>>see comment below]
>>>>
>>>>[insert with JDBC converts Latin-1 umlaut to ?]
>>>>On 04 Sep 2001 09:54:27 -0400, Dave Cramer wrote:
>>>>
>>>>
>>>>>You have to set the encoding when you make the connection.
>>>>>
>>>>>Properties props = new Properties();
>>>>>props.put("user",user);
>>>>>props.put("password",password);
>>>>>props.put("charSet",encoding);
>>>>>Connection con = DriverManager.getConnection(url,props);
>>>>>where encoding is the proper encoding for your database
>>>>>
>>>>>
>>>>For completeness, I quote the answer Barry Lind gave yesterday.
>>>>
>>>>"[the driver] asks the server what character set is being used
>>>>for the database.  Unfortunatly the server only knows about
>>>>character sets if multibyte support is compiled in. If the
>>>>server is compiled without multibyte, then it always reports to
>>>>the client that the character set is SQL_ASCII (where SQL_ASCII
>>>>is 7bit ascii).  Thus if you don't have multibyte enabled on the
>>>>server you can't support 8bit characters through the jdbc
>>>>driver, unless you specifically tell the connection what
>>>>character set to use (i.e. override the default obtained from
>>>>the server)."
>>>>
>>>>This really is confusing and I think PostgreSQL should be able
>>>>to support single byte encoding conversions without enabling
>>>>multi-byte.
>>>>
>>>>To the very least there should be a --enable-encoding-conversion
>>>>or something similar, even if it just enables the current
>>>>multibyte support.
>>>>
>>>>Bruce, can this be put on the TODO list one way or the other?
>>>>This problem has appeared 4 times in two months or so on the
>>>>JDBC list.
>>>>
>>>>Regards,
>>>>Ren? Pijlman <rene@lab.applinet.nl>
>>>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 6: Have you searched our list archives?
>>
>>http://www.postgresql.org/search.mpl
>>
>>
>



Re: [HACKERS] Troubles using German Umlauts with JDBC

From
Bruce Momjian
Date:

Added to TODO.


> Bruce,
>
> I think the TODO item should be:
>
> Ability to set character set for a database without multibyte enabled
>
> Currently createdb -E (and the corresponding create database sql
> command) only works if multibyte is enabled.  However it is useful to
> know which single byte character set is being used even when multibyte
> isn't enabled.  Currently there is no way to specify which single byte
> character set a database is using (unless you compile with multibyte).
>
> thanks,
> --Barry
>
>
> Bruce Momjian wrote:
> > I can add something if people agree there is an issue here.
> >
> >
> >>I've added a new section "Character encoding" to
> >>http://lab.applinet.nl/postgresql-jdbc/, based on the
> >>information from Dave and Barry.
> >>
> >>I haven't seen a confirmation from pgsql-hackers or Bruce yet
> >>that this issue will be added to the Todo list. I'm under the
> >>impression that the backend developers don't see this as a
> >>problem.
> >>
> >>Regards,
> >>Ren? Pijlman
> >>
> >>On Tue, 04 Sep 2001 10:40:36 -0700, Barry Lind wrote:
> >>
> >>>I would like to add one additional comment.  In current sources the jdbc
> >>>driver detects (through a hack) that the server doesn't have multibyte
> >>>enabled and then ignores the SQL_ASCII return value and defaults to the
> >>>JVM's character set instead of using SQL_ASCII.
> >>>
> >>>The problem boils down to the fact that without multibyte enabled, the
> >>>server has know way of specifiying which 8bit character set is being
> >>>used for a particular database.  Thus a client like JDBC doesn't know
> >>>what character set to use when converting to UNICODE.  Thus the best we
> >>>can do in JDBC is use our best guess (JVM character set is probably the
> >>>best default), and allow the user to explicitly specify something else
> >>>if necessary.
> >>>
> >>>thanks,
> >>>--Barry
> >>>
> >>>Rene Pijlman wrote:
> >>>
> >>>>[forwarding to pgsql-hackers and Bruce as Todo list maintainer,
> >>>>see comment below]
> >>>>
> >>>>[insert with JDBC converts Latin-1 umlaut to ?]
> >>>>On 04 Sep 2001 09:54:27 -0400, Dave Cramer wrote:
> >>>>
> >>>>
> >>>>>You have to set the encoding when you make the connection.
> >>>>>
> >>>>>Properties props = new Properties();
> >>>>>props.put("user",user);
> >>>>>props.put("password",password);
> >>>>>props.put("charSet",encoding);
> >>>>>Connection con = DriverManager.getConnection(url,props);
> >>>>>where encoding is the proper encoding for your database
> >>>>>
> >>>>>
> >>>>For completeness, I quote the answer Barry Lind gave yesterday.
> >>>>
> >>>>"[the driver] asks the server what character set is being used
> >>>>for the database.  Unfortunatly the server only knows about
> >>>>character sets if multibyte support is compiled in. If the
> >>>>server is compiled without multibyte, then it always reports to
> >>>>the client that the character set is SQL_ASCII (where SQL_ASCII
> >>>>is 7bit ascii).  Thus if you don't have multibyte enabled on the
> >>>>server you can't support 8bit characters through the jdbc
> >>>>driver, unless you specifically tell the connection what
> >>>>character set to use (i.e. override the default obtained from
> >>>>the server)."
> >>>>
> >>>>This really is confusing and I think PostgreSQL should be able
> >>>>to support single byte encoding conversions without enabling
> >>>>multi-byte.
> >>>>
> >>>>To the very least there should be a --enable-encoding-conversion
> >>>>or something similar, even if it just enables the current
> >>>>multibyte support.
> >>>>
> >>>>Bruce, can this be put on the TODO list one way or the other?
> >>>>This problem has appeared 4 times in two months or so on the
> >>>>JDBC list.
> >>>>
> >>>>Regards,
> >>>>Ren? Pijlman <rene@lab.applinet.nl>
> >>>>
> >>---------------------------(end of broadcast)---------------------------
> >>TIP 6: Have you searched our list archives?
> >>
> >>http://www.postgresql.org/search.mpl
> >>
> >>
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [HACKERS] Troubles using German Umlauts with JDBC

From
Bruce Momjian
Date:
Is this a jdbc issue or a general backend issue?


> Bruce,
>
> I think the TODO item should be:
>
> Ability to set character set for a database without multibyte enabled
>
> Currently createdb -E (and the corresponding create database sql
> command) only works if multibyte is enabled.  However it is useful to
> know which single byte character set is being used even when multibyte
> isn't enabled.  Currently there is no way to specify which single byte
> character set a database is using (unless you compile with multibyte).
>
> thanks,
> --Barry
>
>
> Bruce Momjian wrote:
> > I can add something if people agree there is an issue here.
> >
> >
> >>I've added a new section "Character encoding" to
> >>http://lab.applinet.nl/postgresql-jdbc/, based on the
> >>information from Dave and Barry.
> >>
> >>I haven't seen a confirmation from pgsql-hackers or Bruce yet
> >>that this issue will be added to the Todo list. I'm under the
> >>impression that the backend developers don't see this as a
> >>problem.
> >>
> >>Regards,
> >>Ren? Pijlman
> >>
> >>On Tue, 04 Sep 2001 10:40:36 -0700, Barry Lind wrote:
> >>
> >>>I would like to add one additional comment.  In current sources the jdbc
> >>>driver detects (through a hack) that the server doesn't have multibyte
> >>>enabled and then ignores the SQL_ASCII return value and defaults to the
> >>>JVM's character set instead of using SQL_ASCII.
> >>>
> >>>The problem boils down to the fact that without multibyte enabled, the
> >>>server has know way of specifiying which 8bit character set is being
> >>>used for a particular database.  Thus a client like JDBC doesn't know
> >>>what character set to use when converting to UNICODE.  Thus the best we
> >>>can do in JDBC is use our best guess (JVM character set is probably the
> >>>best default), and allow the user to explicitly specify something else
> >>>if necessary.
> >>>
> >>>thanks,
> >>>--Barry
> >>>
> >>>Rene Pijlman wrote:
> >>>
> >>>>[forwarding to pgsql-hackers and Bruce as Todo list maintainer,
> >>>>see comment below]
> >>>>
> >>>>[insert with JDBC converts Latin-1 umlaut to ?]
> >>>>On 04 Sep 2001 09:54:27 -0400, Dave Cramer wrote:
> >>>>
> >>>>
> >>>>>You have to set the encoding when you make the connection.
> >>>>>
> >>>>>Properties props = new Properties();
> >>>>>props.put("user",user);
> >>>>>props.put("password",password);
> >>>>>props.put("charSet",encoding);
> >>>>>Connection con = DriverManager.getConnection(url,props);
> >>>>>where encoding is the proper encoding for your database
> >>>>>
> >>>>>
> >>>>For completeness, I quote the answer Barry Lind gave yesterday.
> >>>>
> >>>>"[the driver] asks the server what character set is being used
> >>>>for the database.  Unfortunatly the server only knows about
> >>>>character sets if multibyte support is compiled in. If the
> >>>>server is compiled without multibyte, then it always reports to
> >>>>the client that the character set is SQL_ASCII (where SQL_ASCII
> >>>>is 7bit ascii).  Thus if you don't have multibyte enabled on the
> >>>>server you can't support 8bit characters through the jdbc
> >>>>driver, unless you specifically tell the connection what
> >>>>character set to use (i.e. override the default obtained from
> >>>>the server)."
> >>>>
> >>>>This really is confusing and I think PostgreSQL should be able
> >>>>to support single byte encoding conversions without enabling
> >>>>multi-byte.
> >>>>
> >>>>To the very least there should be a --enable-encoding-conversion
> >>>>or something similar, even if it just enables the current
> >>>>multibyte support.
> >>>>
> >>>>Bruce, can this be put on the TODO list one way or the other?
> >>>>This problem has appeared 4 times in two months or so on the
> >>>>JDBC list.
> >>>>
> >>>>Regards,
> >>>>Ren? Pijlman <rene@lab.applinet.nl>
> >>>>
> >>---------------------------(end of broadcast)---------------------------
> >>TIP 6: Have you searched our list archives?
> >>
> >>http://www.postgresql.org/search.mpl
> >>
> >>
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [HACKERS] Troubles using German Umlauts with JDBC

From
Barry Lind
Date:
General backend issue.

--Barry

Bruce Momjian wrote:
> Is this a jdbc issue or a general backend issue?
>
>
>
>>Bruce,
>>
>>I think the TODO item should be:
>>
>>Ability to set character set for a database without multibyte enabled
>>
>>Currently createdb -E (and the corresponding create database sql
>>command) only works if multibyte is enabled.  However it is useful to
>>know which single byte character set is being used even when multibyte
>>isn't enabled.  Currently there is no way to specify which single byte
>>character set a database is using (unless you compile with multibyte).
>>
>>thanks,
>>--Barry
>>
>>
>>Bruce Momjian wrote:
>>
>>>I can add something if people agree there is an issue here.
>>>
>>>
>>>
>>>>I've added a new section "Character encoding" to
>>>>http://lab.applinet.nl/postgresql-jdbc/, based on the
>>>>information from Dave and Barry.
>>>>
>>>>I haven't seen a confirmation from pgsql-hackers or Bruce yet
>>>>that this issue will be added to the Todo list. I'm under the
>>>>impression that the backend developers don't see this as a
>>>>problem.
>>>>
>>>>Regards,
>>>>Ren? Pijlman
>>>>
>>>>On Tue, 04 Sep 2001 10:40:36 -0700, Barry Lind wrote:
>>>>
>>>>
>>>>>I would like to add one additional comment.  In current sources the jdbc
>>>>>driver detects (through a hack) that the server doesn't have multibyte
>>>>>enabled and then ignores the SQL_ASCII return value and defaults to the
>>>>>JVM's character set instead of using SQL_ASCII.
>>>>>
>>>>>The problem boils down to the fact that without multibyte enabled, the
>>>>>server has know way of specifiying which 8bit character set is being
>>>>>used for a particular database.  Thus a client like JDBC doesn't know
>>>>>what character set to use when converting to UNICODE.  Thus the best we
>>>>>can do in JDBC is use our best guess (JVM character set is probably the
>>>>>best default), and allow the user to explicitly specify something else
>>>>>if necessary.
>>>>>
>>>>>thanks,
>>>>>--Barry
>>>>>
>>>>>Rene Pijlman wrote:
>>>>>
>>>>>
>>>>>>[forwarding to pgsql-hackers and Bruce as Todo list maintainer,
>>>>>>see comment below]
>>>>>>
>>>>>>[insert with JDBC converts Latin-1 umlaut to ?]
>>>>>>On 04 Sep 2001 09:54:27 -0400, Dave Cramer wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>You have to set the encoding when you make the connection.
>>>>>>>
>>>>>>>Properties props = new Properties();
>>>>>>>props.put("user",user);
>>>>>>>props.put("password",password);
>>>>>>>props.put("charSet",encoding);
>>>>>>>Connection con = DriverManager.getConnection(url,props);
>>>>>>>where encoding is the proper encoding for your database
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>For completeness, I quote the answer Barry Lind gave yesterday.
>>>>>>
>>>>>>"[the driver] asks the server what character set is being used
>>>>>>for the database.  Unfortunatly the server only knows about
>>>>>>character sets if multibyte support is compiled in. If the
>>>>>>server is compiled without multibyte, then it always reports to
>>>>>>the client that the character set is SQL_ASCII (where SQL_ASCII
>>>>>>is 7bit ascii).  Thus if you don't have multibyte enabled on the
>>>>>>server you can't support 8bit characters through the jdbc
>>>>>>driver, unless you specifically tell the connection what
>>>>>>character set to use (i.e. override the default obtained from
>>>>>>the server)."
>>>>>>
>>>>>>This really is confusing and I think PostgreSQL should be able
>>>>>>to support single byte encoding conversions without enabling
>>>>>>multi-byte.
>>>>>>
>>>>>>To the very least there should be a --enable-encoding-conversion
>>>>>>or something similar, even if it just enables the current
>>>>>>multibyte support.
>>>>>>
>>>>>>Bruce, can this be put on the TODO list one way or the other?
>>>>>>This problem has appeared 4 times in two months or so on the
>>>>>>JDBC list.
>>>>>>
>>>>>>Regards,
>>>>>>Ren? Pijlman <rene@lab.applinet.nl>
>>>>>>
>>>>---------------------------(end of broadcast)---------------------------
>>>>TIP 6: Have you searched our list archives?
>>>>
>>>>http://www.postgresql.org/search.mpl
>>>>
>>>>
>>>>
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 2: you can get off all lists at once with the unregister command
>>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>>
>>
>