Thread: multibyte-support

multibyte-support

From
Ulf Mehlig
Date:
[ I hope this message is "right" in the interfaces list! ]

Hello,

I discovered the multibyte-support of postgreSQL (the documentation is
not easy to find ...). After re-compiling with "--with-mb" and
"--enable-locale", re-creating the data directory with "initdb -e
LATIN1" and re-building my database with "-e LATIN1" as well, I'm able
e.g. to get +-proper sorting of strings with German umlauts
etc. 

However, when I try to use the precompiled psql client for Windows
from "ftp://ftp.postgresql.org" over a network connection, it is not
able to understand the "special" characters like "u umlaut" etc.,
whether I set CLIENT_ENCODING to either 'WIN', 'ALT', or 'LATIN1' (I
think, multibyte support is compiled in the client, at least my unix
psql won't let me set this variable if it has no mb). 

Which encodings do I have to use?  The client runs on a vmware virtual
PC with German NT 4.0/SP5, the server is a Linux (i386/2.2.14) with
postgreSQL 6.5.3.

Many thanks for your attention + help,
Ulf

P.S.: Please CC: me, I'm not on this list at the moment!

-- 
======================================================================
Ulf Mehlig    <umehlig@zmt.uni-bremen.de>             Center for Tropical Marine Ecology/ZMT, Bremen, Germany
----------------------------------------------------------------------


Re: [INTERFACES] multibyte-support

From
Tatsuo Ishii
Date:
> not easy to find ...). After re-compiling with "--with-mb" and

Sorry for the inconvenience. I would like to write docs for the
multi-byte support in PostgreSQL for the next release.

> "--enable-locale", re-creating the data directory with "initdb -e
> LATIN1" and re-building my database with "-e LATIN1" as well, I'm able
> e.g. to get +-proper sorting of strings with German umlauts
> etc. 
> 
> However, when I try to use the precompiled psql client for Windows
> from "ftp://ftp.postgresql.org" over a network connection, it is not

I am not sure the binary being compiled with mb. 

> able to understand the "special" characters like "u umlaut" etc.,
> whether I set CLIENT_ENCODING to either 'WIN', 'ALT', or 'LATIN1' (I
> think, multibyte support is compiled in the client, at least my unix
> psql won't let me set this variable if it has no mb). 

Sounds strange. I assume you use "set client encoding to 'LATIN1'" or
something like that. It is just a query for psql, since psql doesn't
parse the contents of the query, it should be possible for non-mb psql
to send the query.

Anyway, to make sure that you are using mb enabled psql, start
postmaster with "-d 3" flag to get debug output (turn off -
S. otherwise debug out will not appear). You should get lots of
messages including following portions:

query: select getdatabaseencoding()
ProcessQuery
CommitTransactionCommand
StartTransactionCommand
query: SET client_encoding = 'LATIN1'
ProcessUtility: SET client_encoding = 'LATIN1'

If not, you gotta into trouble. You need to build mb enabled psql on
PC by yourself. Unfortunately I cannot make it for you, since I have
very little knowledges about Windows. Maybe Hiroki Kataoka have some
suggestions...
--
Tatsuo Ishii


Re: [INTERFACES] multibyte-support

From
Ulf Mehlig
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> wrote
> > not easy to find ...). After re-compiling with "--with-mb" and> > Sorry for the inconvenience. I would like to
writedocs for the> multi-byte support in PostgreSQL for the next release.
 

I wouldn't call it "inconvenience" -- you are no paid providers of
some service who have to care for their customers "convenience" :-)
Many thanks for writing any documentation at all! :) However, maybe it
is possible to generally include links to the relevant (?) README
files in the HTML documentation or the main README/INSTALL files?
> Anyway, to make sure that you are using mb enabled psql, start> postmaster with "-d 3" flag to get debug output (turn
off-> S. otherwise debug out will not appear). You should get lots of> messages including following portions [...]
 

You are right, it seems to be that the Windows client is *not* mb
enabled; I get only the following debug messages:
  --------------------------------------------------  debug info:          [...]          query echo   = f
InitPostgres         reset_client_encoding()..          reset_client_encoding() done.
--------------------------------------------------

When starting the mb-enabled Unix client, I get among others the
messages you described.

My assumption that psql without mb is not able to set CLIENT_ENCODING
was probably wrong -- I forgot that I compiled backend and psql at the
same time, so the error message I got was probably a *backend* message
...
> If not, you gotta into trouble. You need to build mb enabled psql on> PC by yourself. 

The trouble is not as big, I was just playing around, at least for me
there is no urgent need for a Windows psql. But wouldn't it be a good
idea to generally include mb support? Or is there any reason not to
include mb support in the standard configure process?

Anyway, many thanks for your help and for providing multibyte support! 
Regards,
Ulf

-- 
======================================================================
Ulf Mehlig    <umehlig@zmt.uni-bremen.de>             Center for Tropical Marine Ecology/ZMT, Bremen, Germany
----------------------------------------------------------------------


Re: [INTERFACES] multibyte-support

From
Michael Meskes
Date:
On Sat, Jan 29, 2000 at 11:15:30PM +0100, Ulf Mehlig wrote:
> I discovered the multibyte-support of postgreSQL (the documentation is
> not easy to find ...). After re-compiling with "--with-mb" and
> "--enable-locale", re-creating the data directory with "initdb -e
> LATIN1" and re-building my database with "-e LATIN1" as well, I'm able
> e.g. to get +-proper sorting of strings with German umlauts
> etc. 

Why exactly do you need multibyte? I use PostgreSQL for quite some time now
with German umlauts using locale de_DE and it works pretty well except thet
pgaccess does not display them.

But then you mention correct sort order, do you mean you get it to sort
a, "a (instead of writing the umlaut), b in this order instead of a, b, "a
which is ascii order? Well then I should take a look at it.

Michael
-- 
Michael Meskes                         | Go SF 49ers!
Th.-Heuss-Str. 61, D-41812 Erkelenz    | Go Rhein Fire!
Tel.: (+49) 2431/72651                 | Use Debian GNU/Linux!
Email: Michael@Fam-Meskes.De           | Use PostgreSQL!


Re: [INTERFACES] multibyte-support

From
Thomas Lockhart
Date:
> Sorry for the inconvenience. I would like to write docs for the
> multi-byte support in PostgreSQL for the next release.

Maybe we can set aside time to do the full SQL92 character type
support. Then the docs will change a lot anyway :)
                      - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California