Thread: Encoding problem with 7.4

Encoding problem with 7.4

From
"E.Rodichev"
Date:
Hi,

I just noticed some incorrect behaviour for postgresql-7.4 related
to locale.

After installing 7.4 I created database completely from scratch
with cyrillic locale:

su postgres
export LC_CTYPE=ru_RU.KOI8-R
export LC_COLLATE=ru_RU.KOI8-R
/usr/local/pgsql/bin/initdb -D /db2/pgdata
/usr/local/pgsql/bin/createuser -d er

Then I switch off to my normal account. At this point I have:

/e:1>psql -l       List of databases  Name    |  Owner   | Encoding
-----------+----------+-----------template0 | postgres | SQL_ASCIItemplate1 | postgres | SQL_ASCII
(2 rows)


Then I created new db:

/e:2>createdb test
CREATE DATABASE
/e:3>psql -l       List of databases  Name    |  Owner   | Encoding
-----------+----------+-----------template0 | postgres | SQL_ASCIItemplate1 | postgres | SQL_ASCIItest      | er
|SQL_ASCII   <----- Incorrect!
 
(3 rows)

Let's note than the last line is in fact completely incorrect.
DB test is really in ru_RU.KOI8-R, not ASCII. I can create tables
with ascii characters, and with non-ascii (cyrillic) as well,
and order by, select upper, etc. works in ru_RU.KOI8-R locale.

After first initdb it doesn't affected by my LC_CTYPE and LC_COLLATE
settings. I may set

export LC_CTYPE=ru_RU.KOI8-R
export LC_COLLATE=ru_RU.KOI8-R

or

export LC_CTYPE=C
export LC_COLLATE=C

but order by and select upper works really in cyrillic locale.


As I may see, there are two points here:

1. Reporting Encoding as SQL_ASCII is incorrect - all db are in KOI8,
not in SQL_ASCII;

2. More generally, such kind of fixed locale behaviour is not very
convenient. More natural way looks as follows: the user got
a db encoding as it specified at the moment createdb is issued.
By this way it will be possible to have different databases with
different encodings.

Best regards,  Evgeny Rodichev


_________________________________________________________________________
Evgeny Rodichev                          Sternberg Astronomical Institute
email: er@sai.msu.su                              Moscow State University
Phone: 007 (095) 939 2383
Fax:   007 (095) 932 8841                       http://www.sai.msu.su/~er


Re: Encoding problem with 7.4

From
Peter Eisentraut
Date:
E.Rodichev writes:

> I just noticed some incorrect behaviour for postgresql-7.4 related
> to locale.

Maybe you should first read the documentation to understand how it
actually works.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Encoding problem with 7.4

From
Jean-Michel POURE
Date:
Le Jeudi 27 Novembre 2003 20:56, E.Rodichev a écrit :
> After installing 7.4 I created database completely from scratch
> with cyrillic locale:

Dear Evgeny,

If you want to go 'fast', do not hesitate to install pgAdmin3 GUI from
http://www.pgadmin.org. We will be able to create and manage a database in
KOI8 enconding. You can choose an UTF-8 encoding as well.

pgAdmin3 displays the needed SQL. Therefore you can learn the PostgreSQL/SQL99
syntax quite fast. Also, we provide the full PostgreSQL documentation.

Cheers,
Jean-Michel



Re: Encoding problem with 7.4

From
Christopher Kings-Lynne
Date:
> After installing 7.4 I created database completely from scratch
> with cyrillic locale:
> 
> su postgres
> export LC_CTYPE=ru_RU.KOI8-R
> export LC_COLLATE=ru_RU.KOI8-R
> /usr/local/pgsql/bin/initdb -D /db2/pgdata

You need to go:

/usr/local/pgsql/bin/initdb -D /db2/pgdata -E KOI8

To set the default encoding to KOI8.

> Then I switch off to my normal account. At this point I have:
> 
> /e:1>psql -l
>         List of databases
>    Name    |  Owner   | Encoding
> -----------+----------+-----------
>  template0 | postgres | SQL_ASCII
>  template1 | postgres | SQL_ASCII
> (2 rows)

Locale and encoding are two quite different things.

Chris




Re: Encoding problem with 7.4

From
Tom Lane
Date:
"E.Rodichev" <er@sai.msu.su> writes:
> /e:2>createdb test

>  test      | er       | SQL_ASCII   <----- Incorrect!
> (3 rows)

> Let's note than the last line is in fact completely incorrect.

What's incorrect about it?  You didn't ask for any other encoding
than SQL_ASCII.

You can set the default encoding at initdb time, IIRC, but you didn't.
        regards, tom lane


Re: Encoding problem with 7.4

From
"E.Rodichev"
Date:
On Fri, 28 Nov 2003, Tom Lane wrote:

> "E.Rodichev" <er@sai.msu.su> writes:
> > /e:2>createdb test
>
> >  test      | er       | SQL_ASCII   <----- Incorrect!
> > (3 rows)
>
> > Let's note than the last line is in fact completely incorrect.
>
> What's incorrect about it?  You didn't ask for any other encoding
> than SQL_ASCII.

It is incorrect, because database "test" is, really, in KOI8, NOT in SQL_ASCII
in this example, as I explained in my mail.

Best wishes,
E.R.

>
> You can set the default encoding at initdb time, IIRC, but you didn't.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

_________________________________________________________________________
Evgeny Rodichev                          Sternberg Astronomical Institute
email: er@sai.msu.su                              Moscow State University
Phone: 007 (095) 939 2383
Fax:   007 (095) 932 8841                       http://www.sai.msu.su/~er


Re: Encoding problem with 7.4

From
Peter Eisentraut
Date:
E.Rodichev writes:

> It is incorrect, because database "test" is, really, in KOI8, NOT in SQL_ASCII
> in this example, as I explained in my mail.

The encoding is only a declaration of your intentions.  What you actually
put into the database is your responsibility.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Encoding problem with 7.4

From
Andrew Dunstan
Date:
E.Rodichev wrote:

>On Fri, 28 Nov 2003, Tom Lane wrote:
>
>  
>
>>"E.Rodichev" <er@sai.msu.su> writes:
>>    
>>
>>>/e:2>createdb test
>>>      
>>>
>>> test      | er       | SQL_ASCII   <----- Incorrect!
>>>(3 rows)
>>>      
>>>
>>>Let's note than the last line is in fact completely incorrect.
>>>      
>>>
>>What's incorrect about it?  You didn't ask for any other encoding
>>than SQL_ASCII.
>>    
>>
>
>It is incorrect, because database "test" is, really, in KOI8, NOT in SQL_ASCII
>in this example, as I explained in my mail.
>
>Best wishes,
>E.R.
>
>  
>
>>You can set the default encoding at initdb time, IIRC, but you didn't.
>>

You can set the default at initdb time, or per database at createdb 
time, but it has to be done explicitly. You seem to think it should be 
picked up from the environment, but this isn't so, you must use the 
-E|--encoding flag on either createdb or initdb, or if creating directly 
from SQL use the ENCODING option on "create database" to use something 
other than the default set by initdb.

examples:

[andrew@Thor bin]$ ./initdb /tmp/enctry
The files belonging to this database system will be owned by user "andrew".
This user must also own the server process.

The database cluster will be initialized with locales: COLLATE:  ru_RU.KOI8-R CTYPE:    ru_RU.KOI8-R MESSAGES:
en_US.iso885915MONETARY: en_US.iso885915 NUMERIC:  en_US.iso885915 TIME:     en_US.iso885915
 

creating directory /tmp/enctry... ok
creating directory /tmp/enctry/base... ok
creating directory /tmp/enctry/global... ok
creating directory /tmp/enctry/pg_xlog... ok
creating directory /tmp/enctry/pg_clog... ok
selecting default max_connections... 100
selecting default shared_buffers... 1000
creating configuration files... ok
creating template1 database in /tmp/enctry/base/1... ok
initializing pg_shadow... ok
enabling unlimited row size for system tables... ok
initializing pg_depend... ok
creating system views... ok
loading pg_description... ok
creating conversions... ok
setting privileges on built-in objects... ok
creating information schema... ok
vacuuming database template1... ok
copying template1 to template0... ok

Success. You can now start the database server using:
   ./postmaster -D /tmp/enctry
or   ./pg_ctl -D /tmp/enctry -l logfile start

[andrew@Thor bin]$ ./pg_ctl -D /tmp/enctry -l /tmp/enclog -o '-p 5433' start
postmaster successfully started
[andrew@Thor bin]$ ./createdb -E KOI8-R -p 5433 testme
CREATE DATABASE
[andrew@Thor bin]$ ./psql -p 5433 -l      List of databases  Name    | Owner  | Encoding 
-----------+--------+-----------template0 | andrew | SQL_ASCIItemplate1 | andrew | SQL_ASCIItestme    | andrew | KOI8
(3 rows)

[andrew@Thor bin]$ ./pg_ctl -D /tmp/enctry  -o '-p 5433' stop
waiting for postmaster to shut down......done
postmaster successfully shut down
[andrew@Thor bin]$ rm -rf /tmp/enctry
[andrew@Thor bin]$ ./initdb -E KOI8-R /tmp/enctry
The files belonging to this database system will be owned by user "andrew".
This user must also own the server process.

The database cluster will be initialized with locales: COLLATE:  ru_RU.KOI8-R CTYPE:    ru_RU.KOI8-R MESSAGES:
en_US.iso885915MONETARY: en_US.iso885915 NUMERIC:  en_US.iso885915 TIME:     en_US.iso885915
 

creating directory /tmp/enctry... ok
creating directory /tmp/enctry/base... ok
creating directory /tmp/enctry/global... ok
creating directory /tmp/enctry/pg_xlog... ok
creating directory /tmp/enctry/pg_clog... ok
selecting default max_connections... 100
selecting default shared_buffers... 1000
creating configuration files... ok
creating template1 database in /tmp/enctry/base/1... ok
initializing pg_shadow... ok
enabling unlimited row size for system tables... ok
initializing pg_depend... ok
creating system views... ok
loading pg_description... ok
creating conversions... ok
setting privileges on built-in objects... ok
creating information schema... ok
vacuuming database template1... ok
copying template1 to template0... ok

Success. You can now start the database server using:
   ./postmaster -D /tmp/enctry
or   ./pg_ctl -D /tmp/enctry -l logfile start

[andrew@Thor bin]$ ./pg_ctl -D /tmp/enctry -l /tmp/enclog -o '-p 5433' start
postmaster successfully started
[andrew@Thor bin]$ ./createdb -p 5433 testme
CREATE DATABASE
[andrew@Thor bin]$ ./psql -p 5433 -l      List of databases  Name    | Owner  | Encoding
-----------+--------+----------template0 | andrew | KOI8template1 | andrew | KOI8testme    | andrew | KOI8
(3 rows)

[andrew@Thor bin]$


cheers

andrew



Re: Encoding problem with 7.4

From
Stephan Szabo
Date:
On Wed, 3 Dec 2003, E.Rodichev wrote:

> On Fri, 28 Nov 2003, Tom Lane wrote:
>
> > "E.Rodichev" <er@sai.msu.su> writes:
> > > /e:2>createdb test
> >
> > >  test      | er       | SQL_ASCII   <----- Incorrect!
> > > (3 rows)
> >
> > > Let's note than the last line is in fact completely incorrect.
> >
> > What's incorrect about it?  You didn't ask for any other encoding
> > than SQL_ASCII.
>
> It is incorrect, because database "test" is, really, in KOI8, NOT in SQL_ASCII
> in this example, as I explained in my mail.

No, it isn't. As far as PostgreSQL is concerned the database is SQL_ASCII
since you didn't override the default encoding at initdb time or at
createdb time.  You did choose LC_ values that seem to want KOI8, but
locale and encoding are separate, if you want KOI8 encoding, you have to
say so.


Re: Encoding problem with 7.4

From
"E.Rodichev"
Date:
On Wed, 3 Dec 2003, Stephan Szabo wrote:

>
> On Wed, 3 Dec 2003, E.Rodichev wrote:
>
> > On Fri, 28 Nov 2003, Tom Lane wrote:
> >
> > > "E.Rodichev" <er@sai.msu.su> writes:
> > > > /e:2>createdb test
> > >
> > > >  test      | er       | SQL_ASCII   <----- Incorrect!
> > > > (3 rows)
> > >
> > > > Let's note than the last line is in fact completely incorrect.
> > >
> > > What's incorrect about it?  You didn't ask for any other encoding
> > > than SQL_ASCII.
> >
> > It is incorrect, because database "test" is, really, in KOI8, NOT in SQL_ASCII
> > in this example, as I explained in my mail.
>
> No, it isn't. As far as PostgreSQL is concerned the database is SQL_ASCII
> since you didn't override the default encoding at initdb time or at
> createdb time.  You did choose LC_ values that seem to want KOI8, but
> locale and encoding are separate, if you want KOI8 encoding, you have to
> say so.

Yes, it is!

If db "test" is SQL_ASCII, AND all LC_* env are set to "C", the sorting of
ASCII characters is, for example,

a
A
b
B
c
C

not

A
B
C
a
b
c

(the first order is true for ru_RU.KOI8-R, the latter one - for C).

To summarize shortly:

- initdb _without_ -E flag, but with ru_RU.KOI8-R environment;
- createdb with any environment;
- psql indicates SQL_ASCII;
- sorting and upper/lowercasing are in ru_RU.KOI8-R, even with LC_*
environment is set to "C".

Where is the logic?

Best wishes,
E.R.


Re: Encoding problem with 7.4

From
Alvaro Herrera
Date:
On Wed, Dec 03, 2003 at 11:42:34PM +0300, E.Rodichev wrote:
> On Wed, 3 Dec 2003, Stephan Szabo wrote:
> 
> > No, it isn't. As far as PostgreSQL is concerned the database is SQL_ASCII
> > since you didn't override the default encoding at initdb time or at
> > createdb time.  You did choose LC_ values that seem to want KOI8, but
> > locale and encoding are separate, if you want KOI8 encoding, you have to
> > say so.
> 
> Yes, it is!

What apparently you haven't picked up yet is that the _locale_ is a
different and unrelated configuration setting from the _encoding_.
Sort order is locale related; you already got that one right.  Now you
need to go after the encoding.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El destino baraja y nosotros jugamos" (A. Schopenhauer)


Re: Encoding problem with 7.4

From
Andrew Dunstan
Date:
E.Rodichev wrote:

>On Wed, 3 Dec 2003, Stephan Szabo wrote:
>
>  
>
>>On Wed, 3 Dec 2003, E.Rodichev wrote:
>>
>>    
>>
>>>On Fri, 28 Nov 2003, Tom Lane wrote:
>>>
>>>      
>>>
>>>>"E.Rodichev" <er@sai.msu.su> writes:
>>>>        
>>>>
>>>>>/e:2>createdb test
>>>>>          
>>>>>
>>>>> test      | er       | SQL_ASCII   <----- Incorrect!
>>>>>(3 rows)
>>>>>          
>>>>>
>>>>>Let's note than the last line is in fact completely incorrect.
>>>>>          
>>>>>
>>>>What's incorrect about it?  You didn't ask for any other encoding
>>>>than SQL_ASCII.
>>>>        
>>>>
>>>It is incorrect, because database "test" is, really, in KOI8, NOT in SQL_ASCII
>>>in this example, as I explained in my mail.
>>>      
>>>
>>No, it isn't. As far as PostgreSQL is concerned the database is SQL_ASCII
>>since you didn't override the default encoding at initdb time or at
>>createdb time.  You did choose LC_ values that seem to want KOI8, but
>>locale and encoding are separate, if you want KOI8 encoding, you have to
>>say so.
>>    
>>
>
>Yes, it is!
>
>If db "test" is SQL_ASCII, AND all LC_* env are set to "C", the sorting of
>ASCII characters is, for example,
>
>a
>A
>b
>B
>c
>C
>
>not
>
>A
>B
>C
>a
>b
>c
>
>(the first order is true for ru_RU.KOI8-R, the latter one - for C).
>
>To summarize shortly:
>
>- initdb _without_ -E flag, but with ru_RU.KOI8-R environment;
>- createdb with any environment;
>- psql indicates SQL_ASCII;
>- sorting and upper/lowercasing are in ru_RU.KOI8-R, even with LC_*
>environment is set to "C".
>
>Where is the logic?
>  
>


Encoding and collation order are two different things. LC_* settings 
have no effect on encoding.

see http://www.postgresql.org/docs/current/static/charset.html

cheers

andrew




Re: Encoding problem with 7.4

From
Stephan Szabo
Date:
On Wed, 3 Dec 2003, E.Rodichev wrote:

> On Wed, 3 Dec 2003, Stephan Szabo wrote:
>
> >
> > On Wed, 3 Dec 2003, E.Rodichev wrote:
> >
> > > On Fri, 28 Nov 2003, Tom Lane wrote:
> > >
> > > > "E.Rodichev" <er@sai.msu.su> writes:
> > > > > /e:2>createdb test
> > > >
> > > > >  test      | er       | SQL_ASCII   <----- Incorrect!
> > > > > (3 rows)
> > > >
> > > > > Let's note than the last line is in fact completely incorrect.
> > > >
> > > > What's incorrect about it?  You didn't ask for any other encoding
> > > > than SQL_ASCII.
> > >
> > > It is incorrect, because database "test" is, really, in KOI8, NOT in SQL_ASCII
> > > in this example, as I explained in my mail.
> >
> > No, it isn't. As far as PostgreSQL is concerned the database is SQL_ASCII
> > since you didn't override the default encoding at initdb time or at
> > createdb time.  You did choose LC_ values that seem to want KOI8, but
> > locale and encoding are separate, if you want KOI8 encoding, you have to
> > say so.
>
> Yes, it is!

*sigh*

> (the first order is true for ru_RU.KOI8-R, the latter one - for C).
>
> To summarize shortly:
>
> - initdb _without_ -E flag, but with ru_RU.KOI8-R environment;
> - createdb with any environment;
> - psql indicates SQL_ASCII;
> - sorting and upper/lowercasing are in ru_RU.KOI8-R, even with LC_*
> environment is set to "C".

Only the locale settings at initdb time matter.  Changing the LC_* later
is not going to change what the database does.  Encoding and locale are
separate (but related) and it is your responsibility to make sure the
choices are consistent. If you do not specify an encoding, SQL_ASCII is
used for the encoding. If the characters happen to line up appropriately
for what your ru_RU.KOI8-R locale expects it'll even happen to appear to
work for sorting and case changes (and things like isprint). Which part of
this are you not understanding?


Re: Encoding problem with 7.4

From
"E.Rodichev"
Date:
On Wed, 3 Dec 2003, Alvaro Herrera wrote:

> On Wed, Dec 03, 2003 at 11:42:34PM +0300, E.Rodichev wrote:
> > On Wed, 3 Dec 2003, Stephan Szabo wrote:
> >
> > > No, it isn't. As far as PostgreSQL is concerned the database is SQL_ASCII
> > > since you didn't override the default encoding at initdb time or at
> > > createdb time.  You did choose LC_ values that seem to want KOI8, but
> > > locale and encoding are separate, if you want KOI8 encoding, you have to
> > > say so.
> >
> > Yes, it is!
>
> What apparently you haven't picked up yet is that the _locale_ is a
> different and unrelated configuration setting from the _encoding_.
> Sort order is locale related; you already got that one right.  Now you

Sorry, I got it WRONG!

Sort order for C locale MUST be the abcABC, not aAbBcC.

But I got aAbBcC.

Best wishes,
E.R.

> need to go after the encoding.
>
> --
> Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
> "El destino baraja y nosotros jugamos" (A. Schopenhauer)
>

_________________________________________________________________________
Evgeny Rodichev                          Sternberg Astronomical Institute
email: er@sai.msu.su                              Moscow State University
Phone: 007 (095) 939 2383
Fax:   007 (095) 932 8841                       http://www.sai.msu.su/~er


Re: Encoding problem with 7.4

From
"E.Rodichev"
Date:
On Wed, 3 Dec 2003, Andrew Dunstan wrote:

> Encoding and collation order are two different things. LC_* settings
> have no effect on encoding.
>
> see http://www.postgresql.org/docs/current/static/charset.html

I am trying to point out to reverse dependency:

encoding (1) has effect on LC_* settings and (2) the indication of
encoding is incorrect.

Is it right?

Regards,
E.R.

>
> cheers
>
> andrew
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

_________________________________________________________________________
Evgeny Rodichev                          Sternberg Astronomical Institute
email: er@sai.msu.su                              Moscow State University
Phone: 007 (095) 939 2383
Fax:   007 (095) 932 8841                       http://www.sai.msu.su/~er


Re: Encoding problem with 7.4

From
"E.Rodichev"
Date:
On Wed, 3 Dec 2003, Stephan Szabo wrote:

> Only the locale settings at initdb time matter.  Changing the LC_* later
> is not going to change what the database does.  Encoding and locale are
> separate (but related) and it is your responsibility to make sure the
> choices are consistent. If you do not specify an encoding, SQL_ASCII is
> used for the encoding. If the characters happen to line up appropriately
> for what your ru_RU.KOI8-R locale expects it'll even happen to appear to
> work for sorting and case changes (and things like isprint). Which part of
> this are you not understanding?


Thank you, it is much more consistent answer. But again, the things are
going not exactly the way you wrote.

From your opinion the chain is

data -> encoding transform -> locale transform -> output

It looks clean and reasonable.

Encoding transform may be set during initdb or createdb (is it true?)

But when locale transform is defined? In general unix flavor it should
depend on LC_* setting (is it true?)

As I described in my first posting the situation is different. Namely,
locale setting now defines _encoding transform_ (and data representation
in storage), but _locale transform_ doesnt depend on LC_*.

Best wishes,
E.R.

_________________________________________________________________________
Evgeny Rodichev                          Sternberg Astronomical Institute
email: er@sai.msu.su                              Moscow State University
Phone: 007 (095) 939 2383
Fax:   007 (095) 932 8841                       http://www.sai.msu.su/~er


Re: Encoding problem with 7.4

From
Stephan Szabo
Date:
On Thu, 4 Dec 2003, E.Rodichev wrote:

> On Wed, 3 Dec 2003, Stephan Szabo wrote:
>
> > Only the locale settings at initdb time matter.  Changing the LC_* later
> > is not going to change what the database does.  Encoding and locale are
> > separate (but related) and it is your responsibility to make sure the
> > choices are consistent. If you do not specify an encoding, SQL_ASCII is
> > used for the encoding. If the characters happen to line up appropriately
> > for what your ru_RU.KOI8-R locale expects it'll even happen to appear to
> > work for sorting and case changes (and things like isprint). Which part of
> > this are you not understanding?
>
>
> Thank you, it is much more consistent answer. But again, the things are
> going not exactly the way you wrote.
>
> >From your opinion the chain is
>
> data -> encoding transform -> locale transform -> output
>
> It looks clean and reasonable.
>
> Encoding transform may be set during initdb or createdb (is it true?)
>
> But when locale transform is defined? In general unix flavor it should
> depend on LC_* setting (is it true?)
>
> As I described in my first posting the situation is different. Namely,
> locale setting now defines _encoding transform_ (and data representation
> in storage), but _locale transform_ doesnt depend on LC_*.

The locale settings depend on LC_* at initdb time only. When the
postmaster starts it sets the locale based on the stored values from
initdb, not on the current environment.

With an SQL_ASCII database being accessed from a client with
client_encoding set to SQL_ASCII (which it should be if you aren't setting
it) the byte values of a string are passed along with no conversion for
the encoding.  This means that from within one environment you should get
back what you put in, so it might *look* like it's KOI8-R if that's what
you're in, but it's not because someone accessing it from say an ISO8859-1
system may see something different.


Re: Encoding problem with 7.4

From
"E.Rodichev"
Date:
On Wed, 3 Dec 2003, Stephan Szabo wrote:

> The locale settings depend on LC_* at initdb time only. When the
> postmaster starts it sets the locale based on the stored values from
> initdb, not on the current environment.
>
> With an SQL_ASCII database being accessed from a client with
> client_encoding set to SQL_ASCII (which it should be if you aren't setting
> it) the byte values of a string are passed along with no conversion for
> the encoding.  This means that from within one environment you should get
> back what you put in, so it might *look* like it's KOI8-R if that's what
> you're in, but it's not because someone accessing it from say an ISO8859-1
> system may see something different.

As a result, the possibility to control encodings and locales looks as
follows:
           initdb   createdb     psql
Encoding:      Y         Y          Y
Locale:        Y         N          N

It seems that more natural scheme will be
           initdb   createdb     psql
Encoding:      Y         Y          Y
Locale:        Y         Y          Y

Now the possibility to use different encodings for createdb and psql is
a bit strange... Also, it is impossible to have different locales
for different databases within one cluster, and it is impossible to use
different locales with one database. The latter is even more critical.
The reason is that the sorting under C locale is much more effective compared with
one under another locales (10-50 times faster for some implementations!).
Another reason is that for some applications it is _necessary_ to use different
sort order for different tables. For example, I may have two tables:
russian_persons and forein_persons, and i'd like to print the sorted list
of persons. The russian_persons names must be sorted with ru_RU.KOI8-R locale,
and the forein_persons - with C locale.

Best wishes,
E.R.
_________________________________________________________________________
Evgeny Rodichev                          Sternberg Astronomical Institute
email: er@sai.msu.su                              Moscow State University
Phone: 007 (095) 939 2383
Fax:   007 (095) 932 8841                       http://www.sai.msu.su/~er


Re: Encoding problem with 7.4

From
Andrew Dunstan
Date:
E.Rodichev wrote:

>On Wed, 3 Dec 2003, Stephan Szabo wrote:
>
>  
>
>>The locale settings depend on LC_* at initdb time only. When the
>>postmaster starts it sets the locale based on the stored values from
>>initdb, not on the current environment.
>>
>>With an SQL_ASCII database being accessed from a client with
>>client_encoding set to SQL_ASCII (which it should be if you aren't setting
>>it) the byte values of a string are passed along with no conversion for
>>the encoding.  This means that from within one environment you should get
>>back what you put in, so it might *look* like it's KOI8-R if that's what
>>you're in, but it's not because someone accessing it from say an ISO8859-1
>>system may see something different.
>>    
>>
>
>As a result, the possibility to control encodings and locales looks as
>follows:
>
>            initdb   createdb     psql
>Encoding:      Y         Y          Y
>Locale:        Y         N          N
>
>It seems that more natural scheme will be
>
>            initdb   createdb     psql
>Encoding:      Y         Y          Y
>Locale:        Y         Y          Y
>
>Now the possibility to use different encodings for createdb and psql is
>a bit strange... Also, it is impossible to have different locales
>for different databases within one cluster, and it is impossible to use
>different locales with one database. The latter is even more critical.
>The reason is that the sorting under C locale is much more effective compared with
>one under another locales (10-50 times faster for some implementations!).
>Another reason is that for some applications it is _necessary_ to use different
>sort order for different tables. For example, I may have two tables:
>russian_persons and forein_persons, and i'd like to print the sorted list
>of persons. The russian_persons names must be sorted with ru_RU.KOI8-R locale,
>and the forein_persons - with C locale.
>

see Multi-Language Support section on TODO list at 
http://developer.postgresql.org/todo.php - note that this specifies 
per-column locales rather than per-table, which should be even more useful.

Most of these items have no names against them, meaning you could work 
on them ...

cheers

andrew




Re: Encoding problem with 7.4

From
Stephan Szabo
Date:
On Thu, 4 Dec 2003, E.Rodichev wrote:

> On Wed, 3 Dec 2003, Stephan Szabo wrote:
>
> > The locale settings depend on LC_* at initdb time only. When the
> > postmaster starts it sets the locale based on the stored values from
> > initdb, not on the current environment.
> >
> > With an SQL_ASCII database being accessed from a client with
> > client_encoding set to SQL_ASCII (which it should be if you aren't setting
> > it) the byte values of a string are passed along with no conversion for
> > the encoding.  This means that from within one environment you should get
> > back what you put in, so it might *look* like it's KOI8-R if that's what
> > you're in, but it's not because someone accessing it from say an ISO8859-1
> > system may see something different.
>
> As a result, the possibility to control encodings and locales looks as
> follows:
>
>             initdb   createdb     psql
> Encoding:      Y         Y          Y

As a note you can change the *client* encoding from psql, not the *server*
encoding.  They're also two separate notions.

Andrew already commented on the TODO list.  You may also wish to look
through the archives for a recent message from Peter E on the subject as
he was looking into starting towards multiple collations and such.