Thread: Proposal for detecting encoding mismatch in initdb

Proposal for detecting encoding mismatch in initdb

From
Peter Eisentraut
Date:
I've worked out a scheme that should adequately detect encoding 
mismatches in initdb.  Please comment on the following behavior.

The locale is still taken from the environment or the command line; no 
change.

If the locale is C or POSIX, then we set the encoding to SQL_ASCII or 
whatever was specified on the command line, and do nothing further.  
(No useful matching can be done in this case.)

If the locale is not C or POSIX:

If the encoding is specified, check for compatibility.  If not 
compatible, print a warning.  Continue in any case.

If the encoding was not specified, pick a matching one, print it out, 
continue.  (This is probably the most usual case.)

If no matching encoding could be found, print an error message asking 
the user to set one explicitly.


Here are some "screenshots":

$ initdb -D pg-install/var/data --locale=de_DE@euro
[...]
The database cluster will be initialized with locale de_DE@euro.
The default database encoding has accordingly been set to LATIN9.


$ initdb -D pg-install/var/data --locale=de_DE@euro --encoding=UNICODE
[...]
The database cluster will be initialized with locale de_DE@euro.
initdb: warning: encoding mismatch
The encoding you selected (UNICODE) and the encoding that the selected
locale uses (ISO-8859-15) are not known to match.  This may lead to
misbehavior in various character string processing functions.  To fix
this situation, rerun initdb and either do not specify an encoding
explicitly, or choose a matching combination.
[continues...]


$ initdb -D pg-install/var/data --locale=japanese.sjis
[...]
The database cluster will be initialized with locale japanese.sjis.
initdb: could not find suitable encoding for locale "japanese.sjis"
Rerun initdb with the -E option.
Try "initdb --help" for more information.
[exit 1]

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/



Re: Proposal for detecting encoding mismatch in initdb

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I've worked out a scheme that should adequately detect encoding 
> mismatches in initdb.  Please comment on the following behavior.

The behavioral description sounds fine, but I was eagerly awaiting your
description of exactly how you'd test for compatibility or search for
a compatible encoding ... without that algorithm the whole thing's moot.

BTW, what happens if there is more than one apparently-matching
encoding?  (It might be best to error out in this case, on the theory
that we evidently don't have a correct matching.)
        regards, tom lane


Re: Proposal for detecting encoding mismatch in initdb

From
Peter Eisentraut
Date:
Tom Lane wrote:
> The behavioral description sounds fine, but I was eagerly awaiting
> your description of exactly how you'd test for compatibility or
> search for a compatible encoding ... without that algorithm the whole
> thing's moot.

It's just an explicit list of things that spell similarly.  There's not 
much more we can do, but I don't see any obvious candidates were this 
could lead to trouble.

> BTW, what happens if there is more than one apparently-matching
> encoding?  (It might be best to error out in this case, on the theory
> that we evidently don't have a correct matching.)

I just won't put something like that into the list.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/



Re: Proposal for detecting encoding mismatch in initdb

From
Peter Eisentraut
Date:
I wrote:
> I've worked out a scheme that should adequately detect encoding
> mismatches in initdb.

Done.

Karel pointed me to some other projects that are trying to do the same 
thing, and they are no smarter than what we have now.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/