> I can see only one advantage for NCHAR - those fields that aren't NCHAR
> will not use strcoll() for comparison.
> But I cannot remember one filed in my database that does not contain
> russian characters. Even my WWW logs contain them.
> So in any case I am forced to make all my fields NCHAR, and this is
> exactly what we have now - postgres compiled with --enable-locale makes
> all char NCHAR.
Yes, and that is how we got the implementation we have. Implementing
NCHAR is a step on the road toward having fully flexible character set
capabilities in a single database. By itself, NCHAR probably does not
offer tremendous advantages for anyone running a fully "localized"
database.
So some the questions really might be:
1) Is implementing NCHAR, and reverting CHAR back to the SQL-standard
ascii-ish behavior, acceptable, or does it introduce fatal flaws for
implementers? e.g. do any third party tools know about NCHAR? I would
assume that the odbc interface could just map NCHAR to CHAR if odbc
knows nothing about NCHAR...
2) Solving various problems for specific datasets will require new
specialized support routines. If this is true, then isn't the Postgres
type system the way to introduce these specialized capabilities?
Doesn't Unicode, for example, work well as a new data type, as opposed
to shoehorning it into all areas of the backend with #ifdefs?
3) Do the SQL92-defined features help us solve the problem, or do they
just get in the way? istm that they address some of the features we
would need, and have sufficient fuzz around the edges to allow a
successful implementation.
An example of what we could do would be to have both Russian/Cyrillic
and Japanese regression tests in the main regression suite, since they
could coexist with the other tests.
- Thomas
--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California