Thread: Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
ZEUGSWETTER Andreas IZ5
Date:
> > Perhaps the locale data can be used to gather this information?
> 
> It's certainly there ! locale data contains all information about
> specific character set and encoding. Is it possible to use it to create
> indices ? It should be slower but benefits of using indices will cover
> expenses for non-US people. I didn't notice such behaivour in Informix
> and Oracle. 
> 
Informix has the national character handling, and it is indexable in
Informix.
But it is not done for the standard types char and varchar. In Informix
you use nchar and nvarchar, and have one locale defined per database. 
In Oracle you have national characters, but access is not indexable. 
Actually the SQL standard has something to say about national char 
and varchar. I think it is wrong that char and varchar change their behavior
in postgresql, if you enable locale. A locale sensitive column needs to be
specified
as such in the create table statement according to the standard.
I never enable locale.

Andreas


Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Oleg Bartunov
Date:
On Wed, 9 Jun 1999, ZEUGSWETTER Andreas IZ5 wrote:

> Date: Wed, 9 Jun 1999 19:16:14 +0200 
> From: ZEUGSWETTER Andreas IZ5 <Andreas.Zeugswetter@telecom.at>
> To: 'Oleg Bartunov' <oleg@sai.msu.su>
> Cc: "'hackers@postgresql.org'" <hackers@postgresql.org>
> Subject: Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem 
> 
> 
> > > Perhaps the locale data can be used to gather this information?
> > 
> > It's certainly there ! locale data contains all information about
> > specific character set and encoding. Is it possible to use it to create
> > indices ? It should be slower but benefits of using indices will cover
> > expenses for non-US people. I didn't notice such behaivour in Informix
> > and Oracle. 
> > 
> Informix has the national character handling, and it is indexable in
> Informix.
> But it is not done for the standard types char and varchar. In Informix
> you use nchar and nvarchar, and have one locale defined per database. 
> In Oracle you have national characters, but access is not indexable. 
> Actually the SQL standard has something to say about national char 
> and varchar. I think it is wrong that char and varchar change their behavior
> in postgresql, if you enable locale. A locale sensitive column needs to be
> specified
> as such in the create table statement according to the standard.

Thanks for explanations. It would be great if I could specify for specific
columns if it needs locale. For now I have to pay decreasing in speed
just to enable locale for the only column of one database !!!
I always dream to be able to specify on fly 'SET LOCALE to ON|OFF'
It's a bit separate problem, but probably more easy to implement.
Regards,           Oleg


> I never enable locale.
> 
> Andreas
> 

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Hannu Krosing
Date:
Oleg Bartunov wrote:
> 
> 
> Thanks for explanations. It would be great if I could specify for specific
> columns if it needs locale. For now I have to pay decreasing in speed
> just to enable locale for the only column of one database !!!
> I always dream to be able to specify on fly 'SET LOCALE to ON|OFF'
> It's a bit separate problem, but probably more easy to implement.

Actually the locale should be part of field definition (like length and 
precision currently are) with default of ascii. 

And also it may be probably safest to maintain our own locale defs (as
Oracle 
currently does)so that things don't break if somebody changes the system
ones.

-------------
Hannu


Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Bruce Momjian
Date:
> Oleg Bartunov wrote:
> > 
> > 
> > Thanks for explanations. It would be great if I could specify for specific
> > columns if it needs locale. For now I have to pay decreasing in speed
> > just to enable locale for the only column of one database !!!
> > I always dream to be able to specify on fly 'SET LOCALE to ON|OFF'
> > It's a bit separate problem, but probably more easy to implement.
> 
> Actually the locale should be part of field definition (like length and 
> precision currently are) with default of ascii. 
> 
> And also it may be probably safest to maintain our own locale defs (as
> Oracle 
> currently does)so that things don't break if somebody changes the system
> ones.

Added to TODO:
* Allow LOCALE on a per-column basis, default to ASCII

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Thomas Lockhart
Date:
> Actually the SQL standard has something to say about national char
> and varchar. I think it is wrong that char and varchar change their 
> behavior in postgresql, if you enable locale. A locale sensitive column 
> needs to be specified as such in the create table statement according to 
> the standard. I never enable locale.

I had some discussions on the list a while ago (6 months?) about this
topic, but never really got feedback from "locale-using" people that
NATIONAL CHARACTER and collation sequences are an acceptable solution.
istm that Postgres' extensibility would make this *very* easy to
implement and extend, and that then everyone would get the same
behavior from CHAR while being able to get the behaviors they need
from a variety of other character sets.

I do have an interest in implementing or helping with something, but
since I don't have to live with the consequences of the results
(coming from an ASCII country :) it seemed to be poor form to push it
without feedback from others...
                     - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Oleg Broytmann
Date:
On Thu, 10 Jun 1999, Thomas Lockhart wrote:
> I had some discussions on the list a while ago (6 months?) about this
> topic, but never really got feedback from "locale-using" people that
> NATIONAL CHARACTER and collation sequences are an acceptable solution.
  What feedback do you want? I am pretty sure two Olegs on this list are
ready to continue the discussion.

> istm that Postgres' extensibility would make this *very* easy to
> implement and extend, and that then everyone would get the same
> behavior from CHAR while being able to get the behaviors they need
> from a variety of other character sets.
> 
> I do have an interest in implementing or helping with something, but
> since I don't have to live with the consequences of the results
> (coming from an ASCII country :) it seemed to be poor form to push it
> without feedback from others...
> 
>                       - Thomas
> 
> -- 
> Thomas Lockhart                lockhart@alumni.caltech.edu
> South Pasadena, California

Oleg.
----    Oleg Broytmann     http://members.xoom.com/phd2/     phd2@earthling.net          Programmers don't die, they
justGOSUB without RETURN.
 



Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Thomas Lockhart
Date:
> What feedback do you want? I am pretty sure two Olegs on this list are
> ready to continue the discussion.

istm that the Russian and Japanese contingents could represent the
needs of multibyte and locale concerns very well. So, we should ask
ourselves some questions to see if we can make *progress* in evolving
our text handling, rather than just staying the same forever.

SQL92 suggests some specific text handling features to help with
non-ascii applications. "national character" is, afaik, the feature
which would hold an installation-wide local text type. "collations"
would allow other text types in the same installation, but SQL92 is a
bit fuzzier about how to make them work.

Would these mechanisms work for people? Or are they so fundamentally
flawed or non-standard (it is from a standard, but I'm not sure who
implements it)?
                   - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Tatsuo Ishii
Date:
> istm that the Russian and Japanese contingents could represent the
> needs of multibyte and locale concerns very well. So, we should ask
> ourselves some questions to see if we can make *progress* in evolving
> our text handling, rather than just staying the same forever.
> 
> SQL92 suggests some specific text handling features to help with
> non-ascii applications. "national character" is, afaik, the feature
> which would hold an installation-wide local text type. "collations"
> would allow other text types in the same installation, but SQL92 is a
> bit fuzzier about how to make them work.
> 
> Would these mechanisms work for people? Or are they so fundamentally
> flawed or non-standard (it is from a standard, but I'm not sure who
> implements it)?

In my opinion, introducing NCHAR is useful at least for single byte
codes. Although I'm not familiar with single byte languages, I see
strong demands for NCHAR through recent discussions.

I don't mean it's useless for multibyte, however. As far as I know
locales for multibyte provided by any OS are totally broken especially
in COLLATE(I seriously doubt existing locale framework work for
multibyte). It would be nice to have our own locale data for COLLATE
somewhere in our system like some commercial dbms do, or even better
user defined collations allowed (this is already in the standard).

I have a feeling that I'm going to implement CREATE CHARSET through
slightly modifying the multibyte support code that currently only
allows predefined charset. That will be the first step toward CREATE
COLLATE, NCHAR etc....
---
Tatsuo Ishii


Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Oleg Broytmann
Date:
On Thu, 10 Jun 1999, Thomas Lockhart wrote:
> istm that the Russian and Japanese contingents could represent the
> needs of multibyte and locale concerns very well. So, we should ask
> ourselves some questions to see if we can make *progress* in evolving
> our text handling, rather than just staying the same forever.
  Ok, we are here.  And what a pros and cons for NCHAR?

> SQL92 suggests some specific text handling features to help with
> non-ascii applications. "national character" is, afaik, the feature
  What the help?

> which would hold an installation-wide local text type. "collations"
> would allow other text types in the same installation, but SQL92 is a
> bit fuzzier about how to make them work.
> 
> Would these mechanisms work for people? Or are they so fundamentally
> flawed or non-standard (it is from a standard, but I'm not sure who
> implements it)?
> 
>                     - Thomas
> 
> -- 
> Thomas Lockhart                lockhart@alumni.caltech.edu
> South Pasadena, California

Oleg.
----    Oleg Broytmann     http://members.xoom.com/phd2/     phd2@earthling.net          Programmers don't die, they
justGOSUB without RETURN.
 



Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Dmitry Samersoff
Date:
On 11-Jun-99 Oleg Broytmann wrote:
> On Thu, 10 Jun 1999, Thomas Lockhart wrote:
>> istm that the Russian and Japanese contingents could represent the
>> needs of multibyte and locale concerns very well. So, we should ask
>> ourselves some questions to see if we can make *progress* in evolving
>> our text handling, rather than just staying the same forever.
> 
>    Ok, we are here.
>    And what a pros and cons for NCHAR?
> 
>> SQL92 suggests some specific text handling features to help with
>> non-ascii applications. "national character" is, afaik, the feature
> 
>    What the help?
> 
>> which would hold an installation-wide local text type. "collations"
>> would allow other text types in the same installation, but SQL92 is a
>> bit fuzzier about how to make them work.
>> 
>> Would these mechanisms work for people? Or are they so fundamentally
>> flawed or non-standard (it is from a standard, but I'm not sure who
>> implements it)?

There are two different problems under "locale" cover.

First is national charset handling, mostly sorting,
I try to find official recomendation for it. 
Another one is encoding conversion -   it's unlimited field for discussion.

IMHO, conversion table have to be stored on per host basic, 
some where near from HBA. 

Sort table is to be placed near from createdb. 
Custom sort table is also very attractive for me.


---
Dmitry Samersoff, dms@wplus.net, ICQ:3161705
http://devnull.wplus.net
* There will come soft rains ...


Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Oleg Broytmann
Date:
Hi!

On Fri, 11 Jun 1999, Dmitry Samersoff wrote:
> IMHO, conversion table have to be stored on per host basic, 
> some where near from HBA. 
  Currently you can do this with per-host (SQL command) SET
CLIENT_ENCODING or (env var) export PGCLIENTENCODING=...

> ---
> Dmitry Samersoff, dms@wplus.net, ICQ:3161705
> http://devnull.wplus.net
> * There will come soft rains ...

Oleg.
----    Oleg Broytmann     http://members.xoom.com/phd2/     phd2@earthling.net          Programmers don't die, they
justGOSUB without RETURN.
 



Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Thomas Lockhart
Date:
> > istm that the Russian and Japanese contingents could represent the
> > needs of multibyte and locale concerns very well. So, we should ask
> > ourselves some questions to see if we can make *progress* in evolving
> > our text handling, rather than just staying the same forever.
>    Ok, we are here.
>    And what a pros and cons for NCHAR?

I was hoping you would tell me! :)

> > SQL92 suggests some specific text handling features to help with
> > non-ascii applications.
>    What the help?

OK, SQL92 defines two kinds of native character sets: those we already
have (char, varchar) and those which can be locale customized (nchar,
national character varying, and others). char and varchar always
default to the "SQL" behavior (which I think corresponds to ascii
(called "SQL_TEXT") but I didn't bother looking for the details).

So, at its simplest, there would be two sets of character types, with
char, varchar, etc, always the same on every system (just like
Postgres w/o multibyte or locale), and nchar, nvarchar, etc configured
as your locale/multibyte environment would want.

However, there are many more features in SQL92 to deal with text
customization. I'll mention a few (well, most of them, but not in
detail):

o You can define a "character set" and, independently, a "collation".
The syntax for the type definition is CHARACTER [ VARYING ] [ (length) ]   [ CHARACTER SET your-character-set ]   [
COLLATEyour-collation-sequence ]
 

o You can specify a character type for string literals: _your-character-set 'literal string' e.g. _ESPANOL 'Que pasa?'
(forgivemy omission of a leading upside down question mark :)
 
We already have some support for this in that character string
literals can have a type specification (e.g. "DATETIME 'now'") and
presumably we can use the required underscore to convert the
"_ESPANOL" to a character set and collation, all within the existing
Postgres type system.

o You can specify collation behavior in a strange way: 'Yo dude!' COLLATE collation-method
(which we could convert in the parser to a function call).

o You can translate between character sets, *if* there is a reasonable
mapping available: TRANSLATE(string USING method)
and you can define translations in a vague way (since no one but
Postgres implemented a type system back then): CREATE TRANSLATION translation   FOR source-charset   TO target-charset
FROM { EXTERNAL('external-translation') | IDENTITY |
 
existing-translation } DROP TRANSLATION translation

o You can convert character strings which have the same character
"repertoire" from one to the other: CONVERT(string USING conversion-method)
(e.g. we could define a method "EBCDIC_TO_ASCII" once we have an
"EBCDIC" character set).

o You can specify identifiers (column names, etc) with a specific
character set/collation by: _charset colname (e.g. _FRANCAIS Francais where the second "c" is
allowed to be "c-cedilla", a character in the French/latin character
set; sorry I didn't type it).

> > Would these mechanisms work for people? Or are they so fundamentally
> > flawed or non-standard (it is from a standard, but I'm not sure who
> > implements it)?

Fully implementing these features (or a reasonable subset) would give
us more capabilities than we have now, and imho can be fit into our
existing type system. *Only* implementing NCHAR etc gives us the
ability to carry SQL_TEXT and multibyte/locale types in the same
database, which may not be a huge benefit to those who never want to
mix them in the same installation. I don't know who those folks might
be but Tatsuo and yourself probably do.

Comments?
                      - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Oleg Broytmann
Date:
Hi!

On Fri, 11 Jun 1999, Thomas Lockhart wrote:
> >    And what a pros and cons for NCHAR?
> I was hoping you would tell me! :)
  I can see only one advantage for NCHAR - those fields that aren't NCHAR
will not use strcoll() for comparison.  But I cannot remember one filed in my database that does not contain
russian characters. Even my WWW logs contain them.  So in any case I am forced to make all my fields NCHAR, and this
is
exactly what we have now - postgres compiled with --enable-locale makes all
char NCHAR.

>                        - Thomas
> -- 
> Thomas Lockhart                lockhart@alumni.caltech.edu
> South Pasadena, California

Oleg.
----    Oleg Broytmann     http://members.xoom.com/phd2/     phd2@earthling.net          Programmers don't die, they
justGOSUB without RETURN.
 



Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Thomas Lockhart
Date:
> I can see only one advantage for NCHAR - those fields that aren't NCHAR
> will not use strcoll() for comparison.
> But I cannot remember one filed in my database that does not contain
> russian characters. Even my WWW logs contain them.
> So in any case I am forced to make all my fields NCHAR, and this is
> exactly what we have now - postgres compiled with --enable-locale makes 
> all char NCHAR.

Yes, and that is how we got the implementation we have. Implementing
NCHAR is a step on the road toward having fully flexible character set
capabilities in a single database. By itself, NCHAR probably does not
offer tremendous advantages for anyone running a fully "localized"
database.

So some the questions really might be:
1) Is implementing NCHAR, and reverting CHAR back to the SQL-standard
ascii-ish behavior, acceptable, or does it introduce fatal flaws for
implementers? e.g. do any third party tools know about NCHAR? I would
assume that the odbc interface could just map NCHAR to CHAR if odbc
knows nothing about NCHAR...

2) Solving various problems for specific datasets will require new
specialized support routines. If this is true, then isn't the Postgres
type system the way to introduce these specialized capabilities?
Doesn't Unicode, for example, work well as a new data type, as opposed
to shoehorning it into all areas of the backend with #ifdefs?

3) Do the SQL92-defined features help us solve the problem, or do they
just get in the way? istm that they address some of the features we
would need, and have sufficient fuzz around the edges to allow a
successful implementation.

An example of what we could do would be to have both Russian/Cyrillic
and Japanese regression tests in the main regression suite, since they
could coexist with the other tests.
                   - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Hannu Krosing
Date:
Oleg Broytmann wrote:
> 
> Hi!
> 
> On Fri, 11 Jun 1999, Thomas Lockhart wrote:
> > >    And what a pros and cons for NCHAR?
> > I was hoping you would tell me! :)
> 
>    I can see only one advantage for NCHAR - those fields that aren't NCHAR
> will not use strcoll() for comparison.
>    But I cannot remember one filed in my database that does not contain
> russian characters. Even my WWW logs contain them.

what about the tables beginning with pg_ ?

Are the system tables currently affected by --enable-locale ?

>    So in any case I am forced to make all my fields NCHAR, and this is
> exactly what we have now - postgres compiled with --enable-locale makes all
> char NCHAR.

Well, the problem is that while I do occasionally need cyrillic chars, 
I also need English, Estonian, Finnish/Swedish, Latvian and Lithuanian.

The only two of them that don't have overlapping character codes are
Russian (all chars >127) and English (all < 128)

My current solution is to run without --enable-locale and do all the
sorting
in the client. But it would be often useful to have language specific
columns.

--------------------
Hannu


Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Thomas Lockhart
Date:
> Well, the problem is that while I do occasionally need cyrillic chars,
> I also need English, Estonian, Finnish/Swedish, Latvian and Lithuanian.
> The only two of them that don't have overlapping character codes are
> Russian (all chars >127) and English (all < 128)
> My current solution is to run without --enable-locale and do all the
> sorting
> in the client. But it would be often useful to have language specific
> columns.

Great! You're our representative for multi-charset usage :)

Perhaps you can think about and comment on the SQL92 features which
support this, and whether that kind of capability would suit your
needs. Of course, we would need to start getting specific about how to
map those generic features into a Postgres implementation...

btw, istm that we could make things somewhat backward compatible by
allowing the backend to be built with CHAR and NCHAR always mapped to
NCHAR.
                   - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Tatsuo Ishii
Date:
> OK, SQL92 defines two kinds of native character sets: those we already
> have (char, varchar) and those which can be locale customized (nchar,
> national character varying, and others). char and varchar always
> default to the "SQL" behavior (which I think corresponds to ascii
> (called "SQL_TEXT") but I didn't bother looking for the details).

This seems to be a little bit different from the standard. First,
SQL_TEXT is not equal to ascii. It's a subset of ascii. Second, the
default charset for char and varchar might be implemenation dependent,
not neccesarily limited to SQL_TEXT. The only requirement is the
charset must contain the repertoire SQL_TEXT has. I think any charsets
including ascii I've ever seen satisfies the requirement. Third, the
standards says nothing about locale.
---
Tatsuo Ishii


Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Tatsuo Ishii
Date:
> Well, the problem is that while I do occasionally need cyrillic chars, 
> I also need English, Estonian, Finnish/Swedish, Latvian and Lithuanian.

Probably this is the first example ever appeared on this list for the
demand of i18n database, that should be the future direction of
PostgreSQL, in my opinion.

Currently MB has two encodings for this kind of purpose: Unicode and
mule-internal-code. Both of them allows mixed languages even in the
same column. This might give you a partial solution.

> The only two of them that don't have overlapping character codes are
> Russian (all chars >127) and English (all < 128)
> 
> My current solution is to run without --enable-locale and do all the
> sorting
> in the client. But it would be often useful to have language specific
> columns.

This is another reason why we cannot rely on the locale mechanism
supported by os. Basically locale is a global data for the entire
process. If each column has different language, we have to switch
locales. That would be inefficient and painful.
---
Tatsuo Ishii


Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Thomas Lockhart
Date:
> This seems to be a little bit different from the standard. First,
> SQL_TEXT is not equal to ascii. It's a subset of ascii.

Yes, sorry. I was lazy in my posting.

> Second, the
> default charset for char and varchar might be implemenation dependent,
> not neccesarily limited to SQL_TEXT. The only requirement is the
> charset must contain the repertoire SQL_TEXT has. I think any charsets
> including ascii I've ever seen satisfies the requirement.

Yow! I certainly misremembered the definition. Date and Darwen, 1997,
point out that the SQL implementation *must* support at least one
character set, SQL_TEXT, whose repertoire must contain:

1) Every character that is used in the SQL language itself (this is
the part I remembered), and

2) Every character that is included in *any other character set*
supported by the SQL implementation (Postgres).

This second requirement is presumably to enable text handling of
multiple character sets, but would seem to put severe restrictions on
how we would implement things. Or can it act only as a placeholder,
allowing us to define new character sets as different types in
Postgres? Otherwise, we would have to retrofit capabilities into
SQL_TEXT anytime we defined a new character set??

> Third, the
> standards says nothing about locale.

You are referring to the Unix-style system support for "locale"?
Certainly the NCHAR and character set support in SQL92 would qualify
as locale support in the generic sense...

Regards.
                      - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Tatsuo Ishii
Date:
> > Second, the
> > default charset for char and varchar might be implemenation dependent,
> > not neccesarily limited to SQL_TEXT. The only requirement is the
> > charset must contain the repertoire SQL_TEXT has. I think any charsets
> > including ascii I've ever seen satisfies the requirement.
> 
> Yow! I certainly misremembered the definition. Date and Darwen, 1997,
> point out that the SQL implementation *must* support at least one
> character set, SQL_TEXT, whose repertoire must contain:
> 
> 1) Every character that is used in the SQL language itself (this is
> the part I remembered), and
> 
> 2) Every character that is included in *any other character set*
> supported by the SQL implementation (Postgres).
> 
> This second requirement is presumably to enable text handling of
> multiple character sets, but would seem to put severe restrictions on
> how we would implement things. Or can it act only as a placeholder,
> allowing us to define new character sets as different types in
> Postgres? Otherwise, we would have to retrofit capabilities into
> SQL_TEXT anytime we defined a new character set??

I don't think so. 2) can be read as:

Any other character set must contain every character included in
SQL_TEXT.

This seems extremely easy to implement. We could define SQL_TEXT be a
subset of ASCII and almost any character set contains ASCII chars. As
a result, any character set satisfies above that is logically same as
2). No?

> > Third, the
> > standards says nothing about locale.
> 
> You are referring to the Unix-style system support for "locale"?

Yes.

> Certainly the NCHAR and character set support in SQL92 would qualify
> as locale support in the generic sense...
---
Tatsuo Ishii


Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Thomas Lockhart
Date:
> > SQL_TEXT ... repertoire must contain:
> > 1) Every character that is used in the SQL language itself (this is
> > the part I remembered), and
> > 2) Every character that is included in *any other character set*
> > supported by the SQL implementation (Postgres).
> > This second requirement is presumably to enable text handling of
> > multiple character sets, but would seem to put severe restrictions on
> > how we would implement things. Or can it act only as a placeholder,
> > allowing us to define new character sets as different types in
> > Postgres? Otherwise, we would have to retrofit capabilities into
> > SQL_TEXT anytime we defined a new character set??
> I don't think so. 2) can be read as:
> Any other character set must contain every character included in
> SQL_TEXT.

Here is the text from the July, 1992 SQL92 draft standard:
        The <implementation-defined character repertoire name>
SQL_TEXT        specifies the name of a character repertoire and implied
form-of-        use that can represent every character that is in <SQL
language        character> and all other characters that are in character
sets        supported by the implementation.

and later in the same doc:
        11)The character set named SQL_TEXT is an
implementation-defined           character set whose character repertoire is SQL_TEXT.

I'm reading this to say that SQL_TEXT must contain the union of all
characters in the character sets in the implementation, rather than an
intersection between that union and the characters required by the SQL
language itself.

But I'm not really sure what they mean by this, or whether it is a
problem or not. Clearly different character sets and collations can be
mixed only when that can preserve meaning, so saying that SQL_TEXT has
a repertoire which contains ASCII characters and Japanese characters
doesn't seem to help much.

So istm that "SQL_TEXT" might be just a container class for all
characters in the installation, which still doesn't make complete
sense to me wrt a Postgres implementation.
                     - Tom

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California