Thread: Re: Proposal for fixing numeric type-resolution issues
> Here is a proposal for fixing these problems. Sounds good. We would be looking up this info in a table, right? So we can integrate this type hierarchy fully into our type extensibility system. Another 7.1 project is to work on alternate languages and character sets, to decouple multibyte and locale from the default SQL_TEXT character set. This will probably bring up issues similar to the numeric problems, and since these character sets will be added as user-defined types it will be important for the backend to understand how to convert them for comparison operations, for example. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas Lockhart writes: > Another 7.1 project is to work on alternate languages and character > sets, to decouple multibyte and locale from the default SQL_TEXT > character set. This will probably bring up issues similar to the > numeric problems, and since these character sets will be added as > user-defined types it will be important for the backend to understand > how to convert them for comparison operations, for example. Really? I always thought the character set would be some separate entity and perhaps an oid reference would be stored with every character string and attribute. That would get you around any type conversion as long as the functions acting on character types take this "header" field into account. If you want to go the data type way then you'd need to have some sort of most general character set to cast to. That could be Unicode but that would require that every user-defined character set be a subset of Unicode, which is perhaps not a good assumption to make. Also, I wonder how collations would fit in there. Collations definitely can't be ordered at all, so casting can't be done in a controlled fashion. Just wondering... -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
> Thomas Lockhart writes: > > > Another 7.1 project is to work on alternate languages and character > > sets, to decouple multibyte and locale from the default SQL_TEXT > > character set. This will probably bring up issues similar to the > > numeric problems, and since these character sets will be added as > > user-defined types it will be important for the backend to understand > > how to convert them for comparison operations, for example. > > Really? I always thought the character set would be some separate entity > and perhaps an oid reference would be stored with every character string > and attribute. That would get you around any type conversion as long as > the functions acting on character types take this "header" field into > account. I think that way too. If what Thomas is suggesting is that to make a user-defined charaset, one need to make everything such as operators, charset, functions to work with index etc. (like defining new a data type), that would be too painfull. > If you want to go the data type way then you'd need to have some sort of > most general character set to cast to. That could be Unicode but that > would require that every user-defined character set be a subset of > Unicode, which is perhaps not a good assumption to make. Right. But the problem is SQL92 actually requires such a charset called "SQL_TEXT." For me, the only candidate for SQL_TEX at this point seems to be "mule internal code." Basically it is a variant of ISO-2022 and has a capability to adapt to most of charsets defined in ISO-2022. I think we could expand it so that it could become a superset even for Unicode. Of course the problem is mule internal code is a "internal code" and is not widely spread in the world. Even that's true we could use it for purely internal purpose (for the parse tree etc.). > Also, I wonder > how collations would fit in there. Collations definitely can't be ordered > at all, so casting can't be done in a controlled fashion. Hmm... Collations seem to be a different issue. I think there's no such an idea like "collation casting" in SQL92. -- Tatsuo Ishii
All good ideas and thoughts. I have been thinking that essentially separate types per character set is the right thing, but we'll have plenty of time to talk about it. One point is that SQL92 assigns a specific character set and collation sequence to every character string and every column definition; if we embedded this "type" identification into every string then we would be replicating the existing Postgres type system one layer down (at least for argument's sake ;) There also need to be well defined conversions between character sets/collations, and some or most combinations will be illegal (e.g. how do you collate American English against Japanese?). The Postgres type system can enforce this simply by not providing conversion or comparison functions for the relevant mixture of types. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California