Re: Bug or not about ASCII and Multi-Byte character set - Mailing list pgsql-odbc
From | Marc Herbert |
---|---|
Subject | Re: Bug or not about ASCII and Multi-Byte character set |
Date | |
Msg-id | 20050819180503.GK16062@emicnetworks.com Whole thread Raw |
In response to | Re: Bug or not about ASCII and Multi-Byte character set (Andreas Pflug <pgadmin@pse-consulting.de>) |
List | pgsql-odbc |
On Fri, Aug 19, 2005 at 04:11:48PM +0200, Andreas Pflug wrote: > Marc Herbert wrote: > > >If SQL_ASCII is/was equivalent to "ignoring encoding", then it > >looks/looked pretty misnamed! > > > It's not. It should be used for ASCII only, but the database system will > not barf if you offer it a byte with the upper bit set. You're simply on > your own. Well this still looks like what I called a "BINARY/don't touch it" accidental mode. > >Encoding ignorance should rather be called SQL_BINARY. A BINARY setting > >for strings makes sense, just like when transfering text files using > >FTP: you just don't trust FTP for encodings and use it like a > >filesystem. BINARY just means that: "don't mess-up with encodings and > >let something else deal with the issue". > > > No, binary would include 0x00 This seems irrelevant to me, see below. > which is definitely *not* a character but the string terminator. Not everyone in the world uses 0x00 as a string terminator. C does, Postgres also, but Java does not and I don't think databases standards and even less encoding standards say anything about this (please prove me wrong, I'd really like to have a definitive answer on this). It just tried to insert a string into hsqldb using JDBC and it worked perfectly fine. Postgres JDBC driver is also "strings with null-character"-ready, so this seems to be only a limitation of Postgres. By the way many ODBC function calls ask for the length of string arguments, _optionally_ being SQL_NTS (Null-Terminated String). So it seems some people here catered for strings with null characters even in C! In any case whether 0x00 is The String Terminator or not is not relevant to the fact that there was a accidental "BINARY" string encoding before. If we learn that 0x00 is really The Database String Terminator, then it can also be interpreted as a terminator even in "encoding ignorance" mode since it translates into 0x00 for every known encoding. > >I guess some people knew what they did and simply did not mixed > >driver/apps, or in a way they mastered and that worked. > The latter, with the obvious chance to break if the next app accesses > the data. This is certainly not the design goal of a RDBMS. There was a time, not so long ago, where every encoding-related stuff was under-specified, every software buggy etc., so people had to cope with it. They were probably pleased at that time to have this accidental "BINARY" workaround available. One can easily understand that they complain a little bit about the sudden removal of this workaround and the unplanned migration to The Right Solution. Of course on the other hand everyone can understand that the Postgres developers want to get rid of this accidental BINARY string mode, and that they are free to do what they want. > >Well while reading at the complaints it seems this BINARY mode was > >there before (by "accident"?), > No. Well, I am still waiting for some proof of the opposite (since this 0x00 stuff does not seem really related to it). I was just reformulating Tom Lanes "SQL_ASCII ignorance" quote above, which looked quite informed. > >PS: BTW "unicode" is not one encoding but many different ones. > Doesn't matter. Always means the current Unicode for the system: in the > backend UTF-8, on Win32 UCS16, Linux UCS32 or UTF-8 dependent on > interface definition. Interesting. I hope this "current unicode for the system" concept is well documented, because just saying "unicode" is not clear at all, even if not ambiguous. Regards, Marc.
pgsql-odbc by date: