Thread: US-ASCII on Mac OS X
Hi, I'm trying to make gnucash work with postgresql on Mac OS X. It is suppose to work but I'm finding that it has problems at least on the Mac. My first problem is the gnucash code calls nl_langinfo(CODESET) to get the name of the code page. On Mac OS X, this returns "US-ASCII". The gnucash also has this value hard coded if various compile flags are set. gnucash then passes this value as the encoding to use when it creates a database. But my version of Postgresql (7.4.3) does not know about US-ASCII. I did a tiny bit of research and "US-ASCII" is mentioned as an alias in RFC 1345 but I can't figure out what it is an alias for. I'm wondering how to solve this problem. Should a "usascii" alias be added to encnames.c? Should gnucash change "US-ASCII" into something else? Should I try and get Apple to change their code since it is not really a code page? Should I just remove the "ENCODING ..." phrase from the command that gnucash creates to create a database under the logic that postgres will using the encoding specified by the LANG variable by default anyway (which I'm just guessing is what it does)? Any help or suggestions are welcome. Thanks, Perry
Perry Smith wrote: > I'm wondering how to solve this problem. Should a "usascii" alias be > added to encnames.c? Probably not, considering that PostgreSQL does not really support US-ASCII as such. (Supersets of US-ASCII are supported, but when you select a charset, you don't want a superset of that charset.) > Should gnucash change "US-ASCII" into something else? Yes. > Should I try and get Apple to change their code since it is > not really a code page? It is a code page (or an encoding, or a charset, or something or that sort). Several operating systems seem to agree. > Should I just remove the "ENCODING ..." > phrase from the command that gnucash creates to create a database I would probably create the database without an encoding specification and thus use the default encoding, since that will cooperate best with the character processing functions and the user's expectations in general. Gnucash should, like any client, set the client encoding to its actual encoding used on the frontend, and then the encoding used on the server side does not need to be of concern. > under the logic that postgres will using the encoding specified by > the LANG variable by default anyway (which I'm just guessing is what > it does)? No, scrap that logic. You need to set the client encoding yourself. -- Peter Eisentraut http://developer.postgresql.org/~petere/
I think I understand but wanted to ask a few more questions. Is any code page like iso-8859-1 or IBM850 different in the range that US-ASCII covers (from 0x00 to 0x7f)? If that is the case, then gnucash could change US-ASCII to practically anything. The other question is about the cilent/server relationship. I'm assuming that gnucash is the client in this case. It does set the locale (although it seems to do it incorrectly because eventually the locale gets blown away -- but thats another problem). But what about sorting. Is that done in the server or the client? If gnucash does a select with a order by phrase, the ordering is one in the server, right? If so, that would mean that the server needs to be told the proper encoding since that will effect the sort order. Is that correct? Thanks Perry On Jul 20, 2004, at 11:34 AM, Peter Eisentraut wrote: > Perry Smith wrote: >> I'm wondering how to solve this problem. Should a "usascii" alias be >> added to encnames.c? > > Probably not, considering that PostgreSQL does not really support > US-ASCII as such. (Supersets of US-ASCII are supported, but when you > select a charset, you don't want a superset of that charset.) > >> Should gnucash change "US-ASCII" into something else? > > Yes. > >> Should I try and get Apple to change their code since it is >> not really a code page? > > It is a code page (or an encoding, or a charset, or something or that > sort). Several operating systems seem to agree. > >> Should I just remove the "ENCODING ..." >> phrase from the command that gnucash creates to create a database > > I would probably create the database without an encoding specification > and thus use the default encoding, since that will cooperate best with > the character processing functions and the user's expectations in > general. Gnucash should, like any client, set the client encoding to > its actual encoding used on the frontend, and then the encoding used on > the server side does not need to be of concern. > >> under the logic that postgres will using the encoding specified by >> the LANG variable by default anyway (which I'm just guessing is what >> it does)? > > No, scrap that logic. You need to set the client encoding yourself. > > -- > Peter Eisentraut > http://developer.postgresql.org/~petere/ > >
Perry Smith wrote: > Is any code page like iso-8859-1 or IBM850 different in the range > that US-ASCII covers (from 0x00 to 0x7f)? If that is the case, then > gnucash could change US-ASCII to practically anything. Considering the character repertoire (i.e., that abstract set of characters provided), then US ASCII is a subset of most character repertoires. But considering the encoding (i.e., the binary representation of the characters), then it is not a subset of most encodings (e.g., not of UTF-8), but of some, such as the ISO 8859 series. So if your client application (e.g., gnucash) is sending its data in US-ASCII, you can declare, say, ISO-8859-1 as the PostgreSQL client encoding (assuming that US-ASCII is encoded in 8 bits, but we'll take that as a given). > The other question is about the cilent/server relationship. I'm > assuming that gnucash is the client in this case. Yes. > It does set the > locale (although it seems to do it incorrectly because eventually the > locale gets blown away -- but thats another problem). But what about > sorting. Is that done in the server or the client? server > If gnucash does > a select with a order by phrase, the ordering is one in the server, > right? server > If so, that would mean that the server needs to be told the proper > encoding since that will effect the sort order. No, the locale affects the sort order. The locale is set when the database cluster is initialized by initdb and cannot be set by client applications no matter how hard you try. The encoding merely has to try to be compatible with that locale. (This is a mess, but it's a result of somewhat incomplete OS functionality being replicated by PostgreSQL.) So overriding the default server encoding is only going to lead you to trouble because you cannot guarantee compatibility with the locale. -- Peter Eisentraut http://developer.postgresql.org/~petere/
В Втр, 20.07.2004, в 23:39, Peter Eisentraut пишет: > Perry Smith wrote: > > Is any code page like iso-8859-1 or IBM850 different in the range > > that US-ASCII covers (from 0x00 to 0x7f)? If that is the case, then > > gnucash could change US-ASCII to practically anything. > > Considering the character repertoire (i.e., that abstract set of > characters provided), then US ASCII is a subset of most character > repertoires. But considering the encoding (i.e., the binary > representation of the characters), then it is not a subset of most > encodings (e.g., not of UTF-8) Where's the incompatibility here? I always thought UTF-8 was binary compatible with ASCII for characters in ASCII. -- Markus Bertheau <twanger@bluetwanger.de>
Markus Bertheau wrote: > Where's the incompatibility here? I always thought UTF-8 was binary > compatible with ASCII for characters in ASCII. It is for characters in ASCII, but not for the whole 8-bit range. It depends on the particular circumstances whether it's appropriate to use. -- Peter Eisentraut http://developer.postgresql.org/~petere/