Thread: US-ASCII on Mac OS X

US-ASCII on Mac OS X

From

Perry Smith

Date:

20 July 2004, 12:49:16

Hi,

I'm trying to make gnucash work with postgresql on Mac OS X.  It is
suppose to work but I'm finding that it has problems at least on the
Mac.

My first problem is the gnucash code calls nl_langinfo(CODESET) to get
the name of the code page.  On Mac OS X, this returns "US-ASCII".  The
gnucash also has this value hard coded if various compile flags are
set.  gnucash then passes this value as the encoding to use when it
creates a database.  But my version of Postgresql (7.4.3) does not
know about US-ASCII.

I did a tiny bit of research and "US-ASCII" is mentioned as an alias
in RFC 1345 but I can't figure out what it is an alias for.

I'm wondering how to solve this problem.  Should a "usascii" alias be
added to encnames.c?  Should gnucash change "US-ASCII" into something
else?  Should I try and get Apple to change their code since it is not
really a code page?  Should I just remove the "ENCODING ..." phrase
from the command that gnucash creates to create a database under the
logic that postgres will using the encoding specified by the LANG
variable by default anyway (which I'm just guessing is what it does)?

Any help or suggestions are welcome.

Thanks,
Perry

Re: US-ASCII on Mac OS X

From

Peter Eisentraut

Date:

20 July 2004, 13:34:10

Perry Smith wrote:
> I'm wondering how to solve this problem.  Should a "usascii" alias be
> added to encnames.c?

Probably not, considering that PostgreSQL does not really support
US-ASCII as such.  (Supersets of US-ASCII are supported, but when you
select a charset, you don't want a superset of that charset.)

> Should gnucash change "US-ASCII" into something else?

Yes.

> Should I try and get Apple to change their code since it is
> not really a code page?

It is a code page (or an encoding, or a charset, or something or that
sort).  Several operating systems seem to agree.

> Should I just remove the "ENCODING ..."
> phrase from the command that gnucash creates to create a database

I would probably create the database without an encoding specification
and thus use the default encoding, since that will cooperate best with
the character processing functions and the user's expectations in
general.  Gnucash should, like any client, set the client encoding to
its actual encoding used on the frontend, and then the encoding used on
the server side does not need to be of concern.

> under the logic that postgres will using the encoding specified by
> the LANG variable by default anyway (which I'm just guessing is what
> it does)?

No, scrap that logic.  You need to set the client encoding yourself.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: US-ASCII on Mac OS X

From

Perry Smith

Date:

20 July 2004, 17:37:20

I think I understand but wanted to ask a few more questions.

Is any code page like iso-8859-1 or IBM850 different in the range that
US-ASCII covers (from 0x00 to 0x7f)?  If that is the case, then gnucash
could change US-ASCII to practically anything.

The other question is about the cilent/server relationship.  I'm
assuming that gnucash is the client in this case.  It does set the
locale (although it seems to do it incorrectly because eventually the
locale gets blown away -- but thats another problem).  But what about
sorting.  Is that done in the server or the client?  If gnucash does a
select with a order by phrase, the ordering is one in the server,
right?

If so, that would mean that the server needs to be told the proper
encoding since that will effect the sort order.

Is that correct?

Thanks
Perry

On Jul 20, 2004, at 11:34 AM, Peter Eisentraut wrote:

> Perry Smith wrote:
>> I'm wondering how to solve this problem.  Should a "usascii" alias be
>> added to encnames.c?
>
> Probably not, considering that PostgreSQL does not really support
> US-ASCII as such.  (Supersets of US-ASCII are supported, but when you
> select a charset, you don't want a superset of that charset.)
>
>> Should gnucash change "US-ASCII" into something else?
>
> Yes.
>
>> Should I try and get Apple to change their code since it is
>> not really a code page?
>
> It is a code page (or an encoding, or a charset, or something or that
> sort).  Several operating systems seem to agree.
>
>> Should I just remove the "ENCODING ..."
>> phrase from the command that gnucash creates to create a database
>
> I would probably create the database without an encoding specification
> and thus use the default encoding, since that will cooperate best with
> the character processing functions and the user's expectations in
> general.  Gnucash should, like any client, set the client encoding to
> its actual encoding used on the frontend, and then the encoding used on
> the server side does not need to be of concern.
>
>> under the logic that postgres will using the encoding specified by
>> the LANG variable by default anyway (which I'm just guessing is what
>> it does)?
>
> No, scrap that logic.  You need to set the client encoding yourself.
>
> --
> Peter Eisentraut
> http://developer.postgresql.org/~petere/
>
>

Re: US-ASCII on Mac OS X

From

Peter Eisentraut

Date:

20 July 2004, 18:40:11

Perry Smith wrote:
> Is any code page like iso-8859-1 or IBM850 different in the range
> that US-ASCII covers (from 0x00 to 0x7f)?  If that is the case, then
> gnucash could change US-ASCII to practically anything.

Considering the character repertoire (i.e., that abstract set of
characters provided), then US ASCII is a subset of most character
repertoires.  But considering the encoding (i.e., the binary
representation of the characters), then it is not a subset of most
encodings (e.g., not of UTF-8), but of some, such as the ISO 8859
series.  So if your client application (e.g., gnucash) is sending its
data in US-ASCII, you can declare, say, ISO-8859-1 as the PostgreSQL
client encoding (assuming that US-ASCII is encoded in 8 bits, but we'll
take that as a given).

> The other question is about the cilent/server relationship.  I'm
> assuming that gnucash is the client in this case.

Yes.

> It does set the
> locale (although it seems to do it incorrectly because eventually the
> locale gets blown away -- but thats another problem).  But what about
> sorting.  Is that done in the server or the client?

server

> If gnucash does
> a select with a order by phrase, the ordering is one in the server,
> right?

server

> If so, that would mean that the server needs to be told the proper
> encoding since that will effect the sort order.

No, the locale affects the sort order.  The locale is set when the
database cluster is initialized by initdb and cannot be set by client
applications no matter how hard you try.  The encoding merely has to
try to be compatible with that locale.  (This is a mess, but it's a
result of somewhat incomplete OS functionality being replicated by
PostgreSQL.)  So overriding the default server encoding is only going
to lead you to trouble because you cannot guarantee compatibility with
the locale.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: US-ASCII on Mac OS X

From

Markus Bertheau

Date:

21 July 2004, 03:47:58

В Втр, 20.07.2004, в 23:39, Peter Eisentraut пишет:
> Perry Smith wrote:
> > Is any code page like iso-8859-1 or IBM850 different in the range
> > that US-ASCII covers (from 0x00 to 0x7f)?  If that is the case, then
> > gnucash could change US-ASCII to practically anything.
>
> Considering the character repertoire (i.e., that abstract set of
> characters provided), then US ASCII is a subset of most character
> repertoires.  But considering the encoding (i.e., the binary
> representation of the characters), then it is not a subset of most
> encodings (e.g., not of UTF-8)

Where's the incompatibility here? I always thought UTF-8 was binary
compatible with ASCII for characters in ASCII.

--
Markus Bertheau <twanger@bluetwanger.de>

Re: US-ASCII on Mac OS X

From

Peter Eisentraut

Date:

21 July 2004, 04:27:08

Markus Bertheau wrote:
> Where's the incompatibility here? I always thought UTF-8 was binary
> compatible with ASCII for characters in ASCII.

It is for characters in ASCII, but not for the whole 8-bit range.  It
depends on the particular circumstances whether it's appropriate to
use.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/