Re: Postgres on IBM z/OS 2.2.0 and 2.3.0 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Postgres on IBM z/OS 2.2.0 and 2.3.0
Date
Msg-id 3148.1574121538@sss.pgh.pa.us
Whole thread Raw
In response to Re: Postgres on IBM z/OS 2.2.0 and 2.3.0  (parveen mehta <sim_mehta@yahoo.com>)
List pgsql-hackers
parveen mehta <sim_mehta@yahoo.com> writes:
> Thanks for providing valuable inputs into my concerns. In the last part you mentioned and i am quoting it here "would
limitthe parts of the system 
> that would have to be cleansed of ASCII-isms to libpq and src/bin/.
> But that's already a nontrivial headache I suspect." I am not clear on the ASCII-isms to libpq and src/bin/. Can you
sharesome knowledge on those items. Are those standard directory locations ? Sorry if i am being ignorant. 

I'm just speaking of those subtrees of PG's source code:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=tree

Some of the concrete problems are likely to be the same ones mentioned
on wikipedia's EBCDIC page:

https://en.wikipedia.org/wiki/EBCDIC

notably that in ASCII the upper-case English letters have consecutive
codes, as do the lower-case ones, but that's not the case in EBCDIC.
This'd break pg_toupper and pg_tolower, and perhaps other places.
(Or perhaps that's the only place, but there's a lot of code to be
audited to find out.)

The lack of well-defined equivalents for a lot of common ASCII
punctuation is likely to be a problem as well.  psql's internal
parsing of SQL commands, for example, is entirely unprepared for
the idea that characters it needs to recognize might be encoding
dependent.  But unless you want to restrict yourself to just one
EBCDIC code page, something would have to be done about that.

Another issue, which is something that might be unique to
Postgres, is that we expect that bytes with the high bit set
are elements of multibyte characters, while bytes without are
plain ASCII.  I think you could set things up so that mblen()
returns 1 despite the high bit being set, but at the very least
this would result in an efficiency hit due to taking the slow
"multibyte" code paths even for plain English letters.  Perhaps
it wouldn't matter too much on the client side; hard to tell
without investing a lot of work to try it.

The server-side code has a lot more of these assumptions buried
in it than the client side, which is why I doubt it's feasible
to get the server to run in EBCDIC.  But maybe you could create
encoding translation functions and treat EBCDIC code pages as
client-only encodings, which is a concept we already have.

On the whole though, it seems like a lot of work for a dubious
goal, which is probably why nobody's tackled it.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Maciek Sakrejda
Date:
Subject: Re: Duplicate Workers entries in some EXPLAIN plans
Next
From: Craig Ringer
Date:
Subject: Re: physical slot xmin dependency on logical slot?