Thread: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From

"Tsunakawa, Takayuki"

Date:

11 December 2017, 10:09:41

Hello,

It is embarrassing to ask such a thing, but is there any way to sort character column values in EBCDIC order inside the
PostgreSQLserver?  i.e. is it possible to use EBCDIC collation in PostgreSQL, say, by using ICU or something?  We need
torun on a certain mainframe.
 

I see almost no hope from the following, but let me make sure.

https://www.postgresql.org/message-id/flat/28548.1424884373%40sss.pgh.pa.us#28548.1424884373@sss.pgh.pa.us

"Hmm ... EBCDIC ... is that mandatory?  Because there are a pretty large
number of ASCII dependencies in PG, mostly arising from our support of
multibyte character sets, which are all expected to be ASCII supersets.
If there's not a way to run the server process in ASCII-land, you're
likely dead in the water.  I would not want to try to ferret out all
the dependencies."


Regards
Takayuki Tsunakawa

Re: Is it possible to sort strings in EBCDIC order in PostgreSQLserver?

From

Laurenz Albe

Date:

11 December 2017, 11:52:21

Tsunakawa, Takayuki wrote:
> It is embarrassing to ask such a thing, but is there any way to sort character column values in EBCDIC order inside
thePostgreSQL server?

> i.e. is it possible to use EBCDIC collation in PostgreSQL, say, by using ICU or something?  We need to run on a
certainmainframe.

> 
> I see almost no hope from the following, but let me make sure.

That should not be too difficult.  PostgreSQL is extensible!

Just define operators that implement <, <=, =, >= and > for EBCDIC,
create an operator class for these (for indexing) and use
ORDER BY ... USING.

Yours,
Laurenz Albe

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From

Tom Lane

Date:

11 December 2017, 17:46:19

"Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes:
> It is embarrassing to ask such a thing, but is there any way to sort character column values in EBCDIC order inside
thePostgreSQL server?  i.e. is it possible to use EBCDIC collation in PostgreSQL, say, by using ICU or something?  We
needto run on a certain mainframe. 

> I see almost no hope from the following, but let me make sure.
> https://www.postgresql.org/message-id/flat/28548.1424884373%40sss.pgh.pa.us#28548.1424884373@sss.pgh.pa.us

> "Hmm ... EBCDIC ... is that mandatory?  Because there are a pretty large
> number of ASCII dependencies in PG, mostly arising from our support of
> multibyte character sets, which are all expected to be ASCII supersets.
> If there's not a way to run the server process in ASCII-land, you're
> likely dead in the water.  I would not want to try to ferret out all
> the dependencies."

As I recall, that question was about converting the server to use
EBCDIC strings everywhere.  That seems impractical for the reasons
I mentioned.  But you could probably sort according to EBCDIC order
if you were willing to accept some inefficiency: internal to the
comparison function, convert each string to EBCDIC on each call.

Another way you could imagine is to store EBCDIC strings embedded
in bytea values.

            regards, tom lane

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From

Peter Geoghegan

Date:

11 December 2017, 22:00:36

On Sun, Dec 10, 2017 at 11:09 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> It is embarrassing to ask such a thing, but is there any way to sort character column values in EBCDIC order inside
thePostgreSQL server?  i.e. is it possible to use EBCDIC collation in PostgreSQL, say, by using ICU or something?  We
needto run on a certain mainframe. 

There is no need for embarrassment. I think that you mean that you'd
like to get behavior equivalent to the "C" locale with an EBCDIC
encoding. Right? I think that you could use ICU to do this with some
work. ICU collations are not tied to a particular encoding, unlike the
libc collations (they support most but not all collations).

That said, the idea of an "EBCDIC collation" seems limiting. Why
should a system like DB2 for the mainframe (that happens to use EBCDIC
as its encoding) not have a more natural, human-orientated collation
even while using EBCDIC? ISTM that the point of using the "C" locale
(with EBDIC or with UTF-8 or with any other encoding) is to get a
performance benefit where the actual collation's behavior doesn't
matter much to users. Are you sure it's really important to be
*exactly* compatible with EBCDIC order? As long as you're paying for a
custom collation, why not just use a collation that is helpful to
humans?

--
Peter Geoghegan

RE: Is it possible to sort strings in EBCDIC order in PostgreSQLserver?

From

"Tsunakawa, Takayuki"

Date:

12 December 2017, 11:17:31

Hi Laurenz, Tom, Peter,

Thanks for your suggestions.  The practical solution seems to be to override comparison operators of char, varchar and
textdata types with UDFs that behave as Tom mentioned.
 

From: Peter Geoghegan [mailto:pg@bowt.ie]
> That said, the idea of an "EBCDIC collation" seems limiting. Why
> should a system like DB2 for the mainframe (that happens to use EBCDIC
> as its encoding) not have a more natural, human-orientated collation
> even while using EBCDIC? ISTM that the point of using the "C" locale
> (with EBDIC or with UTF-8 or with any other encoding) is to get a
> performance benefit where the actual collation's behavior doesn't
> matter much to users. Are you sure it's really important to be
> *exactly* compatible with EBCDIC order? As long as you're paying for a
> custom collation, why not just use a collation that is helpful to
> humans?

You are right.  I'd like to ask the customer whether and why they need EBCDIC ordering.

Regards
Takayuki Tsunakawa

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From

John McKown

Date:

12 December 2017, 16:18:10

On Tue, Dec 12, 2017 at 2:17 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:

Hi Laurenz, Tom, Peter,

Thanks for your suggestions. The practical solution seems to be to override comparison operators of char, varchar and text data types with UDFs that behave as Tom mentioned.

From: Peter Geoghegan [mailto:pg@bowt.ie]
> That said, the idea of an "EBCDIC collation" seems limiting. Why
> should a system like DB2 for the mainframe (that happens to use EBCDIC
> as its encoding) not have a more natural, human-orientated collation
> even while using EBCDIC? ISTM that the point of using the "C" locale
> (with EBDIC or with UTF-8 or with any other encoding) is to get a
> performance benefit where the actual collation's behavior doesn't
> matter much to users. Are you sure it's really important to be
> *exactly* compatible with EBCDIC order? As long as you're paying for a
> custom collation, why not just use a collation that is helpful to
> humans?

You are right. I'd like to ask the customer whether and why they need EBCDIC ordering.

This is a guess on my part, based on many years on an EBCDIC system. But I'll bet that they are doing a conversion off of the EBCDIC system (maybe Db2 on z/OS) to an ASCII system (Linux or Windows) running PostgreSQL. They want to be able to compare the output from the existing system to the output on the new system. EBCDIC orders "lower case", "upper case", then "digits". The default C locale on Linux (I don't know Windows) will sort "digits", then alphabetic with the lower then upper case of each letter in order like: "aAbB...zZ". Comparing identical data which is not presented in exactly the same order would be very difficult.

Regards
Takayuki Tsunakawa

I have a theory that it's impossible to prove anything, but I can't prove it.

Maranatha! <><
John McKown

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From

James Keener

Date:

12 December 2017, 18:11:10

The default C locale on Linux (I don't know Windows) will sort "digits", then alphabetic with the lower then upper case of each letter in order like: "aAbB...zZ"

That's no true at all! The C locales are 0-9A-Za-z

#include <locale.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static int myCompare (const void * a, const void * b)
{
return strcmp (*(const char **) a, *(const char **) b);
}

void with_other_locale (char *new_locale,
const char **arr,
const int n
)
{
char *old_locale, *saved_locale;

/* Get the name of the current locale. */
old_locale = setlocale (LC_ALL, NULL);

/* Copy the name so it won’t be clobbered by setlocale. */
saved_locale = strdup (old_locale);

if (strlen(new_locale) == 0)
{
new_locale = saved_locale;
}

/* Now change the locale and do some stuff with it. */
setlocale (LC_ALL, new_locale);
qsort (arr, n, sizeof (const char *), myCompare);

printf("\nSorted array in locale %s is\n", new_locale);
for (int i = 0; i < n; i++)
printf("%d: %s \n", i, arr[i]);

/* Restore the original locale. */
setlocale (LC_ALL, saved_locale);
free (saved_locale);
}

int main ()
{
const char *arr[] = {"Jim", "job", "Anne", "aardvark", "Isaac", "island", "12 Days of Christmas", "12 drummers"};
int n = sizeof(arr)/sizeof(arr[0]);
int i;

printf("Given array is\n");
for (i = 0; i < n; i++)
printf("%d: %s \n", i, arr[i]);

with_other_locale("", arr, n);
with_other_locale("C", arr, n);
with_other_locale("en_US.UTF-8", arr, n);
with_other_locale("UTF-8", arr, n);
return 0;
}

Gives

Given array is
0: Jim
1: job
2: Anne
3: aardvark
4: Isaac
5: island
6: 12 Days of Christmas
7: 12 drummers

Sorted array in locale C is
0: 12 Days of Christmas
1: 12 drummers
2: Anne
3: Isaac
4: Jim
5: aardvark
6: island
7: job

Sorted array in locale C is
0: 12 Days of Christmas
1: 12 drummers
2: Anne
3: Isaac
4: Jim
5: aardvark
6: island
7: job

Sorted array in locale en_US.UTF-8 is
0: 12 Days of Christmas
1: 12 drummers
2: Anne
3: Isaac
4: Jim
5: aardvark
6: island
7: job

Sorted array in locale UTF-8 is
0: 12 Days of Christmas
1: 12 drummers
2: Anne
3: Isaac
4: Jim
5: aardvark
6: island
7: job

I actually don't think there is a case-insensitive locale (bydefault?) on a unix machine.

Jim

On Tue, Dec 12, 2017 at 8:18 AM, John McKown <john.archie.mckown@gmail.com> wrote:

On Tue, Dec 12, 2017 at 2:17 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
Hi Laurenz, Tom, Peter,

Thanks for your suggestions. The practical solution seems to be to override comparison operators of char, varchar and text data types with UDFs that behave as Tom mentioned.

From: Peter Geoghegan [mailto:pg@bowt.ie]
> That said, the idea of an "EBCDIC collation" seems limiting. Why
> should a system like DB2 for the mainframe (that happens to use EBCDIC
> as its encoding) not have a more natural, human-orientated collation
> even while using EBCDIC? ISTM that the point of using the "C" locale
> (with EBDIC or with UTF-8 or with any other encoding) is to get a
> performance benefit where the actual collation's behavior doesn't
> matter much to users. Are you sure it's really important to be
> *exactly* compatible with EBCDIC order? As long as you're paying for a
> custom collation, why not just use a collation that is helpful to
> humans?

You are right. I'd like to ask the customer whether and why they need EBCDIC ordering.

This is a guess on my part, based on many years on an EBCDIC system. But I'll bet that they are doing a conversion off of the EBCDIC system (maybe Db2 on z/OS) to an ASCII system (Linux or Windows) running PostgreSQL. They want to be able to compare the output from the existing system to the output on the new system. EBCDIC orders "lower case", "upper case", then "digits". The default C locale on Linux (I don't know Windows) will sort "digits", then alphabetic with the lower then upper case of each letter in order like: "aAbB...zZ". Comparing identical data which is not presented in exactly the same order would be very difficult.

Regards
Takayuki Tsunakawa

--
I have a theory that it's impossible to prove anything, but I can't prove it.

Maranatha! <><
John McKown

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From

John McKown

Date:

12 December 2017, 18:24:10

On Tue, Dec 12, 2017 at 9:11 AM, James Keener <jim@jimkeener.com> wrote:

The default C locale on Linux (I don't know Windows) will sort "digits", then alphabetic with the lower then upper case of each letter in order like: "aAbB...zZ"

That's no true at all! The C locales are 0-9A-Za-z

Thanks for the correction. Turns out that I forgot that my default locale on Linux was en_US.utf8.

I have a theory that it's impossible to prove anything, but I can't prove it.

Maranatha! <><
John McKown

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From

James Keener

Date:

12 December 2017, 18:36:42

en_US.utf8. is still 0-9A-Za-z and in my example set (as it's my default too :))

You'd need a case insensitive collation to do what you described, and I'm not sure those exist in postgres. (I guess you could always build your own if you _really_ wanted to.

Jim

On Tue, Dec 12, 2017 at 10:24 AM, John McKown <john.archie.mckown@gmail.com> wrote:

On Tue, Dec 12, 2017 at 9:11 AM, James Keener <jim@jimkeener.com> wrote:
The default C locale on Linux (I don't know Windows) will sort "digits", then alphabetic with the lower then upper case of each letter in order like: "aAbB...zZ"

That's no true at all! The C locales are 0-9A-Za-z

Thanks for the correction. Turns out that I forgot that my default locale on Linux was en_US.utf8.

--
I have a theory that it's impossible to prove anything, but I can't prove it.

Maranatha! <><
John McKown

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From

James Keener

Date:

12 December 2017, 18:43:02

Sorry for spamming the list. It appears that I'm an idiot. Sorry :(

jim=# select * from test order by a collate "C";
a
----------------------
12 Days of Christmas
12 drummers
Anne
Isaac
Jim
a
aardvark
b
island
job
(10 rows)

jim=# select * from test order by a collate "en_US.utf8";
a
----------------------
12 Days of Christmas
12 drummers
a
aardvark
Anne
b
Isaac
island
Jim
job
(10 rows)

On Tue, Dec 12, 2017 at 10:36 AM, James Keener <jim@jimkeener.com> wrote:

en_US.utf8. is still 0-9A-Za-z and in my example set (as it's my default too :))

You'd need a case insensitive collation to do what you described, and I'm not sure those exist in postgres. (I guess you could always build your own if you _really_ wanted to.

Jim

On Tue, Dec 12, 2017 at 10:24 AM, John McKown <john.archie.mckown@gmail.com> wrote:
On Tue, Dec 12, 2017 at 9:11 AM, James Keener <jim@jimkeener.com> wrote:
The default C locale on Linux (I don't know Windows) will sort "digits", then alphabetic with the lower then upper case of each letter in order like: "aAbB...zZ"

That's no true at all! The C locales are 0-9A-Za-z

Thanks for the correction. Turns out that I forgot that my default locale on Linux was en_US.utf8.

--
I have a theory that it's impossible to prove anything, but I can't prove it.

Maranatha! <><
John McKown

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From

John McKown

Date:

12 December 2017, 18:47:28

On Tue, Dec 12, 2017 at 9:43 AM, James Keener <jim@jimkeener.com> wrote:

Sorry for spamming the list. It appears that I'm an idiot. Sorry :(

I guess we're even now. We both made a similar mistake.

But, despite my error, I still think the OP's need for an EBCDIC order is to compare output from parallel runs of an application from both an EBCDIC host and and ASCII host, to make sure they are "the same".

I have a theory that it's impossible to prove anything, but I can't prove it.

Maranatha! <><
John McKown

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From

James Keener

Date:

12 December 2017, 19:06:29

If that's the case, I wonder if OP could write a function that would convert from the ASCII code-point ot the EBCDIC codepoint. For instance, (using the function at https://wiki.postgresql.org/wiki/Binary_Replace) convert A to char 193, `select binary_replace('Anne'::bytea, 'A'::bytea, '\xc1'::bytea);` and then sort by the result of this function.

Jim

On Tue, Dec 12, 2017 at 10:47 AM, John McKown <john.archie.mckown@gmail.com> wrote:

On Tue, Dec 12, 2017 at 9:43 AM, James Keener <jim@jimkeener.com> wrote:
Sorry for spamming the list. It appears that I'm an idiot. Sorry :(

I guess we're even now. We both made a similar mistake.

But, despite my error, I still think the OP's need for an EBCDIC order is to compare output from parallel runs of an application from both an EBCDIC host and and ASCII host, to make sure they are "the same".

--
I have a theory that it's impossible to prove anything, but I can't prove it.

Maranatha! <><
John McKown

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From

George Neuner

Date:

12 December 2017, 21:15:31

On Tue, 12 Dec 2017 07:18:10 -0600, John McKown
<john.archie.mckown@gmail.com> wrote:

>?This is a guess on my part, based on many years on an EBCDIC system. But
>I'll bet that they are doing a conversion off of the EBCDIC system (maybe
>Db2 on z/OS) to an ASCII system (Linux or Windows) running PostgreSQL. They
>want to be able to compare the output from the existing system to the
>output on the new system. 

[he asks naively:]

If a port off the EBCDIC system really is the case, wouldn't it be
easier just to write an external compare tool?   Or maybe a PG
function to convert UTF strings as EBCDIC so as to compare string
column dumps?

Either of the above seems more rational than putting a whole lot of
effort into creating an EBCDIC collation.   

[Not that it is terribly hard to create a new collation.  It's more a
matter of "why bother?" when you are trying to move away from that
system in the 1st place.]

YMMV,
George

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From

Peter Geoghegan

Date:

12 December 2017, 21:21:07

On Tue, Dec 12, 2017 at 5:18 AM, John McKown
<john.archie.mckown@gmail.com> wrote:
> On Tue, Dec 12, 2017 at 2:17 AM, Tsunakawa, Takayuki
> <tsunakawa.takay@jp.fujitsu.com> wrote:
>>
>> Hi Laurenz, Tom, Peter,
>>
>> Thanks for your suggestions.  The practical solution seems to be to
>> override comparison operators of char, varchar and text data types with UDFs
>> that behave as Tom mentioned.
>>
>> From: Peter Geoghegan [mailto:pg@bowt.ie]
>> > That said, the idea of an "EBCDIC collation" seems limiting. Why
>> > should a system like DB2 for the mainframe (that happens to use EBCDIC
>> > as its encoding) not have a more natural, human-orientated collation
>> > even while using EBCDIC? ISTM that the point of using the "C" locale
>> > (with EBDIC or with UTF-8 or with any other encoding) is to get a
>> > performance benefit where the actual collation's behavior doesn't
>> > matter much to users. Are you sure it's really important to be
>> > *exactly* compatible with EBCDIC order? As long as you're paying for a
>> > custom collation, why not just use a collation that is helpful to
>> > humans?
>>
>> You are right.  I'd like to ask the customer whether and why they need
>> EBCDIC ordering.
>
>
> This is a guess on my part, based on many years on an EBCDIC system. But
> I'll bet that they are doing a conversion off of the EBCDIC system (maybe
> Db2 on z/OS) to an ASCII system (Linux or Windows) running PostgreSQL. They
> want to be able to compare the output from the existing system to the output
> on the new system. EBCDIC orders "lower case", "upper case", then "digits".

ICU supports creating custom collations that reorder upper and lower
case, or digits with scripts (e.g. Latin alphabet characters). See the
documentation -- "23.2.2.3.2. ICU collations". Advanced customization
is possible.

-- 
Peter Geoghegan

Re: Is it possible to sort strings in EBCDIC order in PostgreSQLserver?

From

Jeremy Schneider

Date:

24 January 2018, 03:08:40

On 12/12/17 10:21, Peter Geoghegan wrote:
> ICU supports creating custom collations that reorder upper and lower
> case, or digits with scripts (e.g. Latin alphabet characters). See the
> documentation -- "23.2.2.3.2. ICU collations". Advanced customization
> is possible.

I just gave this a try and I'm not sure I completely understand what
it's doing but it definitely doesn't look like it's sorting according to
EBCDIC byte orders. (This was on centos 7 with libicu-50.1.2) I might
have created the collation incorrectly. I'd love to hear if anyone else
finds a way to get this working outside of custom comparison operators.

pops-10.1 root@db1=# create collation "ebcdic" (provider=icu,
locale='cp037');
CREATE COLLATION

pops-10.1 root@db1=# create table test (data text);
CREATE TABLE

pops-10.1 root@db1=# insert into test
values('a'),('b'),('A'),('B'),('1'),('2'),('!'),('^');
INSERT 0 8

pops-10.1 root@db1=# select * from test order by data collate "ebcdic";
 data
------
 !
 ^
 1
 2
 a
 A
 b
 B
(8 rows)

-- 
http://about.me/jeremy_schneider