Thread: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From
"Tsunakawa, Takayuki"
Date:
Hello,

It is embarrassing to ask such a thing, but is there any way to sort character column values in EBCDIC order inside the
PostgreSQLserver?  i.e. is it possible to use EBCDIC collation in PostgreSQL, say, by using ICU or something?  We need
torun on a certain mainframe.
 

I see almost no hope from the following, but let me make sure.

https://www.postgresql.org/message-id/flat/28548.1424884373%40sss.pgh.pa.us#28548.1424884373@sss.pgh.pa.us

"Hmm ... EBCDIC ... is that mandatory?  Because there are a pretty large
number of ASCII dependencies in PG, mostly arising from our support of
multibyte character sets, which are all expected to be ASCII supersets.
If there's not a way to run the server process in ASCII-land, you're
likely dead in the water.  I would not want to try to ferret out all
the dependencies."


Regards
Takayuki Tsunakawa




Re: Is it possible to sort strings in EBCDIC order in PostgreSQLserver?

From
Laurenz Albe
Date:
Tsunakawa, Takayuki wrote:
> It is embarrassing to ask such a thing, but is there any way to sort character column values in EBCDIC order inside
thePostgreSQL server?
 
> i.e. is it possible to use EBCDIC collation in PostgreSQL, say, by using ICU or something?  We need to run on a
certainmainframe.
 
> 
> I see almost no hope from the following, but let me make sure.

That should not be too difficult.  PostgreSQL is extensible!

Just define operators that implement <, <=, =, >= and > for EBCDIC,
create an operator class for these (for indexing) and use
ORDER BY ... USING.

Yours,
Laurenz Albe


Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From
Tom Lane
Date:
"Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes:
> It is embarrassing to ask such a thing, but is there any way to sort character column values in EBCDIC order inside
thePostgreSQL server?  i.e. is it possible to use EBCDIC collation in PostgreSQL, say, by using ICU or something?  We
needto run on a certain mainframe. 

> I see almost no hope from the following, but let me make sure.
> https://www.postgresql.org/message-id/flat/28548.1424884373%40sss.pgh.pa.us#28548.1424884373@sss.pgh.pa.us

> "Hmm ... EBCDIC ... is that mandatory?  Because there are a pretty large
> number of ASCII dependencies in PG, mostly arising from our support of
> multibyte character sets, which are all expected to be ASCII supersets.
> If there's not a way to run the server process in ASCII-land, you're
> likely dead in the water.  I would not want to try to ferret out all
> the dependencies."

As I recall, that question was about converting the server to use
EBCDIC strings everywhere.  That seems impractical for the reasons
I mentioned.  But you could probably sort according to EBCDIC order
if you were willing to accept some inefficiency: internal to the
comparison function, convert each string to EBCDIC on each call.

Another way you could imagine is to store EBCDIC strings embedded
in bytea values.

            regards, tom lane


Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From
Peter Geoghegan
Date:
On Sun, Dec 10, 2017 at 11:09 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> It is embarrassing to ask such a thing, but is there any way to sort character column values in EBCDIC order inside
thePostgreSQL server?  i.e. is it possible to use EBCDIC collation in PostgreSQL, say, by using ICU or something?  We
needto run on a certain mainframe. 

There is no need for embarrassment. I think that you mean that you'd
like to get behavior equivalent to the "C" locale with an EBCDIC
encoding. Right? I think that you could use ICU to do this with some
work. ICU collations are not tied to a particular encoding, unlike the
libc collations (they support most but not all collations).

That said, the idea of an "EBCDIC collation" seems limiting. Why
should a system like DB2 for the mainframe (that happens to use EBCDIC
as its encoding) not have a more natural, human-orientated collation
even while using EBCDIC? ISTM that the point of using the "C" locale
(with EBDIC or with UTF-8 or with any other encoding) is to get a
performance benefit where the actual collation's behavior doesn't
matter much to users. Are you sure it's really important to be
*exactly* compatible with EBCDIC order? As long as you're paying for a
custom collation, why not just use a collation that is helpful to
humans?

--
Peter Geoghegan


RE: Is it possible to sort strings in EBCDIC order in PostgreSQLserver?

From
"Tsunakawa, Takayuki"
Date:
Hi Laurenz, Tom, Peter,

Thanks for your suggestions.  The practical solution seems to be to override comparison operators of char, varchar and
textdata types with UDFs that behave as Tom mentioned.
 

From: Peter Geoghegan [mailto:pg@bowt.ie]
> That said, the idea of an "EBCDIC collation" seems limiting. Why
> should a system like DB2 for the mainframe (that happens to use EBCDIC
> as its encoding) not have a more natural, human-orientated collation
> even while using EBCDIC? ISTM that the point of using the "C" locale
> (with EBDIC or with UTF-8 or with any other encoding) is to get a
> performance benefit where the actual collation's behavior doesn't
> matter much to users. Are you sure it's really important to be
> *exactly* compatible with EBCDIC order? As long as you're paying for a
> custom collation, why not just use a collation that is helpful to
> humans?

You are right.  I'd like to ask the customer whether and why they need EBCDIC ordering.

Regards
Takayuki Tsunakawa


Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From
John McKown
Date:
On Tue, Dec 12, 2017 at 2:17 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
Hi Laurenz, Tom, Peter,

Thanks for your suggestions.  The practical solution seems to be to override comparison operators of char, varchar and text data types with UDFs that behave as Tom mentioned.

From: Peter Geoghegan [mailto:pg@bowt.ie]
> That said, the idea of an "EBCDIC collation" seems limiting. Why
> should a system like DB2 for the mainframe (that happens to use EBCDIC
> as its encoding) not have a more natural, human-orientated collation
> even while using EBCDIC? ISTM that the point of using the "C" locale
> (with EBDIC or with UTF-8 or with any other encoding) is to get a
> performance benefit where the actual collation's behavior doesn't
> matter much to users. Are you sure it's really important to be
> *exactly* compatible with EBCDIC order? As long as you're paying for a
> custom collation, why not just use a collation that is helpful to
> humans?

You are right.  I'd like to ask the customer whether and why they need EBCDIC ordering.

​This is a guess on my part, based on many years on an EBCDIC system. But I'll bet that they are doing a conversion off of the EBCDIC system (maybe Db2 on z/OS) to an ASCII system (Linux or Windows) running PostgreSQL. They want to be able to compare the output from the existing system to the output on the new system. EBCDIC orders "lower case", "upper case", then "digits". The default C locale on Linux (I don't know Windows) will sort "digits", then alphabetic with the lower then upper case of each letter in order like: "aAbB...zZ". Comparing identical data which is not presented in exactly the same order would be very difficult. ​

 

Regards
Takayuki Tsunakawa




--
I have a theory that it's impossible to prove anything, but I can't prove it.

Maranatha! <><
John McKown

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From
James Keener
Date:
The default C locale on Linux (I don't know Windows) will sort "digits", then alphabetic with the lower then upper case of each letter in order like: "aAbB...zZ"

That's no true at all! The C locales are 0-9A-Za-z

#include <locale.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>


static int myCompare (const void * a, const void * b)
{
  return strcmp (*(const char **) a, *(const char **) b);
}

void with_other_locale (char *new_locale,
    const char **arr,
    const int n
    )
{
  char *old_locale, *saved_locale;

  /* Get the name of the current locale.  */
  old_locale = setlocale (LC_ALL, NULL);

  /* Copy the name so it won’t be clobbered by setlocale. */
  saved_locale = strdup (old_locale);

  if (strlen(new_locale) == 0)
  {
    new_locale = saved_locale;
  }

  /* Now change the locale and do some stuff with it. */
  setlocale (LC_ALL, new_locale);
  qsort (arr, n, sizeof (const char *), myCompare);

  printf("\nSorted array in locale %s is\n", new_locale);
  for (int i = 0; i < n; i++)
    printf("%d: %s \n", i, arr[i]);

  /* Restore the original locale. */
  setlocale (LC_ALL, saved_locale);
  free (saved_locale);
}

int main ()
{
  const char *arr[] = {"Jim", "job", "Anne", "aardvark", "Isaac", "island", "12 Days of Christmas", "12 drummers"};
  int n = sizeof(arr)/sizeof(arr[0]);
  int i;

  printf("Given array is\n");
  for (i = 0; i < n; i++)
    printf("%d: %s \n", i, arr[i]);

  with_other_locale("", arr, n);
  with_other_locale("C", arr, n);
  with_other_locale("en_US.UTF-8", arr, n);
  with_other_locale("UTF-8", arr, n);
  return 0;
}

Gives

Given array is
0: Jim
1: job
2: Anne
3: aardvark
4: Isaac
5: island
6: 12 Days of Christmas
7: 12 drummers

Sorted array in locale C is
0: 12 Days of Christmas
1: 12 drummers
2: Anne
3: Isaac
4: Jim
5: aardvark
6: island
7: job

Sorted array in locale C is
0: 12 Days of Christmas
1: 12 drummers
2: Anne
3: Isaac
4: Jim
5: aardvark
6: island
7: job

Sorted array in locale en_US.UTF-8 is
0: 12 Days of Christmas
1: 12 drummers
2: Anne
3: Isaac
4: Jim
5: aardvark
6: island
7: job

Sorted array in locale UTF-8 is
0: 12 Days of Christmas
1: 12 drummers
2: Anne
3: Isaac
4: Jim
5: aardvark
6: island
7: job

I actually don't think there is a case-insensitive locale (bydefault?) on a unix machine.

Jim

On Tue, Dec 12, 2017 at 8:18 AM, John McKown <john.archie.mckown@gmail.com> wrote:
On Tue, Dec 12, 2017 at 2:17 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
Hi Laurenz, Tom, Peter,

Thanks for your suggestions.  The practical solution seems to be to override comparison operators of char, varchar and text data types with UDFs that behave as Tom mentioned.

From: Peter Geoghegan [mailto:pg@bowt.ie]
> That said, the idea of an "EBCDIC collation" seems limiting. Why
> should a system like DB2 for the mainframe (that happens to use EBCDIC
> as its encoding) not have a more natural, human-orientated collation
> even while using EBCDIC? ISTM that the point of using the "C" locale
> (with EBDIC or with UTF-8 or with any other encoding) is to get a
> performance benefit where the actual collation's behavior doesn't
> matter much to users. Are you sure it's really important to be
> *exactly* compatible with EBCDIC order? As long as you're paying for a
> custom collation, why not just use a collation that is helpful to
> humans?

You are right.  I'd like to ask the customer whether and why they need EBCDIC ordering.

​This is a guess on my part, based on many years on an EBCDIC system. But I'll bet that they are doing a conversion off of the EBCDIC system (maybe Db2 on z/OS) to an ASCII system (Linux or Windows) running PostgreSQL. They want to be able to compare the output from the existing system to the output on the new system. EBCDIC orders "lower case", "upper case", then "digits". The default C locale on Linux (I don't know Windows) will sort "digits", then alphabetic with the lower then upper case of each letter in order like: "aAbB...zZ". Comparing identical data which is not presented in exactly the same order would be very difficult. ​

 

Regards
Takayuki Tsunakawa




--
I have a theory that it's impossible to prove anything, but I can't prove it.

Maranatha! <><
John McKown

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From
John McKown
Date:
On Tue, Dec 12, 2017 at 9:11 AM, James Keener <jim@jimkeener.com> wrote:
The default C locale on Linux (I don't know Windows) will sort "digits", then alphabetic with the lower then upper case of each letter in order like: "aAbB...zZ"

That's no true at all! The C locales are 0-9A-Za-z

​Thanks for the correction. Turns out that I forgot that my default locale on Linux was en_US.utf8.​


--
I have a theory that it's impossible to prove anything, but I can't prove it.

Maranatha! <><
John McKown

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From
James Keener
Date:
en_US.utf8. is still 0-9A-Za-z and in my example set (as it's my default too :))

You'd need a case insensitive collation to do what you described, and I'm not sure those exist in postgres. (I guess you could always build your own if you _really_ wanted to.

Jim

On Tue, Dec 12, 2017 at 10:24 AM, John McKown <john.archie.mckown@gmail.com> wrote:
On Tue, Dec 12, 2017 at 9:11 AM, James Keener <jim@jimkeener.com> wrote:
The default C locale on Linux (I don't know Windows) will sort "digits", then alphabetic with the lower then upper case of each letter in order like: "aAbB...zZ"

That's no true at all! The C locales are 0-9A-Za-z

​Thanks for the correction. Turns out that I forgot that my default locale on Linux was en_US.utf8.​


--
I have a theory that it's impossible to prove anything, but I can't prove it.

Maranatha! <><
John McKown

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From
James Keener
Date:
Sorry for spamming the list. It appears that I'm an idiot. Sorry :(

jim=# select * from test order by a collate "C";
          a          
----------------------
 12 Days of Christmas
 12 drummers
 Anne
 Isaac
 Jim
 a
 aardvark
 b
 island
 job
(10 rows)

jim=# select * from test order by a collate "en_US.utf8";
          a          
----------------------
 12 Days of Christmas
 12 drummers
 a
 aardvark
 Anne
 b
 Isaac
 island
 Jim
 job
(10 rows)


On Tue, Dec 12, 2017 at 10:36 AM, James Keener <jim@jimkeener.com> wrote:
en_US.utf8. is still 0-9A-Za-z and in my example set (as it's my default too :))

You'd need a case insensitive collation to do what you described, and I'm not sure those exist in postgres. (I guess you could always build your own if you _really_ wanted to.

Jim

On Tue, Dec 12, 2017 at 10:24 AM, John McKown <john.archie.mckown@gmail.com> wrote:
On Tue, Dec 12, 2017 at 9:11 AM, James Keener <jim@jimkeener.com> wrote:
The default C locale on Linux (I don't know Windows) will sort "digits", then alphabetic with the lower then upper case of each letter in order like: "aAbB...zZ"

That's no true at all! The C locales are 0-9A-Za-z

​Thanks for the correction. Turns out that I forgot that my default locale on Linux was en_US.utf8.​


--
I have a theory that it's impossible to prove anything, but I can't prove it.

Maranatha! <><
John McKown


Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From
John McKown
Date:
On Tue, Dec 12, 2017 at 9:43 AM, James Keener <jim@jimkeener.com> wrote:
Sorry for spamming the list. It appears that I'm an idiot. Sorry :(

​I guess we're even now. We both made a similar mistake.​

​But, despite my error, I still think the OP's need for an EBCDIC order is to compare output from parallel runs of an application from both an EBCDIC host and and ASCII host, to make sure they are "the same".

--
I have a theory that it's impossible to prove anything, but I can't prove it.

Maranatha! <><
John McKown

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From
James Keener
Date:
If that's the case, I wonder if OP could write a function that would convert from the ASCII code-point ot the EBCDIC codepoint. For instance, (using the function at https://wiki.postgresql.org/wiki/Binary_Replace) convert A to char 193, `select binary_replace('Anne'::bytea, 'A'::bytea, '\xc1'::bytea);` and then sort by the result of this function.

Jim

On Tue, Dec 12, 2017 at 10:47 AM, John McKown <john.archie.mckown@gmail.com> wrote:
On Tue, Dec 12, 2017 at 9:43 AM, James Keener <jim@jimkeener.com> wrote:
Sorry for spamming the list. It appears that I'm an idiot. Sorry :(

​I guess we're even now. We both made a similar mistake.​

​But, despite my error, I still think the OP's need for an EBCDIC order is to compare output from parallel runs of an application from both an EBCDIC host and and ASCII host, to make sure they are "the same".

--
I have a theory that it's impossible to prove anything, but I can't prove it.

Maranatha! <><
John McKown

Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From
George Neuner
Date:
On Tue, 12 Dec 2017 07:18:10 -0600, John McKown
<john.archie.mckown@gmail.com> wrote:

>?This is a guess on my part, based on many years on an EBCDIC system. But
>I'll bet that they are doing a conversion off of the EBCDIC system (maybe
>Db2 on z/OS) to an ASCII system (Linux or Windows) running PostgreSQL. They
>want to be able to compare the output from the existing system to the
>output on the new system. 

[he asks naively:]

If a port off the EBCDIC system really is the case, wouldn't it be
easier just to write an external compare tool?   Or maybe a PG
function to convert UTF strings as EBCDIC so as to compare string
column dumps?

Either of the above seems more rational than putting a whole lot of
effort into creating an EBCDIC collation.   

[Not that it is terribly hard to create a new collation.  It's more a
matter of "why bother?" when you are trying to move away from that
system in the 1st place.]

YMMV,
George



Re: Is it possible to sort strings in EBCDIC order in PostgreSQL server?

From
Peter Geoghegan
Date:
On Tue, Dec 12, 2017 at 5:18 AM, John McKown
<john.archie.mckown@gmail.com> wrote:
> On Tue, Dec 12, 2017 at 2:17 AM, Tsunakawa, Takayuki
> <tsunakawa.takay@jp.fujitsu.com> wrote:
>>
>> Hi Laurenz, Tom, Peter,
>>
>> Thanks for your suggestions.  The practical solution seems to be to
>> override comparison operators of char, varchar and text data types with UDFs
>> that behave as Tom mentioned.
>>
>> From: Peter Geoghegan [mailto:pg@bowt.ie]
>> > That said, the idea of an "EBCDIC collation" seems limiting. Why
>> > should a system like DB2 for the mainframe (that happens to use EBCDIC
>> > as its encoding) not have a more natural, human-orientated collation
>> > even while using EBCDIC? ISTM that the point of using the "C" locale
>> > (with EBDIC or with UTF-8 or with any other encoding) is to get a
>> > performance benefit where the actual collation's behavior doesn't
>> > matter much to users. Are you sure it's really important to be
>> > *exactly* compatible with EBCDIC order? As long as you're paying for a
>> > custom collation, why not just use a collation that is helpful to
>> > humans?
>>
>> You are right.  I'd like to ask the customer whether and why they need
>> EBCDIC ordering.
>
>
> This is a guess on my part, based on many years on an EBCDIC system. But
> I'll bet that they are doing a conversion off of the EBCDIC system (maybe
> Db2 on z/OS) to an ASCII system (Linux or Windows) running PostgreSQL. They
> want to be able to compare the output from the existing system to the output
> on the new system. EBCDIC orders "lower case", "upper case", then "digits".

ICU supports creating custom collations that reorder upper and lower
case, or digits with scripts (e.g. Latin alphabet characters). See the
documentation -- "23.2.2.3.2. ICU collations". Advanced customization
is possible.

-- 
Peter Geoghegan


Re: Is it possible to sort strings in EBCDIC order in PostgreSQLserver?

From
Jeremy Schneider
Date:
On 12/12/17 10:21, Peter Geoghegan wrote:
> ICU supports creating custom collations that reorder upper and lower
> case, or digits with scripts (e.g. Latin alphabet characters). See the
> documentation -- "23.2.2.3.2. ICU collations". Advanced customization
> is possible.

I just gave this a try and I'm not sure I completely understand what
it's doing but it definitely doesn't look like it's sorting according to
EBCDIC byte orders. (This was on centos 7 with libicu-50.1.2) I might
have created the collation incorrectly. I'd love to hear if anyone else
finds a way to get this working outside of custom comparison operators.

pops-10.1 root@db1=# create collation "ebcdic" (provider=icu,
locale='cp037');
CREATE COLLATION

pops-10.1 root@db1=# create table test (data text);
CREATE TABLE

pops-10.1 root@db1=# insert into test
values('a'),('b'),('A'),('B'),('1'),('2'),('!'),('^');
INSERT 0 8

pops-10.1 root@db1=# select * from test order by data collate "ebcdic";
 data
------
 !
 ^
 1
 2
 a
 A
 b
 B
(8 rows)


-- 
http://about.me/jeremy_schneider