Re: [HACKERS] Bug #659: lower()/upper() bug on - Mailing list pgsql-bugs

From Enke, Michael
Subject Re: [HACKERS] Bug #659: lower()/upper() bug on
Date
Msg-id 3D06FCC2.D03F8006@wincor-nixdorf.com
Whole thread Raw
In response to Re: [HACKERS] Bug #659: lower()/upper() bug on  (Tatsuo Ishii <t-ishii@sra.co.jp>)
List pgsql-bugs
Tatsuo Ishii wrote:
>
> > > > > There are "full width alphabets" in Japanese. Thoes include not only
> > > > > ASCII letters but also some European characters.
> > > >
> > > > Are these ASCII and European characters uppercased in some
> > > > Japanese-specific way ?
> > >
> > > Probably not, but I'm not sure since my Linux box does not have *.utf8
> > > locales.
> >
> > Could you give me the UTF-8 bytecode for one japanese upper case char and
> > for the same char the lower case?
> > I will check in de_DE locale if this translations works.
>
> Ok, here is the data you requested. The first three bytes (0xefbca1)
> represents full-width capital "A", the rest three bytes (0xefbd81)
> represents full-width lower case "a".

Thank you for the data, it is working in ja_JP.utf8 and in de_DE.utf8
I send you my test program as attachment.

Regards,
Michael

#include <stdio.h>
#include <wchar.h>
#include <wctype.h>
#include <locale.h>
#define LEN 7

int main() {
  char readInByte[LEN], writeOutByte[LEN];     // holds the character bytes
  const char *readInByteP[] = {readInByte};    // help pointer
  wchar_t readInWC[LEN], writeOutWC[LEN];      // holds the wide characters
  const wchar_t *writeOutWCP[] = {writeOutWC}; // help pointer
  wctrans_t wctransDesc;                       // holds the descriptor for conversion
  int i, ret;
  //const char myLocale[] = "ja_JP.utf8";
  const char myLocale[] = "de_DE.utf8";
  char *localeSet;

  readInByte[0] = 0xef; readInByte[1] = 0xbc; readInByte[2] = 0xa1; // full-width A (upper) in UTF-8
  readInByte[3] = 0xef; readInByte[4] = 0xbd; readInByte[5] = 0x81; // full-width a (lower) in UTF-8
  readInByte[6] = 0;

  // print out the input
  printf("full-width A (upper) UTF-8: %hhx %hhx %hhx\n", readInByte[0], readInByte[1], readInByte[2]);
  printf("full-width a (lower) UTF-8: %hhx %hhx %hhx\n", readInByte[3], readInByte[4], readInByte[5]);

  if((localeSet = setlocale(LC_CTYPE, myLocale)) == NULL) { perror("setlocale"); exit(1); }
  else printf("locale set: %s\n", localeSet);
  ret = mbsrtowcs(readInWC, readInByteP, LEN, NULL); // convert bytes to wide chars
  printf("number of wide chars: %i\n", ret);
  wctransDesc = wctrans("tolower");            // get descriptor for wc operation
  if(wctransDesc == 0) { perror("wctransDesc"); exit(1); }

  // make the transformation according to descriptor
  i=0; while((writeOutWC[i] = towctrans(readInWC[i], wctransDesc)) != L'\0') i++;

  ret = wcsrtombs(writeOutByte, writeOutWCP, LEN, NULL); // convert wide chars to bytes
  printf("number of bytes: %i\n", ret);

  // print out the result
  printf("full-width A tolower():     %hhx %hhx %hhx\n", writeOutByte[0], writeOutByte[1], writeOutByte[2]);
  printf("full-width a tolower():     %hhx %hhx %hhx\n", writeOutByte[3], writeOutByte[4], writeOutByte[5]);

  return 0;
}

pgsql-bugs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: createdb comments
Next
From: pgsql-bugs@postgresql.org
Date:
Subject: Bug #691: CREATE TABLE AS ignores explicit column names with UNION