Tatsuo Ishii wrote:
>
> > > > > There are "full width alphabets" in Japanese. Thoes include not only
> > > > > ASCII letters but also some European characters.
> > > >
> > > > Are these ASCII and European characters uppercased in some
> > > > Japanese-specific way ?
> > >
> > > Probably not, but I'm not sure since my Linux box does not have *.utf8
> > > locales.
> >
> > Could you give me the UTF-8 bytecode for one japanese upper case char and
> > for the same char the lower case?
> > I will check in de_DE locale if this translations works.
>
> Ok, here is the data you requested. The first three bytes (0xefbca1)
> represents full-width capital "A", the rest three bytes (0xefbd81)
> represents full-width lower case "a".
Thank you for the data, it is working in ja_JP.utf8 and in de_DE.utf8
I send you my test program as attachment.
Regards,
Michael
#include <stdio.h>
#include <wchar.h>
#include <wctype.h>
#include <locale.h>
#define LEN 7
int main() {
char readInByte[LEN], writeOutByte[LEN]; // holds the character bytes
const char *readInByteP[] = {readInByte}; // help pointer
wchar_t readInWC[LEN], writeOutWC[LEN]; // holds the wide characters
const wchar_t *writeOutWCP[] = {writeOutWC}; // help pointer
wctrans_t wctransDesc; // holds the descriptor for conversion
int i, ret;
//const char myLocale[] = "ja_JP.utf8";
const char myLocale[] = "de_DE.utf8";
char *localeSet;
readInByte[0] = 0xef; readInByte[1] = 0xbc; readInByte[2] = 0xa1; // full-width A (upper) in UTF-8
readInByte[3] = 0xef; readInByte[4] = 0xbd; readInByte[5] = 0x81; // full-width a (lower) in UTF-8
readInByte[6] = 0;
// print out the input
printf("full-width A (upper) UTF-8: %hhx %hhx %hhx\n", readInByte[0], readInByte[1], readInByte[2]);
printf("full-width a (lower) UTF-8: %hhx %hhx %hhx\n", readInByte[3], readInByte[4], readInByte[5]);
if((localeSet = setlocale(LC_CTYPE, myLocale)) == NULL) { perror("setlocale"); exit(1); }
else printf("locale set: %s\n", localeSet);
ret = mbsrtowcs(readInWC, readInByteP, LEN, NULL); // convert bytes to wide chars
printf("number of wide chars: %i\n", ret);
wctransDesc = wctrans("tolower"); // get descriptor for wc operation
if(wctransDesc == 0) { perror("wctransDesc"); exit(1); }
// make the transformation according to descriptor
i=0; while((writeOutWC[i] = towctrans(readInWC[i], wctransDesc)) != L'\0') i++;
ret = wcsrtombs(writeOutByte, writeOutWCP, LEN, NULL); // convert wide chars to bytes
printf("number of bytes: %i\n", ret);
// print out the result
printf("full-width A tolower(): %hhx %hhx %hhx\n", writeOutByte[0], writeOutByte[1], writeOutByte[2]);
printf("full-width a tolower(): %hhx %hhx %hhx\n", writeOutByte[3], writeOutByte[4], writeOutByte[5]);
return 0;
}