Re: UNICODE - Mailing list pgsql-general

From Tatsuo Ishii
Subject Re: UNICODE
Date
Msg-id 20011030100139C.t-ishii@sra.co.jp
Whole thread Raw
In response to Re: UNICODE  (Tatsuo Ishii <t-ishii@sra.co.jp>)
List pgsql-general
Can you please do not send me a personal mail?
Let's share info among people in the mailing list.
Anyway...

> I've tried that.  Still not writing the Chinese characters correctly.

I don't know what kind of Chinese character set you are using, but at
least your code will not work if the Chinese character set is Big5
since the second byte of it contains ascii characters.
To learn more about character sets, see
ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf
for example.
--
Tatsuo Ishii

> Here is the code:
>
>   contentTypeFromPost = getenv("CONTENT_TYPE");
>   contentTypeLength = getenv("CONTENT_LENGTH");
>   icontentLength = atoi(contentTypeLength);
>
>     if((queryString = malloc(icontentLength + 1)) == NULL)
>     {
>       postMessage("Cannot allocate memory", 0);
>       return(0);
>     }
>     for(i=0; *queryString; i++)
>     {
>       splitword(items.Item, queryString, '&');
>       unescape_url(items.Item);
>       splitword(items.name, items.Item, '=');
>
>  // items.Item contains double byte characters
>  // However, when write to database I get unrecognizable data
>     }
>
> void splitword(uchar *out, uchar *in, uchar stop)
> {
>    int i, j;
>
>    while(*in == ' ') in++; /* skip past any spaces */
>
>    for(i = 0; in[i] && (in[i] != stop); i++)
>       out[i] = in[i];
>
>    out[i] = '\0'; /* terminate it */
>    if(in[i]) ++i; /* position past the stop */
>
>    while(in[i] == ' ') i++; /* skip past any spaces */
>
>    for(j = 0; in[j]; )  /* shift the rest of the in */
>       in[j++] = in[i++];
> }
>
> uchar x2c(uchar *x)
> {
>    register uchar c;
>
>    /* note: (x & 0xdf) makes x upper case */
>    c  = (x[0] >= 'A' ? ((x[0] & 0xdf) - 'A') + 10 : (x[0] - '0'));
>    c *= 16;
>    c += (x[1] >= 'A' ? ((x[1] & 0xdf) - 'A') + 10 : (x[1] - '0'));
>    return(c);
> }
>
> void unescape_url(uchar *url)
> {
>    register int i, j;
>
>    for(i = 0, j = 0; url[j]; ++i, ++j)
>    {
>       if((url[i] = url[j]) == '%')
>       {
>          url[i] = x2c(&url[j + 1]);
>          j += 2;
>       }
>       else if (url[i] == '+')
>          url[i] = ' ';
>    }
>    url[i] = '\0';  /* terminate it at the new length */
> }
>
> -----Original Message-----
> From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
> Sent: Sunday, October 28, 2001 4:57 PM
> To: jklcom@mindspring.com
> Cc: pgsql-general@postgresql.org
> Subject: RE: [GENERAL] UNICODE
>
>
> > I'm also trying to write some Chinese data to postgresql database.  I'm
> > gibberish after it's written to the database.
> >
> > I recognize the problem is at the http request.  How do I retrieve double
> > byte characters through http request using C/C++? And how do I write it
> the
> > database?
>
> Nothing special. Just read/write one by one.
>
> > And how do I tell it what kind of encoding to use?
>
> set client_encoding.
> --
> Tatsuo Ishii
>

pgsql-general by date:

Previous
From: Doug McNaught
Date:
Subject: Re: Differential Backups
Next
From: Alvaro Herrera
Date:
Subject: Re: Differential Backups