Thread: International support

International support

From
Soma Interesting
Date:
Hello everyone,

I'm currently working a project that is intended to handle Japanese
character sets - and now I'm told ideally iMode too. :) The iMode isn't
such an issue at the moment - but the article below has spooked me a
little. At an early point in the project we tested if putting some input
into a web form, which ultimately was handled by php then stored in
postgres would return fully intact - and it did. This left me comfortable
that PHP and Postgres don't seem to care what language they're storing in
fields or variables. I'm 'guessing' that this is because the data, whether
its English or Japanese is being stored in binary (or something else?). Of
course I wouldn't be able to sort the data or do anything else that would
require PHP/Postgres to be able to interpret the data. However if I compile
Postgres with locals support for the character set/language in question -
then postgres will be able to sort Japanese. Is this right?

Have I got this all right so far? I have attempted to do my research on
this - but finding a real beginners guide to international web development
has been a trick. And the best sources I have found on this topic generally
are specific to Oracle. Any links would be appreciated.

Ok the next part of this message is an article I thought would be generally
interesting so I'm not hesitating to post it entirely. It was forwarded to
me so I'm not sure of the source.

For the postgres folks, these developers went with MySQL - I've chosen
Postgres. Is there anything MySQL does that Postgres doesn't in terms of
language support that I should be aware of?

>Back to case study
>r-newbold.com
>
>George Baptista is riding the wave of the latest technology with his
>company's newest venture:
><http://www.r-newbold.com>r-newbold.<http://www.r-newbold.com>com, a
>wireless i-Mode site created with PHP.
>
>Based in Tokyo, George is a Web developer and co-partner with Izumi
>Hiroshima at <http://www.omame.com>Studio <http://www.omame.com>Omame. He
>was part of the creative team behind the Japanese-language site, and he is
>finding that i-Mode is not only fashionable, but also the face of the future.
>
>r-newbold.com was developed for British fashion designer, Paul Smith. NTT
>DoCoMo's i-Mode Web-browsing cellular phones, which already have over 10
>million users, offers a fast wireless data service with Internet access.
>In Japan, it's been a hit with the younger generation. And since
>r-newbold.com's target population is young, hip and fashionable, the
>decision to create an i-Mode site rather than a regular Web site made
>perfect sense.
>
>PHP and i-Mode: The perfect match
>"i-Mode is exploding in Japan," George adds. "It offers new and
>interesting challenges to the developer. The i18n J version of PHP is an
>increasingly popular choice for developers [in Japan] because of its
>overall performance, speed of development, and character set capability
>features."
>
>George and his team at Studio Omame have already been using PHP for about
>a year prior to developing the r-newbold site, and were very pleased with
>its performance. He cites several reasons for his company's choice of PHP
>including:
>* PHP's overall performance and reliability
>* ease with which to prototype in
>* PHP's large and growing developer base
>"There are a growing number of developers in Tokyo using it. We also knew
>there was ample support on both main and Japanese PHP mailing lists for
>any problems that might come up," George says.
>"PHP became especially useful to us when Shigeru Kanemoto, Hironori Neal
>Sato and a few other developers worked to release a version of PHP
>3.0.15-i18n. We are looking forward to this same core of people to also
>soon release an i18n J version of PHP 4. We expect it will provide us with
>even greater speed and performance than PHP 3."
>Studio Omame's decision to use PHP was welcomed by Paul Smith's company.
>The open-source nature of the product was simply not an issue. "We
>suggested PHP, and there was no problem. They were happy about the
>cost-savings involved," George recalls.
>
>PHP's Japanese challenge
>Since r-newbold.com is in Japanese only, Studio Omame made sure to utilize
>PHP's Japanese character set conversion functions. However, this proved to
>be a challenge.

Is this available for v4 of PHP yet?

>"Dealing with character sets when developing for the Japanese market is a
>headache," George says. "There's not one, but 3 character sets in wide
>use: SJIS, EUC and JIS.
>
>"We used all of them for this project. All output for the i-Mode platform
>needs to be SJIS, while input to MySQL is in EUC, and data for mailing was
>encoded in JIS. Ensuring the integrity of data via character set
>conversion was a very important factor. Ultimately, PHP made this project
>much smoother and very enjoyable. Not having to worry about character set
>mangling is a godsend."
>
>There was also the task of adapting PHP from computers to telephones. "The
>unique nature of celluar phone interfaces affected our application
>development. For example, people browsing the Web using i-Mode phones tend
>to use the back button a lot, rather than 'back' or 'return' links.
>Figuring out how to deal with this in preserving session data integrity
>took a little thinking."
>
>George and his team did figure it out, although George points out that
>"Daryl Jones of TEA, a communications firm based in Silicon Valley, helped
>out tremendously with the network and server consulting and set-up, so
>that we could focus totally on PHP development."
>It took Studio Omame almost two months to write the PHP scripts for all of
>the site's functions, including mail and accessing MySQL. "As much as
>possible, we try to stick with *nix, Apache, PHP," George says.
>
>The current shop database holds approximately 60 records, and the
>membership section is designed to easily handle an initial 20,000 records.
>In addition, other miscellaneous tables are used for administrative purposes.
>
>PHP's functions
>PHP serves a variety of functions on the r-newbold site, including:
>* authentication and sessions via PHPLIB for a member-only section
>* PHP scripts + cron accessing MySQL database to send out daily e-mails
>* heavy use of mail functions for viral-marketing features
>* detecting specific browser/celluar handset and serving the appropriate
>interface
>* Japanese character set functions specific to i18n version, ensuring SJIS
>output, internal processing in EUC
>* mail operations in JIS
>
>As far as security purposes are concerned, Studio Omame decided to use
>PHPLIB for authentication. "At first we tried using the standard HTTP
>authentication method, but it turned out that certain i-Mode handsets had
>somewhat sketchy browser implementations, and they would need to log-in
>for each password-protected page! So we turned to PHPLIB, which worked
>very nicely."
>
>The site currently receives more than 3,000 page views per day, with most
>visitors spending about 5 minutes at a time on the site. "In Japan, people
>generally use i-Mode sites in mini-bursts, for example, while waiting for
>the train, riding the bus or walking about the street," George says.
>
>r-newbold.com hopes to receive approximately 200,000 visitors a day
>initially. "We tried to reduce the amount of calls to MySQL as much as
>possible, and instead created static pages that are updated from an
>administration screen."
>
>Data is collected on visitors in the members' section only. It's an
>entirely opt-in, voluntary feature, George notes.
>Studio Omame also used Adobe's GoLive 5 for the information design and
>page-template creation for this project. The WebDav and i-Mode eMoji
>features were particularly useful.
>
>As i-Mode becomes more and more popular, George says he hopes PHP will
>eventually provide more i18n compatibility.
>In the meantime, Studio Omame has launched omake.com in an experimental
>i-Mode site using PHP. Several other i-Mode projects are also in the
>pipeline. "We plan to use PHP extensively for all of them," George predicts.


Re: International support

From
Tatsuo Ishii
Date:
> I'm currently working a project that is intended to handle Japanese
> character sets - and now I'm told ideally iMode too. :) The iMode isn't
> such an issue at the moment - but the article below has spooked me a
> little. At an early point in the project we tested if putting some input
> into a web form, which ultimately was handled by php then stored in
> postgres would return fully intact - and it did. This left me comfortable
> that PHP and Postgres don't seem to care what language they're storing in
> fields or variables. I'm 'guessing' that this is because the data, whether
> its English or Japanese is being stored in binary (or something
> else?).

No. You are just lucky, I guess. If data submitted by PHP is encoded
in EUC, it's ok, since EUC does not conflict with ASCII. However, it
is encoded in SJIS, you are going into big problem. The second byte of
SJIS *sometimes* conflict with ASCII meta characters such as "\", and
this will make the parser of PostgreSQL crazy.

Of courese the i18n version of PHP will help (it does the conversion
SJIS <--> EUC), but be ware that some characters in SJIS (such as User
define characters especially used in i-mode) are not well supported in
it.

> Of
> course I wouldn't be able to sort the data or do anything else that would
> require PHP/Postgres to be able to interpret the data.

That would depend on how you define "sort". Just doing a normal sort
as you are alredy do it with ASCII, you could get more or less
resonable results, I guess. But if your client requires more "high
level sorts" such as "sorting by YOMIGANA (Japanese pronounciation)"
you need to do something... probably you need to define an extract
field in your table.

> However if I compile
> Postgres with locals support for the character set/language in question -
> then postgres will be able to sort Japanese. Is this right?

No. locale support is useless for Japanese, just slows down
PostgreSQL. Turn it off.

>Have I got this all right so far? I have attempted to do my research on
>this - but finding a real beginners guide to international web development
>has been a trick. And the best sources I have found on this topic generally
>are specific to Oracle. Any links would be appreciated.

Try:
ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf

> For the postgres folks, these developers went with MySQL - I've chosen
> Postgres. Is there anything MySQL does that Postgres doesn't in terms of
> language support that I should be aware of?

I believe PostgreSQL's language support is much better than MySQL's
especially for Japanese. PostgreSQL can handle both EUC/SJIS on the
fly (and even Unicode for 7.1!), and has the ability to do an
automatic encoding conversion between them. Moreover, PostgreSQL has
many "multibyte aware" functions including regular expression search,
which MySQL cannot do, I think.

> >PHP's Japanese challenge
> >Since r-newbold.com is in Japanese only, Studio Omame made sure to utilize
> >PHP's Japanese character set conversion functions. However, this proved to
> >be a challenge.
>
> Is this available for v4 of PHP yet?

No.
--
Tatsuo Ishii

Re: International support

From
Karel Zak
Date:
On Fri, Feb 23, 2001 at 10:02:24AM +0900, Tatsuo Ishii wrote:
> > I'm currently working a project that is intended to handle Japanese
> > character sets - and now I'm told ideally iMode too. :) The iMode isn't
> > such an issue at the moment - but the article below has spooked me a
> > little. At an early point in the project we tested if putting some input
> > into a web form, which ultimately was handled by php then stored in
> > postgres would return fully intact - and it did. This left me comfortable
> > that PHP and Postgres don't seem to care what language they're storing in
> > fields or variables. I'm 'guessing' that this is because the data, whether
> > its English or Japanese is being stored in binary (or something
> > else?).
>

 I work on same things and (IMHO) is good store data in DB in MULE_INTERNAL
encoding (or UNICODE in 7.1) and on-the-fly encode it to EUC_JP or
Shift_JIS (for i-mode) or Latin1, because I store to same tables Latin1
English string too :-) During data input is always set correct encoding
(EUC or Latin1). Sure for data searching I not use string compare ('like'
or '='), but only access by 'id' integer columns - for all is used 'C' not
PHP. I know it sound curious, but is better solution for multi-language and
multi-encoding data storing in one DB? Our global world needs application
like this....

> I believe PostgreSQL's language support is much better than MySQL's
> especially for Japanese. PostgreSQL can handle both EUC/SJIS on the
> fly (and even Unicode for 7.1!), and has the ability to do an
               ^^^^^^^^^^^^
 Great work!

> automatic encoding conversion between them. Moreover, PostgreSQL has
> many "multibyte aware" functions including regular expression search,
> which MySQL cannot do, I think.

 Sure, The PostgreSQL is better (and not for this only).

    Karel

Re: International support

From
Soma Interesting
Date:
I just want to say thanks to all those who responded. I'm starting to
understand things well enough now.

My only big question left unanswered is how the client's browser decides
which encoding to use. I'm guessing that their browser determines this by
something in the document's header or a META tag - assuming they set it to
Auto detect. If they force it to SJIS, for example, could they potentially
cause problems?

So I believe the following solution will work:

- Use Unicode support in postgres for storing data.
- Use the php functions: pg_set_client_encoding to resolve the issue of
converting unicode to JIS or SJIS in the case of email and iMode.
- request that all clients browsers use Unicode and not SJIS to prevent
the problem with postgres parser (can someone explain how this is done?)
- do all my string manipulation within postgres with stored procedures to
avoid requiring multi-byte string functions within PHP

I want to avoid switching to php3-i18n because I've already done a lot of
the project in PHP4 and make extensive use of its sessions. I can meet the
project's needs by using stored procedures and LIKE for what little string
parsing needs to be done.

Not using php-i18n would only have to be the case until a version for php4
comes out (hopefully it will happen). At which point we migrate to it and
enjoy the greater flexibility of being able to handle multi-byte string
parsing and encoding/decoding in PHP instead of just the database.

Unicode seems a better option since ideally we want to support other
languages also - although our immediate needs are strictly to make Japanese
character sets work.


RE: [PHP] International support

From
"PHPBeginner.com"
Date:
Hi (Soma Interesting?)

I am also a developer here in Tokyo and have done few I-Mode websites using
PHP and mySQL. (I've used PostgreSQL with Japanese, but not for I-Mode)
( see www.japaninc.com/i )

There's no particular problems of storing the 2-bit data in your databases,
the only thing is that to do sorting you have to compile the databases with
the language needed ( regardless of which database you use - see their docs
they all support that).

With PHP there's a little problem using string functions. strlen(), for
instance will return you the number of bytes. In Latin characters ASCII
1-255 a character is a byte, but that doesn't apply to Japanese which use
double bits (some characters are even three bits), so a strlen() will not
return you the number of Hiragana & Kanji in string - it will return you the
number of bits these Kanji were composed from - VERY, VERY UGLY....

You need to use PHP3 Japanese interpretation to do that kind of tasks.

In our company we use a different server for Japanese characters running
PHP3 JIS, and the pages are being included form there.

IE: www.fusion-2000.net runs both PHP4 and PHP3 JIS with PostgreSQL at the
back end.

While www.japaninc.com/i goes on PHP4.0.1pl2 and mySQL, (check out that
game)

So, no worries, it is all possible to do, except there's a pain with
choosing PHP version for the server.


Sincerely,

 Maxim Maletsky
 Founder, Chief Developer

 PHPBeginner.com (Where PHP Begins)
 maxim@phpbeginner.com
 www.phpbeginner.com





-----Original Message-----
From: Soma Interesting [mailto:dfunct@telus.net]
Sent: Friday, February 23, 2001 3:23 AM
To: pgsql-general@postgresql.org; php-general@lists.php.net;
php-i18n@lists.php.net
Subject: [PHP] International support


Hello everyone,

I'm currently working a project that is intended to handle Japanese
character sets - and now I'm told ideally iMode too. :) The iMode isn't
such an issue at the moment - but the article below has spooked me a
little. At an early point in the project we tested if putting some input
into a web form, which ultimately was handled by php then stored in
postgres would return fully intact - and it did. This left me comfortable
that PHP and Postgres don't seem to care what language they're storing in
fields or variables. I'm 'guessing' that this is because the data, whether
its English or Japanese is being stored in binary (or something else?). Of
course I wouldn't be able to sort the data or do anything else that would
require PHP/Postgres to be able to interpret the data. However if I compile
Postgres with locals support for the character set/language in question -
then postgres will be able to sort Japanese. Is this right?

Have I got this all right so far? I have attempted to do my research on
this - but finding a real beginners guide to international web development
has been a trick. And the best sources I have found on this topic generally
are specific to Oracle. Any links would be appreciated.

Ok the next part of this message is an article I thought would be generally
interesting so I'm not hesitating to post it entirely. It was forwarded to
me so I'm not sure of the source.

For the postgres folks, these developers went with MySQL - I've chosen
Postgres. Is there anything MySQL does that Postgres doesn't in terms of
language support that I should be aware of?

>Back to case study
>r-newbold.com
>
>George Baptista is riding the wave of the latest technology with his
>company's newest venture:
><http://www.r-newbold.com>r-newbold.<http://www.r-newbold.com>com, a
>wireless i-Mode site created with PHP.
>
>Based in Tokyo, George is a Web developer and co-partner with Izumi
>Hiroshima at <http://www.omame.com>Studio <http://www.omame.com>Omame. He
>was part of the creative team behind the Japanese-language site, and he is
>finding that i-Mode is not only fashionable, but also the face of the
future.
>
>r-newbold.com was developed for British fashion designer, Paul Smith. NTT
>DoCoMo's i-Mode Web-browsing cellular phones, which already have over 10
>million users, offers a fast wireless data service with Internet access.
>In Japan, it's been a hit with the younger generation. And since
>r-newbold.com's target population is young, hip and fashionable, the
>decision to create an i-Mode site rather than a regular Web site made
>perfect sense.
>
>PHP and i-Mode: The perfect match
>"i-Mode is exploding in Japan," George adds. "It offers new and
>interesting challenges to the developer. The i18n J version of PHP is an
>increasingly popular choice for developers [in Japan] because of its
>overall performance, speed of development, and character set capability
>features."
>
>George and his team at Studio Omame have already been using PHP for about
>a year prior to developing the r-newbold site, and were very pleased with
>its performance. He cites several reasons for his company's choice of PHP
>including:
>* PHP's overall performance and reliability
>* ease with which to prototype in
>* PHP's large and growing developer base
>"There are a growing number of developers in Tokyo using it. We also knew
>there was ample support on both main and Japanese PHP mailing lists for
>any problems that might come up," George says.
>"PHP became especially useful to us when Shigeru Kanemoto, Hironori Neal
>Sato and a few other developers worked to release a version of PHP
>3.0.15-i18n. We are looking forward to this same core of people to also
>soon release an i18n J version of PHP 4. We expect it will provide us with
>even greater speed and performance than PHP 3."
>Studio Omame's decision to use PHP was welcomed by Paul Smith's company.
>The open-source nature of the product was simply not an issue. "We
>suggested PHP, and there was no problem. They were happy about the
>cost-savings involved," George recalls.
>
>PHP's Japanese challenge
>Since r-newbold.com is in Japanese only, Studio Omame made sure to utilize
>PHP's Japanese character set conversion functions. However, this proved to
>be a challenge.

Is this available for v4 of PHP yet?

>"Dealing with character sets when developing for the Japanese market is a
>headache," George says. "There's not one, but 3 character sets in wide
>use: SJIS, EUC and JIS.
>
>"We used all of them for this project. All output for the i-Mode platform
>needs to be SJIS, while input to MySQL is in EUC, and data for mailing was
>encoded in JIS. Ensuring the integrity of data via character set
>conversion was a very important factor. Ultimately, PHP made this project
>much smoother and very enjoyable. Not having to worry about character set
>mangling is a godsend."
>
>There was also the task of adapting PHP from computers to telephones. "The
>unique nature of celluar phone interfaces affected our application
>development. For example, people browsing the Web using i-Mode phones tend
>to use the back button a lot, rather than 'back' or 'return' links.
>Figuring out how to deal with this in preserving session data integrity
>took a little thinking."
>
>George and his team did figure it out, although George points out that
>"Daryl Jones of TEA, a communications firm based in Silicon Valley, helped
>out tremendously with the network and server consulting and set-up, so
>that we could focus totally on PHP development."
>It took Studio Omame almost two months to write the PHP scripts for all of
>the site's functions, including mail and accessing MySQL. "As much as
>possible, we try to stick with *nix, Apache, PHP," George says.
>
>The current shop database holds approximately 60 records, and the
>membership section is designed to easily handle an initial 20,000 records.
>In addition, other miscellaneous tables are used for administrative
purposes.
>
>PHP's functions
>PHP serves a variety of functions on the r-newbold site, including:
>* authentication and sessions via PHPLIB for a member-only section
>* PHP scripts + cron accessing MySQL database to send out daily e-mails
>* heavy use of mail functions for viral-marketing features
>* detecting specific browser/celluar handset and serving the appropriate
>interface
>* Japanese character set functions specific to i18n version, ensuring SJIS
>output, internal processing in EUC
>* mail operations in JIS
>
>As far as security purposes are concerned, Studio Omame decided to use
>PHPLIB for authentication. "At first we tried using the standard HTTP
>authentication method, but it turned out that certain i-Mode handsets had
>somewhat sketchy browser implementations, and they would need to log-in
>for each password-protected page! So we turned to PHPLIB, which worked
>very nicely."
>
>The site currently receives more than 3,000 page views per day, with most
>visitors spending about 5 minutes at a time on the site. "In Japan, people
>generally use i-Mode sites in mini-bursts, for example, while waiting for
>the train, riding the bus or walking about the street," George says.
>
>r-newbold.com hopes to receive approximately 200,000 visitors a day
>initially. "We tried to reduce the amount of calls to MySQL as much as
>possible, and instead created static pages that are updated from an
>administration screen."
>
>Data is collected on visitors in the members' section only. It's an
>entirely opt-in, voluntary feature, George notes.
>Studio Omame also used Adobe's GoLive 5 for the information design and
>page-template creation for this project. The WebDav and i-Mode eMoji
>features were particularly useful.
>
>As i-Mode becomes more and more popular, George says he hopes PHP will
>eventually provide more i18n compatibility.
>In the meantime, Studio Omame has launched omake.com in an experimental
>i-Mode site using PHP. Several other i-Mode projects are also in the
>pipeline. "We plan to use PHP extensively for all of them," George
predicts.


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: php-general-unsubscribe@lists.php.net
For additional commands, e-mail: php-general-help@lists.php.net
To contact the list administrators, e-mail: php-list-admin@lists.php.net