Re: Second byte of multibyte characters causing trouble - Mailing list pgsql-general

Tatsuo Ishii · September 22, 2001 06:48:15

> Now first I have to convert my existing data, which although sitting in a
> database that expects EUC, is actually SJIS-based text.  I found the
> following series of bash commands in a Japanese mailing list archive - does
> it look like this will work for me?  (It looks scary to just drop the whole
> database and hope that the .out file knows how to rebuild it with all the
> indexes, sequences, users, etc. in place - should I be nervous?)
> $ pg_dump -D dbname > db.out
> $ dropdb dbname
> $ createdb -E EUC_JP dbname
> $ export PGCLIENTENCODING=SJIS
> $ psql dbname < db.out
> $ export PGCLIENTENCODING=EUC_JP

Yes, above procedure should convert your SJIS based database (by
mistake) to EUC_JP database.

> Regarding the user interface end, when I read the suggested solution of
> using jcode to convert everything in and out of the database, I thought,
> "That's tedious!  Why not just use EUC on the web pages, and the whole
> system will be in sync?"  But that seems to be almost as tedious.  The
> Windows-based editor I normally use to input the Japanese text portions of
> my code (I do most of the work in vi on my Linux box, but I can't input the
> Japanese that way)

You can't input Japanese using vi? Why?

> reads and writes in Shift-JIS unless I use pre- and
> post-processing filters, and it seems that other Windows programs also favor
> Shift-JIS.

Why not emacs? It can read and write SJIS texts directory.

> I did a totally unofficial, very-small-data-sample survey of
> Japanese web sites, and it seems that in general, sites that deal with
> ordinary consumers (and likely are written on Microsoft machines) use
> Shift-JIS (even ones that I figure must use databases, like search engines
> and e-commerce), Linux-related sites use JIS, and PostgreSQL-related sites
> use EUC.  I'm sure there's a grand story to explain how it got to be this
> messy, but for right now, I guess we have to live with all these different
> systems - apparently there is not one system that works nicely for all
> things, or else the others would gradually become obselete, right?
>
> Before I add jcode function calls for every piece of data I get in or out of
> the database, or convert all my web page text to EUC-JP (I haven't decided
> yet which approach is more work, or more of a problem to maintain), are
> there any other thoughts on this?  For example, does someone know of one of
> the following: (a) a way to get the text-only console of a RedHat 6.1J box
> to actually display Japanese characters (if so, I not only wouldn't have to
> deal with the Windows box for editing, I could even read the output of
> queries in psql!),

Use "kon" command.

> or (b) a text editor for Windows that can be configured
> to default to EUC, rather than having to remember to always select a filter
> to convert to and from Shift-JIS?

Again why not emacs?

> Or on the flip side of the discussion,
> can anyone imagine pitfalls associated with having a web site that is half
> EUC (the PHP and Perl files that deal with the database) and half Shift-JIS
> (the static HTML pages that are written by other people in who-knows-what
> Windows-based tools)?

Are yo using PHP? Then I strongly recommend upgrade to PHP 4.0.6 or
higher. It supports Japanese very well. It aumatically guess the input
charset, does the neccessary conversion. This is very helpfull.  Also
I recommend that you always use EUC-JP to write PHP scripts.

Assuming you could read/write Japanese, I recommend you subscribe
PHP-users list (http://ns1.php.gr.jp/mailman/listinfo/php-users).
--
Tatsuo Ishii

From	Tatsuo Ishii
Subject	Re: Second byte of multibyte characters causing trouble
Date	September 22, 2001 06:48:15
Msg-id	20010922194800Y.t-ishii@sra.co.jp Whole thread Raw
In response to	Re: Second byte of multibyte characters causing trouble ("Karen Ellrick" <k-ellrick@sctech.co.jp>)
Responses	Re: Second byte of multibyte characters causing trouble
List	pgsql-general

Re: Second byte of multibyte characters causing trouble - Mailing list pgsql-general

Previous

Next