Re: Space requirements (with respect to foriegn languages) - Mailing list pgsql-php

From Markus Bertheau
Subject Re: Space requirements (with respect to foriegn languages)
Date
Msg-id 1093703976.2732.5.camel@teetnang
Whole thread Raw
In response to Space requirements (with respect to foriegn languages)  (Gerard Samuel <php-db@trini0.org>)
Responses Re: Space requirements (with respect to foriegn languages)
List pgsql-php
В Чтв, 26.08.2004, в 22:36, Gerard Samuel пишет:
> My site/code/database is developed primarily for the english language.
> I've had people from "The Far East" add content to my site using their
> native language, and it is displaying properly in the site.
> But Im a bit concerned about the number of characters these languages use.
> For example, I've had someone enter ->
> chinese testing 中文
>
> It is saved in the database as ->
> chinese testing 中文

Your web page uses a character set that does not contain chinese
characters. So the browser decided to send their respective HTML
entities instead. These entities, as you correctly observed, amount to
more than one (latin, ASCII) character.

> Now, forgive my ignorance, but I have no idea what the additional
> chinese characters mean, but from the values in the database, Im
> assuming that it amounts to 3 characters.
> But if Im correct that those are 3 characters, it is
> using up 24 characters in a column.
>
> My concern is that what if I were to limit a column to say 25 "english"
> characters, and a chinese fellow, comes by and hypothetically says
> "Hello World" in chinese and goes over the limit of the column, the data
> will be truncated.

PostgreSQL will not truncate the data, but reject it; but the general
point is correct.

> Is there anything that can be done to overcome this shortcoming?
>
> Im currently using PostgreSQL 7.4.2, using SQL_ASCII as the database
> characterset, FreeBSD 4.10, php 4.3.6.

Change your site to use a character set that includes chinese
characters, for example Unicode. The most common encoding of Unicode on
the web is UTF-8. It's also the encoding PostgreSQL uses when you use
UNICODE as the database encoding.

If you decide to switch your site to UTF-8 and want varchar(25) to mean
25 characters, and not 25 bytes, you have to change the database
encoding to UNICODE accordingly.

--
Markus Bertheau <twanger@bluetwanger.de>


pgsql-php by date:

Previous
From: Justin Wyer
Date:
Subject: Re: [pgsql-hackers-win32] 8.0 beta1 and XP SP2
Next
From: Gerard Samuel
Date:
Subject: Re: Space requirements (with respect to foriegn languages)