Thread: directional marks

directional marks

From
nngodinh@tiscali.it
Date:
Greetings,

I've found some problems to handle the directional marks (i.e. for arabic
charset in UNICODE 0x200e and 0x200f). As I've exported the db from Microsoft
SQL 7.0 there were so many directional marks even inside words (i.e. "foo"
-> "f(200e)oo"). This probably is due to an external program which was used
to fill the db with values.

Directional marks are not shown in the client browser but for PostgreSQL
is a character. This is a problem when I try to SELECT the db:

SELECT * FROM table WHERE field = "foo";

If in "table" there is a record which contais "foo" as value for "field",
but it has a directional mark in it (i.e.: "f(200e)oo"), I can't get any
result.

The only way to fix the problem is to remove any directional mark occurrence,
or to make PostgreSQL ignore that kind of characters during UNICODE queries.

What do you think about it?
Thx.

---
Nhan NGO DINH



__________________________________________________________________
Tiscali Ricaricasa
la prima prepagata per navigare in Internet a meno di un'urbana e
risparmiare su tutte le tue telefonate. Acquistala on line e non avrai
nessun costo di attivazione né di ricarica!
http://ricaricasaonline.tiscali.it/





Re: directional marks

From
Peter Eisentraut
Date:
nngodinh@tiscali.it writes:

> The only way to fix the problem is to remove any directional mark occurrence,
> or to make PostgreSQL ignore that kind of characters during UNICODE queries.
>
> What do you think about it?

Either remove the directional marks or consistently use them in all your
queries (or use wildcards to paint over the difference).  The directional
mark characters aren't just for amusement -- they contain real information
so they cannot be ignored.

-- 
Peter Eisentraut   peter_e@gmx.net




Re: directional marks

From
nngodinh@tiscali.it
Date:
I'm speaking about directional marks that are ignored by - for instance
- by Microsoft SQL 7.0 because they're unuseful in that position (like when
they're in a one way text either left-to-right or right-to-left). It may
happen that this kind of symbols are randomly inserted: for example...

The entry user types an english text like "test". At the end he switches
the keyboard layout to arabic and types something arabic but he realizes
he don't want to do that and erases the arabic text, switches again the
keyboard and inserts english text after "test". Some directional marks are
inserted but they're unuseful.

The problem is that sometimes the directional mark is inside a word, not
just at the ending, and after all if you try to index using txt2txtidx,
directional marks are not recognized as delimiters (and they aren't) so
the txtidx array will contain the near word with an appended directional
mark.

May be you can say that the source I've exported the db from is a malformed
one, and you are absolutely right. Anyway I know that some programs (expecially
Microsoft) does this mistake. I'm not speaking of PHP.

Bye.

>-- Messaggio Originale --
>Date: Mon, 16 Sep 2002 19:25:30 +0200 (CEST)
>From: Peter Eisentraut <peter_e@gmx.net>
>To: nngodinh@tiscali.it
>cc: pgsql-hackers@postgresql.org
>Subject: Re: [HACKERS] directional marks
>
>
>nngodinh@tiscali.it writes:
>
>> The only way to fix the problem is to remove any directional mark occurrence,
>> or to make PostgreSQL ignore that kind of characters during UNICODE queries.
>>
>> What do you think about it?
>
>Either remove the directional marks or consistently use them in all your
>queries (or use wildcards to paint over the difference).  The directional
>mark characters aren't just for amusement -- they contain real information
>so they cannot be ignored.
>
>--
>Peter Eisentraut   peter_e@gmx.net
>
>



__________________________________________________________________
Tiscali Ricaricasa
la prima prepagata per navigare in Internet a meno di un'urbana e
risparmiare su tutte le tue telefonate. Acquistala on line e non avrai
nessun costo di attivazione né di ricarica!
http://ricaricasaonline.tiscali.it/





Re: directional marks

From
Peter Eisentraut
Date:
nngodinh@tiscali.it writes:

> I'm speaking about directional marks that are ignored by - for instance
> - by Microsoft SQL 7.0 because they're unuseful in that position (like when
> they're in a one way text either left-to-right or right-to-left). It may
> happen that this kind of symbols are randomly inserted: for example...

To me this sounds analogous to inserting tons of <space><backspace>
sequences into a string and expecting the software to automatically figure
out that they cancel.  It would be possible, but it would probably add a
lot of overhead and it doesn't seem to be requested a lot.  The best
solution is probably to fix your data.  Unless you can point to a Unicode
standard that states that such cancellation should happen.

-- 
Peter Eisentraut   peter_e@gmx.net