Home > mailing lists

Re: another seemingly simple encoding question - Mailing list pgsql-general

From	John D. Burger
Subject	Re: another seemingly simple encoding question
Date	March 24, 2006 10:47:07
Msg-id	fff3e8bce85a8c49e8d81ea4b45e367e@mitre.org Whole thread Raw
In response to	Re: another seemingly simple encoding question (joseph <kmh496@kornet.net>)
List	pgsql-general

Tree view

This doesn't sound like your problem, but I'll explain the
normalization issue using Korean as an example, since that seems to be
your data:  There are codepoints in Unicode both for Hangul and Jamo,
so a Hangul glyph can be represented either with the single
corresponding codepoint, or as two or three Jamo codepoints.  A Unicode
font would display these two alternatives identically.  In any Unicode
encoding, including UTF8, these two strings would not be byte-for-byte
identical.  The Unicode normalization forms are four algorithms for
normalizing the strings in such a way that they do compare identically.

Anyway, it sounds like you have the opposite problem, two strings that
are comparing equal when you think they shouldn't.  I don't know that
anyone can help you unless you post an actual example of two such
strings.

- John D. Burger
   MITRE

pgsql-general by date:

From: joseph
Date: 24 March 2006, 10:41:19
Subject: Re: another seemingly simple encoding question

From: "Ian Harding"
Date: 24 March 2006, 11:17:22
Subject: Re: PostgreSQL 8.1 v. Oracle 10g xe

Re: another seemingly simple encoding question - Mailing list pgsql-general

Previous

Next