Re: New to PostgreSQL, performance considerations - Mailing list pgsql-performance

From Daniel van Ham Colchete
Subject Re: New to PostgreSQL, performance considerations
Date
Msg-id 8a0c7af10612110232y2fb416ffpdfa70f1b492388ae@mail.gmail.com
Whole thread Raw
In response to Re: New to PostgreSQL, performance considerations  (Alexander Staubo <alex@purefiction.net>)
List pgsql-performance
On 12/11/06, Alexander Staubo <alex@purefiction.net> wrote:
> On Dec 11, 2006, at 02:47 , Daniel van Ham Colchete wrote:
>
> > I never understood what's the matter between the ASCII/ISO-8859-1/UTF8
> > charsets to a database. They're all simple C strings that doesn't have
> > the zero-byte in the midlle (like UTF16 would) and that doesn't
> > require any different processing unless you are doing case insensitive
> > search (them you would have a problem).
>
> That's not the whole story. UTF-8 and other variable-width encodings
> don't provide a 1:1 mapping of logical characters to single bytes; in
> particular, combination characters opens the possibility of multiple
> different byte sequences mapping to the same code point; therefore,
> string comparison in such encodings generally cannot be done at the
> byte level (unless, of course, you first acertain that the strings
> involved are all normalized to an unambiguous subset of your encoding).
>
> PostgreSQL's use of strings is not limited to string comparison.
> Substring extraction, concatenation, regular expression matching, up/
> downcasing, tokenization and so on are all part of PostgreSQL's small
> library of text manipulation functions, and all deal with logical
> characters, meaning they must be Unicode-aware.
>
> Alexander.
>

You're right. I was thinking only about my cases that takes the
Unicode normatization for granted and doesn't use
regexp/tokenization/...
Thanks

Best
Daniel

pgsql-performance by date:

Previous
From: Ragnar
Date:
Subject: Re: SQL_CALC_FOUND_ROWS in POSTGRESQL / Some one can
Next
From: "Daniel van Ham Colchete"
Date:
Subject: Re: New to PostgreSQL, performance considerations