Re: unicode questions - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: unicode questions
Date
Msg-id 4B33997E.2040907@dunslane.net
Whole thread Raw
In response to unicode questions  (- - <crossroads0000@googlemail.com>)
Responses Re: unicode questions  (- - <crossroads0000@googlemail.com>)
List pgsql-hackers

- - wrote:
> Dear PG hackers,
>
> I have two question regarding Unicode support in PG:
>
> 1) If I set my database and connection encoding to UTF-8, does pg (and
> future versions of it) guarantee that unicode code points are stored
> unmodified? or could it be that pg does some unicode
> normalization/manipulation with them before storing a string, or when
> retrieving a string?
>
> The reason why I'm asking is, I've built a little program that reads
> in and stores text and explicilty analyzes the text at a later point
> in time, also regarding things like if the text is in NFC, NFD or
> neither. and since I want to store them in the database, it is very
> imporant for PG not to fiddle around with the normalization unless my
> program explicitly told PG to do that.
>
> 2) How far is normalization support in PG? When I checked a long time
> ago, there was no such support. Now that the SQL standard mandates a
> NORMALIZE function that may have changed. Any updates?
>   

We don't do any normalization. If the client gives us UTF8 then we store 
exactly what it gives us, and return exactly that.

(This question is not really a -hackers question. The correct forum is 
pgsql-general. Please make sure you use the correct forum in future.)

cheers

andrew


pgsql-hackers by date:

Previous
From: - -
Date:
Subject: unicode questions
Next
From: Tom Lane
Date:
Subject: Re: Corrupt WAL production possible in gistxlog.c