Re: Unicode mapping scripts cleanup - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: Unicode mapping scripts cleanup
Date
Msg-id 55F8B94C.7050909@gmx.net
Whole thread Raw
In response to Re: Unicode mapping scripts cleanup  (Tatsuo Ishii <ishii@postgresql.org>)
Responses Re: Unicode mapping scripts cleanup  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers
On 9/1/15 7:27 PM, Tatsuo Ishii wrote:
>> On Tue, Sep 1, 2015 at 5:13 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>>>   So apparently, the
>>> CJK to Unicode mappings are still evolving and should be updated
>>> occasionally.  Next steps would be to commit some or all of these
>>> differences after additional verification, and then update the scripts
>>> to use whatever the non-obsolete mapping sources are supposed to be.
>>
>> Would that pose a problem for databases which have data in them
>> already using the old mappings?
> 
> I think so. We must be very careful updating the maps. Adding new
> mapping data would cause less problem, but replacing existing mappings
> will be definitely a big problem for users.

Note that I'm not actually proposing to change the mappings, I just want
to get the scripts into working order, to put us into a position to
consider changes if necessary.

That said, I'm not sure what the problem with changes would be.  The
data in the databases doesn't change.  You just see different data
coming out.  It is in the nature of encoding conversion that you don't
get the original data, but an approximation.  Then again, I don't have
any knowledge about how to handle such changes.  But the fact that the
standards bodies are still making changes indicates that such changes
are to be expected and should be handled.  I think this is similar to
time zone changes, and also similar in different ways to collation changes.




pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: src/test/ssl broken on HEAD
Next
From: Peter Eisentraut
Date:
Subject: Re: proposal: function parse_ident