Home > mailing lists

Re: [GENERAL] Changing collate & ctype for an existing database - Mailing list pgsql-general

From	rihad
Subject	Re: [GENERAL] Changing collate & ctype for an existing database
Date	July 12, 2017 19:56:13
Msg-id	03d7f229-12d4-c037-b918-d76ef02d5986@mail.ru Whole thread Raw
In response to	Re: [GENERAL] Changing collate & ctype for an existing database (Albe Laurenz <laurenz.albe@wien.gv.at>)
Responses	Re: [GENERAL] Changing collate & ctype for an existing database
List	pgsql-general

Tree view

On 07/12/2017 01:54 PM, Albe Laurenz wrote:
> rihad wrote:
>> Hi there. We have a working database that was unfortunately created by
>> initdb with default ("C") collation & ctype. All other locale specific
>> settings have the value en_US.UTF-8 in postgresql.conf. The database
>> itself is multilingual and all its data is stored in UTF-8. Sorting
>> doesn't work correctly, though. To fix that, can I just do this:
>>
>>
>> update pg_database set datcollate='en_US.UTF-8', datctype='en_US.UTF-8'
>> where datname='mydb';
>>
>>
>> This does seem to work on a testing copy of the database, i.e. select
>> lower('БлаБлаБла') now works correctly when connected to that database.
>>
>>
>> Is there still any chance for corrupting data by doing this, or indexes
>> stopping working etc?
>>
>> p.s.: postgres 9.6.3
> As explained, yes.  Indexes on string columns will be corrupted.
>
> See this example:
>
> test=# CREATE DATABASE breakme LC_COLLATE "C" LC_CTYPE "C" TEMPLATE template0;
> test=# \c breakme
> breakme=# CREATE TABLE sort(id integer PRIMARY KEY, val text NOT NULL);
> breakme=# INSERT INTO sort VALUES (1, 'LITTLE'), (2, 'big'), (3, 'b-less');
> breakme=# CREATE INDEX ON sort(val);
> breakme=# SET enable_seqscan=off;  -- force index use
> breakme=# SELECT * FROM sort ORDER BY val;
> ┌────┬────────┐
> │ id │  val   │
> ├────┼────────┤
> │  1 │ LITTLE │
> │  3 │ b-less │
> │  2 │ big    │
> └────┴────────┘
> (3 rows)
>
> breakme=# UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='breakme';
> breakme=# \c breakme
> breakme=# SET enable_seqscan=off;  -- force index use
> breakme=# SELECT * FROM sort ORDER BY val;
> ┌────┬────────┐
> │ id │  val   │
> ├────┼────────┤
> │  1 │ LITTLE │
> │  3 │ b-less │
> │  2 │ big    │
> └────┴────────┘
> (3 rows)
>
> breakme=# SET enable_seqscan=on;  -- this and the following force sequential scan
> breakme=# SET enable_bitmapscan=off;
> breakme=# SET enable_indexscan=off;
> breakme=# SET enable_indexonlyscan=off;
> breakme=# SELECT * FROM sort ORDER BY val;  -- this returns the correct order
> ┌────┬────────┐
> │ id │  val   │
> ├────┼────────┤
> │  2 │ big    │
> │  3 │ b-less │
> │  1 │ LITTLE │
> └────┴────────┘
> (3 rows)
>
> As you see, your index is still sorted according to the C collation
> and scanning it returns wrong results.
>
> Yours,
> Laurenz Albe

This ordering issue can certainly be classified as an inconsistency, but
nothing to lose sleep over. Is this all that is normally meant when
saying "index corruption"? What about updating or deleting the wrong row
addressed by the textual index that hasn't been rebuilt after
datcollate/datctype change, complete table/database corruption, or other
scary night-time stories of this kind? Possible?

pgsql-general by date:

From: Christopher Browne
Date: 12 July 2017, 19:04:08
Subject: Re: [GENERAL] Please say it isn't so

From: Tom Lane
Date: 12 July 2017, 20:31:23
Subject: Re: [GENERAL] Changing collate & ctype for an existing database

Re: [GENERAL] Changing collate & ctype for an existing database - Mailing list pgsql-general

Previous

Next