Thread: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
From
"Maeldron T."
Date:
Hello,
# \d users
Column | Type | Modifiers
-------------------------+--------------------------+----------------------------------------------------
id | integer | not null default nextval('users_id_seq'::regclass)
username | character varying | not null
password | character varying | not null
email | character varying | not null
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"index_users_unique_on_lower_email" UNIQUE, btree (lower(email::text))
"index_users_unique_on_lower_username" UNIQUE, btree (lower(username::text))
test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaa', 'Ági');
INSERT 0 1
test2=# select id, username from users where lower(username) = 'ági';
id | username
--------+----------
123 | Ági
(1 row)
test2=# select id, username from users where username = 'Ági';
id | username
--------+----------
123 | Ági
456 | Ági
(2 rows)
test2=# select id, username from users where username = 'Mustafa';
id | username
-------+----------
123 | Mustafa
(1 row)
test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaab', 'Mustafa');
INSERT 0 1
I’ll be as short as I can as I have broken my arm and I’m not supposed to type. This isn’t a joke.
However, I learned the following the hard way and I want to save you from the same.
Upgrading FreeBSD from 10 to 11 might break your database. It probably won’t be corrupted but it will be useless until you dump-import it, and you might need to do manual fixing.
My app has more than 1000 automated tests, and the upgrade itself was explicitly tested. The affected columns are tested many times. It was tested on two different machines (including the 10 => 11) before done in production. But the issue happens only at random on large scale. I could not reproduce it with inserting a few rows. I could reproduce it with real data.
I didn’t debug much as I did not sleep for two days until I fixed it (live systems, with left hand).
I removed noise from queries, the real tables have dozens of not null columns. The edited queries may have syntax errors, but they were copied from real world output. So were the errors. I use 'xxx' and '123' to hide private info but the strings are real in general. This matters as the issue might be locale related.
I started seeing these in the logs:
PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_email"
DETAIL: Key (lower(email::text))=(andy.mxxx@xxx.com) already exists.
: UPDATE "users" SET "online_at" = '2016-12-01 00:23:33.245594', "visible_online_at" = '2016-12-01 00:23:33.245633' WHERE "users"."id" = $1
PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_username"
DETAIL: Key (lower(username::text))=(joyce1234) already exists.
: UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589', "visible_online_at" = '2016-11-30 22:11:37.167647' WHERE "users"."id" = $1
The table:
Column | Type | Modifiers
-------------------------+--------------------------+----------------------------------------------------
id | integer | not null default nextval('users_id_seq'::regclass)
username | character varying | not null
password | character varying | not null
email | character varying | not null
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"index_users_unique_on_lower_email" UNIQUE, btree (lower(email::text))
"index_users_unique_on_lower_username" UNIQUE, btree (lower(username::text))
Testing:
INSERT 0 1
test2=# select id, username from users where lower(username) = 'ági';
id | username
--------+----------
123 | Ági
(1 row)
test2=# select id, username from users where username = 'Ági';
id | username
--------+----------
123 | Ági
456 | Ági
(2 rows)
test2=# select id, username from users where username = 'Mustafa';
id | username
-------+----------
123 | Mustafa
(1 row)
test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaab', 'Mustafa');
INSERT 0 1
test2=# select id, username from users where username = 'Mustafa';
id | username
--------+----------
123 | Mustafa
456 | Mustafa
(2 rows)
test2=# select id, username from users where username = 'bunny';
id | username
------+----------
123 | bunny
(1 row)
test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaac', 'bunny');
INSERT 0 1test2=# select id, username from users where username = 'bunny';
id | username
--------+----------
123 | bunny
456 | bunny
(2 rows)
test2=# select id, username from users where username = 'edwin';
id | username
-------+----------
123 | edwin
(1 row)
test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaad', 'edwin');
INSERT 0 1
id | username
--------+----------
123 | Mustafa
456 | Mustafa
(2 rows)
test2=# select id, username from users where username = 'bunny';
id | username
------+----------
123 | bunny
(1 row)
test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaac', 'bunny');
INSERT 0 1test2=# select id, username from users where username = 'bunny';
id | username
--------+----------
123 | bunny
456 | bunny
(2 rows)
test2=# select id, username from users where username = 'edwin';
id | username
-------+----------
123 | edwin
(1 row)
test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaad', 'edwin');
INSERT 0 1
test2=# select id, username from users where username = 'edwin';
id | username
--------+----------
123 | edwin
456 | edwin
(2 rows)
test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaae', 'edwin');
ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_username"
DETAIL: Key (lower(username::text))=(edwin) already exists.
test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaae', 'bunny');
ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_username"
DETAIL: Key (lower(username::text))=(bunny) already exists.
id | username
--------+----------
123 | edwin
456 | edwin
(2 rows)
test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaae', 'edwin');
ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_username"
DETAIL: Key (lower(username::text))=(edwin) already exists.
test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaae', 'bunny');
ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_username"
DETAIL: Key (lower(username::text))=(bunny) already exists.
The issue didn’t happen only with strings that have characters like 'é', 'ő'. English names were also affected.
The application does have validation on the strings but I didn’t see the existing versions for the same reason the insert didn’t see.
*Sometimes*
Also, the colliding (not colliding) strings usually had the same case of characters. (Not 'edwin' vs 'Edwin' but two 'edwin's)
Sometimes only the latter triggered the violation:
UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589' = xxx
UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589', "visible_online_at" = '2016-11-30 22:11:37.167647' WHERE "users"."id" = xxx
Strange. I though PG created a new tuple in both cases.
How to reproduce it:
0. Don’t break your arm. It’s not needed.
1. Install FreeBSD 10.3. Install PostgreSQL. I used to have 9.5. Version 9.6 has ICU turned on by default. (Good morning, maintainer, yes it’s the 21Th century. I has been for a while). The ICU version might be not affected.
2. Create a database cluster with UTF-8 encoding. (Yes, the year is 2016). I use "en_US.UTF-8" for LC_*.
3. Create a table that has unique index or indices on lower(column)
4. Import much data in the table. Or I think inserting 100 000 random English names will do fine.
5. Upgrade to FreeBSD 11 the official way. It includes "upgrading" the PostgreSQL. But it’s the same version for the new platform. In my case it was postgresql95-server-9.5.4_1
6. Try to insert existing values. You should succeed at random.
7. Try to query the duplicates with "where lower(column) = 'edwin'". Or bunny. Or whatever. You will see only the new version. Even if the original string was already lower case.
This is not only an index and duplication issue. Querying with lower() also didn’t work with non duplicated records. At random.
In my case even non duplicated uses could not log in. They seemed to be deleted. But not all of them. At random. My users thought they were removed from the site. It seemed so.
I fixed it by dump and load. And as I already had downtime, I reloaded it to 9.6. I had to manually rename the duplicated records to do so.
I already typed 10 times more than I should have, feel free to discover it but not on your production database.
(Maybe database clusters should have a header that wouldn’t allow incompatible server versions to process the existing data. I wonder if it would take more than 8 bytes per server. But I guess it was not know to be incompatible. Even my two CIs didn’t show it.)
M
Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
From
"Maeldron T."
Date:
I forgot to add that when I turned off index scan, select lower() still didn’t find some existing rows.
On Sat, Dec 3, 2016 at 2:52 AM, Maeldron T. <maeldron@gmail.com> wrote:
Hello,I’ll be as short as I can as I have broken my arm and I’m not supposed to type. This isn’t a joke.However, I learned the following the hard way and I want to save you from the same.Upgrading FreeBSD from 10 to 11 might break your database. It probably won’t be corrupted but it will be useless until you dump-import it, and you might need to do manual fixing.My app has more than 1000 automated tests, and the upgrade itself was explicitly tested. The affected columns are tested many times. It was tested on two different machines (including the 10 => 11) before done in production. But the issue happens only at random on large scale. I could not reproduce it with inserting a few rows. I could reproduce it with real data.I didn’t debug much as I did not sleep for two days until I fixed it (live systems, with left hand).I removed noise from queries, the real tables have dozens of not null columns. The edited queries may have syntax errors, but they were copied from real world output. So were the errors. I use 'xxx' and '123' to hide private info but the strings are real in general. This matters as the issue might be locale related.I started seeing these in the logs:PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_email" DETAIL: Key (lower(email::text))=(andy.mxxx@xxx.com) already exists. : UPDATE "users" SET "online_at" = '2016-12-01 00:23:33.245594', "visible_online_at" = '2016-12-01 00:23:33.245633' WHERE "users"."id" = $1PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_username" DETAIL: Key (lower(username::text))=(joyce1234) already exists. : UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589', "visible_online_at" = '2016-11-30 22:11:37.167647' WHERE "users"."id" = $1The table:# \d users
Column | Type | Modifiers
-------------------------+--------------------------+------- ------------------------------ ---------------
id | integer | not null default nextval('users_id_seq'::regclass)
username | character varying | not null
password | character varying | not null
email | character varying | not null
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"index_users_unique_on_lower_email" UNIQUE, btree (lower(email::text))
"index_users_unique_on_lower_username" UNIQUE, btree (lower(username::text)) Testing:test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaa', 'Ági');
INSERT 0 1
test2=# select id, username from users where lower(username) = 'ági';
id | username
--------+----------
123 | Ági
(1 row)
test2=# select id, username from users where username = 'Ági';
id | username
--------+----------
123 | Ági
456 | Ági
(2 rows)
test2=# select id, username from users where username = 'Mustafa';
id | username
-------+----------
123 | Mustafa
(1 row)
test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaab', 'Mustafa');
INSERT 0 1test2=# select id, username from users where username = 'Mustafa';
id | username
--------+----------
123 | Mustafa
456 | Mustafa
(2 rows)
test2=# select id, username from users where username = 'bunny';
id | username
------+----------
123 | bunny
(1 row)
test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaac', 'bunny');
INSERT 0 1test2=# select id, username from users where username = 'bunny';
id | username
--------+----------
123 | bunny
456 | bunny
(2 rows)
test2=# select id, username from users where username = 'edwin';
id | username
-------+----------
123 | edwin
(1 row)
test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaad', 'edwin');
INSERT 0 1test2=# select id, username from users where username = 'edwin';
id | username
--------+----------
123 | edwin
456 | edwin
(2 rows)
test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaae', 'edwin');
ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_username"
DETAIL: Key (lower(username::text))=(edwin) already exists.
test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaae', 'bunny');
ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_username"
DETAIL: Key (lower(username::text))=(bunny) already exists. The issue didn’t happen only with strings that have characters like 'é', 'ő'. English names were also affected.The application does have validation on the strings but I didn’t see the existing versions for the same reason the insert didn’t see.*Sometimes*Also, the colliding (not colliding) strings usually had the same case of characters. (Not 'edwin' vs 'Edwin' but two 'edwin's)Sometimes only the latter triggered the violation:UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589' = xxxUPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589', "visible_online_at" = '2016-11-30 22:11:37.167647' WHERE "users"."id" = xxxStrange. I though PG created a new tuple in both cases.How to reproduce it:0. Don’t break your arm. It’s not needed.1. Install FreeBSD 10.3. Install PostgreSQL. I used to have 9.5. Version 9.6 has ICU turned on by default. (Good morning, maintainer, yes it’s the 21Th century. I has been for a while). The ICU version might be not affected.2. Create a database cluster with UTF-8 encoding. (Yes, the year is 2016). I use "en_US.UTF-8" for LC_*.3. Create a table that has unique index or indices on lower(column)4. Import much data in the table. Or I think inserting 100 000 random English names will do fine.5. Upgrade to FreeBSD 11 the official way. It includes "upgrading" the PostgreSQL. But it’s the same version for the new platform. In my case it was postgresql95-server-9.5.4_16. Try to insert existing values. You should succeed at random.7. Try to query the duplicates with "where lower(column) = 'edwin'". Or bunny. Or whatever. You will see only the new version. Even if the original string was already lower case.This is not only an index and duplication issue. Querying with lower() also didn’t work with non duplicated records. At random.In my case even non duplicated uses could not log in. They seemed to be deleted. But not all of them. At random. My users thought they were removed from the site. It seemed so.I fixed it by dump and load. And as I already had downtime, I reloaded it to 9.6. I had to manually rename the duplicated records to do so.I already typed 10 times more than I should have, feel free to discover it but not on your production database.(Maybe database clusters should have a header that wouldn’t allow incompatible server versions to process the existing data. I wonder if it would take more than 8 bytes per server. But I guess it was not know to be incompatible. Even my two CIs didn’t show it.)M
Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
From
Thomas Munro
Date:
On Sat, Dec 3, 2016 at 2:52 PM, Maeldron T. <maeldron@gmail.com> wrote: > Upgrading FreeBSD from 10 to 11 might break your database. It probably won’t > be corrupted but it will be useless until you dump-import it, and you might > need to do manual fixing. Sorry to hear about your arm and your database. The collation support was changed in FreeBSD 11 (in collaboration with Illumos and DragonflyBSD[1]). From FreeBSD's UPDATING: Databases administrators will need to reindex their databases given collation results will be different. This is a problem that faces users of Postgres on all operating systems, too, for example recently some GNU/Linux users had indexes silently corrupted because of change to the glibc locale data for German. One approach is to use "amcheck"[2] (proposed for future inclusion in PostgreSQL) to check if indexes are still sane, any time the collation definition files or libc itself might have changed. > (Maybe database clusters should have a header that wouldn’t allow > incompatible server versions to process the existing data. I wonder if it > would take more than 8 bytes per server. But I guess it was not know to be > incompatible. Even my two CIs didn’t show it.) I had some thoughts along those lines too[3]. I thought about checksumming libc and all relevant collation files (ie OS specific files) so you could notice when anything that could bite you changes (that is just some sketched out ideas, not production code). Some have suggested that PostgreSQL should stop using OS collations and adopt ICU[4] and then use its versioning metadata. Of course some people already use ICU on FreeBSD because the old strcoll implementation didn't play well with Unicode, but those users are still presumably exposed to silent corruption when ICU changes because AFAIK that didn't keep track of ICU versions. [1] http://blog.etoilebsd.net/post/This_is_how_I_like_opensource [2] https://www.postgresql.org/message-id/flat/CAM3SWZQzLMhMwmBqjzK+pRKXrNUZ4w90wYMUWfkeV8mZ3Debvw@mail.gmail.com [3] https://github.com/macdice/check_pg_collations [4] https://www.postgresql.org/message-id/flat/85364fde-091f-bbc0-fec2-e3ede39840a6@2ndquadrant.com -- Thomas Munro http://www.enterprisedb.com
Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
From
"Maeldron T."
Date:
Hello Thomas,
> (Maybe database clusters should have a header that wouldn’t allow
> incompatible server versions to process the existing data. I wonder if it
> would take more than 8 bytes per server. But I guess it was not know to be
> incompatible. Even my two CIs didn’t show it.)
I had some thoughts along those lines too[3]. I thought about
checksumming libc and all relevant collation files (ie OS specific
files) so you could notice when anything that could bite you changes
(that is just some sketched out ideas, not production code). Some
have suggested that PostgreSQL should stop using OS collations and
adopt ICU[4] and then use its versioning metadata. Of course some
people already use ICU on FreeBSD because the old strcoll
implementation didn't play well with Unicode, but those users are
still presumably exposed to silent corruption when ICU changes because
AFAIK that didn't keep track of ICU versions.
I have been using PostgreSQL for 12 years. *Every* issue that I personally ever had was locale-related.
I define "issue" as not totally my fault and it took more than a hour to fix. That’s why I was a bit sarcastic with my comments about the year 2016. In the 80’s I have grown up on sci-fi movies about spaceships make wormholes to travel to the unknown parts of the Universe, even capable of shifting dimensions, but today, one of the biggest issue in IT is when one wants to use UTF-8.
I think I spent more time on fixing locale related issues than on fixing all the rest together.
(Not counting the bugs in my own code.)
Once I tried DragonflyBSD. It could not compile ICU so I deleted it.
I used to run PostgreSQL with ICU on FreeBSD, but I switched from ports to binary packages. Mixing them isn’t fun. (Well, currently I have it mixed.) Anyway, ICU is turned on for PostgreSQL 9.6 even in the pkg version. Hurray.
M
Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
From
Michael Sheaver
Date:
I would like to echo the sentiment on collation and expand it to character sets in general. When issues with them come up, they do take an incredible amount of time and effort to resolve, and are one of my own biggest pain points when dealing with databases and datasets from other sources. Case in point:
I have one dataset that I pull into my local (Windows laptop) for analysis on a weekly basis. Invariably it is sprinkled with special characters that trip up the copy command; these appear to come from people fat-fingering in the data input and the system not checking it. It only takes one of these special characters to trip up and negate the entire result of the copy command. After wrestling with this for a couple days about a year ago, the workaround I found that works is to first import it into a MySQL table, strip out the characters in MySQL, dump the data out to a CSV and finally bring the sanitized data into Postgres using the copy command.
So yes, character set and collations sets are the bane of DBAs, and yes, they are a bitch to troubleshoot and yes, they often take very creative solutions to fix.
Michael
On Dec 7, 2016, at 4:40 AM, Maeldron T. <maeldron@gmail.com> wrote:Hello Thomas,
> (Maybe database clusters should have a header that wouldn’t allow
> incompatible server versions to process the existing data. I wonder if it
> would take more than 8 bytes per server. But I guess it was not know to be
> incompatible. Even my two CIs didn’t show it.)
I had some thoughts along those lines too[3]. I thought about
checksumming libc and all relevant collation files (ie OS specific
files) so you could notice when anything that could bite you changes
(that is just some sketched out ideas, not production code). Some
have suggested that PostgreSQL should stop using OS collations and
adopt ICU[4] and then use its versioning metadata. Of course some
people already use ICU on FreeBSD because the old strcoll
implementation didn't play well with Unicode, but those users are
still presumably exposed to silent corruption when ICU changes because
AFAIK that didn't keep track of ICU versions.I have been using PostgreSQL for 12 years. *Every* issue that I personally ever had was locale-related.I define "issue" as not totally my fault and it took more than a hour to fix. That’s why I was a bit sarcastic with my comments about the year 2016. In the 80’s I have grown up on sci-fi movies about spaceships make wormholes to travel to the unknown parts of the Universe, even capable of shifting dimensions, but today, one of the biggest issue in IT is when one wants to use UTF-8.I think I spent more time on fixing locale related issues than on fixing all the rest together.(Not counting the bugs in my own code.)Once I tried DragonflyBSD. It could not compile ICU so I deleted it.I used to run PostgreSQL with ICU on FreeBSD, but I switched from ports to binary packages. Mixing them isn’t fun. (Well, currently I have it mixed.) Anyway, ICU is turned on for PostgreSQL 9.6 even in the pkg version. Hurray.M
Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
From
Vick Khera
Date:
On Wed, Dec 7, 2016 at 8:33 AM, Michael Sheaver <msheaver@me.com> wrote:
with this for a couple days about a year ago, the workaround I found that works is to first import it into a MySQL table, strip out the characters in MySQL, dump the data out to a CSV and finally bring the sanitized data into Postgres using the copy command.
Couldn't you load them into an SQL_ASCII postgres database and accomplish the same?
I have a utility I wrote to migrate our entire production DB from SQL_ASCII to UTF8. It works *really* well, and introspects your database to figure out what to do, one table at a time. You'd avoid the need for CSV importing. Just dump/reload into UTF8 database and be done.
Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
From
Kevin Grittner
Date:
On Wed, Dec 7, 2016 at 7:33 AM, Michael Sheaver <msheaver@me.com> wrote: > I would like to echo the sentiment on collation and expand it to > character sets in general. When issues with them come up, they do > take an incredible amount of time and effort to resolve, and are > one of my own biggest pain points when dealing with databases and > datasets from other sources. I would be much happier if most developers understood the difference between a character set (e.g., Unicode) and a character encoding scheme (e.g., UTF-8 or UTF-16) and how the two concepts relate. If we reached a point where most DBAs understood the point of being able to set a client_encoding that is different from the server_encoding, I think I would need to pop the cork on some champagne. Hm. Maybe a topic for a blog post.... -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
From
Rob Sargent
Date:
On 12/07/2016 02:06 PM, Kevin Grittner wrote: > On Wed, Dec 7, 2016 at 7:33 AM, Michael Sheaver <msheaver@me.com> wrote: >> I would like to echo the sentiment on collation and expand it to >> character sets in general. When issues with them come up, they do >> take an incredible amount of time and effort to resolve, and are >> one of my own biggest pain points when dealing with databases and >> datasets from other sources. > I would be much happier if most developers understood the > difference between a character set (e.g., Unicode) and a character > encoding scheme (e.g., UTF-8 or UTF-16) and how the two concepts > relate. If we reached a point where most DBAs understood the point > of being able to set a client_encoding that is different from the > server_encoding, I think I would need to pop the cork on some > champagne. > > Hm. Maybe a topic for a blog post.... > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > +1 - Confuses me every time
Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
From
Thomas Munro
Date:
On Wed, Dec 7, 2016 at 10:40 PM, Maeldron T. <maeldron@gmail.com> wrote: > Anyway, ICU is turned on for PostgreSQL 9.6 even in the pkg version. Hurray. Hmm, a curious choice, considering that FreeBSD finally has built-in collations that work! Using the port's ICU patch doesn't change anything about the risks here as far as I know. ICU continually pulls corrections and improvements from CLDR, and FreeBSD's libc is now doing the same, so actually both ICU and libc might be getting the very same stream of collation changes, just at different speeds. IMHO we can't continue to ignore this problem: we need to teach Postgres to recognise when collations change. That could mean adopting like ICU and then tracking when every index is potentially invalidated by a version change (see Peter Eisentraut's proposal), or adopting non-portable implementation-specific techniques. For the latter, we'd probably not want to litter core Postgres with assumptions about how each OS does things. One fairly blunt approach I came up with goes like this: 1. Add a new optional GUC system_collation_version_command = '/some/user/supplied/script.sh'. 2. When postmaster starts, run it and remember the output in memory. 3. When a database is created, store it for this database. 4. When connecting to a database, complain loudly if version at startup doesn't match the stored version. 5. Update the stored value to the startup value when you REINDEX DATABASE (ie that's how to clear the warning). Then package mantainers could supply a script that know how to do the right thing on this OS. For example it could be the package version string from the currently installed locales package, or an MD5 hash of the contents of all files in /usr/share/locales/, or whatever suits. The reason for only running the script at postmaster startup is that there is a risk of libc caching data, so that a REINDEX would use old data but running the command would see new files on disk, so we need to make sure that a cluster restart is necessary after upgrading your OS to clear the warning. That's horribly blunt: it makes you reindex the whole database even if you don't use a collation that changed, or don't even use btrees, etc. You could do something more nuanced and complicated that works at the level of individual locales and indexes (see the link I posted earlier to check_pg_collations for some ideas), but the idea would be basically the same. -- Thomas Munro http://www.enterprisedb.com
Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
From
Tim Clarke
Date:
On 07/12/16 21:20, Rob Sargent wrote: > > > On 12/07/2016 02:06 PM, Kevin Grittner wrote: >> I would be much happier if most developers understood the >> difference between a character set (e.g., Unicode) and a character >> encoding scheme (e.g., UTF-8 or UTF-16) and how the two concepts >> relate. If we reached a point where most DBAs understood the point >> of being able to set a client_encoding that is different from the >> server_encoding, I think I would need to pop the cork on some >> champagne. >> >> Hm. Maybe a topic for a blog post.... >> >> -- >> Kevin Grittner >> EDB: http://www.enterprisedb.com >> The Enterprise PostgreSQL Company >> >> > +1 - Confuses me every time > > +1 me too -- Tim Clarke