Thread: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken

Hello,

I’ll be as short as I can as I have broken my arm and I’m not supposed to type. This isn’t a joke.

However, I learned the following the hard way and I want to save you from the same.

Upgrading FreeBSD from 10 to 11 might break your database. It probably won’t be corrupted but it will be useless until you dump-import it, and you might need to do manual fixing.

My app has more than 1000 automated tests, and the upgrade itself was explicitly tested. The affected columns are tested many times. It was tested on two different machines (including the 10 => 11) before done in production. But the issue happens only at random on large scale. I could not reproduce it with inserting a few rows. I could reproduce it with real data.

I didn’t debug much as I did not sleep for two days until I fixed it (live systems, with left hand).

I removed noise from queries, the real tables have dozens of not null columns. The edited queries may have syntax errors, but they were copied from real world output. So were the errors. I use 'xxx' and '123' to hide private info but the strings are real in general. This matters as the issue might be locale related.

I started seeing these in the logs:

PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_email"
DETAIL: Key (lower(email::text))=(andy.mxxx@xxx.com) already exists.
: UPDATE "users" SET "online_at" = '2016-12-01 00:23:33.245594', "visible_online_at" = '2016-12-01 00:23:33.245633' WHERE "users"."id" = $1

PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_username"
DETAIL: Key (lower(username::text))=(joyce1234) already exists.
: UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589', "visible_online_at" = '2016-11-30 22:11:37.167647' WHERE "users"."id" = $1

The table:

# \d users
         Column          |           Type           |                     Modifiers                      
-------------------------+--------------------------+----------------------------------------------------
 id                      | integer                  | not null default nextval('users_id_seq'::regclass)
 username                | character varying        | not null
 password                | character varying        | not null
 email                   | character varying        | not null
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
    "index_users_unique_on_lower_email" UNIQUE, btree (lower(email::text))
    "index_users_unique_on_lower_username" UNIQUE, btree (lower(username::text))


Testing:

test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaa', 'Ági');
INSERT 0 1

test2=# select id, username from users where lower(username) = 'ági';
   id   | username
--------+----------
 123 | Ági
(1 row)


test2=# select id, username from users where username = 'Ági';
   id   | username
--------+----------
  123 | Ági
 456 | Ági
(2 rows)


test2=# select id, username from users where username = 'Mustafa';

  id   | username
-------+----------
 123 | Mustafa
(1 row)

test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaab',  'Mustafa');
INSERT 0 1

test2=# select id, username from users where username = 'Mustafa';

   id   | username
--------+----------

  123 | Mustafa
 456 | Mustafa
(2 rows)



test2=# select id, username from users where username = 'bunny';
  id  | username
------+----------
 123 | bunny
(1 row)


test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaac', 'bunny');
INSERT 0 1test2=# select id, username from users where username = 'bunny';
   id   | username
--------+----------
   123 | bunny
 456 | bunny
(2 rows)

test2=# select id, username from users where username = 'edwin';
  id   | username
-------+----------
 123 | edwin
(1 row)

test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaad', 'edwin');
INSERT 0 1

test2=# select id, username from users where username = 'edwin';
   id   | username
--------+----------
  123 | edwin
 456 | edwin
(2 rows)


test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaae', 'edwin');
ERROR:  duplicate key value violates unique constraint "index_users_unique_on_lower_username"
DETAIL:  Key (lower(username::text))=(edwin) already exists.

test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaae', 'bunny');
ERROR:  duplicate key value violates unique constraint "index_users_unique_on_lower_username"
DETAIL:  Key (lower(username::text))=(bunny) already exists.


The issue didn’t happen only with strings that have characters like 'é', 'ő'. English names were also affected.

The application does have validation on the strings but I didn’t see the existing versions for the same reason the insert didn’t see. 

*Sometimes*

Also, the colliding (not colliding) strings usually had the same case of characters. (Not 'edwin' vs 'Edwin' but two 'edwin's)

Sometimes only the latter triggered the violation:

UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589' = xxx
UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589', "visible_online_at" = '2016-11-30 22:11:37.167647' WHERE "users"."id" = xxx

Strange. I though PG created a new tuple in both cases.

How to reproduce it:

0. Don’t break your arm. It’s not needed.

1. Install FreeBSD 10.3. Install PostgreSQL. I used to have 9.5. Version 9.6 has ICU turned on by default. (Good morning, maintainer, yes it’s the 21Th century. I has been for a while). The ICU version might be not affected.

2. Create a database cluster with UTF-8 encoding. (Yes, the year is 2016). I use "en_US.UTF-8" for LC_*.

3. Create a table that has unique index or indices on lower(column)

4. Import much data in the table. Or I think inserting 100 000 random English names will do fine.

5. Upgrade to FreeBSD 11 the official way. It includes "upgrading" the PostgreSQL. But it’s the same version for the new platform. In my case it was postgresql95-server-9.5.4_1 

6. Try to insert existing values. You should succeed at random.

7. Try to query the duplicates with "where lower(column) = 'edwin'". Or bunny. Or whatever. You will see only the new version. Even if the original string was already lower case.

This is not only an index and duplication issue. Querying with lower() also didn’t work with non duplicated records. At random.

In my case even non duplicated uses could not log in. They seemed to be deleted. But not all of them. At random. My users thought they were removed from the site. It seemed so.

I fixed it by dump and load. And as I already had downtime, I reloaded it to 9.6. I had to manually rename the duplicated records to do so.

I already typed 10 times more than I should have, feel free to discover it but not on your production database.

(Maybe database clusters should have a header that wouldn’t allow incompatible server versions to process the existing data. I wonder if it would take more than 8 bytes per server. But I guess it was not know to be incompatible. Even my two CIs didn’t show it.)

M


I forgot to add that when I turned off index scan, select lower() still didn’t find some existing rows.

On Sat, Dec 3, 2016 at 2:52 AM, Maeldron T. <maeldron@gmail.com> wrote:
Hello,

I’ll be as short as I can as I have broken my arm and I’m not supposed to type. This isn’t a joke.

However, I learned the following the hard way and I want to save you from the same.

Upgrading FreeBSD from 10 to 11 might break your database. It probably won’t be corrupted but it will be useless until you dump-import it, and you might need to do manual fixing.

My app has more than 1000 automated tests, and the upgrade itself was explicitly tested. The affected columns are tested many times. It was tested on two different machines (including the 10 => 11) before done in production. But the issue happens only at random on large scale. I could not reproduce it with inserting a few rows. I could reproduce it with real data.

I didn’t debug much as I did not sleep for two days until I fixed it (live systems, with left hand).

I removed noise from queries, the real tables have dozens of not null columns. The edited queries may have syntax errors, but they were copied from real world output. So were the errors. I use 'xxx' and '123' to hide private info but the strings are real in general. This matters as the issue might be locale related.

I started seeing these in the logs:

PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_email"
DETAIL: Key (lower(email::text))=(andy.mxxx@xxx.com) already exists.
: UPDATE "users" SET "online_at" = '2016-12-01 00:23:33.245594', "visible_online_at" = '2016-12-01 00:23:33.245633' WHERE "users"."id" = $1

PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_username"
DETAIL: Key (lower(username::text))=(joyce1234) already exists.
: UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589', "visible_online_at" = '2016-11-30 22:11:37.167647' WHERE "users"."id" = $1

The table:

# \d users
         Column          |           Type           |                     Modifiers                      
-------------------------+--------------------------+----------------------------------------------------
 id                      | integer                  | not null default nextval('users_id_seq'::regclass)
 username                | character varying        | not null
 password                | character varying        | not null
 email                   | character varying        | not null
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
    "index_users_unique_on_lower_email" UNIQUE, btree (lower(email::text))
    "index_users_unique_on_lower_username" UNIQUE, btree (lower(username::text))


Testing:

test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaa', 'Ági');
INSERT 0 1

test2=# select id, username from users where lower(username) = 'ági';
   id   | username
--------+----------
 123 | Ági
(1 row)


test2=# select id, username from users where username = 'Ági';
   id   | username
--------+----------
  123 | Ági
 456 | Ági
(2 rows)


test2=# select id, username from users where username = 'Mustafa';

  id   | username
-------+----------
 123 | Mustafa
(1 row)

test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaab',  'Mustafa');
INSERT 0 1

test2=# select id, username from users where username = 'Mustafa';

   id   | username
--------+----------

  123 | Mustafa
 456 | Mustafa
(2 rows)



test2=# select id, username from users where username = 'bunny';
  id  | username
------+----------
 123 | bunny
(1 row)


test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaac', 'bunny');
INSERT 0 1test2=# select id, username from users where username = 'bunny';
   id   | username
--------+----------
   123 | bunny
 456 | bunny
(2 rows)

test2=# select id, username from users where username = 'edwin';
  id   | username
-------+----------
 123 | edwin
(1 row)

test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaad', 'edwin');
INSERT 0 1

test2=# select id, username from users where username = 'edwin';
   id   | username
--------+----------
  123 | edwin
 456 | edwin
(2 rows)


test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaae', 'edwin');
ERROR:  duplicate key value violates unique constraint "index_users_unique_on_lower_username"
DETAIL:  Key (lower(username::text))=(edwin) already exists.

test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaae', 'bunny');
ERROR:  duplicate key value violates unique constraint "index_users_unique_on_lower_username"
DETAIL:  Key (lower(username::text))=(bunny) already exists.


The issue didn’t happen only with strings that have characters like 'é', 'ő'. English names were also affected.

The application does have validation on the strings but I didn’t see the existing versions for the same reason the insert didn’t see. 

*Sometimes*

Also, the colliding (not colliding) strings usually had the same case of characters. (Not 'edwin' vs 'Edwin' but two 'edwin's)

Sometimes only the latter triggered the violation:

UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589' = xxx
UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589', "visible_online_at" = '2016-11-30 22:11:37.167647' WHERE "users"."id" = xxx

Strange. I though PG created a new tuple in both cases.

How to reproduce it:

0. Don’t break your arm. It’s not needed.

1. Install FreeBSD 10.3. Install PostgreSQL. I used to have 9.5. Version 9.6 has ICU turned on by default. (Good morning, maintainer, yes it’s the 21Th century. I has been for a while). The ICU version might be not affected.

2. Create a database cluster with UTF-8 encoding. (Yes, the year is 2016). I use "en_US.UTF-8" for LC_*.

3. Create a table that has unique index or indices on lower(column)

4. Import much data in the table. Or I think inserting 100 000 random English names will do fine.

5. Upgrade to FreeBSD 11 the official way. It includes "upgrading" the PostgreSQL. But it’s the same version for the new platform. In my case it was postgresql95-server-9.5.4_1 

6. Try to insert existing values. You should succeed at random.

7. Try to query the duplicates with "where lower(column) = 'edwin'". Or bunny. Or whatever. You will see only the new version. Even if the original string was already lower case.

This is not only an index and duplication issue. Querying with lower() also didn’t work with non duplicated records. At random.

In my case even non duplicated uses could not log in. They seemed to be deleted. But not all of them. At random. My users thought they were removed from the site. It seemed so.

I fixed it by dump and load. And as I already had downtime, I reloaded it to 9.6. I had to manually rename the duplicated records to do so.

I already typed 10 times more than I should have, feel free to discover it but not on your production database.

(Maybe database clusters should have a header that wouldn’t allow incompatible server versions to process the existing data. I wonder if it would take more than 8 bytes per server. But I guess it was not know to be incompatible. Even my two CIs didn’t show it.)

M



On Sat, Dec 3, 2016 at 2:52 PM, Maeldron T. <maeldron@gmail.com> wrote:
> Upgrading FreeBSD from 10 to 11 might break your database. It probably won’t
> be corrupted but it will be useless until you dump-import it, and you might
> need to do manual fixing.

Sorry to hear about your arm and your database.

The collation support was changed in FreeBSD 11 (in collaboration with
Illumos and DragonflyBSD[1]).  From FreeBSD's UPDATING:

    Databases administrators will need to reindex their databases given
    collation results will be different.

This is a problem that faces users of Postgres on all operating
systems, too, for example recently some GNU/Linux users had indexes
silently corrupted because of change to the glibc locale data for
German.  One approach is to use "amcheck"[2] (proposed for future
inclusion in PostgreSQL) to check if indexes are still sane, any time
the collation definition files or libc itself might have changed.

> (Maybe database clusters should have a header that wouldn’t allow
> incompatible server versions to process the existing data. I wonder if it
> would take more than 8 bytes per server. But I guess it was not know to be
> incompatible. Even my two CIs didn’t show it.)

I had some thoughts along those lines too[3].  I thought about
checksumming libc and all relevant collation files (ie OS specific
files) so you could notice when anything that could bite you changes
(that is just some sketched out ideas, not production code).  Some
have suggested that PostgreSQL should stop using OS collations and
adopt ICU[4] and then use its versioning metadata.  Of course some
people already use ICU on FreeBSD because the old strcoll
implementation didn't play well with Unicode, but those users are
still presumably exposed to silent corruption when ICU changes because
AFAIK that didn't keep track of ICU versions.

[1] http://blog.etoilebsd.net/post/This_is_how_I_like_opensource
[2] https://www.postgresql.org/message-id/flat/CAM3SWZQzLMhMwmBqjzK+pRKXrNUZ4w90wYMUWfkeV8mZ3Debvw@mail.gmail.com
[3] https://github.com/macdice/check_pg_collations
[4] https://www.postgresql.org/message-id/flat/85364fde-091f-bbc0-fec2-e3ede39840a6@2ndquadrant.com

--
Thomas Munro
http://www.enterprisedb.com


Hello Thomas,


> (Maybe database clusters should have a header that wouldn’t allow
> incompatible server versions to process the existing data. I wonder if it
> would take more than 8 bytes per server. But I guess it was not know to be
> incompatible. Even my two CIs didn’t show it.)

I had some thoughts along those lines too[3].  I thought about
checksumming libc and all relevant collation files (ie OS specific
files) so you could notice when anything that could bite you changes
(that is just some sketched out ideas, not production code).  Some
have suggested that PostgreSQL should stop using OS collations and
adopt ICU[4] and then use its versioning metadata.  Of course some
people already use ICU on FreeBSD because the old strcoll
implementation didn't play well with Unicode, but those users are
still presumably exposed to silent corruption when ICU changes because
AFAIK that didn't keep track of ICU versions.


I have been using PostgreSQL for 12 years. *Every* issue that I personally ever had was locale-related. 

I define "issue" as not totally my fault and it took more than a hour to fix. That’s why I was a bit sarcastic with my comments about the year 2016. In the 80’s I have grown up on sci-fi movies about spaceships make wormholes to travel to the unknown parts of the Universe, even capable of shifting dimensions, but today, one of the biggest issue in IT is when one wants to use UTF-8.

I think I spent more time on fixing locale related issues than on fixing all the rest together. 

(Not counting the bugs in my own code.)

Once I tried DragonflyBSD. It could not compile ICU so I deleted it.

I used to run PostgreSQL with ICU on FreeBSD, but I switched from ports to binary packages. Mixing them isn’t fun. (Well, currently I have it mixed.) Anyway, ICU is turned on for PostgreSQL 9.6 even in the pkg version. Hurray.

M
I would like to echo the sentiment on collation and expand it to character sets in general. When issues with them come up, they do take an incredible amount of time and effort to resolve, and are one of my own biggest pain points when dealing with databases and datasets from other sources. Case in point:

I have one dataset that I pull into my local (Windows laptop) for analysis on a weekly basis. Invariably it is sprinkled with special characters that trip up the copy command; these appear to come from people fat-fingering in the data input and the system not checking it. It only takes one of these special characters to trip up and negate the entire result of the copy command. After wrestling with this for a couple days about a year ago, the workaround I found that works is to first import it into a MySQL table, strip out the characters in MySQL, dump the data out to a CSV and finally bring the sanitized data into Postgres using the copy command.

So yes, character set and collations sets are the bane of DBAs, and yes, they are a bitch to troubleshoot and yes, they often take very creative solutions to fix.

Michael

On Dec 7, 2016, at 4:40 AM, Maeldron T. <maeldron@gmail.com> wrote:

Hello Thomas,


> (Maybe database clusters should have a header that wouldn’t allow
> incompatible server versions to process the existing data. I wonder if it
> would take more than 8 bytes per server. But I guess it was not know to be
> incompatible. Even my two CIs didn’t show it.)

I had some thoughts along those lines too[3].  I thought about
checksumming libc and all relevant collation files (ie OS specific
files) so you could notice when anything that could bite you changes
(that is just some sketched out ideas, not production code).  Some
have suggested that PostgreSQL should stop using OS collations and
adopt ICU[4] and then use its versioning metadata.  Of course some
people already use ICU on FreeBSD because the old strcoll
implementation didn't play well with Unicode, but those users are
still presumably exposed to silent corruption when ICU changes because
AFAIK that didn't keep track of ICU versions.


I have been using PostgreSQL for 12 years. *Every* issue that I personally ever had was locale-related. 

I define "issue" as not totally my fault and it took more than a hour to fix. That’s why I was a bit sarcastic with my comments about the year 2016. In the 80’s I have grown up on sci-fi movies about spaceships make wormholes to travel to the unknown parts of the Universe, even capable of shifting dimensions, but today, one of the biggest issue in IT is when one wants to use UTF-8.

I think I spent more time on fixing locale related issues than on fixing all the rest together. 

(Not counting the bugs in my own code.)

Once I tried DragonflyBSD. It could not compile ICU so I deleted it.

I used to run PostgreSQL with ICU on FreeBSD, but I switched from ports to binary packages. Mixing them isn’t fun. (Well, currently I have it mixed.) Anyway, ICU is turned on for PostgreSQL 9.6 even in the pkg version. Hurray.

M


On Wed, Dec 7, 2016 at 8:33 AM, Michael Sheaver <msheaver@me.com> wrote:
with this for a couple days about a year ago, the workaround I found that works is to first import it into a MySQL table, strip out the characters in MySQL, dump the data out to a CSV and finally bring the sanitized data into Postgres using the copy command.

Couldn't you load them into an SQL_ASCII postgres database and accomplish the same?

I have a utility I wrote to migrate our entire production DB from SQL_ASCII to UTF8. It works *really* well, and introspects your database to figure out what to do, one table at a time. You'd avoid the need for CSV importing. Just dump/reload into UTF8 database and be done.

On Wed, Dec 7, 2016 at 7:33 AM, Michael Sheaver <msheaver@me.com> wrote:
> I would like to echo the sentiment on collation and expand it to
> character sets in general. When issues with them come up, they do
> take an incredible amount of time and effort to resolve, and are
> one of my own biggest pain points when dealing with databases and
> datasets from other sources.

I would be much happier if most developers understood the
difference between a character set (e.g., Unicode) and a character
encoding scheme (e.g., UTF-8 or UTF-16) and how the two concepts
relate.  If we reached a point where most DBAs understood the point
of being able to set a client_encoding that is different from the
server_encoding, I think I would need to pop the cork on some
champagne.

Hm.  Maybe a topic for a blog post....

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On 12/07/2016 02:06 PM, Kevin Grittner wrote:
> On Wed, Dec 7, 2016 at 7:33 AM, Michael Sheaver <msheaver@me.com> wrote:
>> I would like to echo the sentiment on collation and expand it to
>> character sets in general. When issues with them come up, they do
>> take an incredible amount of time and effort to resolve, and are
>> one of my own biggest pain points when dealing with databases and
>> datasets from other sources.
> I would be much happier if most developers understood the
> difference between a character set (e.g., Unicode) and a character
> encoding scheme (e.g., UTF-8 or UTF-16) and how the two concepts
> relate.  If we reached a point where most DBAs understood the point
> of being able to set a client_encoding that is different from the
> server_encoding, I think I would need to pop the cork on some
> champagne.
>
> Hm.  Maybe a topic for a blog post....
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
+1 - Confuses me every time


On Wed, Dec 7, 2016 at 10:40 PM, Maeldron T. <maeldron@gmail.com> wrote:
> Anyway, ICU is turned on for PostgreSQL 9.6 even in the pkg version. Hurray.

Hmm, a curious choice, considering that FreeBSD finally has built-in
collations that work!

Using the port's ICU patch doesn't change anything about the risks
here as far as I know.  ICU continually pulls corrections and
improvements from CLDR, and FreeBSD's libc is now doing the same, so
actually both ICU and libc might be getting the very same stream of
collation changes, just at different speeds.

IMHO we can't continue to ignore this problem: we need to teach
Postgres to recognise when collations change.  That could mean
adopting like ICU and then tracking when every index is potentially
invalidated by a version change (see Peter Eisentraut's proposal), or
adopting non-portable implementation-specific techniques.  For the
latter, we'd probably not want to litter core Postgres with
assumptions about how each OS does things.  One fairly blunt approach
I came up with goes like this:

1.  Add a new optional GUC system_collation_version_command =
'/some/user/supplied/script.sh'.
2.  When postmaster starts, run it and remember the output in memory.
3.  When a database is created, store it for this database.
4.  When connecting to a database, complain loudly if version at
startup doesn't match the stored version.
5.  Update the stored value to the startup value when you REINDEX
DATABASE (ie that's how to clear the warning).

Then package mantainers could supply a script that know how to do the
right thing on this OS.  For example it could be the package version
string from the currently installed locales package, or an MD5 hash of
the contents of all files in /usr/share/locales/, or whatever suits.
The reason for only running the script at postmaster startup is that
there is a risk of libc caching data, so that a REINDEX would use old
data but running the command would see new files on disk, so we need
to make sure that a cluster restart is necessary after upgrading your
OS to clear the warning.

That's horribly blunt: it makes you reindex the whole database even if
you don't use a collation that changed, or don't even use btrees, etc.
You could do something more nuanced and complicated that works at the
level of individual locales and indexes (see the link I posted earlier
to check_pg_collations for some ideas), but the idea would be
basically the same.

--
Thomas Munro
http://www.enterprisedb.com


On 07/12/16 21:20, Rob Sargent wrote:
>
>
> On 12/07/2016 02:06 PM, Kevin Grittner wrote:
>> I would be much happier if most developers understood the
>> difference between a character set (e.g., Unicode) and a character
>> encoding scheme (e.g., UTF-8 or UTF-16) and how the two concepts
>> relate.  If we reached a point where most DBAs understood the point
>> of being able to set a client_encoding that is different from the
>> server_encoding, I think I would need to pop the cork on some
>> champagne.
>>
>> Hm.  Maybe a topic for a blog post....
>>
>> --
>> Kevin Grittner
>> EDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>>
> +1 - Confuses me every time
>
>

+1 me too

--
Tim Clarke


Attachment