FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken - Mailing list pgsql-general

From Maeldron T.
Subject FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
Date
Msg-id CAKatfSnA6bZPav0W2mOZ0HhriDL5fK0ycYPkT7XQo=TGcF6oOw@mail.gmail.com
Whole thread Raw
Responses Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
List pgsql-general
Hello,

I’ll be as short as I can as I have broken my arm and I’m not supposed to type. This isn’t a joke.

However, I learned the following the hard way and I want to save you from the same.

Upgrading FreeBSD from 10 to 11 might break your database. It probably won’t be corrupted but it will be useless until you dump-import it, and you might need to do manual fixing.

My app has more than 1000 automated tests, and the upgrade itself was explicitly tested. The affected columns are tested many times. It was tested on two different machines (including the 10 => 11) before done in production. But the issue happens only at random on large scale. I could not reproduce it with inserting a few rows. I could reproduce it with real data.

I didn’t debug much as I did not sleep for two days until I fixed it (live systems, with left hand).

I removed noise from queries, the real tables have dozens of not null columns. The edited queries may have syntax errors, but they were copied from real world output. So were the errors. I use 'xxx' and '123' to hide private info but the strings are real in general. This matters as the issue might be locale related.

I started seeing these in the logs:

PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_email"
DETAIL: Key (lower(email::text))=(andy.mxxx@xxx.com) already exists.
: UPDATE "users" SET "online_at" = '2016-12-01 00:23:33.245594', "visible_online_at" = '2016-12-01 00:23:33.245633' WHERE "users"."id" = $1

PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_username"
DETAIL: Key (lower(username::text))=(joyce1234) already exists.
: UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589', "visible_online_at" = '2016-11-30 22:11:37.167647' WHERE "users"."id" = $1

The table:

# \d users
         Column          |           Type           |                     Modifiers                      
-------------------------+--------------------------+----------------------------------------------------
 id                      | integer                  | not null default nextval('users_id_seq'::regclass)
 username                | character varying        | not null
 password                | character varying        | not null
 email                   | character varying        | not null
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
    "index_users_unique_on_lower_email" UNIQUE, btree (lower(email::text))
    "index_users_unique_on_lower_username" UNIQUE, btree (lower(username::text))


Testing:

test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaa', 'Ági');
INSERT 0 1

test2=# select id, username from users where lower(username) = 'ági';
   id   | username
--------+----------
 123 | Ági
(1 row)


test2=# select id, username from users where username = 'Ági';
   id   | username
--------+----------
  123 | Ági
 456 | Ági
(2 rows)


test2=# select id, username from users where username = 'Mustafa';

  id   | username
-------+----------
 123 | Mustafa
(1 row)

test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaab',  'Mustafa');
INSERT 0 1

test2=# select id, username from users where username = 'Mustafa';

   id   | username
--------+----------

  123 | Mustafa
 456 | Mustafa
(2 rows)



test2=# select id, username from users where username = 'bunny';
  id  | username
------+----------
 123 | bunny
(1 row)


test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaac', 'bunny');
INSERT 0 1test2=# select id, username from users where username = 'bunny';
   id   | username
--------+----------
   123 | bunny
 456 | bunny
(2 rows)

test2=# select id, username from users where username = 'edwin';
  id   | username
-------+----------
 123 | edwin
(1 row)

test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaad', 'edwin');
INSERT 0 1

test2=# select id, username from users where username = 'edwin';
   id   | username
--------+----------
  123 | edwin
 456 | edwin
(2 rows)


test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaae', 'edwin');
ERROR:  duplicate key value violates unique constraint "index_users_unique_on_lower_username"
DETAIL:  Key (lower(username::text))=(edwin) already exists.

test2=# insert into users (password, email, username) values ('aaaaa', 'aaaaaae', 'bunny');
ERROR:  duplicate key value violates unique constraint "index_users_unique_on_lower_username"
DETAIL:  Key (lower(username::text))=(bunny) already exists.


The issue didn’t happen only with strings that have characters like 'é', 'ő'. English names were also affected.

The application does have validation on the strings but I didn’t see the existing versions for the same reason the insert didn’t see. 

*Sometimes*

Also, the colliding (not colliding) strings usually had the same case of characters. (Not 'edwin' vs 'Edwin' but two 'edwin's)

Sometimes only the latter triggered the violation:

UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589' = xxx
UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589', "visible_online_at" = '2016-11-30 22:11:37.167647' WHERE "users"."id" = xxx

Strange. I though PG created a new tuple in both cases.

How to reproduce it:

0. Don’t break your arm. It’s not needed.

1. Install FreeBSD 10.3. Install PostgreSQL. I used to have 9.5. Version 9.6 has ICU turned on by default. (Good morning, maintainer, yes it’s the 21Th century. I has been for a while). The ICU version might be not affected.

2. Create a database cluster with UTF-8 encoding. (Yes, the year is 2016). I use "en_US.UTF-8" for LC_*.

3. Create a table that has unique index or indices on lower(column)

4. Import much data in the table. Or I think inserting 100 000 random English names will do fine.

5. Upgrade to FreeBSD 11 the official way. It includes "upgrading" the PostgreSQL. But it’s the same version for the new platform. In my case it was postgresql95-server-9.5.4_1 

6. Try to insert existing values. You should succeed at random.

7. Try to query the duplicates with "where lower(column) = 'edwin'". Or bunny. Or whatever. You will see only the new version. Even if the original string was already lower case.

This is not only an index and duplication issue. Querying with lower() also didn’t work with non duplicated records. At random.

In my case even non duplicated uses could not log in. They seemed to be deleted. But not all of them. At random. My users thought they were removed from the site. It seemed so.

I fixed it by dump and load. And as I already had downtime, I reloaded it to 9.6. I had to manually rename the duplicated records to do so.

I already typed 10 times more than I should have, feel free to discover it but not on your production database.

(Maybe database clusters should have a header that wouldn’t allow incompatible server versions to process the existing data. I wonder if it would take more than 8 bytes per server. But I guess it was not know to be incompatible. Even my two CIs didn’t show it.)

M


pgsql-general by date:

Previous
From: Kiriakos Georgiou
Date:
Subject: Announcement: application level undo/redo for PostgreSQL
Next
From: "Maeldron T."
Date:
Subject: Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken