Thread: Duplicate rows during pg_dump
I am seeing a duplicate, stale copy of the same row when performing a pg_dump or copying a specific table, but not when directly selecting from it. I'm running PostgreSQL 9.3.9 on Amazon RDS, with 9.3.10 client tools.
It's happening on a users table, which has a primary key and enforces a unique email address:
Table "public.users"
Column | Type | Modifiers
---------------+-----------------------------+---------------------------------------------------
id | integer | not null default nextval('users_id_seq'::regclass)
email | character varying(255) | not null default ''::character varying
last_activity | timestamp without time zone |
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"users_unique_email" UNIQUE, btree (email)
I first noticed the problem when doing copying the table to another database. Roughly this:
% pg_dump --column-inserts -Fp -h remotedb remote_db > users.txt
% psql test_db < users.txt
[...]
ERROR: could not create unique index "users_pkey"
DETAIL: Key (id)=(123) is duplicated.
[...]
ERROR: could not create unique index "users_unique_email"
DETAIL: Key (email)=(this_user@xyz.com) is duplicated.
It appears there's some sort of duplicate record for a single user in the database. Checking the pg_dump output, I saw that a single user's record was being exported twice:
% grep -i this_user@xyz.com users.txt
INSERT INTO users (id, email, last_activity) VALUES (123, 'this_user@xyz.com', '2015-10-21 10:32:15.997887');
INSERT INTO users (id, email, last_activity) VALUES (123, 'this_user@xyz.com', '2015-10-02 11:32:58.615743');
The rows were not exactly the same. Connecting to the source database directly, I tried this:
remote_db=> select count(1) from users where id = 123;
count
-------
1
(1 row)
remote_db=> select count(1) from users where email = 'this_user@xyz.com';
count
-------
1
(1 row)
To eliminate any risk of it being a weird locking issue, I restored a snapshot of the database into a new RDS instance but I got the same results. I then tried the following:
remote_db=> create table users_copy_with_indexes (like users including defaults including constraints including indexes including storage including comments);
CREATE TABLE
remote_db=> insert into users_copy_with_indexes select * from users;
ERROR: duplicate key value violates unique constraint "users_copy_with_indexes_pkey"
DETAIL: Key (id)=(123) already exists.
However, when I created a copy without the indexes, I can see the duplicate rows:
remote_db=> create table users_copy_without_indexes (like users);
CREATE TABLE
remote_db=> insert into users_copy_without_indexes select * from users;
INSERT 0 523342
remote_db=> select count(1) from users_copy_without_indexes where id = 123;
count
-------
2
(1 row)
remote_db=> select count(1) from users_copy_without_indexes where email = 'this_user@xyz.com';
count
-------
2
(1 row)
Any suggestions for what to look for next? Is it table corruption?
Chaz
On 10/24/2015 12:35 PM, Chaz Yoon wrote: > I am seeing a duplicate, stale copy of the same row when performing a > pg_dump or copying a specific table, but not when directly selecting > from it. I'm running PostgreSQL 9.3.9 on Amazon RDS, with 9.3.10 client > tools. > > It's happening on a users table, which has a primary key and enforces a > unique email address: > > Table "public.users" > Column | Type | Modifiers > > ---------------+-----------------------------+--------------------------------------------------- > id | integer | not null default > nextval('users_id_seq'::regclass) > email | character varying(255) | not null default > ''::character varying > last_activity | timestamp without time zone | > Indexes: > "users_pkey" PRIMARY KEY, btree (id) > "users_unique_email" UNIQUE, btree (email) > > I first noticed the problem when doing copying the table to another > database. Roughly this: > > % pg_dump --column-inserts -Fp -h remotedb remote_db > users.txt > % psql test_db < users.txt > [...] > ERROR: could not create unique index "users_pkey" > DETAIL: Key (id)=(123) is duplicated. > [...] > ERROR: could not create unique index "users_unique_email" > DETAIL: Key (email)=(this_user@xyz.com <mailto:this_user@xyz.com>) > is duplicated. > > It appears there's some sort of duplicate record for a single user in > the database. Checking the pg_dump output, I saw that a single user's > record was being exported twice: > > % grep -i this_user@xyz.com <mailto:this_user@xyz.com> users.txt > INSERT INTO users (id, email, last_activity) VALUES (123, > 'this_user@xyz.com <mailto:this_user@xyz.com>', '2015-10-21 > 10:32:15.997887'); > INSERT INTO users (id, email, last_activity) VALUES (123, > 'this_user@xyz.com <mailto:this_user@xyz.com>', '2015-10-02 > 11:32:58.615743'); > > The rows were not exactly the same. Connecting to the source database > directly, I tried this: > > remote_db=> select count(1) from users where id = 123; > count > ------- > 1 > (1 row) > > remote_db=> select count(1) from users where email = > 'this_user@xyz.com <mailto:this_user@xyz.com>'; > count > ------- > 1 > (1 row) > > To eliminate any risk of it being a weird locking issue, I restored a > snapshot of the database into a new RDS instance but I got the same > results. I then tried the following: > > remote_db=> create table users_copy_with_indexes (like users > including defaults including constraints including indexes including > storage including comments); > CREATE TABLE > remote_db=> insert into users_copy_with_indexes select * from users; > ERROR: duplicate key value violates unique constraint > "users_copy_with_indexes_pkey" > DETAIL: Key (id)=(123) already exists. > > However, when I created a copy without the indexes, I can see the > duplicate rows: > > remote_db=> create table users_copy_without_indexes (like users); > CREATE TABLE > remote_db=> insert into users_copy_without_indexes select * from users; > INSERT 0 523342 > remote_db=> select count(1) from users_copy_without_indexes where id > = 123; > count > ------- > 2 > (1 row) > remote_db=> select count(1) from users_copy_without_indexes where > email = 'this_user@xyz.com <mailto:this_user@xyz.com>'; > count > ------- > 2 > (1 row) > > Any suggestions for what to look for next? Is it table corruption? I would say the smoking gun is the copy w/o indexes shows both records and the one with indexes only one. I would DROP/CREATE index on the original table, with the usual caveat that this does place a load on the table. Using Concurrently might help, but I would read the information here: http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY Building Indexes Concurrently > > Chaz > -- Adrian Klaver adrian.klaver@aklaver.com
>On 10/24/2015 12:35 PM, Chaz Yoon wrote: >> I am seeing a duplicate, stale copy of the same row when performing a >> pg_dump or copying a specific table, but not when directly selecting >> from it. I'm running PostgreSQL 9.3.9 on Amazon RDS, with 9.3.10 client >> tools. >> >> It's happening on a users table, which has a primary key and enforces a >> unique email address: >> >> Table "public.users" >> Column | Type | Modifiers >> >> ---------------+-----------------------------+--------------------------------------------------- >> id | integer | not null default >> nextval('users_id_seq'::regclass) >> email | character varying(255) | not null default >> ''::character varying >> last_activity | timestamp without time zone | >> Indexes: >> "users_pkey" PRIMARY KEY, btree (id) >> "users_unique_email" UNIQUE, btree (email) >> >> I first noticed the problem when doing copying the table to another >> database. Roughly this: >> >> % pg_dump --column-inserts -Fp -h remotedb remote_db > users.txt >> % psql test_db < users.txt >> [...] >> ERROR: could not create unique index "users_pkey" >> DETAIL: Key (id)=(123) is duplicated. >> [...] >> ERROR: could not create unique index "users_unique_email" >> DETAIL: Key (email)=(this_user@xyz.com <mailto:this_user@xyz.com>) >> is duplicated. >> >> It appears there's some sort of duplicate record for a single user in >> the database. Checking the pg_dump output, I saw that a single user's >> record was being exported twice: >> >> % grep -i this_user@xyz.com <mailto:this_user@xyz.com> users.txt >> INSERT INTO users (id, email, last_activity) VALUES (123, >> 'this_user@xyz.com <mailto:this_user@xyz.com>', '2015-10-21 >> 10:32:15.997887'); >> INSERT INTO users (id, email, last_activity) VALUES (123, >> 'this_user@xyz.com <mailto:this_user@xyz.com>', '2015-10-02 >> 11:32:58.615743'); >> >> The rows were not exactly the same. Connecting to the source database >> directly, I tried this: >> >> remote_db=> select count(1) from users where id = 123; >> count >> ------- >> 1 >> (1 row) >> >> remote_db=> select count(1) from users where email = >> 'this_user@xyz.com <mailto:this_user@xyz.com>'; >> count >> ------- >> 1 >> (1 row) >> >> To eliminate any risk of it being a weird locking issue, I restored a >> snapshot of the database into a new RDS instance but I got the same >> results. I then tried the following: >> >> remote_db=> create table users_copy_with_indexes (like users >> including defaults including constraints including indexes including >> storage including comments); >> CREATE TABLE >> remote_db=> insert into users_copy_with_indexes select * from users; >> ERROR: duplicate key value violates unique constraint >> "users_copy_with_indexes_pkey" >> DETAIL: Key (id)=(123) already exists. >> >> However, when I created a copy without the indexes, I can see the >> duplicate rows: >> >> remote_db=> create table users_copy_without_indexes (like users); >> CREATE TABLE >> remote_db=> insert into users_copy_without_indexes select * from users; >> INSERT 0 523342 >> remote_db=> select count(1) from users_copy_without_indexes where id >> = 123; >> count >> ------- >> 2 >> (1 row) >> remote_db=> select count(1) from users_copy_without_indexes where >> email = 'this_user@xyz.com <mailto:this_user@xyz.com>'; >> count >> ------- >> 2 >> (1 row) >> >> Any suggestions for what to look for next? Is it table corruption? Most likely is the index corrupt, not the table. You should check for further duplicates, fix them and as Adrian writes, build a new index an then drop the corrupt one. I've seen this a few times before, and if I recall well it was always after some plate got full. Is AWS getting out of space :) regards, Marc Mamin > >I would say the smoking gun is the copy w/o indexes shows both records >and the one with indexes only one. I would DROP/CREATE index on the >original table, with the usual caveat that this does place a load on the >table. Using Concurrently might help, but I would read the information here: > >http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY > >Building Indexes Concurrently >> >> Chaz >> > > >-- >Adrian Klaver >adrian.klaver@aklaver.com > > >-- >Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-general >
On 10/24/15 3:15 PM, Marc Mamin wrote: >>> Any suggestions for what to look for next? Is it table corruption? > Most likely is the index corrupt, not the table. > You should check for further duplicates, fix them and as Adrian writes, > build a new index an then drop the corrupt one. > > I've seen this a few times before, and if I recall well it was always after some plate got full. > Is AWS getting out of space:) You should report this to the RDS team, because an out of space condition shouldn't leave multiple values in the index. I suspect they've made a modification somewhere that is causing this. It could be a base Postgres bug, but I'd think we'd have caught such a bug by now... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > On 10/24/15 3:15 PM, Marc Mamin wrote: > Any suggestions for what to look for next? Is it table corruption? >> Most likely is the index corrupt, not the table. >> You should check for further duplicates, fix them and as Adrian writes, >> build a new index an then drop the corrupt one. >> >> I've seen this a few times before, and if I recall well it was always after some plate got full. >> Is AWS getting out of space:) > You should report this to the RDS team, because an out of space > condition shouldn't leave multiple values in the index. I suspect > they've made a modification somewhere that is causing this. It could be > a base Postgres bug, but I'd think we'd have caught such a bug by now... Notable also is that pg_dump invariably reads tables with a plain "COPY foo" or "SELECT * FROM foo", which should ignore all indexes and just read the table contents. So I doubt that reindexing will fix anything: you almost certainly do have duplicate rows in the base table. It's highly likely that the index is corrupt, which is what would be necessary to get into such a state ... but you will need to manually remove the dup rows before rebuilding the unique index will succeed. regards, tom lane
> -----Original Message----- > From: Jim Nasby [mailto:Jim.Nasby@BlueTreble.com] > Sent: Montag, 26. Oktober 2015 01:55 > To: Marc Mamin; Adrian Klaver; Chaz Yoon; pgsql-general@postgresql.org > Subject: Re: [GENERAL] Duplicate rows during pg_dump > > On 10/24/15 3:15 PM, Marc Mamin wrote: > >>> Any suggestions for what to look for next? Is it table corruption? > > Most likely is the index corrupt, not the table. > > You should check for further duplicates, fix them and as Adrian > > writes, build a new index an then drop the corrupt one. > > > > I've seen this a few times before, and if I recall well it was always > after some plate got full. > > Is AWS getting out of space:) > > You should report this to the RDS team, because an out of space > condition shouldn't leave multiple values in the index. I suspect > they've made a modification somewhere that is causing this. It could be > a base Postgres bug, but I'd think we'd have caught such a bug by > now... Last time I got this trouble was 4-5 years ago... regards, Marc Mamin > -- > Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in > Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in > Treble! http://BlueTreble.com