Re: Duplicate rows during pg_dump - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: Duplicate rows during pg_dump |
Date | |
Msg-id | 562BDFC8.8040908@aklaver.com Whole thread Raw |
In response to | Duplicate rows during pg_dump (Chaz Yoon <chaz@shopspring.com>) |
Responses |
Re: Duplicate rows during pg_dump
|
List | pgsql-general |
On 10/24/2015 12:35 PM, Chaz Yoon wrote: > I am seeing a duplicate, stale copy of the same row when performing a > pg_dump or copying a specific table, but not when directly selecting > from it. I'm running PostgreSQL 9.3.9 on Amazon RDS, with 9.3.10 client > tools. > > It's happening on a users table, which has a primary key and enforces a > unique email address: > > Table "public.users" > Column | Type | Modifiers > > ---------------+-----------------------------+--------------------------------------------------- > id | integer | not null default > nextval('users_id_seq'::regclass) > email | character varying(255) | not null default > ''::character varying > last_activity | timestamp without time zone | > Indexes: > "users_pkey" PRIMARY KEY, btree (id) > "users_unique_email" UNIQUE, btree (email) > > I first noticed the problem when doing copying the table to another > database. Roughly this: > > % pg_dump --column-inserts -Fp -h remotedb remote_db > users.txt > % psql test_db < users.txt > [...] > ERROR: could not create unique index "users_pkey" > DETAIL: Key (id)=(123) is duplicated. > [...] > ERROR: could not create unique index "users_unique_email" > DETAIL: Key (email)=(this_user@xyz.com <mailto:this_user@xyz.com>) > is duplicated. > > It appears there's some sort of duplicate record for a single user in > the database. Checking the pg_dump output, I saw that a single user's > record was being exported twice: > > % grep -i this_user@xyz.com <mailto:this_user@xyz.com> users.txt > INSERT INTO users (id, email, last_activity) VALUES (123, > 'this_user@xyz.com <mailto:this_user@xyz.com>', '2015-10-21 > 10:32:15.997887'); > INSERT INTO users (id, email, last_activity) VALUES (123, > 'this_user@xyz.com <mailto:this_user@xyz.com>', '2015-10-02 > 11:32:58.615743'); > > The rows were not exactly the same. Connecting to the source database > directly, I tried this: > > remote_db=> select count(1) from users where id = 123; > count > ------- > 1 > (1 row) > > remote_db=> select count(1) from users where email = > 'this_user@xyz.com <mailto:this_user@xyz.com>'; > count > ------- > 1 > (1 row) > > To eliminate any risk of it being a weird locking issue, I restored a > snapshot of the database into a new RDS instance but I got the same > results. I then tried the following: > > remote_db=> create table users_copy_with_indexes (like users > including defaults including constraints including indexes including > storage including comments); > CREATE TABLE > remote_db=> insert into users_copy_with_indexes select * from users; > ERROR: duplicate key value violates unique constraint > "users_copy_with_indexes_pkey" > DETAIL: Key (id)=(123) already exists. > > However, when I created a copy without the indexes, I can see the > duplicate rows: > > remote_db=> create table users_copy_without_indexes (like users); > CREATE TABLE > remote_db=> insert into users_copy_without_indexes select * from users; > INSERT 0 523342 > remote_db=> select count(1) from users_copy_without_indexes where id > = 123; > count > ------- > 2 > (1 row) > remote_db=> select count(1) from users_copy_without_indexes where > email = 'this_user@xyz.com <mailto:this_user@xyz.com>'; > count > ------- > 2 > (1 row) > > Any suggestions for what to look for next? Is it table corruption? I would say the smoking gun is the copy w/o indexes shows both records and the one with indexes only one. I would DROP/CREATE index on the original table, with the usual caveat that this does place a load on the table. Using Concurrently might help, but I would read the information here: http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY Building Indexes Concurrently > > Chaz > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: