Home > mailing lists

Re: Duplicate rows during pg_dump - Mailing list pgsql-general

From	Marc Mamin
Subject	Re: Duplicate rows during pg_dump
Date	October 24, 2015 23:15:47
Msg-id	B6F6FD62F2624C4C9916AC0175D56D884209D6A3@jenmbs01.ad.intershop.net Whole thread Raw
In response to	Re: Duplicate rows during pg_dump (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses	Re: Duplicate rows during pg_dump (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List	pgsql-general

Tree view

>On 10/24/2015 12:35 PM, Chaz Yoon wrote:
>> I am seeing a duplicate, stale copy of the same row when performing a
>> pg_dump or copying a specific table, but not when directly selecting
>> from it. I'm running PostgreSQL 9.3.9 on Amazon RDS, with 9.3.10 client
>> tools.
>>
>> It's happening on a users table, which has a primary key and enforces a
>> unique email address:
>>
>>                                          Table "public.users"
>>        Column     |            Type             |                 Modifiers
>>
>> ---------------+-----------------------------+---------------------------------------------------
>>     id            | integer                     | not null default
>> nextval('users_id_seq'::regclass)
>>     email         | character varying(255)      | not null default
>> ''::character varying
>>     last_activity | timestamp without time zone |
>>    Indexes:
>>        "users_pkey" PRIMARY KEY, btree (id)
>>        "users_unique_email" UNIQUE, btree (email)
>>
>> I first noticed the problem when doing copying the table to another
>> database. Roughly this:
>>
>>    % pg_dump --column-inserts -Fp -h remotedb remote_db > users.txt
>>    % psql test_db < users.txt
>>    [...]
>>    ERROR:  could not create unique index "users_pkey"
>>    DETAIL:  Key (id)=(123) is duplicated.
>>    [...]
>>    ERROR:  could not create unique index "users_unique_email"
>>    DETAIL:  Key (email)=(this_user@xyz.com <mailto:this_user@xyz.com>)
>> is duplicated.
>>
>> It appears there's some sort of duplicate record for a single user in
>> the database. Checking the pg_dump output, I saw that a single user's
>> record was being exported twice:
>>
>>    % grep -i this_user@xyz.com <mailto:this_user@xyz.com> users.txt
>>    INSERT INTO users (id, email, last_activity) VALUES (123,
>> 'this_user@xyz.com <mailto:this_user@xyz.com>', '2015-10-21
>> 10:32:15.997887');
>>    INSERT INTO users (id, email, last_activity) VALUES (123,
>> 'this_user@xyz.com <mailto:this_user@xyz.com>', '2015-10-02
>> 11:32:58.615743');
>>
>> The rows were not exactly the same. Connecting to the source database
>> directly, I tried this:
>>
>>    remote_db=> select count(1) from users where id = 123;
>>     count
>>    -------
>>         1
>>    (1 row)
>>
>>    remote_db=> select count(1) from users where email =
>> 'this_user@xyz.com <mailto:this_user@xyz.com>';
>>     count
>>    -------
>>         1
>>    (1 row)
>>
>> To eliminate any risk of it being a weird locking issue, I restored a
>> snapshot of the database into a new RDS instance but I got the same
>> results. I then tried the following:
>>
>>    remote_db=> create table users_copy_with_indexes (like users
>> including defaults including constraints including indexes including
>> storage including comments);
>>    CREATE TABLE
>>    remote_db=> insert into users_copy_with_indexes select * from users;
>>    ERROR:  duplicate key value violates unique constraint
>> "users_copy_with_indexes_pkey"
>>    DETAIL:  Key (id)=(123) already exists.
>>
>> However, when I created a copy without the indexes, I can see the
>> duplicate rows:
>>
>>    remote_db=> create table users_copy_without_indexes (like users);
>>    CREATE TABLE
>>    remote_db=> insert into users_copy_without_indexes select * from users;
>>    INSERT 0 523342
>>    remote_db=> select count(1) from users_copy_without_indexes where id
>> = 123;
>>     count
>>    -------
>>         2
>>    (1 row)
>>    remote_db=> select count(1) from users_copy_without_indexes where
>> email = 'this_user@xyz.com <mailto:this_user@xyz.com>';
>>     count
>>    -------
>>         2
>>    (1 row)
>>
>> Any suggestions for what to look for next? Is it table corruption?

Most likely is the index corrupt, not the table.
You should check for further duplicates, fix them and as Adrian writes,
build a new index an then drop the corrupt one.

I've seen this a few times before, and if I recall well it was always after some plate got full.
Is AWS getting out of space :)

regards,
Marc Mamin

>
>I would say the smoking gun is the copy w/o indexes shows both records
>and the one with indexes only one. I would DROP/CREATE index on the
>original table, with the usual caveat that this does place a load on the
>table. Using Concurrently might help, but I would read the information here:
>
>http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY
>
>Building Indexes Concurrently
>>
>> Chaz
>>
>
>
>--
>Adrian Klaver
>adrian.klaver@aklaver.com
>
>
>--
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general
>

pgsql-general by date:

From: Adrian Klaver
Date: 24 October 2015, 22:45:22
Subject: Re: Duplicate rows during pg_dump

From: Dane Foster
Date: 25 October 2015, 00:24:27
Subject: Re: partial JOIN (was: ID column naming convention)

Re: Duplicate rows during pg_dump - Mailing list pgsql-general

Previous

Next