Re: Duplicate rows during pg_dump - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Duplicate rows during pg_dump
Date
Msg-id 562BDFC8.8040908@aklaver.com
Whole thread Raw
In response to Duplicate rows during pg_dump  (Chaz Yoon <chaz@shopspring.com>)
Responses Re: Duplicate rows during pg_dump
List pgsql-general
On 10/24/2015 12:35 PM, Chaz Yoon wrote:
> I am seeing a duplicate, stale copy of the same row when performing a
> pg_dump or copying a specific table, but not when directly selecting
> from it. I'm running PostgreSQL 9.3.9 on Amazon RDS, with 9.3.10 client
> tools.
>
> It's happening on a users table, which has a primary key and enforces a
> unique email address:
>
>                                          Table "public.users"
>        Column     |            Type             |                 Modifiers
>
> ---------------+-----------------------------+---------------------------------------------------
>     id            | integer                     | not null default
> nextval('users_id_seq'::regclass)
>     email         | character varying(255)      | not null default
> ''::character varying
>     last_activity | timestamp without time zone |
>    Indexes:
>        "users_pkey" PRIMARY KEY, btree (id)
>        "users_unique_email" UNIQUE, btree (email)
>
> I first noticed the problem when doing copying the table to another
> database. Roughly this:
>
>    % pg_dump --column-inserts -Fp -h remotedb remote_db > users.txt
>    % psql test_db < users.txt
>    [...]
>    ERROR:  could not create unique index "users_pkey"
>    DETAIL:  Key (id)=(123) is duplicated.
>    [...]
>    ERROR:  could not create unique index "users_unique_email"
>    DETAIL:  Key (email)=(this_user@xyz.com <mailto:this_user@xyz.com>)
> is duplicated.
>
> It appears there's some sort of duplicate record for a single user in
> the database. Checking the pg_dump output, I saw that a single user's
> record was being exported twice:
>
>    % grep -i this_user@xyz.com <mailto:this_user@xyz.com> users.txt
>    INSERT INTO users (id, email, last_activity) VALUES (123,
> 'this_user@xyz.com <mailto:this_user@xyz.com>', '2015-10-21
> 10:32:15.997887');
>    INSERT INTO users (id, email, last_activity) VALUES (123,
> 'this_user@xyz.com <mailto:this_user@xyz.com>', '2015-10-02
> 11:32:58.615743');
>
> The rows were not exactly the same. Connecting to the source database
> directly, I tried this:
>
>    remote_db=> select count(1) from users where id = 123;
>     count
>    -------
>         1
>    (1 row)
>
>    remote_db=> select count(1) from users where email =
> 'this_user@xyz.com <mailto:this_user@xyz.com>';
>     count
>    -------
>         1
>    (1 row)
>
> To eliminate any risk of it being a weird locking issue, I restored a
> snapshot of the database into a new RDS instance but I got the same
> results. I then tried the following:
>
>    remote_db=> create table users_copy_with_indexes (like users
> including defaults including constraints including indexes including
> storage including comments);
>    CREATE TABLE
>    remote_db=> insert into users_copy_with_indexes select * from users;
>    ERROR:  duplicate key value violates unique constraint
> "users_copy_with_indexes_pkey"
>    DETAIL:  Key (id)=(123) already exists.
>
> However, when I created a copy without the indexes, I can see the
> duplicate rows:
>
>    remote_db=> create table users_copy_without_indexes (like users);
>    CREATE TABLE
>    remote_db=> insert into users_copy_without_indexes select * from users;
>    INSERT 0 523342
>    remote_db=> select count(1) from users_copy_without_indexes where id
> = 123;
>     count
>    -------
>         2
>    (1 row)
>    remote_db=> select count(1) from users_copy_without_indexes where
> email = 'this_user@xyz.com <mailto:this_user@xyz.com>';
>     count
>    -------
>         2
>    (1 row)
>
> Any suggestions for what to look for next? Is it table corruption?

I would say the smoking gun is the copy w/o indexes shows both records
and the one with indexes only one. I would DROP/CREATE index on the
original table, with the usual caveat that this does place a load on the
table. Using Concurrently might help, but I would read the information here:

http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY

Building Indexes Concurrently
>
> Chaz
>


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Rafal Pietrak
Date:
Subject: Re: partial JOIN (was: ID column naming convention)
Next
From: Marc Mamin
Date:
Subject: Re: Duplicate rows during pg_dump