Violation of primary key constraint - Mailing list pgsql-bugs

From Toby Murray
Subject Violation of primary key constraint
Date
Msg-id CAJeqKgveXGDDNLo6PEShgf+TrdZ2JOXmjfg91CEE+TPG-963_Q@mail.gmail.com
Whole thread Raw
Responses Re: Violation of primary key constraint
List pgsql-bugs
I just had some interaction with RhodiumToad on IRC about a duplicated
primary key problem I ran into today. After some poking around he
suggested that I send this to -bugs since it seems like an interesting
error.

The short version is that I have a table ("ways") which has a primary
key ("id") that has managed to get 4 of its records duplicated (out of
~168 million):
select id, version, tstamp from ways where id in
(26245218,26709186,26709284,26926573) order by id, version;
    id    | version |       tstamp
----------+---------+---------------------
 26245218 |      12 | 2011-02-06 06:54:10
 26245218 |      13 | 2013-01-28 02:37:56
 26709186 |       4 | 2008-09-02 04:39:21
 26709186 |       4 | 2008-09-02 04:39:21
 26709284 |       4 | 2008-10-26 14:06:03
 26709284 |       5 | 2013-01-28 02:38:30
 26926573 |       4 | 2009-12-27 07:13:28
 26926573 |       4 | 2009-12-27 07:13:28

version and tstamp are just additional data to show that two of the
entries have identical values while two have different values. More on
that in a bit.

Here is the table definition:
    Column    |            Type             | Modifiers | Storage
--------------+-----------------------------+-----------+----------
 id           | bigint                      | not null  | plain
 version      | integer                     | not null  | plain
 user_id      | integer                     | not null  | plain
 tstamp       | timestamp without time zone | not null  | plain
 changeset_id | bigint                      | not null  | plain
 tags         | hstore                      |           | extended
 nodes        | bigint[]                    |           | extended
 linestring   | geometry(Geometry,4326)     |           | main
Indexes:
    "pk_ways" PRIMARY KEY, btree (id)
    "idx_ways_linestring" gist (linestring), tablespace "ssd"
    "idx_ways_tags" gin (tags)
Has OIDs: no
Tablespace: "ssd"

If you don't read your email with a monospaced font, here is a pastie'd version:
http://pastie.org/5979660


Some background:
I am running a postgis database filled with OpenStreetMap data. The
database was initially populated with data from the "planet" file in
early January. The process involves creating a set of data files and
then loading them into the database using \copy. The basic load
process can be seen in this file:
https://github.com/openstreetmap/osmosis/blob/master/package/script/pgsnapshot_load_0.6.sql

The file does have some sections that are optional, depending on your
setup so I did not execute everything in there. Taking out only the
statements that affect the problematic ways table:
ALTER TABLE ways DROP CONSTRAINT pk_ways;
DROP INDEX idx_ways_linestring;
\copy ways FROM 'ways.txt'
ALTER TABLE ONLY ways ADD CONSTRAINT pk_ways PRIMARY KEY (id);
CREATE INDEX idx_ways_linestring ON ways USING gist (linestring);
plus an index creation on the hstore "tags" column that isn't in that file.


After the copy and index building was done, I set it up to update from
the OpenStreetMap minutely diff files. This means that every minute,
updates are being applied to the database. This has been working fine
for about 3 weeks until this morning when the minutely update crashed
with a duplicate key error:

Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key
value violates unique constraint "pk_ways"
  Detail: Key (id)=(26926573) already exists.

Knowing how the update software works, my first response was "that's
impossible" since it actually checks for the existence of an object in
the database and if it exists, it issues an UPDATE query. Otherwise it
does an INSERT. Of course the problem is actually that there are
already two copies of this object in the database. So the update
software finds an existing record and issues an UPDATE query which
fails with the duplicate key error.

As I said, I did the initial import at the beginning of January. So if
you look back up at the data you will see that at least all the
earlier versions of the objects have a timestamp long before the
import. My first intuition was that this meant that there were
duplicates in the ways.txt file that somehow went undetected when I
added the PK constraint. While I don't have the original files any
more I do still have the planet file and the same software that
generated the files so I just ran that again and I do not see any
duplicates in the file. This leaves me completely clueless as to how
the duplicates got into the database. Especially the one with the ID
26709186 since it hasn't been changed in OpenStreetMap in years so
there is no reason for it to have been touched in any way since the
import.

But looking at the timestamps again, it looks like an UPDATE query may
have successfully executed on two of the duplicates during minutely
diff application and updated one of the records on ways 26245218 and
26709284. The timestamp is when this object was last touched in
OpenStreetMap. These two ways were changed within seconds of each
other although they don't seem to share anything. Edited by different
users in different locations.

Here are some queries and their results that RhodiumToad had me run to
try and track things down:

select ctid,xmin,xmax,id from ways
where id in (26926573,26709284,26709186,26245218) order by id;
     ctid      |  xmin  |  xmax  |    id
---------------+--------+--------+----------
 (11249625,3)  | 979685 |      0 | 26245218
 (1501614,5)   | 955993 |      0 | 26245218
 (11247884,18) | 979686 |      0 | 26709186
 (1520052,19)  | 955993 |      0 | 26709186
 (1520056,15)  | 976128 |      0 | 26709284
 (11249780,8)  | 979686 |      0 | 26709284
 (1528888,14)  | 955993 |      0 | 26926573
 (11249622,6)  | 979685 | 982467 | 26926573

( http://pastie.org/5979497 )

The last one there seems to be showing the transaction that failed and
was rolled back earlier this morning.



select * from heap_page_items(get_raw_page('ways',11249622)) where lp=6;
 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 |
t_ctid    | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid

----+--------+----------+--------+--------+--------+----------+--------------+-------------+------------+--------+--------+-------
  6 |   7888 |        1 |    304 | 979685 | 982467 |      143 |
(11399071,5) |           8 |      10498 |     24 |        |

( http://pastie.org/pastes/5979566/text )


select i, (x).* from (select i, bt_page_items('pk_ways',i::integer) as x
from generate_series(1, pg_relation_size('pk_ways')/8192 - 1) i offset 0) s
where (x).ctid in

('(11249625,3)','(1501614,5)','(11247884,18)','(1520052,19)','(1520056,15)','(11249780,8)','(1528888,14)','(11249622,6)');
   i    | itemoffset |     ctid      | itemlen | nulls | vars |          data
--------+------------+---------------+---------+-------+------+-------------------------
  50039 |        181 | (11249625,3)  |      16 | f     | f    | 62 78
90 01 00 00 00 00
  50039 |        183 | (1501614,5)   |      16 | f     | f    | 62 78
90 01 00 00 00 00
 506579 |        214 | (11249622,6)  |      16 | f     | f    | ed dd
9a 01 00 00 00 00
 506579 |        219 | (1528888,14)  |      16 | f     | f    | ed dd
9a 01 00 00 00 00
 535456 |          4 | (11247884,18) |      16 | f     | f    | c2 8c
97 01 00 00 00 00
 535456 |          5 | (1520052,19)  |      16 | f     | f    | c2 8c
97 01 00 00 00 00
 535456 |         77 | (11249780,8)  |      16 | f     | f    | 24 8d
97 01 00 00 00 00
 535456 |         80 | (1520056,15)  |      16 | f     | f    | 24 8d
97 01 00 00 00 00

( http://pastie.org/5980159 )


Additional system information:
OS: Linux Mint 14 (based on Ubuntu 12.10)
Postgresql version: PostgreSQL 9.1.7 on x86_64-unknown-linux-gnu,
compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2, 64-bit
Postgis version: 2.0.1 r9979
The database is running on a 4 disk software RAID5 (yeah... not built
for speed) plus one SSD which holds the ways table is along with its
linestring index (but not the PK index)

And one more thing I just realized: I did turn off fsync for the
import, intending to turn it back on for normal usage but it looks
like I never turned it back on. However I haven't had any ungraceful
shutdowns since the import so I don't think this should be a factor.
The nature of the database is that I can rebuild it at any time from
the source (although it does take several days) so I'm not overly
vigilant about this.

I have left the database in its current state for now so I can run
some more checks if needed.

Toby

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #7838: pg_dump major bug
Next
From: digoal@126.com
Date:
Subject: BUG #7840: PostgreSQL 9.3 devel select for no key share lock bug?