Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE constraint - Mailing list pgsql-hackers

From Travis Cross
Subject Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE constraint
Date
Msg-id 44854EB6.8020606@crosswirecorp.com
Whole thread Raw
Responses Re: Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE constraint  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE constraint  (Kenneth Marshall <ktm@it.is.rice.edu>)
List pgsql-hackers
I have a table that I am using to store email token data for DSPAM.

I'm noticing that a handful (4-16) of rows with duplicate columns
(uid,token) are sneaking into the table every day despite the
primary key constraint.

The server currently processes a few thousand emails per day, and
this particular table currently has about 4.5 million rows in it.

I feel as though I must be missing something here, because I have
always strongly assumed that postgresql prevents this sort of chaos
from happening by default.

When these duplicate pairs make their way into the table, all havoc
breaks loose ;)  The rows with the duplicate pairs seem to become
land mines.  The postgresql process handling a query that
subsequently runs across one of these rows dies, taking down
the DSPAM daemon with it, and sometimes corrupting the postgresql
shared memory enough that the postmaster has to shutdown the other
processes and restart everything anew [1].

I am usually able to clean things up by running the following, but
once or twice I've had to drop the unique constraint before
postgresql would process the request without choking:

delete from dspam_token_data
    where row(uid,token) in
            (select uid,token
              from dspam_token_data
              group by uid,token
              having count(*) > 1);

(I don't worry about preserving one of the duplicates here.)

I'm running postgresql-8.1.3.  Here is the table in question:

CREATE TABLE dspam.dspam_token_data
(
    uid int4 NOT NULL,
    token int8 NOT NULL,
    spam_hits int4,
    innocent_hits int4,
    last_hit date,
    CONSTRAINT dspam_token_data_pkey PRIMARY KEY (uid, token)
)
WITHOUT OIDS;

[2]

What steps can I take to fix this?  Is there any additional
information I can provide?

I've cleaned the table many times now.  I then run VACUUM ANALYZE.

My next step will probably be to hack the DSPAM sources to make the
application more careful about not trying to insert rows that would
violate the unique constraint.  Even still, it seems that would only
reduce the frequency of these occurrences, not eliminate them
completely.

Thanks!

Cheers,

-- Travis


----
Notes:
[1] A condensed log file showing off exactly what happens here is
attached.

[2]  Previously, the table lacked a primary key and instead used a
unique constraint and index.  This yielded the exact same results I
am currently seeing using a two-column primary key, as above.  The
old table schema was:

CREATE TABLE dspam.dspam_token_data
(
    uid int4,
    token int8,
    spam_hits int4,
    innocent_hits int4,
    last_hit date,
    CONSTRAINT dspam_token_data_uid_key UNIQUE (uid, token)
)
WITHOUT OIDS;

Jun  5 17:58:14 shadow postgres[28775]: [21-1] PANIC:  right sibling's left-link doesn't match
Jun  5 17:58:14 shadow postgres[28775]: [21-2] STATEMENT:  PREPARE dspam_update_plan (bigint) AS UPDATE
dspam_token_dataSET last_hit = CURRENT_DATE, innocent_hits = 
Jun  5 17:58:14 shadow postgres[28775]: [21-3]  innocent_hits + 1 WHERE uid = '1' AND token = $1;PREPARE
dspam_insert_plan(bigint, int, int) AS INSERT INTO dspam_token_data 
Jun  5 17:58:14 shadow postgres[28775]: [21-4]  (uid, token, spam_hits, innocent_hits, last_hit) VALUES (1, $1, $2, $3,
CURRENT_DATE);
Jun  5 17:58:14 shadow dspam[7780]: query error: (null): see sql.errors for more details
Jun  5 17:58:14 shadow postgres[10566]: [21-1] LOG:  server process (PID 28775) was terminated by signal 6
Jun  5 17:58:14 shadow postgres[10566]: [22-1] LOG:  terminating any other active server processes
Jun  5 17:58:14 shadow postgres[7219]: [21-1] WARNING:  terminating connection because of crash of another server
process
Jun  5 17:58:14 shadow postgres[7219]: [21-2] DETAIL:  The postmaster has commanded this server process to roll back
thecurrent transaction and exit, because another server 
Jun  5 17:58:14 shadow postgres[7219]: [21-3]  process exited abnormally and possibly corrupted shared memory.
....
Jun  5 17:58:14 shadow postgres[7525]: [24-1] LOG:  database system was interrupted at 2006-06-05 17:58:01 UTC
Jun  5 17:58:14 shadow postgres[7525]: [25-1] LOG:  checkpoint record is at 16/DA1D6868
Jun  5 17:58:14 shadow postgres[7525]: [26-1] LOG:  redo record is at 16/DA0081D8; undo record is at 0/0; shutdown
FALSE
Jun  5 17:58:14 shadow postgres[7525]: [27-1] LOG:  next transaction ID: 5691748; next OID: 148615
Jun  5 17:58:14 shadow postgres[7525]: [28-1] LOG:  next MultiXactId: 4; next MultiXactOffset: 10
Jun  5 17:58:14 shadow postgres[7525]: [29-1] LOG:  database system was not properly shut down; automatic recovery in
progress
Jun  5 17:58:14 shadow postgres[7525]: [30-1] LOG:  redo starts at 16/DA0081D8
Jun  5 17:58:14 shadow postgres[7525]: [31-1] LOG:  record with zero length at 16/DC2858F8
Jun  5 17:58:14 shadow postgres[7525]: [32-1] LOG:  redo done at 16/DC2858C8
Jun  5 17:58:16 shadow postgres[7542]: [24-1] FATAL:  the database system is starting up
Jun  5 17:58:16 shadow postgres[7543]: [24-1] FATAL:  the database system is starting up
Jun  5 17:58:24 shadow postgres[7525]: [33-1] LOG:  database system is ready
Jun  5 17:58:24 shadow postgres[7525]: [34-1] LOG:  transaction ID wrap limit is 1078513827, limited by database
"xmail"
Jun  5 17:59:24 shadow postgres[7632]: [24-1] LOG:  unexpected EOF on client connection
Jun  5 17:59:24 shadow postgres[7768]: [24-1] LOG:  unexpected EOF on client connection
Jun  5 17:59:37 shadow postgres[7836]: [24-1] WARNING:  there is no transaction in progress


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Why do we want to %Remove behavior of postmaster -o
Next
From: Joachim Wieland
Date:
Subject: AIX check in datetime.h