Thread: Adding PRIMARY KEY: Table contains duplicated values
Hello, when trying to add a forgotten primary key pair to a PostgreSQL 8.4.13 table I get the error: # \d pref_rep Table "public.pref_rep" Column | Type | Modifiers -----------+-----------------------------+----------------------------------------------------------- id | character varying(32) | author | character varying(32) | good | boolean | fair | boolean | nice | boolean | about | character varying(256) | stamp | timestamp without time zone | default now() author_ip | inet | rep_id | integer | not null default nextval('pref_rep_rep_id_seq'::regclass) Check constraints: "pref_rep_check" CHECK (id::text <> author::text) Foreign-key constraints: "pref_rep_author_fkey" FOREIGN KEY (author) REFERENCES pref_users(id) ON DELETE CASCADE "pref_rep_id_fkey" FOREIGN KEY (id) REFERENCES pref_users(id) ON DELETE CASCADE # alter table pref_rep add primary key(id, author); NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "pref_rep_pkey" for table "pref_rep" ERROR: could not create unique index "pref_rep_pkey" DETAIL: Table contains duplicated values. How could I find those duplicated pairs of id and author? I've tried following, but this of course doesn't give me "pairs": # select id, count(id) from pref_rep group by id order by count desc limit 5; id | count ----------------+------- OK408547485023 | 706 OK261593357402 | 582 DE11198 | 561 DE13041 | 560 OK347613386893 | 556 (5 rows) Thank you Alex P.S. I've also asked my question also at SO, hope it is okay to "crosspost" that way :-) http://stackoverflow.com/questions/14688523/adding-primary-key-table-contains-duplicated-values
On 02/04/2013 06:17 AM, Alexander Farber wrote: > Hello, > > when trying to add a forgotten primary key pair > to a PostgreSQL 8.4.13 table I get the error: > > # \d pref_rep > Table "public.pref_rep" > Column | Type | Modifiers > -----------+-----------------------------+----------------------------------------------------------- > id | character varying(32) | > author | character varying(32) | > good | boolean | > fair | boolean | > nice | boolean | > about | character varying(256) | > stamp | timestamp without time zone | default now() > author_ip | inet | > rep_id | integer | not null default > nextval('pref_rep_rep_id_seq'::regclass) > Check constraints: > "pref_rep_check" CHECK (id::text <> author::text) > Foreign-key constraints: > "pref_rep_author_fkey" FOREIGN KEY (author) REFERENCES > pref_users(id) ON DELETE CASCADE > "pref_rep_id_fkey" FOREIGN KEY (id) REFERENCES pref_users(id) ON > DELETE CASCADE > > # alter table pref_rep add primary key(id, author); > NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index > "pref_rep_pkey" for table "pref_rep" > ERROR: could not create unique index "pref_rep_pkey" > DETAIL: Table contains duplicated values. > > How could I find those duplicated pairs of id and author? > > I've tried following, but this of course doesn't give me "pairs": > > # select id, count(id) from pref_rep group by id order by count desc limit 5; > id | count > ----------------+------- > OK408547485023 | 706 > OK261593357402 | 582 > DE11198 | 561 > DE13041 | 560 > OK347613386893 | 556 > (5 rows) SELECT * FROM (SELECT count(*) AS ct, id, author FROM pref_rep GROUP BY id, author) AS dup WHERE dup.ct >1; > > Thank you > Alex -- Adrian Klaver adrian.klaver@gmail.com
Thank you - On Mon, Feb 4, 2013 at 3:26 PM, Andrew Jaimes <andrewjaimes@hotmail.com> wrote: > SELECT id, author, count(1) > FROM pref_rep > GROUP BY id, author > HAVING count(1) >1 >> From: alexander.farber@gmail.com >> http://stackoverflow.com/questions/14688523/adding-primary-key-table-contains-duplicated-values this has worked and has delivered me 190 records (I still wonder how they could have happened, because I only used a stored procedure with UPDATE - if NOT FOUND - INSERT Is it maybe pgbouncer's fault?): id | author | count ------------------------+------------------------+------- DE10598 | OK495480409724 | 2 DE12188 | MR17925810634439466500 | 3 DE13529 | OK471161192902 | 2 DE13963 | OK434087948702 | 2 DE14037 | DE7692 | 2 ...... VK45132921 | DE3544 | 2 VK6152782 | OK261593357402 | 2 VK72883921 | OK506067284178 | 2 (190 rows) And then I'm trying to construct a query which would delete the older (the "stamp" column) of such pairs - but this also doesn't work: # SELECT id, author, count(1), stamp FROM pref_rep GROUP BY id, author, stamp HAVING count(1) >1; id | author | count | stamp ----------------+-----------------------+-------+---------------------------- OK14832267156 | OK419052078016 | 2 | 2012-04-11 12:54:02.980239 OK333460361587 | VK151946174 | 2 | 2012-07-04 07:08:22.172663 OK351109431016 | OK165881471481 | 2 | 2011-09-18 18:29:33.51362 OK367507493096 | OK342027384470 | 5 | 2012-02-10 20:58:11.488184 OK430882956135 | OK331014635822 | 2 | 2012-11-21 18:38:23.141298 OK446355841129 | OK353460633855 | 2 | 2012-06-15 21:31:56.791688 OK450700410618 | OK511055704249 | 2 | 2012-03-16 15:19:50.27776 OK458979640673 | OK165881471481 | 2 | 2011-08-18 22:31:17.540112 OK468333888972 | MR5100358507294433874 | 2 | 2012-12-05 14:16:15.870061 OK485109177380 | DE12383 | 2 | 2011-09-16 16:00:38.625038 OK505164304516 | OK165881471481 | 2 | 2012-03-24 13:54:27.968482 (11 rows) Any suggestions please? Should I use a temp table here? Thank you Alex
Trying to delete the older of the duplicated pairs: # SELECT id, author, count(1), max(stamp) as maxx FROM pref_rep GROUP BY id, author HAVING count(1) >1 and stamp < maxx; ERROR: column "maxx" does not exist LINE 4: HAVING count(1) >1 and stamp < maxx; ^ On Mon, Feb 4, 2013 at 3:36 PM, Alexander Farber <alexander.farber@gmail.com> wrote: >>> http://stackoverflow.com/questions/14688523/adding-primary-key-table-contains-duplicated-values >
On 02/04/2013 06:45 AM, Alexander Farber wrote: > Trying to delete the older of the duplicated pairs: > > # SELECT id, author, count(1), max(stamp) as maxx > FROM pref_rep > GROUP BY id, author > HAVING count(1) >1 and stamp < maxx; > ERROR: column "maxx" does not exist > LINE 4: HAVING count(1) >1 and stamp < maxx; How about: SELECT id, author, count(1), max(stamp) as maxx FROM pref_rep GROUP BY id, author HAVING count(1) >1 and stamp < max(stamp); > ^ > > On Mon, Feb 4, 2013 at 3:36 PM, Alexander Farber > <alexander.farber@gmail.com> wrote: >>>> http://stackoverflow.com/questions/14688523/adding-primary-key-table-contains-duplicated-values >> > > -- Adrian Klaver adrian.klaver@gmail.com
Unfortunately that fails - On Mon, Feb 4, 2013 at 3:55 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote: > On 02/04/2013 06:45 AM, Alexander Farber wrote: >> >> Trying to delete the older of the duplicated pairs: >> > > How about: > > SELECT id, author, count(1), max(stamp) as maxx > FROM pref_rep > GROUP BY id, author > HAVING count(1) >1 and stamp < max(stamp); >> On Mon, Feb 4, 2013 at 3:36 PM, Alexander Farber >> <alexander.farber@gmail.com> wrote: >>>>> http://stackoverflow.com/questions/14688523/adding-primary-key-table-contains-duplicated-values # SELECT id, author, count(1), max(stamp) as maxx pref-> FROM pref_rep pref-> GROUP BY id, author pref-> HAVING count(1) >1 and stamp < max(stamp); ERROR: column "pref_rep.stamp" must appear in the GROUP BY clause or be used in an aggregate function LINE 4: HAVING count(1) >1 and stamp < max(stamp);
On 02/04/2013 06:45 AM, Alexander Farber wrote: > Trying to delete the older of the duplicated pairs: > > # SELECT id, author, count(1), max(stamp) as maxx > FROM pref_rep > GROUP BY id, author > HAVING count(1) >1 and stamp < maxx; > ERROR: column "maxx" does not exist > LINE 4: HAVING count(1) >1 and stamp < maxx; > ^ Caffeine has not reached critical mass yet, so test before using: SELECT * FROM pref_rep JOIN (SELECT id, author, count(1) AS ct, max(stamp) AS maxx FROM pref_rep GROUP BY id,author) max_time ON max_time.id=pref_rep.id WHERE ct > 1 AND stamp < maxx; > -- Adrian Klaver adrian.klaver@gmail.com
Alexander Farber <alexander.farber@gmail.com> wrote: > # alter table pref_rep add primary key(id, author); > NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index > "pref_rep_pkey" for table "pref_rep" > ERROR: could not create unique index "pref_rep_pkey" > DETAIL: Table contains duplicated values. > > How could I find those duplicated pairs of id and author? similar example: test=*# select * from foo; id1 | id2 -----+----- 1 | 1 1 | 2 1 | 3 2 | 1 2 | 2 2 | 3 1 | 2 3 | 1 3 | 2 3 | 3 3 | 1 (11 rows) Time: 0,151 ms test=*# alter table foo add primary key (id1,id2); NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "foo_pkey" for table "foo" ERROR: could not create unique index "foo_pkey" DETAIL: Key (id1, id2)=(1, 2) is duplicated. Time: 1,394 ms test=*# select id1, id2, count(*) as c from foo group by id1, id2 having count(*) > 1; id1 | id2 | c -----+-----+--- 3 | 1 | 2 1 | 2 | 2 (2 rows) Time: 0,331 ms HTH. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
Andreas Kretschmer <akretschmer@spamfence.net> wrote: > Alexander Farber <alexander.farber@gmail.com> wrote: > > > # alter table pref_rep add primary key(id, author); > > NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index > > "pref_rep_pkey" for table "pref_rep" > > ERROR: could not create unique index "pref_rep_pkey" > > DETAIL: Table contains duplicated values. > > > > How could I find those duplicated pairs of id and author? > > similar example: > > test=*# select * from foo; > id1 | id2 > -----+----- > 1 | 1 > 1 | 2 > 1 | 3 > 2 | 1 > 2 | 2 > 2 | 3 > 1 | 2 > 3 | 1 > 3 | 2 > 3 | 3 > 3 | 1 > (11 rows) > > Time: 0,151 ms > test=*# alter table foo add primary key (id1,id2); > NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "foo_pkey" for table "foo" > ERROR: could not create unique index "foo_pkey" > DETAIL: Key (id1, id2)=(1, 2) is duplicated. > Time: 1,394 ms > test=*# select id1, id2, count(*) as c from foo group by id1, id2 having count(*) > 1; > id1 | id2 | c > -----+-----+--- > 3 | 1 | 2 > 1 | 2 | 2 > (2 rows) > > Time: 0,331 ms If your next question is 'how to delete ...', my answer: (yeah, reading SO ;-) ) test=*# select ctid,* from foo; ctid | id1 | id2 --------+-----+----- (0,1) | 1 | 1 (0,2) | 1 | 2 (0,3) | 1 | 3 (0,4) | 2 | 1 (0,5) | 2 | 2 (0,6) | 2 | 3 (0,7) | 1 | 2 (0,8) | 3 | 1 (0,9) | 3 | 2 (0,10) | 3 | 3 (0,11) | 3 | 1 (11 rows) Time: 0,170 ms test=*# delete from foo where ctid in (select min(ctid) from foo where (id1,id2) in (select id1, id2 from foo group by id1,id2 having count(*) > 1) group by id1,id2); DELETE 2 Time: 0,559 ms test=*# select ctid,* from foo; ctid | id1 | id2 --------+-----+----- (0,1) | 1 | 1 (0,3) | 1 | 3 (0,4) | 2 | 1 (0,5) | 2 | 2 (0,6) | 2 | 3 (0,7) | 1 | 2 (0,9) | 3 | 2 (0,10) | 3 | 3 (0,11) | 3 | 1 (9 rows) Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
Thank you - On Mon, Feb 4, 2013 at 5:06 PM, Andreas Kretschmer <akretschmer@spamfence.net> wrote: >> # alter table pref_rep add primary key(id, author); >> NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index >> "pref_rep_pkey" for table "pref_rep" >> ERROR: could not create unique index "pref_rep_pkey" >> DETAIL: Table contains duplicated values. how do you get this DETAIL, is it a setting for psql prompt? I've got a nice answer for my question at Stackoverflow: DELETE FROM pref_rep p USING ( SELECT id, author, max(stamp) stamp FROM pref_rep GROUP BY id, author HAVING count(1) > 1) AS f WHERE p.id=f.id AND p.author=f.author AND p.stamp<f.stamp; and learnt about SQL Fiddle too - http://sqlfiddle.com/#!11/59fbc/11 Regards Alex
Alexander Farber <alexander.farber@gmail.com> wrote: > Thank you - > > On Mon, Feb 4, 2013 at 5:06 PM, Andreas Kretschmer > <akretschmer@spamfence.net> wrote: > >> # alter table pref_rep add primary key(id, author); > >> NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index > >> "pref_rep_pkey" for table "pref_rep" > >> ERROR: could not create unique index "pref_rep_pkey" > >> DETAIL: Table contains duplicated values. > > how do you get this DETAIL, is it a setting for psql prompt? You means the 'DETAIL: Key (id1, id2)=(1, 2) is duplicated.'? I'm using 9.2, i think, that's the reason. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°