Thread: Adding PRIMARY KEY: Table contains duplicated values

Adding PRIMARY KEY: Table contains duplicated values

From
Alexander Farber
Date:
Hello,

when trying to add a forgotten primary key pair
to a PostgreSQL 8.4.13 table I get the error:

# \d pref_rep
                                       Table "public.pref_rep"
  Column   |            Type             |                         Modifiers
-----------+-----------------------------+-----------------------------------------------------------
 id        | character varying(32)       |
 author    | character varying(32)       |
 good      | boolean                     |
 fair      | boolean                     |
 nice      | boolean                     |
 about     | character varying(256)      |
 stamp     | timestamp without time zone | default now()
 author_ip | inet                        |
 rep_id    | integer                     | not null default
nextval('pref_rep_rep_id_seq'::regclass)
Check constraints:
    "pref_rep_check" CHECK (id::text <> author::text)
Foreign-key constraints:
    "pref_rep_author_fkey" FOREIGN KEY (author) REFERENCES
pref_users(id) ON DELETE CASCADE
    "pref_rep_id_fkey" FOREIGN KEY (id) REFERENCES pref_users(id) ON
DELETE CASCADE

# alter table pref_rep add primary key(id, author);
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pref_rep_pkey" for table "pref_rep"
ERROR:  could not create unique index "pref_rep_pkey"
DETAIL:  Table contains duplicated values.

How could I find those duplicated pairs of id and author?

I've tried following, but this of course doesn't give me "pairs":

# select id, count(id) from pref_rep group by id order by count desc limit 5;
       id       | count
----------------+-------
 OK408547485023 |   706
 OK261593357402 |   582
 DE11198        |   561
 DE13041        |   560
 OK347613386893 |   556
(5 rows)

Thank you
Alex

P.S. I've also asked my question also at SO,
         hope it is okay to "crosspost" that way :-)

http://stackoverflow.com/questions/14688523/adding-primary-key-table-contains-duplicated-values

Re: Adding PRIMARY KEY: Table contains duplicated values

From
Adrian Klaver
Date:
On 02/04/2013 06:17 AM, Alexander Farber wrote:
> Hello,
>
> when trying to add a forgotten primary key pair
> to a PostgreSQL 8.4.13 table I get the error:
>
> # \d pref_rep
>                                         Table "public.pref_rep"
>    Column   |            Type             |                         Modifiers
> -----------+-----------------------------+-----------------------------------------------------------
>   id        | character varying(32)       |
>   author    | character varying(32)       |
>   good      | boolean                     |
>   fair      | boolean                     |
>   nice      | boolean                     |
>   about     | character varying(256)      |
>   stamp     | timestamp without time zone | default now()
>   author_ip | inet                        |
>   rep_id    | integer                     | not null default
> nextval('pref_rep_rep_id_seq'::regclass)
> Check constraints:
>      "pref_rep_check" CHECK (id::text <> author::text)
> Foreign-key constraints:
>      "pref_rep_author_fkey" FOREIGN KEY (author) REFERENCES
> pref_users(id) ON DELETE CASCADE
>      "pref_rep_id_fkey" FOREIGN KEY (id) REFERENCES pref_users(id) ON
> DELETE CASCADE
>
> # alter table pref_rep add primary key(id, author);
> NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
> "pref_rep_pkey" for table "pref_rep"
> ERROR:  could not create unique index "pref_rep_pkey"
> DETAIL:  Table contains duplicated values.
>
> How could I find those duplicated pairs of id and author?
>
> I've tried following, but this of course doesn't give me "pairs":
>
> # select id, count(id) from pref_rep group by id order by count desc limit 5;
>         id       | count
> ----------------+-------
>   OK408547485023 |   706
>   OK261593357402 |   582
>   DE11198        |   561
>   DE13041        |   560
>   OK347613386893 |   556
> (5 rows)


SELECT * FROM  (SELECT count(*) AS ct, id, author FROM pref_rep GROUP BY
id, author) AS dup WHERE dup.ct >1;

>
> Thank you
> Alex



--
Adrian Klaver
adrian.klaver@gmail.com

Re: Adding PRIMARY KEY: Table contains duplicated values

From
Alexander Farber
Date:
Thank you -

On Mon, Feb 4, 2013 at 3:26 PM, Andrew Jaimes <andrewjaimes@hotmail.com> wrote:
> SELECT id, author, count(1)
>   FROM pref_rep
> GROUP BY id, author
> HAVING count(1) >1

>> From: alexander.farber@gmail.com
>> http://stackoverflow.com/questions/14688523/adding-primary-key-table-contains-duplicated-values

this has worked and has delivered me 190 records
(I still wonder how they could have happened,
because I only used a stored procedure
with UPDATE - if NOT FOUND - INSERT
Is it maybe pgbouncer's fault?):

           id           |         author         | count
------------------------+------------------------+-------
 DE10598                | OK495480409724         |     2
 DE12188                | MR17925810634439466500 |     3
 DE13529                | OK471161192902         |     2
 DE13963                | OK434087948702         |     2
 DE14037                | DE7692                 |     2
......
 VK45132921             | DE3544                 |     2
 VK6152782              | OK261593357402         |     2
 VK72883921             | OK506067284178         |     2
(190 rows)

And then I'm trying to construct a query which
would delete the older (the "stamp" column)
of such pairs - but this also doesn't work:

#  SELECT id, author, count(1), stamp
  FROM pref_rep
GROUP BY id, author, stamp
HAVING count(1) >1;
       id       |        author         | count |           stamp
----------------+-----------------------+-------+----------------------------
 OK14832267156  | OK419052078016        |     2 | 2012-04-11 12:54:02.980239
 OK333460361587 | VK151946174           |     2 | 2012-07-04 07:08:22.172663
 OK351109431016 | OK165881471481        |     2 | 2011-09-18 18:29:33.51362
 OK367507493096 | OK342027384470        |     5 | 2012-02-10 20:58:11.488184
 OK430882956135 | OK331014635822        |     2 | 2012-11-21 18:38:23.141298
 OK446355841129 | OK353460633855        |     2 | 2012-06-15 21:31:56.791688
 OK450700410618 | OK511055704249        |     2 | 2012-03-16 15:19:50.27776
 OK458979640673 | OK165881471481        |     2 | 2011-08-18 22:31:17.540112
 OK468333888972 | MR5100358507294433874 |     2 | 2012-12-05 14:16:15.870061
 OK485109177380 | DE12383               |     2 | 2011-09-16 16:00:38.625038
 OK505164304516 | OK165881471481        |     2 | 2012-03-24 13:54:27.968482
(11 rows)

Any suggestions please? Should I use a temp table here?

Thank you
Alex

Re: Adding PRIMARY KEY: Table contains duplicated values

From
Alexander Farber
Date:
Trying to delete the older of the duplicated pairs:

# SELECT id, author, count(1), max(stamp) as maxx
  FROM pref_rep
GROUP BY id, author
HAVING count(1) >1 and stamp < maxx;
ERROR:  column "maxx" does not exist
LINE 4: HAVING count(1) >1 and stamp < maxx;
                                       ^

On Mon, Feb 4, 2013 at 3:36 PM, Alexander Farber
<alexander.farber@gmail.com> wrote:
>>> http://stackoverflow.com/questions/14688523/adding-primary-key-table-contains-duplicated-values
>

Re: Adding PRIMARY KEY: Table contains duplicated values

From
Adrian Klaver
Date:
On 02/04/2013 06:45 AM, Alexander Farber wrote:
> Trying to delete the older of the duplicated pairs:
>
> # SELECT id, author, count(1), max(stamp) as maxx
>    FROM pref_rep
> GROUP BY id, author
> HAVING count(1) >1 and stamp < maxx;
> ERROR:  column "maxx" does not exist
> LINE 4: HAVING count(1) >1 and stamp < maxx;

How about:
SELECT id, author, count(1), max(stamp) as maxx
   FROM pref_rep
GROUP BY id, author
HAVING count(1) >1 and stamp < max(stamp);

>                                         ^
>
> On Mon, Feb 4, 2013 at 3:36 PM, Alexander Farber
> <alexander.farber@gmail.com> wrote:
>>>> http://stackoverflow.com/questions/14688523/adding-primary-key-table-contains-duplicated-values
>>
>
>


--
Adrian Klaver
adrian.klaver@gmail.com

Re: Adding PRIMARY KEY: Table contains duplicated values

From
Alexander Farber
Date:
Unfortunately  that fails -

On Mon, Feb 4, 2013 at 3:55 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote:
> On 02/04/2013 06:45 AM, Alexander Farber wrote:
>>
>> Trying to delete the older of the duplicated pairs:
>>
>
> How about:
>
> SELECT id, author, count(1), max(stamp) as maxx
>   FROM pref_rep
> GROUP BY id, author
> HAVING count(1) >1 and stamp < max(stamp);

>> On Mon, Feb 4, 2013 at 3:36 PM, Alexander Farber
>> <alexander.farber@gmail.com> wrote:
>>>>> http://stackoverflow.com/questions/14688523/adding-primary-key-table-contains-duplicated-values

# SELECT id, author, count(1), max(stamp) as maxx
pref->   FROM pref_rep
pref-> GROUP BY id, author
pref-> HAVING count(1) >1 and stamp < max(stamp);
ERROR:  column "pref_rep.stamp" must appear in the GROUP BY clause or
be used in an aggregate function
LINE 4: HAVING count(1) >1 and stamp < max(stamp);

Re: Adding PRIMARY KEY: Table contains duplicated values

From
Adrian Klaver
Date:
On 02/04/2013 06:45 AM, Alexander Farber wrote:
> Trying to delete the older of the duplicated pairs:
>
> # SELECT id, author, count(1), max(stamp) as maxx
>    FROM pref_rep
> GROUP BY id, author
> HAVING count(1) >1 and stamp < maxx;
> ERROR:  column "maxx" does not exist
> LINE 4: HAVING count(1) >1 and stamp < maxx;
>                                         ^

Caffeine has not reached critical mass yet, so test before using:

  SELECT *  FROM pref_rep JOIN (SELECT id, author, count(1) AS ct,
max(stamp) AS maxx FROM pref_rep GROUP BY id,author) max_time ON
max_time.id=pref_rep.id WHERE ct > 1 AND stamp < maxx;
>


--
Adrian Klaver
adrian.klaver@gmail.com

Re: Adding PRIMARY KEY: Table contains duplicated values

From
Andreas Kretschmer
Date:
Alexander Farber <alexander.farber@gmail.com> wrote:

> # alter table pref_rep add primary key(id, author);
> NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
> "pref_rep_pkey" for table "pref_rep"
> ERROR:  could not create unique index "pref_rep_pkey"
> DETAIL:  Table contains duplicated values.
>
> How could I find those duplicated pairs of id and author?

similar example:

test=*# select * from foo;
 id1 | id2
-----+-----
   1 |   1
   1 |   2
   1 |   3
   2 |   1
   2 |   2
   2 |   3
   1 |   2
   3 |   1
   3 |   2
   3 |   3
   3 |   1
(11 rows)

Time: 0,151 ms
test=*# alter table foo add primary key (id1,id2);
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "foo_pkey" for table "foo"
ERROR:  could not create unique index "foo_pkey"
DETAIL:  Key (id1, id2)=(1, 2) is duplicated.
Time: 1,394 ms
test=*# select id1, id2, count(*) as c from foo group by id1, id2 having count(*) > 1;
 id1 | id2 | c
-----+-----+---
   3 |   1 | 2
   1 |   2 | 2
(2 rows)

Time: 0,331 ms


HTH.


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: Adding PRIMARY KEY: Table contains duplicated values

From
Andreas Kretschmer
Date:
Andreas Kretschmer <akretschmer@spamfence.net> wrote:

> Alexander Farber <alexander.farber@gmail.com> wrote:
>
> > # alter table pref_rep add primary key(id, author);
> > NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
> > "pref_rep_pkey" for table "pref_rep"
> > ERROR:  could not create unique index "pref_rep_pkey"
> > DETAIL:  Table contains duplicated values.
> >
> > How could I find those duplicated pairs of id and author?
>
> similar example:
>
> test=*# select * from foo;
>  id1 | id2
> -----+-----
>    1 |   1
>    1 |   2
>    1 |   3
>    2 |   1
>    2 |   2
>    2 |   3
>    1 |   2
>    3 |   1
>    3 |   2
>    3 |   3
>    3 |   1
> (11 rows)
>
> Time: 0,151 ms
> test=*# alter table foo add primary key (id1,id2);
> NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "foo_pkey" for table "foo"
> ERROR:  could not create unique index "foo_pkey"
> DETAIL:  Key (id1, id2)=(1, 2) is duplicated.
> Time: 1,394 ms
> test=*# select id1, id2, count(*) as c from foo group by id1, id2 having count(*) > 1;
>  id1 | id2 | c
> -----+-----+---
>    3 |   1 | 2
>    1 |   2 | 2
> (2 rows)
>
> Time: 0,331 ms

If your next question is 'how to delete ...', my answer:
(yeah, reading SO ;-) )


test=*# select ctid,* from foo;
  ctid  | id1 | id2
--------+-----+-----
 (0,1)  |   1 |   1
 (0,2)  |   1 |   2
 (0,3)  |   1 |   3
 (0,4)  |   2 |   1
 (0,5)  |   2 |   2
 (0,6)  |   2 |   3
 (0,7)  |   1 |   2
 (0,8)  |   3 |   1
 (0,9)  |   3 |   2
 (0,10) |   3 |   3
 (0,11) |   3 |   1
(11 rows)

Time: 0,170 ms
test=*# delete from foo where ctid in (select min(ctid) from foo where (id1,id2) in (select id1, id2 from foo group by
id1,id2 having count(*) > 1) group by id1,id2); 
DELETE 2
Time: 0,559 ms
test=*# select ctid,* from foo;
  ctid  | id1 | id2
--------+-----+-----
 (0,1)  |   1 |   1
 (0,3)  |   1 |   3
 (0,4)  |   2 |   1
 (0,5)  |   2 |   2
 (0,6)  |   2 |   3
 (0,7)  |   1 |   2
 (0,9)  |   3 |   2
 (0,10) |   3 |   3
 (0,11) |   3 |   1
(9 rows)





Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: Adding PRIMARY KEY: Table contains duplicated values

From
Alexander Farber
Date:
Thank you -

On Mon, Feb 4, 2013 at 5:06 PM, Andreas Kretschmer
<akretschmer@spamfence.net> wrote:
>> # alter table pref_rep add primary key(id, author);
>> NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
>> "pref_rep_pkey" for table "pref_rep"
>> ERROR:  could not create unique index "pref_rep_pkey"
>> DETAIL:  Table contains duplicated values.

how do you get this DETAIL, is it a setting for psql prompt?

I've got a nice answer for my question at Stackoverflow:

DELETE FROM pref_rep p USING (
  SELECT id, author, max(stamp) stamp
    FROM pref_rep
   GROUP BY id, author
  HAVING count(1) > 1) AS f
WHERE p.id=f.id AND p.author=f.author AND p.stamp<f.stamp;

and learnt about SQL Fiddle too -
http://sqlfiddle.com/#!11/59fbc/11

Regards
Alex

Re: Adding PRIMARY KEY: Table contains duplicated values

From
Andreas Kretschmer
Date:
Alexander Farber <alexander.farber@gmail.com> wrote:

> Thank you -
>
> On Mon, Feb 4, 2013 at 5:06 PM, Andreas Kretschmer
> <akretschmer@spamfence.net> wrote:
> >> # alter table pref_rep add primary key(id, author);
> >> NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
> >> "pref_rep_pkey" for table "pref_rep"
> >> ERROR:  could not create unique index "pref_rep_pkey"
> >> DETAIL:  Table contains duplicated values.
>
> how do you get this DETAIL, is it a setting for psql prompt?

You means the 'DETAIL:  Key (id1, id2)=(1, 2) is duplicated.'?

I'm using 9.2, i think, that's the reason.



Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°