Thread: how could duplicate pkey exist in psql?

how could duplicate pkey exist in psql?

From
Yan Chunlu
Date:
recently I have found several tables has exactly the same pkey,  here is the definition:
"diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)


the data is like this:

   159292 | funnypics_link_point       | 41                                                                                                                     | num
   159292 | funnypics_link_point       | 40                                                                                                                     | num


I could not even update this record.

really confused about how could this happen... thanks!

Re: how could duplicate pkey exist in psql?

From
"Albe Laurenz"
Date:
Yan Chunlu wrote:
> recently I have found several tables has exactly the same pkey,  here
is the definition:
> "diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)
>
>
> the data is like this:
>
>    159292 | funnypics_link_point       | 41
> | num
>    159292 | funnypics_link_point       | 40
> | num
>
>
> I could not even update this record.

It could be a software bug or something else, but did you ever
restore the database after a storage problem?

This has happened to me once:
http://archives.postgresql.org/pgsql-general/2010-02/msg00971.php

Yours,
Laurenz Albe

Re: how could duplicate pkey exist in psql?

From
Szymon Guz
Date:


On 17 November 2011 06:19, Yan Chunlu <springrider@gmail.com> wrote:
recently I have found several tables has exactly the same pkey,  here is the definition:
"diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)


the data is like this:

   159292 | funnypics_link_point       | 41                                                                                                                     | num
   159292 | funnypics_link_point       | 40                                                                                                                     | num


I could not even update this record.

really confused about how could this happen... thanks!

Hi,
could you send us result of the query: 
select thing_id, '|'||key||'|' from table? 
Maybe there are some more spaces in the key column which were hidden by table alignment in the client?

regards
Szymon

Re: how could duplicate pkey exist in psql?

From
Edson Richter
Date:
Em 17-11-2011 03:19, Yan Chunlu escreveu:
> recently I have found several tables has exactly the same pkey,  here
> is the definition:
> "diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)
>
>
> the data is like this:
>
>    159292 | funnypics_link_point       | 41
>
>                     | num
>    159292 | funnypics_link_point       | 40
>
>                     | num
>
>
> I could not even update this record.
>
> really confused about how could this happen... thanks!

I know one scenario this can happen on Linux. In my case, it was caused
by a "rsync"... instead copy to a different location, script was copying
pg_xlog over own pg_xlog.

I did this stupidity once, and learned for a life time. Lost two hours
of work to recover everything (from backup, at least I had one).

Be careful with rsync and cp, since Linux does not block files from
being overwriten even when they are in use.


Regards,

Edson.


Re: how could duplicate pkey exist in psql?

From
Yan Chunlu
Date:
seems they are identical:
   159292 | |funnypicscn_link_karma|
   159292 | |funnypicscn_link_karma|

On Thu, Nov 17, 2011 at 4:07 PM, Szymon Guz <mabewlun@gmail.com> wrote:


On 17 November 2011 06:19, Yan Chunlu <springrider@gmail.com> wrote:
recently I have found several tables has exactly the same pkey,  here is the definition:
"diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)


the data is like this:

   159292 | funnypics_link_point       | 41                                                                                                                     | num
   159292 | funnypics_link_point       | 40                                                                                                                     | num


I could not even update this record.

really confused about how could this happen... thanks!

Hi,
could you send us result of the query: 
select thing_id, '|'||key||'|' from table? 
Maybe there are some more spaces in the key column which were hidden by table alignment in the client?

regards
Szymon

Re: how could duplicate pkey exist in psql?

From
Yan Chunlu
Date:
 I am using pgpool's replication feature, it does copy pg_xlog from one server to another, was that possible cause of the problem?

thanks for the help!

On Thu, Nov 17, 2011 at 5:38 PM, Edson Richter <richter@simkorp.com.br> wrote:

Em 17-11-2011 03:19, Yan Chunlu escreveu:

recently I have found several tables has exactly the same pkey,  here is the definition:
"diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)


the data is like this:

  159292 | funnypics_link_point       | 41                                                                                                                     | num
  159292 | funnypics_link_point       | 40                                                                                                                     | num


I could not even update this record.

really confused about how could this happen... thanks!

I know one scenario this can happen on Linux. In my case, it was caused by a "rsync"... instead copy to a different location, script was copying pg_xlog over own pg_xlog.

I did this stupidity once, and learned for a life time. Lost two hours of work to recover everything (from backup, at least I had one).

Be careful with rsync and cp, since Linux does not block files from being overwriten even when they are in use.


Regards,

Edson.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: how could duplicate pkey exist in psql?

From
hubert depesz lubaczewski
Date:
On Thu, Nov 17, 2011 at 01:19:30PM +0800, Yan Chunlu wrote:
> recently I have found several tables has exactly the same pkey,  here is
> the definition:
> "diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)

please check:
select thing_id, key, count(*) from diggcontent_data_account group by 1,2 having
count(*) > 1;

this will show if you have really duplicated values.

if you have - the index ( diggcontent_data_account_pkey ) is broken.
Exact reason can vary, any chance this database is hot-backup restored
from different system?

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

Re: how could duplicate pkey exist in psql?

From
Edson Richter
Date:
Em 17-11-2011 09:21, Yan Chunlu escreveu:
I am using pgpool's replication feature, it does copy pg_xlog from one server to another, was that possible cause of the problem?

I did not mean that this IS your problem, I just gave you a tip regarding a problem I had in the past, that eventually has same simptom.

This scenario only happens when your script is copy data over own data... like in "rsync -ar root@127.0.0.1:/var/lib/pgsql/9.0/data/* /var/lib/pgsql/9.0/data/"

the command above is highly dangerous because it copies data over the network link over its own data... if you have transactions runing during the command above, you will get a crash (and, in my case, I had duplicate primary keys).

Would be better to check if this could be happening to you... some script overwriting data using rsync, cp, etc... I had no other situation where Postgresql allowed duplicate keys.

Hope this helps,

Edson.



thanks for the help!

On Thu, Nov 17, 2011 at 5:38 PM, Edson Richter <richter@simkorp.com.br> wrote:

Em 17-11-2011 03:19, Yan Chunlu escreveu:

recently I have found several tables has exactly the same pkey,  here is the definition:
"diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)


the data is like this:

  159292 | funnypics_link_point       | 41                                                                                                                     | num
  159292 | funnypics_link_point       | 40                                                                                                                     | num


I could not even update this record.

really confused about how could this happen... thanks!

I know one scenario this can happen on Linux. In my case, it was caused by a "rsync"... instead copy to a different location, script was copying pg_xlog over own pg_xlog.

I did this stupidity once, and learned for a life time. Lost two hours of work to recover everything (from backup, at least I had one).

Be careful with rsync and cp, since Linux does not block files from being overwriten even when they are in use.


Regards,

Edson.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: how could duplicate pkey exist in psql?

From
Yan Chunlu
Date:
got it.  thank you very much for you help.   I found out this problem too late, and there is no backup.

luckily there was not too much data for this, and my app keeps running without error.

I am not sure if they are related but I could not use pg_restore to import data dumped by "pg_dump -Fc";  

pg_restore will print some error message about "duplicate primary key", and the table is empty. no data has been imported.

pg_restore supposed to import the data and ignore the errors.    does any one have the similar problem?

On Thu, Nov 17, 2011 at 11:08 PM, Edson Richter <richter@simkorp.com.br> wrote:
Em 17-11-2011 09:21, Yan Chunlu escreveu:
I am using pgpool's replication feature, it does copy pg_xlog from one server to another, was that possible cause of the problem?

I did not mean that this IS your problem, I just gave you a tip regarding a problem I had in the past, that eventually has same simptom.

This scenario only happens when your script is copy data over own data... like in "rsync -ar root@127.0.0.1:/var/lib/pgsql/9.0/data/* /var/lib/pgsql/9.0/data/"

the command above is highly dangerous because it copies data over the network link over its own data... if you have transactions runing during the command above, you will get a crash (and, in my case, I had duplicate primary keys).

Would be better to check if this could be happening to you... some script overwriting data using rsync, cp, etc... I had no other situation where Postgresql allowed duplicate keys.

Hope this helps,

Edson.




thanks for the help!

On Thu, Nov 17, 2011 at 5:38 PM, Edson Richter <richter@simkorp.com.br> wrote:

Em 17-11-2011 03:19, Yan Chunlu escreveu:

recently I have found several tables has exactly the same pkey,  here is the definition:
"diggcontent_data_account_pkey" PRIMARY KEY, btree (thing_id, key)


the data is like this:

  159292 | funnypics_link_point       | 41                                                                                                                     | num
  159292 | funnypics_link_point       | 40                                                                                                                     | num


I could not even update this record.

really confused about how could this happen... thanks!

I know one scenario this can happen on Linux. In my case, it was caused by a "rsync"... instead copy to a different location, script was copying pg_xlog over own pg_xlog.

I did this stupidity once, and learned for a life time. Lost two hours of work to recover everything (from backup, at least I had one).

Be careful with rsync and cp, since Linux does not block files from being overwriten even when they are in use.


Regards,

Edson.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: how could duplicate pkey exist in psql?

From
Adrian Klaver
Date:
On Monday, November 21, 2011 6:39:55 am Yan Chunlu wrote:
> got it.  thank you very much for you help.   I found out this problem too
> late, and there is no backup.
>
> luckily there was not too much data for this, and my app keeps running
> without error.
>
> I am not sure if they are related but I could not use pg_restore to import
> data dumped by "pg_dump -Fc";
>
> pg_restore will print some error message about "duplicate primary key", and
> the table is empty. no data has been imported.
>
> pg_restore supposed to import the data and ignore the errors.    does any
> one have the similar problem?
>

pg_restore may ignore the error and keep on going but the database will not. In
other words when pg_restore receives the error it will continue on to the next
item (unless you have the -e switch on). As far as the server(database) is
concerned duplicate primary key is still an error and the data will not be
loaded.


--
Adrian Klaver
adrian.klaver@gmail.com

Re: how could duplicate pkey exist in psql?

From
Yan Chunlu
Date:
and database will stop receiving the following data after detected an error?  
that means while using pg_restore, no error allowed to happen, otherwise the database will stop receiving data and the import will fail.

I found only one record in psql's log:

 duplicate key value violates unique constraint "account_pkey"

does that means one duplicate record will prevent all other records to import?

On Mon, Nov 21, 2011 at 10:55 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote:
On Monday, November 21, 2011 6:39:55 am Yan Chunlu wrote:
> got it.  thank you very much for you help.   I found out this problem too
> late, and there is no backup.
>
> luckily there was not too much data for this, and my app keeps running
> without error.
>
> I am not sure if they are related but I could not use pg_restore to import
> data dumped by "pg_dump -Fc";
>
> pg_restore will print some error message about "duplicate primary key", and
> the table is empty. no data has been imported.
>
> pg_restore supposed to import the data and ignore the errors.    does any
> one have the similar problem?
>

pg_restore may ignore the error and keep on going but the database will not. In
other words when pg_restore receives the error it will continue on to the next
item (unless you have the -e switch on). As far as the server(database) is
concerned duplicate primary key is still an error and the data will not be
loaded.


--
Adrian Klaver
adrian.klaver@gmail.com

Re: how could duplicate pkey exist in psql?

From
Adrian Klaver
Date:
On Monday, November 21, 2011 4:53:21 pm Yan Chunlu wrote:
> and database will stop receiving the following data after detected an
> error?
> that means while using pg_restore, no error allowed to happen, otherwise
> the database will stop receiving data and the import will fail.
>
> I found only one record in psql's log:
>
>  duplicate key value violates unique constraint "account_pkey"
>
> does that means one duplicate record will prevent all other records to
> import?

For that table yes. Though if that table is the parent in FK relationships with
other tables, those tables will fail to import also because the keys they refer
to do not exist.

To get around this you have several options:
1) Find the duplicate entry(s) in the original table and eliminate them before
dumping.
2) Dump the table by itself to a plain text format and eliminate the
duplicate(s) in the plain text file before restoring.
3) By default pg_dump uses COPY to load data into tables. As you have found out
that runs as a single transaction and rollbacks if there is an error. You can
specify --insert to the pg_dump command to get it to output INSERT(s) for each
row. The up side is each INSERT is a separate transaction. The down side is if
there is a lot of data it will take a long time to load because each INSERT is a
separate transaction.
4) Use pgloader (http://pgfoundry.org/projects/pgloader/). It is a Python
program that 'manages' COPY. It will kick out bad rows and keep loading data.


--
Adrian Klaver
adrian.klaver@gmail.com