Thread: duplicates

duplicates

From
Tsirkin Evgeny
Date:
Hello dear list!
Here is the problem i have:
i am using 7.3.4 postgres .i have an aplication that updating 2 tables.
while it needs to update something - it does not select the rows that
are already in the table search what it needs to update and execute
an 'update' query .in place it deletes all the old rows and inserts the
new one.However while we have a havy load we got a duplicate rows in the
table ,althought we use transaction an both delete and the new insert
are in the same transaction.We are pretty sure there is no bug in
the applicatioin that inserts the data more then once .
Is that a known problem ?What could be the problem?

--
Evgeny.

Re: duplicates

From
"Andrei Bintintan"
Date:
I am not sure that I understand clearly your problem. Are you sure that your
query's are written correctly?

For duplicates you can make a uniqe indexing so this will avoid any
dupplicates in your table.
CREATE UNIQUE INDEX table_column_uniqueidx ON table(column);

If the rows are dissapearing, please check your delete query, because that
is the only way to erase all rows from the table.

If still got problems, please post some queries, be more specific.

Best regards,
Andy.

----- Original Message -----
From: "Tsirkin Evgeny" <tsurkin@mail.jct.ac.il>
To: <pgsql-admin@postgresql.org>
Sent: Monday, September 06, 2004 9:33 AM
Subject: [ADMIN] duplicates


> Hello dear list!
> Here is the problem i have:
> i am using 7.3.4 postgres .i have an aplication that updating 2 tables.
> while it needs to update something - it does not select the rows that
> are already in the table search what it needs to update and execute
> an 'update' query .in place it deletes all the old rows and inserts the
> new one.However while we have a havy load we got a duplicate rows in the
> table ,althought we use transaction an both delete and the new insert
> are in the same transaction.We are pretty sure there is no bug in
> the applicatioin that inserts the data more then once .
> Is that a known problem ?What could be the problem?
>
> --
> Evgeny.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly


Re: duplicates

From
Tsirkin Evgeny
Date:
On Mon, 6 Sep 2004, Andrei Bintintan wrote:

> If still got problems, please post some queries, be more specific.
>
> Best regards,
> Andy.
>
Ok i will try:
CREATE TABLE schedule (
       studentid    decimal(9),
       groupid      decimal(10),
       maslulsignid decimal(7),
       tfusot        varchar(29)
);

that is the table for writing down the courses/groups
a students takes.note there is NO unique constrain here
(maybe it should be but that the way it is and probably
can't can't be changed).while changing groups for a student
the applic. deletes all the groups the student had :
delete from schedule where studentid=11111;
and then inserts the new groups (regular inserts).
Now sometimes we got duplicates of the same groupid and
studentid in the table(everything is same).i thought that \
maybe the delete do not delete but the insert succeed?

Note :I know that there could be created the a unique key and
a constrain but befor we have to understand why the duplicates
were created.

--
Evgeny.

Re: duplicates

From
"Andrei Bintintan"
Date:
----- Original Message -----
From: "Tsirkin Evgeny" <tsurkin@mail.jct.ac.il>
To: "Andrei Bintintan" <klodoma@ar-sd.net>
Cc: <pgsql-admin@postgresql.org>
Sent: Monday, September 06, 2004 10:57 AM
Subject: Re: [ADMIN] duplicates


> On Mon, 6 Sep 2004, Andrei Bintintan wrote:
>
> > If still got problems, please post some queries, be more specific.
> >
> > Best regards,
> > Andy.
> >
> Ok i will try:
> CREATE TABLE schedule (
>        studentid    decimal(9),
>        groupid      decimal(10),
>        maslulsignid decimal(7),
>        tfusot        varchar(29)
> );

Use something like this:

 CREATE TABLE schedule (
    id serial PRIMARY KEY,
    studentid    decimal(9),
    groupid      decimal(10),
    maslulsignid decimal(7),
    tfusot        varchar(29)
 );

Now, ALWAYS use a ID for a table. This id will be always uniqe(because it's
primary key). You don't have to insert this field when you're making an
insert, the server does it automatically. And when you delete you refere
this key and not the student ID.

For ex: you have student id = 1111 in 3 groups
id studentid groupid maslulsignid
1    1111        22        some val
2    1111        33        some val
3    1111        44        some val

If you delete: "Delete from table where studentid=1111" it deletes all the
fields from table. If you want to delete only field 2 then you delete:
delete from table where id=2.

Normally in this kind of tables you should never have dupplicates I mean, in
tables you should not have dupplicates(all fields the samein your case
(studentid    groupid    maslulsignid  tfusot) the same val  - this means
structure conception error).

Hope this helps.

Best regards.

>
> that is the table for writing down the courses/groups
> a students takes.note there is NO unique constrain here
> (maybe it should be but that the way it is and probably
> can't can't be changed).while changing groups for a student
> the applic. deletes all the groups the student had :
> delete from schedule where studentid=11111;
> and then inserts the new groups (regular inserts).
> Now sometimes we got duplicates of the same groupid and
> studentid in the table(everything is same).i thought that \
> maybe the delete do not delete but the insert succeed?
>
> Note :I know that there could be created the a unique key and
> a constrain but befor we have to understand why the duplicates
> were created.
>
> --
> Evgeny.


Re: duplicates

From
Tsirkin Evgeny
Date:
yes i understand that i can create a primary key/unique etc...
however my question is if i have to understand if and why
the i got duplicate rows inserted.
so here is the picture:
an application is deleting rows and inserting right after that
new ones some of which are the same as the old one,and i am getting
duplicates !Don't transaction should prevent this by not letting
insert ot do something in case the delete did not succeed?
Another option is that i have to clients deleting and then inserting the
same thing into table ,but again should not transaction prevent duplicates
in that case ?

> Use something like this:
>
>  CREATE TABLE schedule (
>     id serial PRIMARY KEY,
>     studentid    decimal(9),
>     groupid      decimal(10),
>     maslulsignid decimal(7),
>     tfusot        varchar(29)
>  );
>


--
Evgeny.

Re: duplicates

From
"Andrei Bintintan"
Date:
----- Original Message -----
From: "Tsirkin Evgeny" <tsurkin@mail.jct.ac.il>
To: "Andrei Bintintan" <klodoma@ar-sd.net>
Cc: <pgsql-admin@postgresql.org>
Sent: Monday, September 06, 2004 12:19 PM
Subject: Re: [ADMIN] duplicates


> yes i understand that i can create a primary key/unique etc...
> however my question is if i have to understand if and why
> the i got duplicate rows inserted.
> so here is the picture:
> an application is deleting rows and inserting right after that
> new ones some of which are the same as the old one,and i am getting

Maybe you're inserting not only one time that dupplicate pair. In your table
definition nothing stopes this.

> duplicates !Don't transaction should prevent this by not letting
> insert ot do something in case the delete did not succeed?

Only if you check your query's result and you say "rollback" to the
transaction in case of an error.

> Another option is that i have to clients deleting and then inserting the
> same thing into table ,but again should not transaction prevent duplicates
> in that case ?

No, perhaps you're missunderstanding transactions. Transactions don't
prevent anything, only if you say so.

For ex:
Begin
Delete query -- if here is error you should run a Rollback and don't run any
insert.
If the delete query succedes then you should run the inserts... etc etc.
At the end of the transaction you should  Commit ONLY if everything worked
as you wanted!!!!

The uniqe indexes helps you to prevent double inserts into the database.
For example:
CREATE UNIQUE INDEX table_column_uniqueidx ON table(studentid, groupid);
will assure you that you have only one ROW with this pair of data. By assure
I mean, that if you have the pair (1111, 22) in the table, and you want to
insert another (1111, 22) pair into the table, the insert query will fail.

In the transactions you will have to check the result of every
delete/insert/update query so that you know there was a failuire.

You can also look in the database log file, and see exaclty what happened
there.


Hope this helps.

>
> > Use something like this:
> >
> >  CREATE TABLE schedule (
> >     id serial PRIMARY KEY,
> >     studentid    decimal(9),
> >     groupid      decimal(10),
> >     maslulsignid decimal(7),
> >     tfusot        varchar(29)
> >  );
> >
>
>
> --
> Evgeny.


Re: duplicates

From
"Michael Paesold"
Date:
Tsirkin Evgeny wrote:

> On Mon, 6 Sep 2004, Michael Paesold wrote:
> Does not the Serializable Isolation Level do insure that?
> what i thought is that while using this level then i am getting
> the BEGIN and COMMIT to behave the same as the code you wrote!
> since the second concarent transaction (the one that started a little
> later) should wait for the first one and only then DELETE .
> Is not that true?

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

DELETE FROM schedule WHERE studentid = ... ;
INSERT INTO schedule (studentid, ...) VALUES (... );
INSERT INTO schedule (studentid, ...) VALUES (... );

COMMIT;

If you do it like in the above sql code, there is still a problem. The
serializable checking here only works, if DELETE FROM schedule... really
finds at least one row. If it does not, it will not recognize the
serialization problem. So it's still possible that your programm creates
duplicates.

I have tested this here. I don't really know if this is just the case with
PostgreSQL serializable transactions (MVCC limitation) or a general problem.


> PS Does not the perl::DBI use the serializable level?

I don't really know about DBI, you should really check that.
* check that autocommit is off
* check that transaction isolation level is serializable


Still serializable transaction level is not enough. An aproach that allows
more concurrency is possible, but you have to make sure, that all relevant
code does the locking in the same way:
(assuming you have a student table with studentid as primary/unique key)

BEGIN;

SELECT * FROM student WHERE studentid = ... FOR UPDATE;
   -- lock the student record

DELETE FROM schedule WHERE studentid = ... ;
INSERT INTO schedule (studentid, ...) VALUES (... );
INSERT INTO schedule (studentid, ...) VALUES (... );

COMMIT;

This will not lock the whole schedule table, but only one row of the student
table.

I hope that helps.

Best Regards,
Michael Paesold


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend



Re: duplicates

From
"Michael Paesold"
Date:
I wrote:

> BEGIN;
> SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
>
> DELETE FROM schedule WHERE studentid = ... ;
> INSERT INTO schedule (studentid, ...) VALUES (... );
> INSERT INTO schedule (studentid, ...) VALUES (... );
>
> COMMIT;
>
> If you do it like in the above sql code, there is still a problem. The
> serializable checking here only works, if DELETE FROM schedule... really
> finds at least one row. If it does not, it will not recognize the
> serialization problem. So it's still possible that your programm creates
> duplicates.
>
> I have tested this here. I don't really know if this is just the case with
> PostgreSQL serializable transactions (MVCC limitation) or a general
problem.

There is a detailed description about this problem in Section 12.2.2.1. of
the PostgreSQL 8 docs here:
http://developer.postgresql.org/docs/postgres/transaction-iso.html

(Serializable Isolation versus True Serializability)

Best Regards,
Michael Paesold


Re: duplicates

From
Tsirkin Evgeny
Date:
You are greate Michael!
Thanks.

On Mon, 6 Sep 2004, Michael Paesold wrote:

> I wrote:
>
> > BEGIN;
> > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> >
> > DELETE FROM schedule WHERE studentid = ... ;
> > INSERT INTO schedule (studentid, ...) VALUES (... );
> > INSERT INTO schedule (studentid, ...) VALUES (... );
> >
> > COMMIT;
> >
> > If you do it like in the above sql code, there is still a problem. The
> > serializable checking here only works, if DELETE FROM schedule... really
> > finds at least one row. If it does not, it will not recognize the
> > serialization problem. So it's still possible that your programm creates
> > duplicates.
> >
> > I have tested this here. I don't really know if this is just the case with
> > PostgreSQL serializable transactions (MVCC limitation) or a general
> problem.
>
> There is a detailed description about this problem in Section 12.2.2.1. of
> the PostgreSQL 8 docs here:
> http://developer.postgresql.org/docs/postgres/transaction-iso.html
>
> (Serializable Isolation versus True Serializability)
>
> Best Regards,
> Michael Paesold
>

--
Evgeny.

Re: duplicates

From
Tsirkin Evgeny
Date:
one moer question ,how did you tested it?

> I have tested this here. I don't really know if this is just the case with
> Best Regards,
> Michael Paesold
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

--
Evgeny.

Re: duplicates

From
"Michael Paesold"
Date:
Tsirkin Evgeny wrote:


> one moer question ,how did you tested it?
>
> > I have tested this here. I don't really know if this is just the case
with
> > Best Regards,
> > Michael Paesold

First I created the your schedule table. Then I opened two psql sessions...

Session A                Session B

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
DELETE FROM schedule WHERE studentid = 1;
INSERT INTO schedule VALUES (1, 1, 0, 0);

                         BEGIN;
                         SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
                         DELETE FROM schedule WHERE studentid = 1;
                         -- if there existed records for this delete,
                         -- session B will now wait for session A to
                         -- commit/rollback and error out on commit;
                         -- otherwise no error


COMMIT;
                         INSERT INTO schedule VALUES (1, 1, 0, 0);
                         COMMIT;


You can also try and rollback the first transaction etc.

Again, what really helps is:
ALTER TABLE schedule ADD CONSTRAINT uniq_schedule UNIQUE (studentid,
groupid);

Best Regards,
Michael Paesold



Re: duplicates

From
Jean-Luc Lachance
Date:
If the reccord you are inserting do not depend on data in the record you
want to delete, why not simply use a trigger?

Before insert delete the record with the same key!



Michael Paesold wrote:

> Tsirkin Evgeny wrote:
>
>
>
>>one moer question ,how did you tested it?
>>
>>
>>>I have tested this here. I don't really know if this is just the case
>
> with
>
>>>Best Regards,
>>>Michael Paesold
>
>
> First I created the your schedule table. Then I opened two psql sessions...
>
> Session A                Session B
>
> BEGIN;
> SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> DELETE FROM schedule WHERE studentid = 1;
> INSERT INTO schedule VALUES (1, 1, 0, 0);
>
>                          BEGIN;
>                          SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
>                          DELETE FROM schedule WHERE studentid = 1;
>                          -- if there existed records for this delete,
>                          -- session B will now wait for session A to
>                          -- commit/rollback and error out on commit;
>                          -- otherwise no error
>
>
> COMMIT;
>                          INSERT INTO schedule VALUES (1, 1, 0, 0);
>                          COMMIT;
>
>
> You can also try and rollback the first transaction etc.
>
> Again, what really helps is:
> ALTER TABLE schedule ADD CONSTRAINT uniq_schedule UNIQUE (studentid,
> groupid);
>
> Best Regards,
> Michael Paesold
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>

Re: duplicates

From
Kostis Mentzelos
Date:
Tsirkin Evgeny wrote:
> yes i understand that i can create a primary key/unique etc...
> however my question is if i have to understand if and why
> the i got duplicate rows inserted.
> so here is the picture:
> an application is deleting rows and inserting right after that
> new ones some of which are the same as the old one,and i am getting
> duplicates !Don't transaction should prevent this by not letting
> insert ot do something in case the delete did not succeed?
> Another option is that i have to clients deleting and then inserting the
> same thing into table ,but again should not transaction prevent duplicates
> in that case ?
>
>
>>Use something like this:
>>
>> CREATE TABLE schedule (
>>    id serial PRIMARY KEY,
>>    studentid    decimal(9),
>>    groupid      decimal(10),
>>    maslulsignid decimal(7),
>>    tfusot        varchar(29)
>> );
>>
>
>
>
No, you should lock the table.

Re: duplicates

From
Kostis Mentzelos
Date:
> No, you should lock the table.

I mean that you can select or insert into that table even if a
transaction is running.

Re: duplicates

From
"Michael Paesold"
Date:
Tsirkin Evgeny wrote:


> one moer question ,how did you tested it?
>
> > I have tested this here. I don't really know if this is just the case
with
> > Best Regards,
> > Michael Paesold

First I created the your schedule table. Then I opened two psql sessions...

Session A                Session B

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
DELETE FROM schedule WHERE studentid = 1;
INSERT INTO schedule VALUES (1, 1, 0, 0);

                         BEGIN;
                         SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
                         DELETE FROM schedule WHERE studentid = 1;
                         -- if there existed records for this delete,
                         -- session B will now wait for session A to
                         -- commit/rollback and error out on commit;
                         -- otherwise no error


COMMIT;
                         INSERT INTO schedule VALUES (1, 1, 0, 0);
                         COMMIT;


You can also try and rollback the first transaction etc.

Again, what really helps is:
ALTER TABLE schedule ADD CONSTRAINT uniq_schedule UNIQUE (studentid,
groupid);

Best Regards,
Michael Paesold



Re: duplicates

From
"Michael Paesold"
Date:
Tsirkin Evgeny wrote:

> On Mon, 6 Sep 2004, Michael Paesold wrote:
> Does not the Serializable Isolation Level do insure that?
> what i thought is that while using this level then i am getting
> the BEGIN and COMMIT to behave the same as the code you wrote!
> since the second concarent transaction (the one that started a little
> later) should wait for the first one and only then DELETE .
> Is not that true?

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

DELETE FROM schedule WHERE studentid = ... ;
INSERT INTO schedule (studentid, ...) VALUES (... );
INSERT INTO schedule (studentid, ...) VALUES (... );

COMMIT;

If you do it like in the above sql code, there is still a problem. The
serializable checking here only works, if DELETE FROM schedule... really
finds at least one row. If it does not, it will not recognize the
serialization problem. So it's still possible that your programm creates
duplicates.

I have tested this here. I don't really know if this is just the case with
PostgreSQL serializable transactions (MVCC limitation) or a general problem.


> PS Does not the perl::DBI use the serializable level?

I don't really know about DBI, you should really check that.
* check that autocommit is off
* check that transaction isolation level is serializable


Still serializable transaction level is not enough. An aproach that allows
more concurrency is possible, but you have to make sure, that all relevant
code does the locking in the same way:
(assuming you have a student table with studentid as primary/unique key)

BEGIN;

SELECT * FROM student WHERE studentid = ... FOR UPDATE;
   -- lock the student record

DELETE FROM schedule WHERE studentid = ... ;
INSERT INTO schedule (studentid, ...) VALUES (... );
INSERT INTO schedule (studentid, ...) VALUES (... );

COMMIT;

This will not lock the whole schedule table, but only one row of the student
table.

I hope that helps.

Best Regards,
Michael Paesold


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend



Unable to connect to a specific database

From
"Sally Sally"
Date:
I had to kill a vacuum in the middle with -9. I shut down and restarted the
postgres server several times after that but I am unable to connect to the
db that I was initially running vacuum on
I'm doing "psql dbname" and it hangs for a while. I'm still waiting. Any
ideas?
Thanks



Re: Unable to connect to a specific database

From
Christopher Browne
Date:
> I had to kill a vacuum in the middle with -9. I shut down and
> restarted the postgres server several times after that but I am unable
> to connect to the db that I was initially running vacuum on
> I'm doing "psql dbname" and it hangs for a while. I'm still
> waiting. Any ideas?

"Kill -9" is distinctly not recommended.

It is possible that you have corrupted the database by issuing "kill
-9."

You might want to look into what backups you have...
--
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','gmail.com').
http://linuxdatabases.info/info/nonrdbms.html
"Very funny, Scotty. Now beam down my clothes."

Re: Unable to connect to a specific database

From
Andrew Sullivan
Date:
On Fri, Jan 13, 2006 at 10:20:07PM +0000, Sally Sally wrote:
> I had to kill a vacuum in the middle with -9. I shut down and restarted the
> postgres server several times after that but I am unable to connect to the
> db that I was initially running vacuum on
> I'm doing "psql dbname" and it hangs for a while. I'm still waiting. Any
> ideas?

What do your database logs say?  I expect that you're in recovery
mode, but it's impossible to tell from this note.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
The fact that technology doesn't work is no bar to success in the marketplace.
        --Philip Greenspun

Re: Unable to connect to a specific database

From
Andrew Sullivan
Date:
On Fri, Jan 13, 2006 at 09:19:30PM -0500, Christopher Browne wrote:
> It is possible that you have corrupted the database by issuing "kill
> -9."

If that's true, then WAL doesn't work.  kill -9 shouldn't be any more
harmful than a machine crash.  That is to say, it's very bad news,
but it shouldn't actually corrupt the database.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what
you told them to.  That actually seems sort of quaint now.
        --J.D. Baldwin

Re: Unable to connect to a specific database

From
"Sally Sally"
Date:
Well after kill -9, it was in recovery mode but I restarted the server
again. It took a while (~ 45min) but it connected again. I have noticed that
after a vacuum it takes a while to connect for the first time. You mentioned
kill -9 is very bad news. Everything seems to be working fine now, should I
be worried? What are the consequences?
Thanks


>From: Andrew Sullivan <ajs@crankycanuck.ca>
>To: pgsql-admin@postgresql.org
>Subject: Re: [ADMIN] Unable to connect to a specific database
>Date: Mon, 16 Jan 2006 12:38:09 -0500
>
>On Fri, Jan 13, 2006 at 09:19:30PM -0500, Christopher Browne wrote:
> > It is possible that you have corrupted the database by issuing "kill
> > -9."
>
>If that's true, then WAL doesn't work.  kill -9 shouldn't be any more
>harmful than a machine crash.  That is to say, it's very bad news,
>but it shouldn't actually corrupt the database.
>
>A
>
>--
>Andrew Sullivan  | ajs@crankycanuck.ca
>I remember when computers were frustrating because they *did* exactly what
>you told them to.  That actually seems sort of quaint now.
>        --J.D. Baldwin
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: explain analyze is your friend



Re: Unable to connect to a specific database

From
Tom Lane
Date:
"Sally Sally" <dedeb17@hotmail.com> writes:
> Well after kill -9, it was in recovery mode but I restarted the server
> again. It took a while (~ 45min) but it connected again. I have noticed that
> after a vacuum it takes a while to connect for the first time. You mentioned
> kill -9 is very bad news. Everything seems to be working fine now, should I
> be worried? What are the consequences?

The problem with kill -9 is mainly if you use it to kill the postmaster;
then the postmaster is unable to release its shared memory segment
before quitting.  Depending on how your kernel handles shared memory,
the segment might stick around indefinitely (unless manually cleaned up
or you reboot the whole machine).  This will eventually run the system
out of memory and/or prevent launching a new postmaster.

kill -9 on an individual backend isn't so dangerous, but it's still
using a steam hammer to swat a fly.  Try less invasive methods first,
like kill -INT (ie, query cancel).

BTW, the unreasonably long restart time suggests that you've set
unreasonably long inter-checkpoint intervals.  Might want to think
about backing off the checkpoint parameters a bit.

            regards, tom lane