Thread: duplicates
Hello dear list! Here is the problem i have: i am using 7.3.4 postgres .i have an aplication that updating 2 tables. while it needs to update something - it does not select the rows that are already in the table search what it needs to update and execute an 'update' query .in place it deletes all the old rows and inserts the new one.However while we have a havy load we got a duplicate rows in the table ,althought we use transaction an both delete and the new insert are in the same transaction.We are pretty sure there is no bug in the applicatioin that inserts the data more then once . Is that a known problem ?What could be the problem? -- Evgeny.
I am not sure that I understand clearly your problem. Are you sure that your query's are written correctly? For duplicates you can make a uniqe indexing so this will avoid any dupplicates in your table. CREATE UNIQUE INDEX table_column_uniqueidx ON table(column); If the rows are dissapearing, please check your delete query, because that is the only way to erase all rows from the table. If still got problems, please post some queries, be more specific. Best regards, Andy. ----- Original Message ----- From: "Tsirkin Evgeny" <tsurkin@mail.jct.ac.il> To: <pgsql-admin@postgresql.org> Sent: Monday, September 06, 2004 9:33 AM Subject: [ADMIN] duplicates > Hello dear list! > Here is the problem i have: > i am using 7.3.4 postgres .i have an aplication that updating 2 tables. > while it needs to update something - it does not select the rows that > are already in the table search what it needs to update and execute > an 'update' query .in place it deletes all the old rows and inserts the > new one.However while we have a havy load we got a duplicate rows in the > table ,althought we use transaction an both delete and the new insert > are in the same transaction.We are pretty sure there is no bug in > the applicatioin that inserts the data more then once . > Is that a known problem ?What could be the problem? > > -- > Evgeny. > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
On Mon, 6 Sep 2004, Andrei Bintintan wrote: > If still got problems, please post some queries, be more specific. > > Best regards, > Andy. > Ok i will try: CREATE TABLE schedule ( studentid decimal(9), groupid decimal(10), maslulsignid decimal(7), tfusot varchar(29) ); that is the table for writing down the courses/groups a students takes.note there is NO unique constrain here (maybe it should be but that the way it is and probably can't can't be changed).while changing groups for a student the applic. deletes all the groups the student had : delete from schedule where studentid=11111; and then inserts the new groups (regular inserts). Now sometimes we got duplicates of the same groupid and studentid in the table(everything is same).i thought that \ maybe the delete do not delete but the insert succeed? Note :I know that there could be created the a unique key and a constrain but befor we have to understand why the duplicates were created. -- Evgeny.
----- Original Message ----- From: "Tsirkin Evgeny" <tsurkin@mail.jct.ac.il> To: "Andrei Bintintan" <klodoma@ar-sd.net> Cc: <pgsql-admin@postgresql.org> Sent: Monday, September 06, 2004 10:57 AM Subject: Re: [ADMIN] duplicates > On Mon, 6 Sep 2004, Andrei Bintintan wrote: > > > If still got problems, please post some queries, be more specific. > > > > Best regards, > > Andy. > > > Ok i will try: > CREATE TABLE schedule ( > studentid decimal(9), > groupid decimal(10), > maslulsignid decimal(7), > tfusot varchar(29) > ); Use something like this: CREATE TABLE schedule ( id serial PRIMARY KEY, studentid decimal(9), groupid decimal(10), maslulsignid decimal(7), tfusot varchar(29) ); Now, ALWAYS use a ID for a table. This id will be always uniqe(because it's primary key). You don't have to insert this field when you're making an insert, the server does it automatically. And when you delete you refere this key and not the student ID. For ex: you have student id = 1111 in 3 groups id studentid groupid maslulsignid 1 1111 22 some val 2 1111 33 some val 3 1111 44 some val If you delete: "Delete from table where studentid=1111" it deletes all the fields from table. If you want to delete only field 2 then you delete: delete from table where id=2. Normally in this kind of tables you should never have dupplicates I mean, in tables you should not have dupplicates(all fields the samein your case (studentid groupid maslulsignid tfusot) the same val - this means structure conception error). Hope this helps. Best regards. > > that is the table for writing down the courses/groups > a students takes.note there is NO unique constrain here > (maybe it should be but that the way it is and probably > can't can't be changed).while changing groups for a student > the applic. deletes all the groups the student had : > delete from schedule where studentid=11111; > and then inserts the new groups (regular inserts). > Now sometimes we got duplicates of the same groupid and > studentid in the table(everything is same).i thought that \ > maybe the delete do not delete but the insert succeed? > > Note :I know that there could be created the a unique key and > a constrain but befor we have to understand why the duplicates > were created. > > -- > Evgeny.
yes i understand that i can create a primary key/unique etc... however my question is if i have to understand if and why the i got duplicate rows inserted. so here is the picture: an application is deleting rows and inserting right after that new ones some of which are the same as the old one,and i am getting duplicates !Don't transaction should prevent this by not letting insert ot do something in case the delete did not succeed? Another option is that i have to clients deleting and then inserting the same thing into table ,but again should not transaction prevent duplicates in that case ? > Use something like this: > > CREATE TABLE schedule ( > id serial PRIMARY KEY, > studentid decimal(9), > groupid decimal(10), > maslulsignid decimal(7), > tfusot varchar(29) > ); > -- Evgeny.
----- Original Message ----- From: "Tsirkin Evgeny" <tsurkin@mail.jct.ac.il> To: "Andrei Bintintan" <klodoma@ar-sd.net> Cc: <pgsql-admin@postgresql.org> Sent: Monday, September 06, 2004 12:19 PM Subject: Re: [ADMIN] duplicates > yes i understand that i can create a primary key/unique etc... > however my question is if i have to understand if and why > the i got duplicate rows inserted. > so here is the picture: > an application is deleting rows and inserting right after that > new ones some of which are the same as the old one,and i am getting Maybe you're inserting not only one time that dupplicate pair. In your table definition nothing stopes this. > duplicates !Don't transaction should prevent this by not letting > insert ot do something in case the delete did not succeed? Only if you check your query's result and you say "rollback" to the transaction in case of an error. > Another option is that i have to clients deleting and then inserting the > same thing into table ,but again should not transaction prevent duplicates > in that case ? No, perhaps you're missunderstanding transactions. Transactions don't prevent anything, only if you say so. For ex: Begin Delete query -- if here is error you should run a Rollback and don't run any insert. If the delete query succedes then you should run the inserts... etc etc. At the end of the transaction you should Commit ONLY if everything worked as you wanted!!!! The uniqe indexes helps you to prevent double inserts into the database. For example: CREATE UNIQUE INDEX table_column_uniqueidx ON table(studentid, groupid); will assure you that you have only one ROW with this pair of data. By assure I mean, that if you have the pair (1111, 22) in the table, and you want to insert another (1111, 22) pair into the table, the insert query will fail. In the transactions you will have to check the result of every delete/insert/update query so that you know there was a failuire. You can also look in the database log file, and see exaclty what happened there. Hope this helps. > > > Use something like this: > > > > CREATE TABLE schedule ( > > id serial PRIMARY KEY, > > studentid decimal(9), > > groupid decimal(10), > > maslulsignid decimal(7), > > tfusot varchar(29) > > ); > > > > > -- > Evgeny.
Tsirkin Evgeny wrote: > On Mon, 6 Sep 2004, Michael Paesold wrote: > Does not the Serializable Isolation Level do insure that? > what i thought is that while using this level then i am getting > the BEGIN and COMMIT to behave the same as the code you wrote! > since the second concarent transaction (the one that started a little > later) should wait for the first one and only then DELETE . > Is not that true? BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; DELETE FROM schedule WHERE studentid = ... ; INSERT INTO schedule (studentid, ...) VALUES (... ); INSERT INTO schedule (studentid, ...) VALUES (... ); COMMIT; If you do it like in the above sql code, there is still a problem. The serializable checking here only works, if DELETE FROM schedule... really finds at least one row. If it does not, it will not recognize the serialization problem. So it's still possible that your programm creates duplicates. I have tested this here. I don't really know if this is just the case with PostgreSQL serializable transactions (MVCC limitation) or a general problem. > PS Does not the perl::DBI use the serializable level? I don't really know about DBI, you should really check that. * check that autocommit is off * check that transaction isolation level is serializable Still serializable transaction level is not enough. An aproach that allows more concurrency is possible, but you have to make sure, that all relevant code does the locking in the same way: (assuming you have a student table with studentid as primary/unique key) BEGIN; SELECT * FROM student WHERE studentid = ... FOR UPDATE; -- lock the student record DELETE FROM schedule WHERE studentid = ... ; INSERT INTO schedule (studentid, ...) VALUES (... ); INSERT INTO schedule (studentid, ...) VALUES (... ); COMMIT; This will not lock the whole schedule table, but only one row of the student table. I hope that helps. Best Regards, Michael Paesold ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
I wrote: > BEGIN; > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > > DELETE FROM schedule WHERE studentid = ... ; > INSERT INTO schedule (studentid, ...) VALUES (... ); > INSERT INTO schedule (studentid, ...) VALUES (... ); > > COMMIT; > > If you do it like in the above sql code, there is still a problem. The > serializable checking here only works, if DELETE FROM schedule... really > finds at least one row. If it does not, it will not recognize the > serialization problem. So it's still possible that your programm creates > duplicates. > > I have tested this here. I don't really know if this is just the case with > PostgreSQL serializable transactions (MVCC limitation) or a general problem. There is a detailed description about this problem in Section 12.2.2.1. of the PostgreSQL 8 docs here: http://developer.postgresql.org/docs/postgres/transaction-iso.html (Serializable Isolation versus True Serializability) Best Regards, Michael Paesold
You are greate Michael! Thanks. On Mon, 6 Sep 2004, Michael Paesold wrote: > I wrote: > > > BEGIN; > > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > > > > DELETE FROM schedule WHERE studentid = ... ; > > INSERT INTO schedule (studentid, ...) VALUES (... ); > > INSERT INTO schedule (studentid, ...) VALUES (... ); > > > > COMMIT; > > > > If you do it like in the above sql code, there is still a problem. The > > serializable checking here only works, if DELETE FROM schedule... really > > finds at least one row. If it does not, it will not recognize the > > serialization problem. So it's still possible that your programm creates > > duplicates. > > > > I have tested this here. I don't really know if this is just the case with > > PostgreSQL serializable transactions (MVCC limitation) or a general > problem. > > There is a detailed description about this problem in Section 12.2.2.1. of > the PostgreSQL 8 docs here: > http://developer.postgresql.org/docs/postgres/transaction-iso.html > > (Serializable Isolation versus True Serializability) > > Best Regards, > Michael Paesold > -- Evgeny.
one moer question ,how did you tested it? > I have tested this here. I don't really know if this is just the case with > Best Regards, > Michael Paesold > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend > -- Evgeny.
Tsirkin Evgeny wrote: > one moer question ,how did you tested it? > > > I have tested this here. I don't really know if this is just the case with > > Best Regards, > > Michael Paesold First I created the your schedule table. Then I opened two psql sessions... Session A Session B BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; DELETE FROM schedule WHERE studentid = 1; INSERT INTO schedule VALUES (1, 1, 0, 0); BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; DELETE FROM schedule WHERE studentid = 1; -- if there existed records for this delete, -- session B will now wait for session A to -- commit/rollback and error out on commit; -- otherwise no error COMMIT; INSERT INTO schedule VALUES (1, 1, 0, 0); COMMIT; You can also try and rollback the first transaction etc. Again, what really helps is: ALTER TABLE schedule ADD CONSTRAINT uniq_schedule UNIQUE (studentid, groupid); Best Regards, Michael Paesold
If the reccord you are inserting do not depend on data in the record you want to delete, why not simply use a trigger? Before insert delete the record with the same key! Michael Paesold wrote: > Tsirkin Evgeny wrote: > > > >>one moer question ,how did you tested it? >> >> >>>I have tested this here. I don't really know if this is just the case > > with > >>>Best Regards, >>>Michael Paesold > > > First I created the your schedule table. Then I opened two psql sessions... > > Session A Session B > > BEGIN; > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > DELETE FROM schedule WHERE studentid = 1; > INSERT INTO schedule VALUES (1, 1, 0, 0); > > BEGIN; > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > DELETE FROM schedule WHERE studentid = 1; > -- if there existed records for this delete, > -- session B will now wait for session A to > -- commit/rollback and error out on commit; > -- otherwise no error > > > COMMIT; > INSERT INTO schedule VALUES (1, 1, 0, 0); > COMMIT; > > > You can also try and rollback the first transaction etc. > > Again, what really helps is: > ALTER TABLE schedule ADD CONSTRAINT uniq_schedule UNIQUE (studentid, > groupid); > > Best Regards, > Michael Paesold > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
Tsirkin Evgeny wrote: > yes i understand that i can create a primary key/unique etc... > however my question is if i have to understand if and why > the i got duplicate rows inserted. > so here is the picture: > an application is deleting rows and inserting right after that > new ones some of which are the same as the old one,and i am getting > duplicates !Don't transaction should prevent this by not letting > insert ot do something in case the delete did not succeed? > Another option is that i have to clients deleting and then inserting the > same thing into table ,but again should not transaction prevent duplicates > in that case ? > > >>Use something like this: >> >> CREATE TABLE schedule ( >> id serial PRIMARY KEY, >> studentid decimal(9), >> groupid decimal(10), >> maslulsignid decimal(7), >> tfusot varchar(29) >> ); >> > > > No, you should lock the table.
> No, you should lock the table. I mean that you can select or insert into that table even if a transaction is running.
Tsirkin Evgeny wrote: > one moer question ,how did you tested it? > > > I have tested this here. I don't really know if this is just the case with > > Best Regards, > > Michael Paesold First I created the your schedule table. Then I opened two psql sessions... Session A Session B BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; DELETE FROM schedule WHERE studentid = 1; INSERT INTO schedule VALUES (1, 1, 0, 0); BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; DELETE FROM schedule WHERE studentid = 1; -- if there existed records for this delete, -- session B will now wait for session A to -- commit/rollback and error out on commit; -- otherwise no error COMMIT; INSERT INTO schedule VALUES (1, 1, 0, 0); COMMIT; You can also try and rollback the first transaction etc. Again, what really helps is: ALTER TABLE schedule ADD CONSTRAINT uniq_schedule UNIQUE (studentid, groupid); Best Regards, Michael Paesold
Tsirkin Evgeny wrote: > On Mon, 6 Sep 2004, Michael Paesold wrote: > Does not the Serializable Isolation Level do insure that? > what i thought is that while using this level then i am getting > the BEGIN and COMMIT to behave the same as the code you wrote! > since the second concarent transaction (the one that started a little > later) should wait for the first one and only then DELETE . > Is not that true? BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; DELETE FROM schedule WHERE studentid = ... ; INSERT INTO schedule (studentid, ...) VALUES (... ); INSERT INTO schedule (studentid, ...) VALUES (... ); COMMIT; If you do it like in the above sql code, there is still a problem. The serializable checking here only works, if DELETE FROM schedule... really finds at least one row. If it does not, it will not recognize the serialization problem. So it's still possible that your programm creates duplicates. I have tested this here. I don't really know if this is just the case with PostgreSQL serializable transactions (MVCC limitation) or a general problem. > PS Does not the perl::DBI use the serializable level? I don't really know about DBI, you should really check that. * check that autocommit is off * check that transaction isolation level is serializable Still serializable transaction level is not enough. An aproach that allows more concurrency is possible, but you have to make sure, that all relevant code does the locking in the same way: (assuming you have a student table with studentid as primary/unique key) BEGIN; SELECT * FROM student WHERE studentid = ... FOR UPDATE; -- lock the student record DELETE FROM schedule WHERE studentid = ... ; INSERT INTO schedule (studentid, ...) VALUES (... ); INSERT INTO schedule (studentid, ...) VALUES (... ); COMMIT; This will not lock the whole schedule table, but only one row of the student table. I hope that helps. Best Regards, Michael Paesold ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
I had to kill a vacuum in the middle with -9. I shut down and restarted the postgres server several times after that but I am unable to connect to the db that I was initially running vacuum on I'm doing "psql dbname" and it hangs for a while. I'm still waiting. Any ideas? Thanks
> I had to kill a vacuum in the middle with -9. I shut down and > restarted the postgres server several times after that but I am unable > to connect to the db that I was initially running vacuum on > I'm doing "psql dbname" and it hangs for a while. I'm still > waiting. Any ideas? "Kill -9" is distinctly not recommended. It is possible that you have corrupted the database by issuing "kill -9." You might want to look into what backups you have... -- wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','gmail.com'). http://linuxdatabases.info/info/nonrdbms.html "Very funny, Scotty. Now beam down my clothes."
On Fri, Jan 13, 2006 at 10:20:07PM +0000, Sally Sally wrote: > I had to kill a vacuum in the middle with -9. I shut down and restarted the > postgres server several times after that but I am unable to connect to the > db that I was initially running vacuum on > I'm doing "psql dbname" and it hangs for a while. I'm still waiting. Any > ideas? What do your database logs say? I expect that you're in recovery mode, but it's impossible to tell from this note. A -- Andrew Sullivan | ajs@crankycanuck.ca The fact that technology doesn't work is no bar to success in the marketplace. --Philip Greenspun
On Fri, Jan 13, 2006 at 09:19:30PM -0500, Christopher Browne wrote: > It is possible that you have corrupted the database by issuing "kill > -9." If that's true, then WAL doesn't work. kill -9 shouldn't be any more harmful than a machine crash. That is to say, it's very bad news, but it shouldn't actually corrupt the database. A -- Andrew Sullivan | ajs@crankycanuck.ca I remember when computers were frustrating because they *did* exactly what you told them to. That actually seems sort of quaint now. --J.D. Baldwin
Well after kill -9, it was in recovery mode but I restarted the server again. It took a while (~ 45min) but it connected again. I have noticed that after a vacuum it takes a while to connect for the first time. You mentioned kill -9 is very bad news. Everything seems to be working fine now, should I be worried? What are the consequences? Thanks >From: Andrew Sullivan <ajs@crankycanuck.ca> >To: pgsql-admin@postgresql.org >Subject: Re: [ADMIN] Unable to connect to a specific database >Date: Mon, 16 Jan 2006 12:38:09 -0500 > >On Fri, Jan 13, 2006 at 09:19:30PM -0500, Christopher Browne wrote: > > It is possible that you have corrupted the database by issuing "kill > > -9." > >If that's true, then WAL doesn't work. kill -9 shouldn't be any more >harmful than a machine crash. That is to say, it's very bad news, >but it shouldn't actually corrupt the database. > >A > >-- >Andrew Sullivan | ajs@crankycanuck.ca >I remember when computers were frustrating because they *did* exactly what >you told them to. That actually seems sort of quaint now. > --J.D. Baldwin > >---------------------------(end of broadcast)--------------------------- >TIP 6: explain analyze is your friend
"Sally Sally" <dedeb17@hotmail.com> writes: > Well after kill -9, it was in recovery mode but I restarted the server > again. It took a while (~ 45min) but it connected again. I have noticed that > after a vacuum it takes a while to connect for the first time. You mentioned > kill -9 is very bad news. Everything seems to be working fine now, should I > be worried? What are the consequences? The problem with kill -9 is mainly if you use it to kill the postmaster; then the postmaster is unable to release its shared memory segment before quitting. Depending on how your kernel handles shared memory, the segment might stick around indefinitely (unless manually cleaned up or you reboot the whole machine). This will eventually run the system out of memory and/or prevent launching a new postmaster. kill -9 on an individual backend isn't so dangerous, but it's still using a steam hammer to swat a fly. Try less invasive methods first, like kill -INT (ie, query cancel). BTW, the unreasonably long restart time suggests that you've set unreasonably long inter-checkpoint intervals. Might want to think about backing off the checkpoint parameters a bit. regards, tom lane