Thread: implicit abort harmful?
Hi, I have been using postgresql for about 3 months now (after about 15 years of using various other dbms). I love it :) but - (and you just knew that but was coming :) there is one thing that really causes me grief. . It is the tendency for postgres to rollback automatically on some errors. What this leads to is the need for an application to commit much more frequently than is really desirable. Example1. During import of 120 thousand records from an isam file system, 3 say records fail integrity checks ( files in non-database systems tend not to have referential integrity implemented on them except at an application level (meaning not at all :). The desired result is to drop the records failing integrity checks. Importing into db2 or oracle say - I have the option to ignore the referential integrity errors (i get the error code on the insert anyway), and continue with the insert of the 120 thousand - 3 records. In postgres, I either have to commit after every record write, or guarantee the "cleanness" of the data before I begin the import - which is sometimes difficult given the data sources I may be importing from (Often I neither own nor can modify them). Worse, if the intention behind the automatic rollback is to guarantee data purity (from the postgresql manual 10.4.1 - " Turn off autocommit and just do one commit at the end. (In plain SQL, this means issuing BEGIN at the start and COMMIT at the end. Some client libraries may do this behind your back, in which case you need to make sure the library does it when you want it done.) If you allow each insertion to be committed separately, PostgreSQL is doing a lot of work for each record added. An additional benefit of doing all insertions in one transaction is that if the insertion of one record were to fail then the insertion of all records inserted up to that point would be rolled back, so you won't be stuck with partially loaded data." ), It actually makes more of a mess than leaving the decision to rollback under application control. If in this example the three "bad" records are scattered throughout the dataset, I could end up (if I ignore the error as i would do for most other dbms), with a random number of records in the table. etc etc. Of course also, to do robust imports of the insert ifError update style avaiilable in most other dbms I have worked with, it is also nescessary to issue a commit before each insert/update attempt. Example2 - ODBC driver rollbacks :- Here is a snippet from a postgres log :- 2003-05-24 23:09:14 [1525] LOG: duration: 0.134503 sec 2003-05-24 23:09:14 [1525] LOG: query: select nspname from pg_namespace n, p _class c where c.relnamespace=n.oid and c.oid='select'::regclass 2003-05-24 23:09:14 [1525] ERROR: Relation "select" does not exist 2003-05-24 23:09:14 [1525] LOG: statement: select nspname from pg_namespace , pg_class c where c.relnamespace=n.oid and c.oid='select'::regclass 2003-05-24 23:09:14 [1525] LOG: query: ROLLBACK 2003-05-24 23:09:14 [1525] LOG: duration: 0.000538 sec The rollback here is generated by the odbc driver to clear an error created by the odbc driver incorrectly parsing the select from a subselect statement as a table name. The application is totally unaware an error has occured and a rollback has been issued. This is most likely to lead to data loss, unless, basically, an application using odbc is in autocommit mode or commits after every sql statement issued. This concerns me to the piont where I would really recommend not using the parse statement option in the odbc driver (even though that reduces the odbc drivers ability to mimic a prepare which is also problematic :) unless you are using autocommit. For any application complex enough to be working with autocommit turned off in the odbc driver, the auto rollback is a real problem. Please consider making the automatic abort/rollback feature an option ? Regards, Wayne Armstorng Bacchus Management Systems http://www.bacchus.com.au
On Sun, 25 May 2003, Wayne Armstrong wrote: > Hi, > I have been using postgresql for about 3 months now (after about 15 years of > using various other dbms). > > I love it :) but - (and you just knew that but was coming :) there is one thing > that really causes me grief. > . > It is the tendency for postgres to rollback automatically on some errors. > > ... > > It actually makes more of a mess than leaving the decision to rollback under > application control. > If in this example the three "bad" records are scattered throughout the > dataset, I could end up (if I ignore the error as i would do for most other > dbms), with a random number of records in the table. etc etc. I'm confused on this. You're complaining that Postgresql lets you insert and commit a known number of records or rollback and insert none and yet you seem here to be saying in support of not applying that level of data integrity checks that it is messy to not apply that level of checking and ending up with a random number of inserted tuples. > Of course also, to do robust imports of the insert ifError update style > avaiilable in most other dbms I have worked with, it is also nescessary to > issue a commit before each insert/update attempt. Ok. So you're faced with a client wanting to do an bulk import of data. 3 out of a million records fail the import so you suggest the client determines the business reasons for why those records fail the import and are stopping the entire import or you import the rest and then spend the next six months alternately trying to a) get the client to determine the business reasons for the failure of those records because the lack of them isn't causing a problem and b) explaining to the client exactly why what they are trying to do won't work because of the lack of those business records? Ok, so after a couple of months you do get used to it and it becomes water off a ducks back but it still gets seen as a failing on your part by the client. BTW, I do believe in giving a client what they want in case you're wonder, just that sometimes you have to try and educate them in what they _really_ want. > > Example2 - ODBC driver rollbacks :- > > Here is a snippet from a postgres log :- > 2003-05-24 23:09:14 [1525] LOG: duration: 0.134503 sec > 2003-05-24 23:09:14 [1525] LOG: query: select nspname from pg_namespace n, p > _class c where c.relnamespace=n.oid and c.oid='select'::regclass > 2003-05-24 23:09:14 [1525] ERROR: Relation "select" does not exist > 2003-05-24 23:09:14 [1525] LOG: statement: select nspname from pg_namespace > , pg_class c where c.relnamespace=n.oid and c.oid='select'::regclass > 2003-05-24 23:09:14 [1525] LOG: query: ROLLBACK > 2003-05-24 23:09:14 [1525] LOG: duration: 0.000538 sec > > The rollback here is generated by the odbc driver to clear an error created by > the odbc driver incorrectly parsing the select from a subselect statement as a > table name. > The application is totally unaware an error has occured and a rollback has been > issued. > This is most likely to lead to data loss, unless, basically, an application > using odbc is in autocommit mode or commits after every sql statement issued. Ok. I can see that would be a pain but that is an odbc issue. If the odbc layer is misparsing something and issuing a rollback then that is nothing to do with the server, you'll hit that whether or not a single error server side forces a transaction abort or not. > This concerns me to the piont where I would really recommend not using the > parse statement option in the odbc driver (even though that reduces the odbc > drivers ability to mimic a prepare which is also problematic :) unless you are > using autocommit. > > For any application complex enough to be working with autocommit turned off in > the odbc driver, the auto rollback is a real problem. > > Please consider making the automatic abort/rollback feature an option ? I've got to disagree with this strongly. Like pain, errors are there to tell you something is wrong not to be ignored. Although, may be you're only on about the odbc thing. Even in that case I would suggest that the error is in odbc not flagging the offending query as being 'in error' when it generated the error instead of the backend. ...Actually, rereading the log above I see what you mean. ODBC generates a query which generates an error thus aborting the current transaction and making odbc issue a rollback becuase it noticed the error raised...however, my comment still stands, odbc should show that error to the client application. Indeed the standard behaviour of postgresql is correct in this situation and odbc trying to be clever has broken it. Until the client app. acknowledges an error occured, by causing the issuance of the rollback, then all queries should raise additional errors stopping the partial data commit you are seeing. If odbc was not trying to be clever and hide that fact from the application then any partial data commit would be the applications fault. Of course, I could have completely misunderstood your post considering the hour and everything. -- Nigel J. Andrews
On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote: > During import of 120 thousand records from an isam file system, 3 say records > fail integrity checks ( files in non-database systems tend not to have > referential integrity implemented on them except at an application level > (meaning not at all :). The desired result is to drop the records failing > integrity checks. Yes, I have that problem too. I actually wrote a script that took an input file and automatically reissued queries that succeeded but got rolled-back. I called it quickloader.pl :) Probably the biggest problem is that you can't use COPY to load the data. I've thought about loading into another table and transferring later but I havn't sorted out the details. > Example2 - ODBC driver rollbacks :- > > Here is a snippet from a postgres log :- > 2003-05-24 23:09:14 [1525] LOG: duration: 0.134503 sec > 2003-05-24 23:09:14 [1525] LOG: query: select nspname from pg_namespace n, p > _class c where c.relnamespace=n.oid and c.oid='select'::regclass > 2003-05-24 23:09:14 [1525] ERROR: Relation "select" does not exist > 2003-05-24 23:09:14 [1525] LOG: statement: select nspname from pg_namespace > , pg_class c where c.relnamespace=n.oid and c.oid='select'::regclass > 2003-05-24 23:09:14 [1525] LOG: query: ROLLBACK > 2003-05-24 23:09:14 [1525] LOG: duration: 0.000538 sec > > The rollback here is generated by the odbc driver to clear an error created by > the odbc driver incorrectly parsing the select from a subselect statement as a > table name. Sorry, that's an ODBC problem. For internal stuff it should use queries that can't fail like that. In this case it should use c.relname = 'select'; that won't abort if the table don't exist. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "the West won the world not by the superiority of its ideas or values or > religion but rather by its superiority in applying organized violence. > Westerners often forget this fact, non-Westerners never do." > - Samuel P. Huntington
Attachment
Martijn van Oosterhout wrote: > On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote: > >> During import of 120 thousand records from an isam file system, 3 say records >>fail integrity checks ( files in non-database systems tend not to have >>referential integrity implemented on them except at an application level >>(meaning not at all :). The desired result is to drop the records failing >>integrity checks. > > > Yes, I have that problem too. I actually wrote a script that took an input > file and automatically reissued queries that succeeded but got rolled-back. > I called it quickloader.pl :) > > Probably the biggest problem is that you can't use COPY to load the data. > I've thought about loading into another table and transferring later but I > havn't sorted out the details. The general idea would be to setup a table that has exactly the same structure as the final target table, but with no constraints at all. As long as your data passes all input functions you can even COPY it in. Now you run check queries that show you all tuples in that staging table that would fail constraints on the final table. Fix those and you can do INSERT INTO final SELECT * FROM staging; If step one fails because of data that doesn't pass the input functions of our data types, you have to go through another level of staging with a table that has text fields only and move it by explicit casting after cleaning up those problems. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
** Reply to message from Jan Wieck <JanWieck@Yahoo.com> on Thu, 29 May 2003 20:25:56 -0400 Hi Jan, Yup, staging tables are probably going to be the answer in this case. However, my point is, you shouldn't need workarounds for this. (or for the more general insert on failure update on failure ignore type of logic used in a lot of import scenarios) Taking the decision if or not to rollback a transaction (sql error or no) away from the application, and enforcing it in the dbm, results in kluges, workarounds, and a generally less robust dbm. Regards, Wayne > Martijn van Oosterhout wrote: > > On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote: > > > >> During import of 120 thousand records from an isam file system, 3 say records > >>fail integrity checks ( files in non-database systems tend not to have > >>referential integrity implemented on them except at an application level > >>(meaning not at all :). The desired result is to drop the records failing > >>integrity checks. > > > > > > Yes, I have that problem too. I actually wrote a script that took an input > > file and automatically reissued queries that succeeded but got rolled-back. > > I called it quickloader.pl :) > > > > Probably the biggest problem is that you can't use COPY to load the data. > > I've thought about loading into another table and transferring later but I > > havn't sorted out the details. > > The general idea would be to setup a table that has exactly the same > structure as the final target table, but with no constraints at all. As > long as your data passes all input functions you can even COPY it in. > > Now you run check queries that show you all tuples in that staging table > that would fail constraints on the final table. Fix those and you can do > > INSERT INTO final SELECT * FROM staging; > > If step one fails because of data that doesn't pass the input functions > of our data types, you have to go through another level of staging with > a table that has text fields only and move it by explicit casting after > cleaning up those problems. > > > Jan > > -- > #======================================================================# > # It's easier to get forgiveness for being wrong than for being right. # > # Let's break this rule - forgive me. # > #================================================== JanWieck@Yahoo.com #
Wayne Armstrong wrote: > ** Reply to message from Jan Wieck <JanWieck@Yahoo.com> on Thu, 29 May 2003 > 20:25:56 -0400 > Hi Jan, > Yup, staging tables are probably going to be the answer in this case. > However, my point is, you shouldn't need workarounds for this. (or for the > more general insert on failure update on failure ignore type of logic used in a > lot of import scenarios) > Taking the decision if or not to rollback a transaction (sql error or no) away > from the application, and enforcing it in the dbm, results in kluges, > workarounds, and a generally less robust dbm. You obviously didn't search the mail archives too much, did you? If INSERT ... SELECT ... fails half way through due to a duplicate key error - how to get rid of the so far inserted tuples? This problem is well known, has been often discussed and is yet not solveable because we do not have subtransactions. They are coming, but I don't know if the proposed implementation will cope well with 120 million single INSERTs each running in it's own subtransaction. And I disagree with your last statement. It doesn't result in a less robust dbm, it results in more complex applications that (if written by more primitive coders) will instruct a perfectly robust dbm to do the wrong thing to the data - from a business point of view. Jan > > Regards, > Wayne > >> Martijn van Oosterhout wrote: >> > On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote: >> > >> >> During import of 120 thousand records from an isam file system, 3 say records >> >>fail integrity checks ( files in non-database systems tend not to have >> >>referential integrity implemented on them except at an application level >> >>(meaning not at all :). The desired result is to drop the records failing >> >>integrity checks. >> > >> > >> > Yes, I have that problem too. I actually wrote a script that took an input >> > file and automatically reissued queries that succeeded but got rolled-back. >> > I called it quickloader.pl :) >> > >> > Probably the biggest problem is that you can't use COPY to load the data. >> > I've thought about loading into another table and transferring later but I >> > havn't sorted out the details. >> >> The general idea would be to setup a table that has exactly the same >> structure as the final target table, but with no constraints at all. As >> long as your data passes all input functions you can even COPY it in. >> >> Now you run check queries that show you all tuples in that staging table >> that would fail constraints on the final table. Fix those and you can do >> >> INSERT INTO final SELECT * FROM staging; >> >> If step one fails because of data that doesn't pass the input functions >> of our data types, you have to go through another level of staging with >> a table that has text fields only and move it by explicit casting after >> cleaning up those problems. >> >> >> Jan >> >> -- >> #======================================================================# >> # It's easier to get forgiveness for being wrong than for being right. # >> # Let's break this rule - forgive me. # >> #================================================== JanWieck@Yahoo.com # -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
** Reply to message from Jan Wieck <JanWieck@Yahoo.com> on Fri, 30 May 2003 15:06:01 -0400 Hi, Subtransactions here are likely as onerous as the solution that I bet gets used most of the time in this scenario, that is to commit after every insert. And, it's not a matter of good or bad coding here. There are cases where the response to an sql or insert error should be a rollback. There are as many cases where (given that the application is informed there was a problem), the problem can safely be ignored. Again, the decision should be upto the application not the database manager. It is the dbms perogative to reject the statement, not the transaction. Regards, Wayne > Wayne Armstrong wrote: > > ** Reply to message from Jan Wieck <JanWieck@Yahoo.com> on Thu, 29 May 2003 > > 20:25:56 -0400 > > Hi Jan, > > Yup, staging tables are probably going to be the answer in this case. > > However, my point is, you shouldn't need workarounds for this. (or for the > > more general insert on failure update on failure ignore type of logic used in a > > lot of import scenarios) > > Taking the decision if or not to rollback a transaction (sql error or no) away > > from the application, and enforcing it in the dbm, results in kluges, > > workarounds, and a generally less robust dbm. > > You obviously didn't search the mail archives too much, did you? If > INSERT ... SELECT ... fails half way through due to a duplicate key > error - how to get rid of the so far inserted tuples? > > This problem is well known, has been often discussed and is yet not > solveable because we do not have subtransactions. They are coming, but I > don't know if the proposed implementation will cope well with 120 > million single INSERTs each running in it's own subtransaction. > > And I disagree with your last statement. It doesn't result in a less > robust dbm, it results in more complex applications that (if written by > more primitive coders) will instruct a perfectly robust dbm to do the > wrong thing to the data - from a business point of view. > > > Jan > > > > > Regards, > > Wayne > > > >> Martijn van Oosterhout wrote: > >> > On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote: > >> > > >> >> During import of 120 thousand records from an isam file system, 3 say records > >> >>fail integrity checks ( files in non-database systems tend not to have > >> >>referential integrity implemented on them except at an application level > >> >>(meaning not at all :). The desired result is to drop the records failing > >> >>integrity checks. > >> > > >> > > >> > Yes, I have that problem too. I actually wrote a script that took an input > >> > file and automatically reissued queries that succeeded but got rolled-back. > >> > I called it quickloader.pl :) > >> > > >> > Probably the biggest problem is that you can't use COPY to load the data. > >> > I've thought about loading into another table and transferring later but I > >> > havn't sorted out the details. > >> > >> The general idea would be to setup a table that has exactly the same > >> structure as the final target table, but with no constraints at all. As > >> long as your data passes all input functions you can even COPY it in. > >> > >> Now you run check queries that show you all tuples in that staging table > >> that would fail constraints on the final table. Fix those and you can do > >> > >> INSERT INTO final SELECT * FROM staging; > >> > >> If step one fails because of data that doesn't pass the input functions > >> of our data types, you have to go through another level of staging with > >> a table that has text fields only and move it by explicit casting after > >> cleaning up those problems. > >> > >> > >> Jan > >> > >> -- > >> #======================================================================# > >> # It's easier to get forgiveness for being wrong than for being right. # > >> # Let's break this rule - forgive me. # > >> #================================================== JanWieck@Yahoo.com # > > > > -- > #======================================================================# > # It's easier to get forgiveness for being wrong than for being right. # > # Let's break this rule - forgive me. # > #================================================== JanWieck@Yahoo.com #
Wayne, yes, I totally agree. Will you implement UNDO for that and if so, how do you propose to get rid of the to be removed index tuples without the risk of deadlocking during UNDO? Or is there some other possibility we have overlooked so far? With subtransactions I didn't mean application controlled ones, but automatic ones that allow the system to return to the state at the beginning of the failing command. It's not just your little INSERT problem that needs to be covered. If you want the system to continue after an ERROR without rollback, you need to undo every triggered action done during this statement before the ERROR occured. Rule generated additional queries run before the failing one, triggers, whatnot. So please, do you have a context diff, do you have a detailed implementation proposal, or are you just waving hands telling us what you think the user visible behaviour should be? Jan Wayne Armstrong wrote: > ** Reply to message from Jan Wieck <JanWieck@Yahoo.com> on Fri, 30 May 2003 > 15:06:01 -0400 > Hi, > Subtransactions here are likely as onerous as the solution that I bet gets > used most of the time in this scenario, that is to commit after every insert. > And, it's not a matter of good or bad coding here. There are cases where the > response to an sql or insert error should be a rollback. There are as many > cases where (given that the application is informed there was a problem), the > problem can safely be ignored. Again, the decision should be upto the > application not the database manager. It is the dbms perogative to reject the > statement, not the transaction. > > Regards, > Wayne > >> Wayne Armstrong wrote: >> > ** Reply to message from Jan Wieck <JanWieck@Yahoo.com> on Thu, 29 May 2003 >> > 20:25:56 -0400 >> > Hi Jan, >> > Yup, staging tables are probably going to be the answer in this case. >> > However, my point is, you shouldn't need workarounds for this. (or for the >> > more general insert on failure update on failure ignore type of logic used in a >> > lot of import scenarios) >> > Taking the decision if or not to rollback a transaction (sql error or no) away >> > from the application, and enforcing it in the dbm, results in kluges, >> > workarounds, and a generally less robust dbm. >> >> You obviously didn't search the mail archives too much, did you? If >> INSERT ... SELECT ... fails half way through due to a duplicate key >> error - how to get rid of the so far inserted tuples? >> >> This problem is well known, has been often discussed and is yet not >> solveable because we do not have subtransactions. They are coming, but I >> don't know if the proposed implementation will cope well with 120 >> million single INSERTs each running in it's own subtransaction. >> >> And I disagree with your last statement. It doesn't result in a less >> robust dbm, it results in more complex applications that (if written by >> more primitive coders) will instruct a perfectly robust dbm to do the >> wrong thing to the data - from a business point of view. >> >> >> Jan >> >> > >> > Regards, >> > Wayne >> > >> >> Martijn van Oosterhout wrote: >> >> > On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote: >> >> > >> >> >> During import of 120 thousand records from an isam file system, 3 say records >> >> >>fail integrity checks ( files in non-database systems tend not to have >> >> >>referential integrity implemented on them except at an application level >> >> >>(meaning not at all :). The desired result is to drop the records failing >> >> >>integrity checks. >> >> > >> >> > >> >> > Yes, I have that problem too. I actually wrote a script that took an input >> >> > file and automatically reissued queries that succeeded but got rolled-back. >> >> > I called it quickloader.pl :) >> >> > >> >> > Probably the biggest problem is that you can't use COPY to load the data. >> >> > I've thought about loading into another table and transferring later but I >> >> > havn't sorted out the details. >> >> >> >> The general idea would be to setup a table that has exactly the same >> >> structure as the final target table, but with no constraints at all. As >> >> long as your data passes all input functions you can even COPY it in. >> >> >> >> Now you run check queries that show you all tuples in that staging table >> >> that would fail constraints on the final table. Fix those and you can do >> >> >> >> INSERT INTO final SELECT * FROM staging; >> >> >> >> If step one fails because of data that doesn't pass the input functions >> >> of our data types, you have to go through another level of staging with >> >> a table that has text fields only and move it by explicit casting after >> >> cleaning up those problems. >> >> >> >> >> >> Jan >> >> >> >> -- >> >> #======================================================================# >> >> # It's easier to get forgiveness for being wrong than for being right. # >> >> # Let's break this rule - forgive me. # >> >> #================================================== JanWieck@Yahoo.com # >> >> >> >> -- >> #======================================================================# >> # It's easier to get forgiveness for being wrong than for being right. # >> # Let's break this rule - forgive me. # >> #================================================== JanWieck@Yahoo.com # -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
** Reply to message from Jan Wieck <JanWieck@Yahoo.com> on Fri, 30 May 2003 15:40:43 -0400 Hi Jan, I'm grabbing source right now :) That ought to be enough to shut me up for a couple months :) Regards, Wayne > Wayne, > > yes, I totally agree. Will you implement UNDO for that and if so, how do > you propose to get rid of the to be removed index tuples without the > risk of deadlocking during UNDO? Or is there some other possibility we > have overlooked so far? > > With subtransactions I didn't mean application controlled ones, but > automatic ones that allow the system to return to the state at the > beginning of the failing command. It's not just your little INSERT > problem that needs to be covered. If you want the system to continue > after an ERROR without rollback, you need to undo every triggered action > done during this statement before the ERROR occured. Rule generated > additional queries run before the failing one, triggers, whatnot. > > So please, do you have a context diff, do you have a detailed > implementation proposal, or are you just waving hands telling us what > you think the user visible behaviour should be? > > > Jan > > Wayne Armstrong wrote: > > ** Reply to message from Jan Wieck <JanWieck@Yahoo.com> on Fri, 30 May 2003 > > 15:06:01 -0400 > > Hi, > > Subtransactions here are likely as onerous as the solution that I bet gets > > used most of the time in this scenario, that is to commit after every insert. > > And, it's not a matter of good or bad coding here. There are cases where the > > response to an sql or insert error should be a rollback. There are as many > > cases where (given that the application is informed there was a problem), the > > problem can safely be ignored. Again, the decision should be upto the > > application not the database manager. It is the dbms perogative to reject the > > statement, not the transaction. > > > > Regards, > > Wayne > > > >> Wayne Armstrong wrote: > >> > ** Reply to message from Jan Wieck <JanWieck@Yahoo.com> on Thu, 29 May 2003 > >> > 20:25:56 -0400 > >> > Hi Jan, > >> > Yup, staging tables are probably going to be the answer in this case. > >> > However, my point is, you shouldn't need workarounds for this. (or for the > >> > more general insert on failure update on failure ignore type of logic used in a > >> > lot of import scenarios) > >> > Taking the decision if or not to rollback a transaction (sql error or no) away > >> > from the application, and enforcing it in the dbm, results in kluges, > >> > workarounds, and a generally less robust dbm. > >> > >> You obviously didn't search the mail archives too much, did you? If > >> INSERT ... SELECT ... fails half way through due to a duplicate key > >> error - how to get rid of the so far inserted tuples? > >> > >> This problem is well known, has been often discussed and is yet not > >> solveable because we do not have subtransactions. They are coming, but I > >> don't know if the proposed implementation will cope well with 120 > >> million single INSERTs each running in it's own subtransaction. > >> > >> And I disagree with your last statement. It doesn't result in a less > >> robust dbm, it results in more complex applications that (if written by > >> more primitive coders) will instruct a perfectly robust dbm to do the > >> wrong thing to the data - from a business point of view. > >> > >> > >> Jan > >> > >> > > >> > Regards, > >> > Wayne > >> > > >> >> Martijn van Oosterhout wrote: > >> >> > On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote: > >> >> > > >> >> >> During import of 120 thousand records from an isam file system, 3 say records > >> >> >>fail integrity checks ( files in non-database systems tend not to have > >> >> >>referential integrity implemented on them except at an application level > >> >> >>(meaning not at all :). The desired result is to drop the records failing > >> >> >>integrity checks. > >> >> > > >> >> > > >> >> > Yes, I have that problem too. I actually wrote a script that took an input > >> >> > file and automatically reissued queries that succeeded but got rolled-back. > >> >> > I called it quickloader.pl :) > >> >> > > >> >> > Probably the biggest problem is that you can't use COPY to load the data. > >> >> > I've thought about loading into another table and transferring later but I > >> >> > havn't sorted out the details. > >> >> > >> >> The general idea would be to setup a table that has exactly the same > >> >> structure as the final target table, but with no constraints at all. As > >> >> long as your data passes all input functions you can even COPY it in. > >> >> > >> >> Now you run check queries that show you all tuples in that staging table > >> >> that would fail constraints on the final table. Fix those and you can do > >> >> > >> >> INSERT INTO final SELECT * FROM staging; > >> >> > >> >> If step one fails because of data that doesn't pass the input functions > >> >> of our data types, you have to go through another level of staging with > >> >> a table that has text fields only and move it by explicit casting after > >> >> cleaning up those problems. > >> >> > >> >> > >> >> Jan > >> >> > >> >> -- > >> >> #======================================================================# > >> >> # It's easier to get forgiveness for being wrong than for being right. # > >> >> # Let's break this rule - forgive me. # > >> >> #================================================== JanWieck@Yahoo.com # > >> > >> > >> > >> -- > >> #======================================================================# > >> # It's easier to get forgiveness for being wrong than for being right. # > >> # Let's break this rule - forgive me. # > >> #================================================== JanWieck@Yahoo.com # > > > > -- > #======================================================================# > # It's easier to get forgiveness for being wrong than for being right. # > # Let's break this rule - forgive me. # > #================================================== JanWieck@Yahoo.com # > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
Try Microsoft SQL Server (tm). It will not roll back a transaction unless explicitly told to do so. It will also not informyou of any error, unless you specifically include code to check the status of the error code after every single DMLstatement. If you do so in a subtransaction, you also have to keep track of how far nested you are and whether a priorerror has been detected. Stored procedures have to be written to return error status as well, and nested stored procedurecalls likewise must keep track of the religiously checked error code status for earlier procedures and act (ornot act) accordingly. I really don't think you are going to talk anyone around here into that scenario.... Ian Harding Programmer/Analyst II Tacoma-Pierce County Health Department iharding@tpchd.org Phone: (253) 798-3549 Pager: (253) 754-0002 >>> "Wayne Armstrong" <wdarmst@bacchus.com.au> 05/30/03 12:18PM >>> ** Reply to message from Jan Wieck <JanWieck@Yahoo.com> on Fri, 30 May 2003 15:06:01 -0400 Hi, Subtransactions here are likely as onerous as the solution that I bet gets used most of the time in this scenario, that is to commit after every insert. And, it's not a matter of good or bad coding here. There are cases where the response to an sql or insert error should be a rollback. There are as many cases where (given that the application is informed there was a problem), the problem can safely be ignored. Again, the decision should be upto the application not the database manager. It is the dbms perogative to reject the statement, not the transaction. Regards, Wayne > Wayne Armstrong wrote: > > ** Reply to message from Jan Wieck <JanWieck@Yahoo.com> on Thu, 29 May 2003 > > 20:25:56 -0400 > > Hi Jan, > > Yup, staging tables are probably going to be the answer in this case. > > However, my point is, you shouldn't need workarounds for this. (or for the > > more general insert on failure update on failure ignore type of logic used in a > > lot of import scenarios) > > Taking the decision if or not to rollback a transaction (sql error or no) away > > from the application, and enforcing it in the dbm, results in kluges, > > workarounds, and a generally less robust dbm. > > You obviously didn't search the mail archives too much, did you? If > INSERT ... SELECT ... fails half way through due to a duplicate key > error - how to get rid of the so far inserted tuples? > > This problem is well known, has been often discussed and is yet not > solveable because we do not have subtransactions. They are coming, but I > don't know if the proposed implementation will cope well with 120 > million single INSERTs each running in it's own subtransaction. > > And I disagree with your last statement. It doesn't result in a less > robust dbm, it results in more complex applications that (if written by > more primitive coders) will instruct a perfectly robust dbm to do the > wrong thing to the data - from a business point of view. > > > Jan > > > > > Regards, > > Wayne > > > >> Martijn van Oosterhout wrote: > >> > On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote: > >> > > >> >> During import of 120 thousand records from an isam file system, 3 say records > >> >>fail integrity checks ( files in non-database systems tend not to have > >> >>referential integrity implemented on them except at an application level > >> >>(meaning not at all :). The desired result is to drop the records failing > >> >>integrity checks. > >> > > >> > > >> > Yes, I have that problem too. I actually wrote a script that took an input > >> > file and automatically reissued queries that succeeded but got rolled-back. > >> > I called it quickloader.pl :) > >> > > >> > Probably the biggest problem is that you can't use COPY to load the data. > >> > I've thought about loading into another table and transferring later but I > >> > havn't sorted out the details. > >> > >> The general idea would be to setup a table that has exactly the same > >> structure as the final target table, but with no constraints at all. As > >> long as your data passes all input functions you can even COPY it in. > >> > >> Now you run check queries that show you all tuples in that staging table > >> that would fail constraints on the final table. Fix those and you can do > >> > >> INSERT INTO final SELECT * FROM staging; > >> > >> If step one fails because of data that doesn't pass the input functions > >> of our data types, you have to go through another level of staging with > >> a table that has text fields only and move it by explicit casting after > >> cleaning up those problems. > >> > >> > >> Jan > >> > >> -- > >> #======================================================================# > >> # It's easier to get forgiveness for being wrong than for being right. # > >> # Let's break this rule - forgive me. # > >> #================================================== JanWieck@Yahoo.com # > > > > -- > #======================================================================# > # It's easier to get forgiveness for being wrong than for being right. # > # Let's break this rule - forgive me. # > #================================================== JanWieck@Yahoo.com # ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Hi all, This is going the flame-war way. Too bad, cause it's an important feature. I do miss the possibility to continue a transaction after a partial failure. There's a performance penalty in always checking data integrity before insert when you only get 3 records wrong out of a million. On the other hand, if the implementation of the statement undo (sub-transactions, undo, etc.) would also cause a performance penalty, then it's all the same, except the code would be cleaner... It's kind of amusing to see all postgres developers getting irritated when this feature is requested over and over again. Understandable, given the complexity of the problem. Maybe you should put a specific FAQ item: "continuing transaction after error: don't ask for this feature unless you can't contribute it !!!" Comparing with MS SQL is also wrong, it's kind of "look, there are worse implementations than ours, so we're OK". Why don't you compare with the working implementations ? I won't give any examples (they do exist), cause in my experience postgres developers are allergic to this too. Please understand that I do appreciate the effort all postgres developers invested in this product which I am using, and I am satisfied with it. Just that it's not constructive to convince yourself and everybody that a workaround is better then the requested feature all the time when the feature is complex and hard to implement or you simply don't use it... Cheers, Csaba. On Fri, 2003-05-30 at 22:07, Ian Harding wrote: > Try Microsoft SQL Server (tm). It will not roll back a transaction unless explicitly told to do so. It will also notinform you of any error, unless you specifically include code to check the status of the error code after every singleDML statement. If you do so in a subtransaction, you also have to keep track of how far nested you are and whethera prior error has been detected. Stored procedures have to be written to return error status as well, and nestedstored procedure calls likewise must keep track of the religiously checked error code status for earlier proceduresand act (or not act) accordingly. > > I really don't think you are going to talk anyone around here into that scenario.... > > Ian Harding > Programmer/Analyst II > Tacoma-Pierce County Health Department > iharding@tpchd.org > Phone: (253) 798-3549 > Pager: (253) 754-0002 > > > >>> "Wayne Armstrong" <wdarmst@bacchus.com.au> 05/30/03 12:18PM >>> > ** Reply to message from Jan Wieck <JanWieck@Yahoo.com> on Fri, 30 May 2003 > 15:06:01 -0400 > Hi, > Subtransactions here are likely as onerous as the solution that I bet gets > used most of the time in this scenario, that is to commit after every insert. > And, it's not a matter of good or bad coding here. There are cases where the > response to an sql or insert error should be a rollback. There are as many > cases where (given that the application is informed there was a problem), the > problem can safely be ignored. Again, the decision should be upto the > application not the database manager. It is the dbms perogative to reject the > statement, not the transaction. > > Regards, > Wayne > > > Wayne Armstrong wrote: > > > ** Reply to message from Jan Wieck <JanWieck@Yahoo.com> on Thu, 29 May 2003 > > > 20:25:56 -0400 > > > Hi Jan, > > > Yup, staging tables are probably going to be the answer in this case. > > > However, my point is, you shouldn't need workarounds for this. (or for the > > > more general insert on failure update on failure ignore type of logic used in a > > > lot of import scenarios) > > > Taking the decision if or not to rollback a transaction (sql error or no) away > > > from the application, and enforcing it in the dbm, results in kluges, > > > workarounds, and a generally less robust dbm. > > > > You obviously didn't search the mail archives too much, did you? If > > INSERT ... SELECT ... fails half way through due to a duplicate key > > error - how to get rid of the so far inserted tuples? > > > > This problem is well known, has been often discussed and is yet not > > solveable because we do not have subtransactions. They are coming, but I > > don't know if the proposed implementation will cope well with 120 > > million single INSERTs each running in it's own subtransaction. > > > > And I disagree with your last statement. It doesn't result in a less > > robust dbm, it results in more complex applications that (if written by > > more primitive coders) will instruct a perfectly robust dbm to do the > > wrong thing to the data - from a business point of view. > > > > > > Jan > > > > > > > > Regards, > > > Wayne > > > > > >> Martijn van Oosterhout wrote: > > >> > On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote: > > >> > > > >> >> During import of 120 thousand records from an isam file system, 3 say records > > >> >>fail integrity checks ( files in non-database systems tend not to have > > >> >>referential integrity implemented on them except at an application level > > >> >>(meaning not at all :). The desired result is to drop the records failing > > >> >>integrity checks. > > >> > > > >> > > > >> > Yes, I have that problem too. I actually wrote a script that took an input > > >> > file and automatically reissued queries that succeeded but got rolled-back. > > >> > I called it quickloader.pl :) > > >> > > > >> > Probably the biggest problem is that you can't use COPY to load the data. > > >> > I've thought about loading into another table and transferring later but I > > >> > havn't sorted out the details. > > >> > > >> The general idea would be to setup a table that has exactly the same > > >> structure as the final target table, but with no constraints at all. As > > >> long as your data passes all input functions you can even COPY it in. > > >> > > >> Now you run check queries that show you all tuples in that staging table > > >> that would fail constraints on the final table. Fix those and you can do > > >> > > >> INSERT INTO final SELECT * FROM staging; > > >> > > >> If step one fails because of data that doesn't pass the input functions > > >> of our data types, you have to go through another level of staging with > > >> a table that has text fields only and move it by explicit casting after > > >> cleaning up those problems. > > >> > > >> > > >> Jan > > >> > > >> -- > > >> #======================================================================# > > >> # It's easier to get forgiveness for being wrong than for being right. # > > >> # Let's break this rule - forgive me. # > > >> #================================================== JanWieck@Yahoo.com # > > > > > > > > -- > > #======================================================================# > > # It's easier to get forgiveness for being wrong than for being right. # > > # Let's break this rule - forgive me. # > > #================================================== JanWieck@Yahoo.com # > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Hello All, I want to run a process of archival which will delete data form one table and insert into another based on date critria. Can anybody please guide what will be best way of doing it. Regards Shoaib
Hi, On Mon, 2 Jun 2003, shoaib wrote: > I want to run a process of archival which will delete data form one > table and insert into another based on date critria. If you've created the new table; then the following might be useful for you: INSERT INTO new_table SELECT * FROM old_table WHERE date_condition; If you haven't creted the table; then use: CREATE TABLE new_table AS SELECT * FROM old_table WHERE date_condition. Regards, -- Devrim GUNDUZ devrim@gunduz.org devrim.gunduz@linux.org.tr http://www.tdmsoft.com http://www.gunduz.org
INSERT .into targettable. SELECT .. FROM destinationtable with filtered date criteria. DELETE FROM destinationtable with filtered date criteria. Sreedhar ----- Original Message ----- From: "shoaib" <shoaibm@vmoksha.com> To: "'Postgres general mailing list'" <pgsql-general@postgresql.org> Sent: Monday, June 02, 2003 4:01 PM Subject: [GENERAL] Bulk coying of data from one table to another > Hello All, > > I want to run a process of archival which will delete data form one > table and insert into another based on date critria. > > Can anybody please guide what will be best way of doing it. > > Regards > > Shoaib > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >
I think it's always appropriate to compare PostgreSQL to MSSQL Server since it is a direct competitor with a huge market/mindshare. The issue being discussed is one of the main joys of my transition to PG. I don't have to worry aboutsilent errors corrupting my data over a series of months like I did before. Just because a feature is 'cool' doesn't mean it has to be embraced by PostgreSQL. There are many examples of 'neat' featuresin MySQL that have been requested here and rejected as well. I was serious about suggesting MSSQL server, just as I would be about suggesting MySQL to someone who thought one of their'features' was so neat it was indispensible. I would never imply that the MSSQL Server implementation of this feature is wrong, it is just different, and if you wantto spend your time coding around its lacadaisical error handling, you can, just like MySQL users can spend their timereinventing referential integrity in PHP. I prefer 'spirited discussion' to 'flame war';^) Ian Harding Programmer/Analyst II Tacoma-Pierce County Health Department iharding@tpchd.org Phone: (253) 798-3549 Pager: (253) 754-0002 >>> Csaba Nagy <nagy@ecircle-ag.com> 06/02/03 02:16AM >>> Hi all, This is going the flame-war way. Too bad, cause it's an important feature. I do miss the possibility to continue a transaction after a partial failure. There's a performance penalty in always checking data integrity before insert when you only get 3 records wrong out of a million. On the other hand, if the implementation of the statement undo (sub-transactions, undo, etc.) would also cause a performance penalty, then it's all the same, except the code would be cleaner... It's kind of amusing to see all postgres developers getting irritated when this feature is requested over and over again. Understandable, given the complexity of the problem. Maybe you should put a specific FAQ item: "continuing transaction after error: don't ask for this feature unless you can't contribute it !!!" Comparing with MS SQL is also wrong, it's kind of "look, there are worse implementations than ours, so we're OK". Why don't you compare with the working implementations ? I won't give any examples (they do exist), cause in my experience postgres developers are allergic to this too. Please understand that I do appreciate the effort all postgres developers invested in this product which I am using, and I am satisfied with it. Just that it's not constructive to convince yourself and everybody that a workaround is better then the requested feature all the time when the feature is complex and hard to implement or you simply don't use it... Cheers, Csaba. On Fri, 2003-05-30 at 22:07, Ian Harding wrote: > Try Microsoft SQL Server (tm). It will not roll back a transaction unless explicitly told to do so. It will also notinform you of any error, unless you specifically include code to check the status of the error code after every singleDML statement. If you do so in a subtransaction, you also have to keep track of how far nested you are and whethera prior error has been detected. Stored procedures have to be written to return error status as well, and nestedstored procedure calls likewise must keep track of the religiously checked error code status for earlier proceduresand act (or not act) accordingly. > > I really don't think you are going to talk anyone around here into that scenario.... > > Ian Harding > Programmer/Analyst II > Tacoma-Pierce County Health Department > iharding@tpchd.org > Phone: (253) 798-3549 > Pager: (253) 754-0002 > > > >>> "Wayne Armstrong" <wdarmst@bacchus.com.au> 05/30/03 12:18PM >>> > ** Reply to message from Jan Wieck <JanWieck@Yahoo.com> on Fri, 30 May 2003 > 15:06:01 -0400 > Hi, > Subtransactions here are likely as onerous as the solution that I bet gets > used most of the time in this scenario, that is to commit after every insert. > And, it's not a matter of good or bad coding here. There are cases where the > response to an sql or insert error should be a rollback. There are as many > cases where (given that the application is informed there was a problem), the > problem can safely be ignored. Again, the decision should be upto the > application not the database manager. It is the dbms perogative to reject the > statement, not the transaction. > > Regards, > Wayne > > > Wayne Armstrong wrote: > > > ** Reply to message from Jan Wieck <JanWieck@Yahoo.com> on Thu, 29 May 2003 > > > 20:25:56 -0400 > > > Hi Jan, > > > Yup, staging tables are probably going to be the answer in this case. > > > However, my point is, you shouldn't need workarounds for this. (or for the > > > more general insert on failure update on failure ignore type of logic used in a > > > lot of import scenarios) > > > Taking the decision if or not to rollback a transaction (sql error or no) away > > > from the application, and enforcing it in the dbm, results in kluges, > > > workarounds, and a generally less robust dbm. > > > > You obviously didn't search the mail archives too much, did you? If > > INSERT ... SELECT ... fails half way through due to a duplicate key > > error - how to get rid of the so far inserted tuples? > > > > This problem is well known, has been often discussed and is yet not > > solveable because we do not have subtransactions. They are coming, but I > > don't know if the proposed implementation will cope well with 120 > > million single INSERTs each running in it's own subtransaction. > > > > And I disagree with your last statement. It doesn't result in a less > > robust dbm, it results in more complex applications that (if written by > > more primitive coders) will instruct a perfectly robust dbm to do the > > wrong thing to the data - from a business point of view. > > > > > > Jan > > > > > > > > Regards, > > > Wayne > > > > > >> Martijn van Oosterhout wrote: > > >> > On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote: > > >> > > > >> >> During import of 120 thousand records from an isam file system, 3 say records > > >> >>fail integrity checks ( files in non-database systems tend not to have > > >> >>referential integrity implemented on them except at an application level > > >> >>(meaning not at all :). The desired result is to drop the records failing > > >> >>integrity checks. > > >> > > > >> > > > >> > Yes, I have that problem too. I actually wrote a script that took an input > > >> > file and automatically reissued queries that succeeded but got rolled-back. > > >> > I called it quickloader.pl :) > > >> > > > >> > Probably the biggest problem is that you can't use COPY to load the data. > > >> > I've thought about loading into another table and transferring later but I > > >> > havn't sorted out the details. > > >> > > >> The general idea would be to setup a table that has exactly the same > > >> structure as the final target table, but with no constraints at all. As > > >> long as your data passes all input functions you can even COPY it in. > > >> > > >> Now you run check queries that show you all tuples in that staging table > > >> that would fail constraints on the final table. Fix those and you can do > > >> > > >> INSERT INTO final SELECT * FROM staging; > > >> > > >> If step one fails because of data that doesn't pass the input functions > > >> of our data types, you have to go through another level of staging with > > >> a table that has text fields only and move it by explicit casting after > > >> cleaning up those problems. > > >> > > >> > > >> Jan > > >> > > >> -- > > >> #======================================================================# > > >> # It's easier to get forgiveness for being wrong than for being right. # > > >> # Let's break this rule - forgive me. # > > >> #================================================== JanWieck@Yahoo.com # > > > > > > > > -- > > #======================================================================# > > # It's easier to get forgiveness for being wrong than for being right. # > > # Let's break this rule - forgive me. # > > #================================================== JanWieck@Yahoo.com # > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly