Automatic error rollback can be harmful - Mailing list pgsql-general
From | Wayne Armstrong |
---|---|
Subject | Automatic error rollback can be harmful |
Date | |
Msg-id | 200305241449.h4OEnmbv002338@mail.bacchus.com.au Whole thread Raw |
In response to | Re-Create Table make Faster. ("Yudha Setiawan" <inas_husband@bonbon.net>) |
List | pgsql-general |
Hi, I have been using postgresql for about 3 months now (after about 15 years of using various commercial databases). I love it :) but - (and you just knew that but was coming :) there is one thing that really causes me grief . It is the tendency for postgres to rollback automatically on some errors. What this leads to is the need for an application to commit more frequently tahn is really desirable. Example1. During import of 120 thousand records from an isam file system, 3 say records fail integrity checks ( files in non-database systems tend not to have referential integrity implemented on them except at an application level). The desired result is to drop the records failing integrity checks. Importing into db2 or oracle say - I have the option to ignore the referential integrity errors (i get the error code on the insert anyway), and continue with the insert of the 120 thousand - 3 records. In postgres, I either have to commit after every record write, or guarantee the "cleanness" of the data before I begin the import - which is sometimes difficult given the data sources I may be importing from. Worse if the intention behind the automatic rollback is to guarantee data purity (from the postgresql manual 10.4.1 - " Turn off autocommit and just do one commit at the end. (In plain SQL, this means issuing BEGIN at the start and COMMIT at the end. Some client libraries may do this behind your back, in which case you need to make sure the library does it when you want it done.) If you allow each insertion to be committed separately, PostgreSQL is doing a lot of work for each record added. An additional benefit of doing all insertions in one transaction is that if the insertion of one record were to fail then the insertion of all records inserted up to that point would be rolled back, so you won't be stuck with partially loaded data." It actually makes more of a mess than leaving if or not to rollback under application control. If in this example the three "bad" records are scattered throughout the dataset, I could end up (if I ignore the error as i would do for most other dbms), with a random number of records in the table. etc etc. Of course also, to do robust imports of the insert ifError update style avaiilable in most other (commercial) dbms it is also nescessary to issue a commit before each insert/update attempt. Example2 - ODBC driver rollbacks :- Here is a snippet from a postgres log :- 2003-05-24 23:09:14 [1525] LOG: duration: 0.134503 sec 2003-05-24 23:09:14 [1525] LOG: query: select nspname from pg_namespace n, p _class c where c.relnamespace=n.oid and c.oid='select'::regclass 2003-05-24 23:09:14 [1525] ERROR: Relation "select" does not exist 2003-05-24 23:09:14 [1525] LOG: statement: select nspname from pg_namespace , pg_class c where c.relnamespace=n.oid and c.oid='select'::regclass 2003-05-24 23:09:14 [1525] LOG: query: ROLLBACK 2003-05-24 23:09:14 [1525] LOG: duration: 0.000538 sec The rollback here is generated by the odbc driver to clear an error created by the odbc driver incorrectly parsing the select from a subselect statement as a table name. The application is totally unaware an error has occured and a rollback has been issued. This is most likely to lead to data loss, unless, basically, an application using odbc is in autocommit mode or commits after every sql statement issued. This concerns me to the piont where I would really recommend not using the parse statement option in the odbc driver (even though that reduces the odbc drivers ability to mimic a prepare which is also problematic :) unless you are using autocommit. Thoughts and comments ? Regards, Wayne Armstorng Bacchus Management Systems http://www.bacchus.com.au
pgsql-general by date: