Thread: Problem with aborting entire transactions on error
Hello, As I read while googling the web, many people complained about this before. Couldn't it be made optional (can be even with "default ON")? I understand, that there are situations, when it is a must - for example, when the rest of queries rely on the result of first ones - but there are numerous situations, when just skipping a faulty query is all we need. A simple - but very common - example: I wanted to perform really large number of inserts - using transaction, to make it faster - while being sure the duplicate entries will be skipped. Of course, this job will be done best by server itself, which is keeping an eye on "primary key" of the table. Unfortunately: not during a transaction! Any "dupe" will trash thousands other (proper) entries immediately. Why is this? My guess is, there is kind of logic in the code, like this: if { no error during query } { do it } else { withdraw this one rollback entire transaction } Therefore my request - and, as I saw, of many others - would be just to introduce a little change: if { no error during query } { do it } else { withdraw this one if { ROLLBACK_ON_ERROR } { rollback entire transaction } } (if there's no ROLLBACK_ON_ERROR - it should carry on with the remaining queries) Is it really so problematic to introduce such code change, allowing the users to control this behaviour? Yes, I read about using "savepoints" - but I think we agree, it's just cumbersome workaround - and not real solution, like my proposal. All we need is either a variable to set, or a command, that will allow to modify the present functionality in the way described above. -- regards, Zbigniew
On Dec 9, 2012, at 22:20, Zbigniew <zbigniew2011@gmail.com> wrote: > Hello, > > As I read while googling the web, many people complained about this > before. Couldn't it be made optional (can be even with "default ON")? > I understand, that there are situations, when it is a must - for > example, when the rest of queries rely on the result of first ones - > but there are numerous situations, when just skipping a faulty query > is all we need. > > A simple - but very common - example: I wanted to perform really large > number of inserts - using transaction, to make it faster - while being > sure the duplicate entries will be skipped. And what happens if one of those inserts errors out for reasons other than a duplicate? > Of course, this job will > be done best by server itself, which is keeping an eye on "primary > key" of the table. Unfortunately: not during a transaction! Any "dupe" > will trash thousands other (proper) entries immediately. > > Why is this? My guess is, there is kind of logic in the code, like this: > > if { no error during query } { > do it > } else { > withdraw this one > rollback entire transaction > } > > Therefore my request - and, as I saw, of many others - would be just > to introduce a little change: > > if { no error during query } { > do it > } else { > withdraw this one > if { ROLLBACK_ON_ERROR } { > rollback entire transaction > } > } > > (if there's no ROLLBACK_ON_ERROR - it should carry on with the > remaining queries) > > Is it really so problematic to introduce such code change, allowing > the users to control this behaviour? Since current belief is that such behavior is unwise no one is willing to give their time to doing so. I'm not sure whetherthat means that if you supplied such a patch it would be rejected. Since their is a correct way to solve the duplicatesscenario (see below) without savepoints you may wish to supply another example if you want to try and convincepeople. > Yes, I read about using > "savepoints" - but I think we agree, it's just cumbersome workaround - > and not real solution, like my proposal. All we need is either a > variable to set, or a command, that will allow to modify the present > functionality in the way described above. The true solution is to insert into a staging table that allows duplicates (but ideally contains other constraints that youdo care about) and then de-dupe and insert into your final table. > -- > regards, > Zbigniew > This may be an undesirable instance of the database not allowing you to shoot yourself in the foot but as ignoring errorsis bad practice motivation to allow it is small. You should always be able to import the data without errors intoan explicitly defined table and then write queries to convert between the input format and the final format - explicitlymaking no coding the necessary translation decisions and procedures. In that way you always know that your importroutine is always working as expected and not guessing whether it was the known error condition or an unknown conditionthe caused a record to go missing. David J.
Please reply-to the list, not just myself. > -----Original Message----- > From: Zbigniew [mailto:zbigniew2011@gmail.com] > Sent: Monday, December 10, 2012 6:26 AM > To: David Johnston > Subject: Re: [GENERAL] Problem with aborting entire transactions on error > > 2012/12/10, David Johnston <polobo@yahoo.com>: > > >> A simple - but very common - example: I wanted to perform really > >> large number of inserts - using transaction, to make it faster - > >> while being sure the duplicate entries will be skipped. > > > > And what happens if one of those inserts errors out for reasons other > > than a duplicate? > > Then maybe there could be something like > "DON'T_ROLLBACK_ON_DUPLICATE_ERROR" > instead of "more general" directive? > > > Since current belief is that such behavior is unwise > > No idea, why common belief is "allowing the user to decide is unwise". > Is the "common assumption", that the users of Postgres are children > 4-5 years old? If it requires coding something to provide the user the desired flexibility then whether or not such flexibility is wise or unwise is going to go into said decision. Also, since you are begging others to solve your own problems you are in many ways behaving as a child. Especially with an Open Source project like PostgreSQL adults are welcomed and encouraged to solve their own problems by altering the source code and, in the spirit of community, contributing it back to the project (contribution also means you do not have to maintain your own custom version of the software). Please do not take this as a personal affront; I just simply wanted to extend on the analogy that you put forward. I fully recognize the need to ask the community for help myself (I am also not a coder) but in doing so I know I am imposing on the charity of others and as such need to make best efforts to understand and respect their point-of-view and readily acknowledge my own ignorance in the matter. > > > no one is willing to give their time to doing so. > > Pay attention, that actually my (and not just mine) request isn't to write > some new, sophisticated code - but just a little change in existing code > instead. If this is your belief then do it yourself. Your response to my question above already indicates that what you initially thought was very simple had at least one complication you did not understand. There are likely others. > > > I'm not sure whether that means that if you supplied such a patch it > > would be rejected. Since their is a correct way to solve the > > duplicates scenario (see below) without savepoints you may wish to > > supply another example if you want to try and convince people. > > I'm afraid, I don't understand. How many examples can I supply for the > problem, like: > 1. I need only to avoid duplicated rows (the ones with duplicated primary > key). > 2. I don't need any "stronger" reaction for such error - since I know, there > won't be any troubles. > 3. Since server controls primary key, all I need is to make it a bit less > "nervous" about such attempt to insert dupe, only dropping this one (and > not everything else together with single faulty entry)? > > Whatever will I supply, the scenario basically will be the same - I just would to > avoid such "throwing the baby out with the bathwater". > > > The true solution is to insert into a staging table that allows > > duplicates (but ideally contains other constraints that you do care > > about) and then de-dupe and insert into your final table. > > ...which means twice as work for the server (AT LEAST twice). When all I > needed was just to skip dupes, not aborting the transaction. And it could be > perfectly done. > > > This may be an undesirable instance of the database not allowing you > > to shoot yourself in the foot but as ignoring errors is bad practice > > motivation to allow it is small. > > I'm NOT proposing to just ignore errors - read carefully. please - I'm writing > about letting user decide, what impact should an error have: > - only dropping "faulty" query, or... > - ...aborting the entire transaction (which is undesirable in many situations) Except for ETL aborting is almost universally the desired behavior. I'll grant you your arguments for why, during ETL, this feature could be useful (as opposed to performing duplicate work), but I still think that a blind "ignore duplicates and leave the first instance in place" ETL policy is a bad one in almost all cases. > > > You should always be able to import the data without errors into an > > explicitly defined table and then write queries to convert between the > > input format and the final format - explicitly making no coding the > > necessary translation decisions and procedures. In that way you > > always know that your import routine is always working as expected and > > not guessing whether it was the known error condition or an unknown > > condition the caused a record to go missing. > > Well, looking for solution yesterday I've found several messages of other > people, > who complained about the same: that simple duplicate record is > aborting whole transaction. This makes me think, that indeed this is quite > common case, when limiting the "server response" (and NOT "ignoring the > error" completely, which could mean insertion of the duplicated row), could > be very useful. There are situations, when all we need is just avoid dupes - > that's why we have primary keys - and "too much protection" makes the > work unnecessarily difficult. > -- > regards, > Zbigniew It is easy to complain but apparently no one feels strongly enough to either code a solution themselves or sponsor someone else to do so. As I have not seen any core coders respond I cannot be certain whether there are underlying technical issues preventing this but there is at the least a resource allocation concern since neither code donors nor those sponsored by clients have made the time to implement this "simple feature". It may be more productive, not being a core coder yourself, to simply ask why such a feature has not been implemented given the apparent demand instead of asserting (from ignorance) that such an implementation should be very simple to accomplish. The later approach (as well as your response to me - personally) is much more confrontational and contrary (the direct reply at least) to the posted etiquette for this community. David J.
2012/12/10, David Johnston <polobo@yahoo.com>: > Please reply-to the list, not just myself. Sorry. > If it requires coding something to provide the user the desired flexibility > then whether or not such flexibility is wise or unwise is going to go into > said decision. So? > Also, since you are begging others to solve your own > problems Now, not just a problem of mine - but of many others as well, as I wrote. And I was "begging" already: "please, read carefully" (I meant: with understanding). > you are in many ways behaving as a child. When someone doesn't design his/her clothes on his own, doesn't sew boots, bake bread etc. - "behaves like a child"? > Especially with an Open > Source project like PostgreSQL adults are welcomed and encouraged to solve > their own problems by altering the source code and, in the spirit of > community, contributing it back to the project (contribution also means you > do not have to maintain your own custom version of the software). If I knew the "innards" of Postgres as good, as its present developers - maybe I could alter the code. But since I don't know them - I'm perfectly sure, that the ones, who created the code, are able to introduce such improvement during about 1/20 of the time, I had to spend trying to do it by myself. > Please do not take this as a personal affront; Yes, I took this as personal affront, because IT IS a personal affront. Do you really think, that your "please, do not take" is changing this? > It is easy to complain but apparently no one feels strongly enough to > either code a solution themselves or sponsor someone else to do so. From what I see, the development is going on - then my conclusion is, that there are people "feeling strong enough" - and therefore I wanted to let them know, where they got wrong. > As I have not > seen any core coders respond I cannot be certain whether there are > underlying technical issues preventing this but there is at the least a > resource allocation concern since neither code donors nor those sponsored > by clients have made the time to implement this "simple feature". It may be > more productive, not being a core coder yourself, to simply ask why such a > feature has not been implemented given the apparent demand instead of > asserting (from ignorance) that such an implementation should be very > simple > to accomplish. The later approach (as well as your response to me - > personally) is much more confrontational and contrary (the direct reply at > least) to the posted etiquette for this community. No idea, what made you so upset with this "direct response" - just clicked "Reply", and forgot to check the recipient's address. Actually, the sender address should be pgsql-general@postgresql.org already. I've got a feeling, that all you have to say, is: "if this is the way it is, it means, that this is good, and shouldn't be changed". You are unable to explain, why - just the "common belief" etc. is your rationale (while it's not that "common" at all, as I wrote). You have no arguments against, but: "code it by yourself", "it must be perfect, when it works so", and so on. According to you the development can be stopped at this time, since everything is perfect. -- regards, Zbigniew
Zbigniew, 10.12.2012 04:20: > Yes, I read about using "savepoints" - but I think we agree, >it's just cumbersome workaround - and not real solution, I disagree. It might be a bit cumbersome, but it *is* a proper solution to the problem - not a workaround. Thomas
On Tue, Dec 11, 2012 at 1:15 AM, David Johnston <polobo@yahoo.com> wrote: >> -----Original Message----- >> From: Zbigniew [mailto:zbigniew2011@gmail.com] >> Sent: Monday, December 10, 2012 6:26 AM >> To: David Johnston >> Subject: Re: [GENERAL] Problem with aborting entire transactions on error >> >> No idea, why common belief is "allowing the user to decide is unwise". >> Is the "common assumption", that the users of Postgres are children >> 4-5 years old? > > If it requires coding something to provide the user the desired flexibility > then whether or not such flexibility is wise or unwise is going to go into > said decision. There's a fundamental difference here between PostgreSQL and, for example, MySQL. With PG, you get a well-built database and your application is permitted to access it in accordance with the database admin's rules. With MySQL, your application stores data. There's a vastly different focus there, and that's why you feel that PostgreSQL is treating you like a child. The fact is that your application is NOT in command here; the database is. You cannot violate the database's rules (eg constraints, permissions lists, etc) in the application. That means that you can be confident that those constraints will always be followed! The other part of the problem is that you're using transactions for something other than transactional integrity. You're batching up transactions in order to improve performance. That's certainly possible, but you're using the database "oddly", so you can't expect it to behave according to your modified set of rules. That's why the transaction functions like a transaction, not like a commit-point. I second the recommendation to look into savepoints. Chances are that's all you need to do to make this work, but I wouldn't gamble my job/reputation on it being perfect; so... have backups. Make sure you can start the whole job over if need be. ChrisA
On 12/10/2012 06:52 AM, Zbigniew wrote: > 2012/12/10, David Johnston <polobo@yahoo.com>: > > > I've got a feeling, that all you have to say, is: "if this is the way > it is, it means, that this is good, and shouldn't be changed". You are > unable to explain, why - just the "common belief" etc. is your > rationale (while it's not that "common" at all, as I wrote). You have > no arguments against, but: "code it by yourself", "it must be perfect, > when it works so", and so on. > > According to you the development can be stopped at this time, since > everything is perfect. Nothing is ever perfect when it comes to software. An error is an error, sooner or later you will have to deal with it. What you are facing is a project decision to have the end user deal with the problem sooner rather than later, one that I agree with by the way. As to why that is so, experience has shown delaying the solution increases the difficulty of the fix. I have been reminded of that lately while working on some data stored in another Open Source database that allows one to run with the safeties off the foot gun. The original coder went the expedient route and now that data is a mess. Some of it can be cleaned up, but an unknown amount is lost or defective. At this point in time there is insufficient information left to decide on the validity of portions of the data. The 'common belief' then is that it is better to make the end user aware of issues from the start and force a solution, for the sake of data integrity. If you follow the list regularly you will find that to be the case. -- Adrian Klaver adrian.klaver@gmail.com
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Zbigniew > Sent: Monday, December 10, 2012 9:53 AM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Problem with aborting entire transactions on error > > 2012/12/10, David Johnston <polobo@yahoo.com>: > > > Please reply-to the list, not just myself. > > Sorry. > > > If it requires coding something to provide the user the desired > > flexibility then whether or not such flexibility is wise or unwise is > > going to go into said decision. > > So? > > > Also, since you are begging others to solve your own problems > > Now, not just a problem of mine - but of many others as well, as I wrote. And > I was "begging" already: "please, read carefully" (I meant: > with understanding). > > > you are in many ways behaving as a child. > > When someone doesn't design his/her clothes on his own, doesn't sew > boots, bake bread etc. - "behaves like a child"? You are taking the analogy to an absurd extreme. That said, you are not generally asking Nike or the local baker to customize their recipes and then give you the end result for free. > > > Especially with an Open > > Source project like PostgreSQL adults are welcomed and encouraged to > > solve their own problems by altering the source code and, in the > > spirit of community, contributing it back to the project (contribution > > also means you do not have to maintain your own custom version of the > software). > > If I knew the "innards" of Postgres as good, as its present developers > - maybe I could alter the code. But since I don't know them - I'm perfectly > sure, that the ones, who created the code, are able to introduce such > improvement during about 1/20 of the time, I had to spend trying to do it by > myself. > > > Please do not take this as a personal affront; > > Yes, I took this as personal affront, because IT IS a personal affront. Do you > really think, that your "please, do not take" is changing this? I can hope. Given that e-mail is a text-only medium it makes since to at least attempt to put some comments into a proper context explicitly since body language and tone-of-voice cannot be used to convey such. > > > It is easy to complain but apparently no one feels strongly enough to > > either code a solution themselves or sponsor someone else to do so. > > From what I see, the development is going on - then my conclusion is, that > there are people "feeling strong enough" - and therefore I wanted to let > them know, where they got wrong. You generalize where I am specifically referring to this explicit feature. > > > As I have not > > seen any core coders respond I cannot be certain whether there are > > underlying technical issues preventing this but there is at the least > > a resource allocation concern since neither code donors nor those > > sponsored by clients have made the time to implement this "simple > > feature". It may be more productive, not being a core coder yourself, > > to simply ask why such a feature has not been implemented given the > > apparent demand instead of asserting (from ignorance) that such an > > implementation should be very simple to accomplish. The later > > approach (as well as your response to me - > > personally) is much more confrontational and contrary (the direct > > reply at > > least) to the posted etiquette for this community. > > No idea, what made you so upset with this "direct response" - just clicked > "Reply", and forgot to check the recipient's address. > Actually, the sender address should be pgsql-general@postgresql.org > already. > > I've got a feeling, that all you have to say, is: "if this is the way it is, it means, > that this is good, and shouldn't be changed". You are unable to explain, why - > just the "common belief" etc. is your rationale (while it's not that "common" > at all, as I wrote). You have no arguments against, but: "code it by yourself", > "it must be perfect, when it works so", and so on. > > According to you the development can be stopped at this time, since > everything is perfect. No. But everything is the way it is for some reason. "This would be a useful feature" and "why do not things work this way" both are excellent approaches to instigating change and obtaining the understanding as to the "why" of how things are. "This is a bug and the solution is simple - though I have never actually coded the PostgreSQL" is a less effective way of doing so. I've already agreed that your idea has merit (maybe not much, but some). However, since it is not that way currently (and the problem is not a new one) there likely exists some underlying reason said feature has not been implemented. I offer, via induction and some experience, that said feature is not in high demand, not as simple as you claim, and in many ways is undesirable - thus no one has taken the time to implement it. You either need a more convincing (and novel) argument or you need to contact someone like PGExperts, 2ndQuadrant, EnterpriseDB (for example and none of whom I am affiliated with) and pay them to implement the feature for you (likely much less expensive than learning C and the codebase and doing it yourself). Given that we are both ignorant on the technical aspects of said feature, and our conversation is getting too pointed, I suggest that you simply wait for a response from someone more qualified and in the meantime (since even if someone coded this feature today it would not be in production release until 9.3) code your ETL according to the best-practice I suggested earlier. If the staging table is too large a burden AND no one volunteers to implement your feature your options have been put before you. David J.
Chris Angelico, 10.12.2012 16:47: > The other part of the problem is that you're using transactions for > something other than transactional integrity. You're batching up > transactions in order to improve performance. That's certainly > possible, but you're using the database "oddly", so you can't expect > it to behave according to your modified set of rules. That's why the > transaction functions like a transaction, not like a commit-point. Just as a side-note: Oracle also allows you to commit a "transaction" even if some of the statements failed e.g. the commit of 20 inserts would only "commit" the ones that were successful. Regards Thomas
> if { no error during query } {
> do it
> } else {
> withdraw this one
> if { ROLLBACK_ON_ERROR } {
> rollback entire transaction
> }
> }
I fear that this is no-where near as simple as it's been taken for, and without looking at the code, I would imagine that this would touch so many different areas, that a simple implementation is just not possible.
Ask yourself; How do other DBMS systems handle this?
MySQL has "INSERT IGNORE"
MSSQL has PRIMARY KEY WITH IGNORE_DUP_KEY
Oracle has a couple of ways of doing this, one of which logs the data (using a DBMS package) the other is with MERGE.
So, maybe you need to have a look at a postgres sepcific way of doing this, seeing as other DBMS's can't seem to agree? Maybe shorter save points so you lose only partial data?
Also, you might want to read some of the discussions here, as in my opinion, what you're looking for is best solved with MERGE.
http://wiki.postgresql.org/wiki/SQL_MERGE
Cheers
=============================================
Romax Technology Limited
Rutherford House
Nottingham Science & Technology Park
Nottingham,
NG7 2PZ
England
Telephone numbers:
+44 (0)115 951 88 00 (main)
For other office locations see:
http://www.romaxtech.com/Contact
=================================
===============
E-mail: info@romaxtech.com
Website: www.romaxtech.com
=================================
================
Confidentiality Statement
This transmission is for the addressee only and contains information that is confidential and privileged.
Unless you are the named addressee, or authorised to receive it on behalf of the addressee
you may not copy or use it, or disclose it to anyone else.
If you have received this transmission in error please delete from your system and contact the sender. Thank you for your cooperation.
=================================================
> do it
> } else {
> withdraw this one
> if { ROLLBACK_ON_ERROR } {
> rollback entire transaction
> }
> }
I fear that this is no-where near as simple as it's been taken for, and without looking at the code, I would imagine that this would touch so many different areas, that a simple implementation is just not possible.
Ask yourself; How do other DBMS systems handle this?
MySQL has "INSERT IGNORE"
MSSQL has PRIMARY KEY WITH IGNORE_DUP_KEY
Oracle has a couple of ways of doing this, one of which logs the data (using a DBMS package) the other is with MERGE.
So, maybe you need to have a look at a postgres sepcific way of doing this, seeing as other DBMS's can't seem to agree? Maybe shorter save points so you lose only partial data?
Also, you might want to read some of the discussions here, as in my opinion, what you're looking for is best solved with MERGE.
http://wiki.postgresql.org/wiki/SQL_MERGE
Cheers
=============================================
Romax Technology Limited
Rutherford House
Nottingham Science & Technology Park
Nottingham,
NG7 2PZ
England
Telephone numbers:
+44 (0)115 951 88 00 (main)
For other office locations see:
http://www.romaxtech.com/Contact
=================================
===============
E-mail: info@romaxtech.com
Website: www.romaxtech.com
=================================
================
Confidentiality Statement
This transmission is for the addressee only and contains information that is confidential and privileged.
Unless you are the named addressee, or authorised to receive it on behalf of the addressee
you may not copy or use it, or disclose it to anyone else.
If you have received this transmission in error please delete from your system and contact the sender. Thank you for your cooperation.
=================================================
On Sun, Dec 9, 2012 at 8:20 PM, Zbigniew <zbigniew2011@gmail.com> wrote: > Hello, > > As I read while googling the web, many people complained about this > before. Couldn't it be made optional (can be even with "default ON")? > I understand, that there are situations, when it is a must - for > example, when the rest of queries rely on the result of first ones - > but there are numerous situations, when just skipping a faulty query > is all we need. > > A simple - but very common - example: I wanted to perform really large > number of inserts - using transaction, to make it faster - while being > sure the duplicate entries will be skipped. Of course, this job will > be done best by server itself, which is keeping an eye on "primary > key" of the table. Unfortunately: not during a transaction! Any "dupe" > will trash thousands other (proper) entries immediately. > > Why is this? My guess is, there is kind of logic in the code, like this: > > if { no error during query } { > do it > } else { > withdraw this one > rollback entire transaction > } > > Therefore my request - and, as I saw, of many others - would be just > to introduce a little change: > > if { no error during query } { > do it > } else { > withdraw this one > if { ROLLBACK_ON_ERROR } { > rollback entire transaction > } > } > > (if there's no ROLLBACK_ON_ERROR - it should carry on with the > remaining queries) > > Is it really so problematic to introduce such code change, allowing > the users to control this behaviour? Yes, I read about using > "savepoints" - but I think we agree, it's just cumbersome workaround - > and not real solution, like my proposal. All we need is either a > variable to set, or a command, that will allow to modify the present > functionality in the way described above. Databases aren't as simple as you imagine. What you're basically asking for from the pg engine is for it to enclose every insert into a subtransaction (i.e. set a savepoint) to check for an error. The overhead cost of doing this in pgsql is not cheap. Now what I would use for this would be some tool written to interface with pgsql and do the job for you, like pgloader. It gives you the simple interface to do what you're asking. The idea that everything like this belongs in the database, particularly coded by the core developers isn't how things generally get done in pgsql. We've got a small sharp team of developers working on BIG things, like covering indexes, HOT updates, query planner optimizations and so on. Fixing this problem is core probably doesn't tickle anyone's itch, and there are so many ways to work around it in pg that would not work as well in other dbs, like loading to a temp table and updating in one big query. That method might run an oracle db out of rollback space etc, but on pg it's the way it's done. The thing that often happens is that people learn to do things a certain way on another db and then think that that experience should translate to postgresql straight across. It often does not, because postgresql is so different in many ways. Keep in mind with pg transactions you can put almost anything into a transaction, and the only real limit to the size of a transaction you can run in it is the size of your hard drives. This alone lets you approach problems from a very different perspective than on most other dbs. Lastly it's a free (as in beer, as in freedom) database. People work on it as much out of love as for money, and if you want to get traction for a someone else to volunteer their time to make a change, YOU have to make the case. I'm not a pg hacker. But I've made a case several times and gotten someone to make a change a few times for me. If you think you have a case here you'll have to convince a lot of people who don't think you do.
2012/12/10, Scott Marlowe <scott.marlowe@gmail.com>: > Databases aren't as simple as you imagine. What you're basically > asking for from the pg engine is for it to enclose every insert into a > subtransaction (i.e. set a savepoint) to check for an error. No, I'm not. It's able (I mean Postgres) to detect an error, and it's able to react on error. "What I'm basically asking for" is an option to change its reaction; that's all. Just to change a bit the thing already exisiting - not to add new one. > The overhead cost of doing this in pgsql is not cheap. Now what I would > use for this would be some tool written to interface with pgsql and do > the job for you, like pgloader. It gives you the simple interface to > do what you're asking. You see? The people had to create sophisticated loaders, just to work around the problem. > The idea that everything like this belongs in the database, But why "everything"? My question was about one specific thing, and I quite clearly (I think so...) explained, why. Not to "ignore errors", not to "add savepoints" - just to proceed with this, which it's already doing (skipping faulty query), but NOT aborting whole transaction - as an optional behaviour. > particularly coded by the core developers isn't how things generally > get done in pgsql. We've got a small sharp team of developers working > on BIG things, like covering indexes, HOT updates, query planner > optimizations and so on. Fixing this problem is core probably doesn't > tickle anyone's itch, and there are so many ways to work around it in > pg that would not work as well in other dbs, like loading to a temp > table and updating in one big query. That method might run an oracle > db out of rollback space etc, but on pg it's the way it's done. Just try to make a Google search on "postgresql abort transaction" - you'll see, how much trouble this solution makes to the users. Pay attention, that I didn't even mind "...on duplicate". How much time the people are spending 1. To find out, what actually is going on ("such thing was working on X and Y"), 2. To look for solution (non-existing), and then... 3. ...to work around the problem with "savepoints" or other way. All this can be spared to users just by adding an OPTION. This option can be "off" by default. There can be even a description in manual: "We discourage...", "...use it only, when you know, what are you doing...", "...you have been warned". But I bet, many users would appreciate this. Besides: as you noticed (and the others too), several other servers allows this. Then maybe really it's not that evil thing, as you think? > The thing that often happens is that people learn to do things a > certain way on another db and then think that that experience should > translate to postgresql straight across. It often does not, because > postgresql is so different in many ways. > > Keep in mind with pg transactions you can put almost anything into a > transaction, and the only real limit to the size of a transaction you > can run in it is the size of your hard drives. This alone lets you > approach problems from a very different perspective than on most other > dbs. Yes, and exactly because of this I'm pretty sure, that you understand, why I don't want to trash such long transaction. Why I would to commit it - just skipping the dupes. > Lastly it's a free (as in beer, as in freedom) database. People work > on it as much out of love as for money, and if you want to get > traction for a someone else to volunteer their time to make a change, > YOU have to make the case. I'm not a pg hacker. But I've made a case > several times and gotten someone to make a change a few times for me. > If you think you have a case here you'll have to convince a lot of > people who don't think you do. Yes, I agree: it costs implementation time. But pay attention: few afternoons (maybe I'm wrong, but I doubt, this could take more to experienced PG-developers) of your team - and on the other hand, the time wasted by thousands of people, who are looking how to do the things that were so simple while using some other server (even Oracle, as has been mentioned). And in the future THERE STILL WILL BE THE SAME. And it can be spared to all this people just by adding an option. No idea, is it simple or very complicated. But if it is simple - why not? Maybe someone of the devs team could answer this doubt: is the proposed change simple to implement? -- regards, Zbigniew
On Mon, Dec 10, 2012 at 11:29 AM, Zbigniew <zbigniew2011@gmail.com> wrote: > 2012/12/10, Scott Marlowe <scott.marlowe@gmail.com>: > >> Databases aren't as simple as you imagine. What you're basically >> asking for from the pg engine is for it to enclose every insert into a >> subtransaction (i.e. set a savepoint) to check for an error. > > No, I'm not. Whether you know it or not, that IS what you are asking for. It's how pg works. I'm done.
2012/12/10 Thomas Kellerer <spam_eater@gmx.net> > > Zbigniew, 10.12.2012 04:20: >> >> Yes, I read about using "savepoints" - but I think we agree, >> it's just cumbersome workaround - and not real solution, > > > It might be a bit cumbersome, but it *is* a proper solution to the problem - not a workaround. Writing scripts for ORACLE's sqlplus, I have the option to write "whenever sqlerror exit sql.sqlcode;" and get the behavior similar to setting psql's ON_ERROR_STOP variable. On the other hand, it is possible to write "whenever sqlerror continue;" and this will make ORACLE to process all the statements inide the script, ignoring all errors. This is a general feature, available not only for sqlplus scripts — as mentioned, a series of 100 INSERTs can have 5 failing ones and commit statement will result in 95 new records.. So, in order to get similar behavior using current PostgreSQL features, one will have to: 1) use savepoints 2) use single-statement transactions; 3) (crazy stuff) use triggers on a fake relation to “test” inserts into the real one and ignore filing ones; 4) do pre-processing in some external tool, like script or pgloader; 5) more options? All these options will give significant performance and/or scripting penalty, while, say ORACLE, does it “for free” — I don't know the mechanics behind this feature, so looks like “free as in free speech” to me. The original OP's post is a feature request, IMHO. I'm not sure whether it is a good thing to have in general, but I really had to use it my scripts several times. -- Victor Y. Yegorov
Zbigniew wrote: > No idea, is it simple or very complicated. But if it is simple - > why not? When I first started using PostgreSQL I was used to a database product which would roll back an entire statement on failure, but leave the transaction open for another try. This is handy for interactive work, which is why psql (an interactive tool) gives you an easy way to do it, using subtransactions behind the scenes. Doing it in the general case is risky partly because almost all data definition language (DDL) statements are allowed in transactions; so you could have a transaction which inserted all data from one table into somewhere else and then dropped the source table. If your insert fails, you probably don't want to drop the table. PostgreSQL behavior was a bit of a surprise to me at first, but it wasn't a hard adjustment for me, and I never thought to suggest a change because it just seemed safer this way. I've generally found an intermediate table to be a good idea anyway. > Maybe someone of the devs team could answer this doubt: is the > proposed change simple to implement? I now work on PostgreSQL internals as part of my job. To get a handle on how much it would take to implement, you might want to read these two README files which document the sub-systems most directly involved. http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/README;hb=HEAD http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/executor/README;hb=HEAD For sample queries and diagrams showing how it works a little more graphically, you might wnat to look at: http://momjian.us/main/writings/pgsql/mvcc.pdf The problem is that you might have affected a million rows in the transaction before the failing statement, and then that statement might affect another million before it failes. What would you do with the xmin/xmax values in the affected tuples to get visibility correct if you haven't executed that failing transaction within a subtransaction? I don't see how you could do it without something very like a subtransaction for each statement executed within the transaction, and that carries significant overhead. Nobody wants to slow down all their work to cater to this one special case. Could an option like this be added to the server? Probably. Has anyone worked on such a feature? Not that I know of. Would such a feature be accepted if written? I think a better case would need to be made for its value compared to the added code complexity and maintenance burden than has been made so far. Since this is primarily of interest for bulk loads and interactive work, and there are already solutions for both of those, it would probably be seen as not worth its weight. It's only by careful gatekeeping that the source code has been kept manageable. A quick count shows that there are 1,167,732 lines of code, counting only *.c and *.h files. Every added line has a maintenance cost. I hope this helps. -Kevin
Caveat: I am not a PostgreSQL hacker, and have not looked into its internals at all, though I've read a number of excellent articles and blog posts on some of its features (TOAST, HOT updates, MVCC, etc). I'm a programmer who has made use of PG from a number of languages, and formed a strong opinion on the high quality and usability of Pike's bindings. On Tue, Dec 11, 2012 at 5:29 AM, Zbigniew <zbigniew2011@gmail.com> wrote: > 2012/12/10, Scott Marlowe <scott.marlowe@gmail.com>: > >> Databases aren't as simple as you imagine. What you're basically >> asking for from the pg engine is for it to enclose every insert into a >> subtransaction (i.e. set a savepoint) to check for an error. > > No, I'm not. > > It's able (I mean Postgres) to detect an error, and it's able to react > on error. "What I'm basically asking for" is an option to change its > reaction; that's all. Just to change a bit the thing already exisiting > - not to add new one. You're thinking that it's easy to just "back out" something that failed. Perhaps this is true in the simple case where one statement does one insert and that's all, but if this is to be a PostgreSQL feature, it needs to handle the more complicated cases. In each of these examples, what should be kept and what should be undone? (Let's suppose that your language has a query() function that executes an SQL query while handwaving questions of which database connection to use.) Row ID 3 already exists, in each case. query("insert into table (id) values (1),(2),(3)"); query("insert into table (id) select id from other_table where id in (1,2,3)"); query("insert into table (id) values (1); insert into table (id) values (2); insert into table (id) values (3)"); query("create trigger after update on other_table for each row insert into table values (new.id); update other_table set dummy=dummy+1 where id in (1,2,3)"); I'm pretty sure the example trigger would need to be rewritten as a procedure, but you get the idea. There's a LOT more complexity than just "if Postgres detects an error, it should (optionally) just not do that bit". >> The overhead cost of doing this in pgsql is not cheap. Now what I would >> use for this would be some tool written to interface with pgsql and do >> the job for you, like pgloader. It gives you the simple interface to >> do what you're asking. > > You see? The people had to create sophisticated loaders, just to work > around the problem. Any good project will work in layers. The core PostgreSQL engine doesn't have support for the special backslash commands that function at the command-line interface; they're implemented in the psql client. Nor does PG need any code to handle the peculiarities of network routing, because that's all managed by the TCP/IP sockets layer. Often the best thing to do is to bolt something onto the outside, because that adds zero complexity for everyone who isn't using this feature. > All this can be spared to users just by adding an OPTION. This option > can be "off" by default. There can be even a description in manual: > "We discourage...", "...use it only, when you know, what are you > doing...", "...you have been warned". But I bet, many users would > appreciate this. Besides: as you noticed (and the others too), several > other servers allows this. Then maybe really it's not that evil thing, > as you think? Are you a programmer? Are you aware how much complexity each option adds? Every single combination must be tested and debugged. In this instance, that means testing every part of Postgres before and after several types of failure, to make sure everything works correctly in both cases. That is not cheap. And then there's the user-facing complexity (documenting the option, explaining when it's useful, etc), and now everyone has to decide whether or not to use it. Also not cheap. > Yes, and exactly because of this I'm pretty sure, that you understand, > why I don't want to trash such long transaction. Why I would to commit > it - just skipping the dupes. The normal way to use transactions is to guarantee atomicity (the first letter from ACID). The classic example is a money transfer - you deduct $100 from one account and add $100 to another. Naturally you want this to be all-or-nothing - it's better to leave the money where it is than to leave the account-holder (or the bank, if you do them in the other order) short a hundred bucks. What you're doing is fiddling with transactions as a means of improving performance, and then discovering that they aren't exactly what you want. Have you considered looking into some other methods of improving performance? You may find that there's a completely different way to achieve your goal. > Maybe someone of the devs team could answer this doubt: is the > proposed change simple to implement? I'm not a PG dev, but I've fought the battle against complexity in enough other situations that I know that it's much more usual to underestimate than overestimate the cost. ChrisA
On Mon, Dec 10, 2012 at 06:29:49PM +0000, Zbigniew wrote: > > No, I'm not. > > It's able (I mean Postgres) to detect an error, and it's able to react > on error. "What I'm basically asking for" is an option to change its > reaction; that's all. Just to change a bit the thing already exisiting > - not to add new one. Yes, you are. By the time Postgres has detected that there is this thing to which it needs to react, you're already in the error state. The only way around that is a subtransaction. Moreover, what you're _really_ asking for is that Postgres do this only in the right circumstances, and not when there's some other error that isn't the error you were intending be automatically fixed. This is a special case of the Do What I Mean protocol, and lovely as it would be if our programs all had artificial intelligence and could read minds, I don't think such software is going to ship any time soon. Postgres is general-purpose software, so it has to be programmed for the general purpose. This does mean that some special cases that look easy if you don't know anything about the underlying details require more work on the part of the person using the general-purpose software. That's what optimizing for general cases buys you. Best, A -- Andrew Sullivan ajs@crankycanuck.ca
On Mon, Dec 10, 2012 at 05:02:37PM +0100, Thomas Kellerer wrote: > Just as a side-note: Oracle also allows you to commit a "transaction" even if some of the statements failed True. I always thought that was a bizarre bug. Certainly, it would be the day you did BEGIN; INSERT INTO sale $bunch of stuff; INSERT INTO xact_log $stuff-with-money-deducted; UPDATE account SET balance = balance-?COST WHERE customer_id = ?ID; ---ERROR here for no permission COMMIT; Or anyway, that's how I look at it. A -- Andrew Sullivan ajs@crankycanuck.ca
"Kevin Grittner" <kgrittn@mail.com> writes: > Zbigniew wrote: >> No idea, is it simple or very complicated. But if it is simple - >> why not? > When I first started using PostgreSQL I was used to a database > product which would roll back an entire statement on failure, but > leave the transaction open for another try. This is handy for > interactive work, which is why psql (an interactive tool) gives you > an easy way to do it, using subtransactions behind the scenes. FWIW, we've generally concluded that this sort of thing should be done on the client side, ie, exactly like psql does it. Fooling with transactional behavior on the server side poses far too much risk of breaking clients that aren't expecting the new behavior. We learned an object lesson on that point back in 7.3, when we attempted to implement autocommit on/off behavior on server side, using a GUC setting to control it. (Autocommit isn't exactly what the OP is asking for, but it's close enough that the lesson holds.) Basically, turning the setting to the non-default state broke just about every nontrivial client program. The push-back was fierce enough that the setting went away again in 7.4, and it won't be coming back as long as anybody who remembers that fiasco is still around the project. So now we have AUTOCOMMIT (and ON_ERROR_STOP too) as psql behaviors, where they're available for interactive use but won't break non-interactive program logic. The complexity and performance costs that people have mentioned are other good reasons not to change it; but even if the change were free on the server side, history says it's not something we ought to mess with. regards, tom lane
Hi, On 11 December 2012 06:25, Виктор Егоров <vyegorov@gmail.com> wrote: > On the other hand, it is possible to write "whenever sqlerror > continue;" and this will make ORACLE to process all the statements > inide the script, ignoring all errors. This is a general feature, > available not only for sqlplus scripts — as mentioned, a series of 100 > INSERTs can have 5 failing ones and commit statement will result in 95 > new records.. For example, nCluster (and Greenplum) has similar feature for COPY: page 13 "Error Logging in COPY" http://www.pgcon.org/2009/schedule/attachments/135_PGCon%202009%20-%20Aster%20v6.pdf. Around the same time AsterData tried to push some COPY enhancements (error logging and partitioning): http://archives.postgresql.org/pgsql-hackers/2009-10/msg00247.php -- Ondrej Ivanic (http://www.linkedin.com/in/ondrejivanic)
2012/12/10, Kevin Grittner <kgrittn@mail.com>: > Could an option like this be added to the server? Probably. Has > anyone worked on such a feature? Not that I know of. Would such a > feature be accepted if written? I think a better case would need to > be made for its value compared to the added code complexity and > maintenance burden than has been made so far. Since this is > primarily of interest for bulk loads and interactive work, and > there are already solutions for both of those, it would probably be > seen as not worth its weight. Thanks for detailed explanation. From what I see, it's rather not going to happen. > It's only by careful gatekeeping that the source code has been kept > manageable. A quick count shows that there are 1,167,732 lines of > code, counting only *.c and *.h files. Every added line has a > maintenance cost. I agree - but, on the other hand, every added line of code in the software of the users (programmers using PG) has its maintenance cost too. And it has to be multiplied by the number of the users, then again: by the number of the users of their software... -- regards, Zbigniew
2012/12/11, Tom Lane <tgl@sss.pgh.pa.us>: > The complexity and performance costs that people have mentioned are other > good reasons not to change it; but even if the change were free on the > server side, history says it's not something we ought to mess with. Again: my suggestion was to add an option... with default value "keep present functionality". -- regards, Zbigniew
2012/12/10, Chris Angelico <rosuav@gmail.com>: > Are you a programmer? Are you aware how much complexity each option > adds? Every single combination must be tested and debugged. In this > instance, that means testing every part of Postgres before and after > several types of failure, to make sure everything works correctly in > both cases. That is not cheap. And then there's the user-facing > complexity (documenting the option, explaining when it's useful, etc), > and now everyone has to decide whether or not to use it. Also not > cheap. It's not that bad, that really "each option adds much complexity", and has to be tested then in every single combination of parameters/settings etc. Just one example: introducing an option "save preferences" doesn't require this. > I'm not a PG dev, but I've fought the battle against complexity in > enough other situations that I know that it's much more usual to > underestimate than overestimate the cost. There are always TWO sides (at least two): creators/designers - and the users. Considering how much complexity some kind of modification adds to your - programmer's - code, and how it'll make your life more difficult, at the same time try to consider, how much relief could it mean to many of the users of your software. -- regards, Zbigniew
I generally do:
DO $$
BEGIN
INSERT INTO ...;
EXCEPTION
WHEN UNIQUE_VIOLATION or EXCLUSION_VIOLATION THEN
NULL; /* ignore this record */
END;
$$;
DO $$
BEGIN
INSERT INTO ...;
EXCEPTION
WHEN UNIQUE_VIOLATION or EXCLUSION_VIOLATION THEN
NULL; /* ignore this record */
END;
$$;
On Sun, Dec 9, 2012 at 9:20 PM, Zbigniew <zbigniew2011@gmail.com> wrote:
Hello,
As I read while googling the web, many people complained about this
before. Couldn't it be made optional (can be even with "default ON")?
I understand, that there are situations, when it is a must - for
example, when the rest of queries rely on the result of first ones -
but there are numerous situations, when just skipping a faulty query
is all we need.
A simple - but very common - example: I wanted to perform really large
number of inserts - using transaction, to make it faster - while being
sure the duplicate entries will be skipped. Of course, this job will
be done best by server itself, which is keeping an eye on "primary
key" of the table. Unfortunately: not during a transaction! Any "dupe"
will trash thousands other (proper) entries immediately.
Why is this? My guess is, there is kind of logic in the code, like this:
if { no error during query } {
do it
} else {
withdraw this one
rollback entire transaction
}
Therefore my request - and, as I saw, of many others - would be just
to introduce a little change:
if { no error during query } {
do it
} else {
withdraw this one
if { ROLLBACK_ON_ERROR } {
rollback entire transaction
}
}
(if there's no ROLLBACK_ON_ERROR - it should carry on with the
remaining queries)
Is it really so problematic to introduce such code change, allowing
the users to control this behaviour? Yes, I read about using
"savepoints" - but I think we agree, it's just cumbersome workaround -
and not real solution, like my proposal. All we need is either a
variable to set, or a command, that will allow to modify the present
functionality in the way described above.
--
regards,
Zbigniew
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Obviously, it's not fast...
On Tue, Dec 11, 2012 at 5:42 AM, Abel Abraham Camarillo Ojeda <acamari@the00z.org> wrote:
I generally do:
DO $$
BEGIN
INSERT INTO ...;
EXCEPTION
WHEN UNIQUE_VIOLATION or EXCLUSION_VIOLATION THEN
NULL; /* ignore this record */
END;
$$;On Sun, Dec 9, 2012 at 9:20 PM, Zbigniew <zbigniew2011@gmail.com> wrote:Hello,
As I read while googling the web, many people complained about this
before. Couldn't it be made optional (can be even with "default ON")?
I understand, that there are situations, when it is a must - for
example, when the rest of queries rely on the result of first ones -
but there are numerous situations, when just skipping a faulty query
is all we need.
A simple - but very common - example: I wanted to perform really large
number of inserts - using transaction, to make it faster - while being
sure the duplicate entries will be skipped. Of course, this job will
be done best by server itself, which is keeping an eye on "primary
key" of the table. Unfortunately: not during a transaction! Any "dupe"
will trash thousands other (proper) entries immediately.
Why is this? My guess is, there is kind of logic in the code, like this:
if { no error during query } {
do it
} else {
withdraw this one
rollback entire transaction
}
Therefore my request - and, as I saw, of many others - would be just
to introduce a little change:
if { no error during query } {
do it
} else {
withdraw this one
if { ROLLBACK_ON_ERROR } {
rollback entire transaction
}
}
(if there's no ROLLBACK_ON_ERROR - it should carry on with the
remaining queries)
Is it really so problematic to introduce such code change, allowing
the users to control this behaviour? Yes, I read about using
"savepoints" - but I think we agree, it's just cumbersome workaround -
and not real solution, like my proposal. All we need is either a
variable to set, or a command, that will allow to modify the present
functionality in the way described above.
--
regards,
Zbigniew
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tue, Dec 11, 2012 at 10:36 PM, Zbigniew <zbigniew2011@gmail.com> wrote: > There are always TWO sides (at least two): creators/designers - and > the users. Considering how much complexity some kind of modification > adds to your - programmer's - code, and how it'll make your life more > difficult, at the same time try to consider, how much relief could it > mean to many of the users of your software. I have considered. It will be beneficial to a small group of users, including yourself; I predict it will have significant costs (performance, documentation complexity, etc) for all users. (Of course, my prediction is decidedly non-authoritative. But I had to deal with the docs complexity of issues like the autocommit that Tom Lane mentioned; it's definitely a cost.) ChrisA
2012/12/11, Abel Abraham Camarillo Ojeda <acamari@the00z.org>: > Obviously, it's not fast... Well, the problem was exactly speed-related. If not this - I could simply dispose of the transaction... -- regards, Zbigniew
It's faster than restarting the big transaction, maybe...
On Tue, Dec 11, 2012 at 5:48 AM, Zbigniew <zbigniew2011@gmail.com> wrote:
2012/12/11, Abel Abraham Camarillo Ojeda <acamari@the00z.org>:Well, the problem was exactly speed-related. If not this - I could
> Obviously, it's not fast...
simply dispose of the transaction...--
regards,
Zbigniew
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
2012/12/11, Abel Abraham Camarillo Ojeda <acamari@the00z.org>: > It's faster than restarting the big transaction, maybe... Of course, but it's still _much_ slower, than conducting such transaction without any abort. Actually, I had to implement savepoint-based solution anyway - but at a cost of making the relevant part of code much more complex. Pay attention, that actually every developer in this thread wrote, that adding such complexity isn't a particularly good idea. -- regards, Zbigniew
2012/12/11, Chris Angelico <rosuav@gmail.com>: > I have considered. It will be beneficial to a small group of users, > including yourself No, not that small whatsoever; it's easy to find out using Google, how many people were looking for some help because of this. You can easily multiply this group at least by a factor of 100 - to add those, who didn't post their askings anywhere (or whose postings have been deleted meanwhile). The fact, that many posts are several years old, is suggesting to me, that there will be still such numerous questions in the future about this. -- regards, Zbigniew
On 12/11/2012 08:05 PM, Zbigniew wrote: > 2012/12/11, Chris Angelico <rosuav@gmail.com>: > >> I have considered. It will be beneficial to a small group of users, >> including yourself > No, not that small whatsoever; it's easy to find out using Google, how > many people were looking for some help because of this. You can easily > multiply this group at least by a factor of 100 - to add those, who > didn't post their askings anywhere (or whose postings have been > deleted meanwhile). > > The fact, that many posts are several years old, is suggesting to me, > that there will be still such numerous questions in the future about > this. Yes, but huge groups of these users' needs can be satisfied with changes to a relatively small number of middleware tools (Rails ActiveRecord, etc) to teach it about subtransactions and how to use them. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Dec 11, 2012, at 6:28, Zbigniew <zbigniew2011@gmail.com> wrote: > 2012/12/11, Tom Lane <tgl@sss.pgh.pa.us>: > >> The complexity and performance costs that people have mentioned are other >> good reasons not to change it; but even if the change were free on the >> server side, history says it's not something we ought to mess with. > > Again: my suggestion was to add an option... with default value "keep > present functionality". > We heard you the first three times ... All of the responses given so far, including the paragraph you are quoting, are given with that in mind. You have made yourpoint clearly and have been given direct responses. There is some merit but not enough to convince anyone to change their general belief so now it comes down to providing aspecific implementation that can be approved or rejected as belonging in core or go with the various alternatives that havebeen presented. David J.
On Dec 11, 2012, at 6:26, Zbigniew <zbigniew2011@gmail.com> wrote: > 2012/12/10, Kevin Grittner <kgrittn@mail.com>: > >> Could an option like this be added to the server? Probably. Has >> anyone worked on such a feature? Not that I know of. Would such a >> feature be accepted if written? I think a better case would need to >> be made for its value compared to the added code complexity and >> maintenance burden than has been made so far. Since this is >> primarily of interest for bulk loads and interactive work, and >> there are already solutions for both of those, it would probably be >> seen as not worth its weight. > > Thanks for detailed explanation. From what I see, it's rather not > going to happen. > >> It's only by careful gatekeeping that the source code has been kept >> manageable. A quick count shows that there are 1,167,732 lines of >> code, counting only *.c and *.h files. Every added line has a >> maintenance cost. > > I agree - but, on the other hand, every added line of code in the > software of the users (programmers using PG) has its maintenance cost > too. > And it has to be multiplied by the number of the users, then > again: by the number of the users of their software... > -- > PostgreSQL itself has a very large installed extended user base and of that group the need for your feature is likely notthat large. The sub group of those that can use such a generic feature instead of coding something custom anyway is likelythe larger portion as well. While it is can be rewarding to solve a problem for the masses the ambiguity and apparent difficulty means that likely onlysomeone paid or experiencing pain directly is going to make the effort. Ideally a large number of persons would bandtogether and fund such an initiative (even if just for research) which would both cover development costs and give thecommunity a better idea of the scope of the problem. David J.
On Dec 11, 2012, at 6:48, Zbigniew <zbigniew2011@gmail.com> wrote: > 2012/12/11, Abel Abraham Camarillo Ojeda <acamari@the00z.org>: > >> Obviously, it's not fast... > > Well, the problem was exactly speed-related. If not this - I could > simply dispose of the transaction... > -- > regards, > Zbigniew > > It seems like if you switched to the COPY API and a staging table your initial performance would be maximized and from thedelta gain you can run the second transformation query. Your desired feature probably wouldn't work with copy anywayand high-performance bulk loading should use that API. David J.
On Sun, Dec 9, 2012 at 9:20 PM, Zbigniew <zbigniew2011@gmail.com> wrote: > Hello, > > As I read while googling the web, many people complained about this > before. Couldn't it be made optional (can be even with "default ON")? > I understand, that there are situations, when it is a must - for > example, when the rest of queries rely on the result of first ones - > but there are numerous situations, when just skipping a faulty query > is all we need. > > A simple - but very common - example: I wanted to perform really large > number of inserts - using transaction, to make it faster - while being > sure the duplicate entries will be skipped. Of course, this job will > be done best by server itself, which is keeping an eye on "primary > key" of the table. Unfortunately: not during a transaction! Any "dupe" > will trash thousands other (proper) entries immediately. > > Why is this? My guess is, there is kind of logic in the code, like this: > > if { no error during query } { > do it > } else { > withdraw this one > rollback entire transaction > } > > Therefore my request - and, as I saw, of many others - would be just > to introduce a little change: > > if { no error during query } { > do it > } else { > withdraw this one > if { ROLLBACK_ON_ERROR } { > rollback entire transaction > } > } > > (if there's no ROLLBACK_ON_ERROR - it should carry on with the > remaining queries) > > Is it really so problematic to introduce such code change, allowing > the users to control this behaviour? Yes, I read about using > "savepoints" - but I think we agree, it's just cumbersome workaround - > and not real solution, like my proposal. All we need is either a > variable to set, or a command, that will allow to modify the present > functionality in the way described above. No. The point of transactions is to define a 'unit of work' -- all aspects of the transaction must pass or must all fail. there are limited mechanisms to work around this (SAVEPOINTs) the allow finer control but still maintain that basic principle. Consolidating bulk inserts into a single transaction is widely done for performance reasons. I understand your pain point, but you're much better off in the long run by using various well established methods such as staging data before inserting it instead of hacking the transaction rules. merlin