Re: Problem with aborting entire transactions on error - Mailing list pgsql-general

From David Johnston
Subject Re: Problem with aborting entire transactions on error
Date
Msg-id 02f601cdd6e0$d53178e0$7f946aa0$@yahoo.com
Whole thread Raw
In response to Problem with aborting entire transactions on error  (Zbigniew <zbigniew2011@gmail.com>)
Responses Re: Problem with aborting entire transactions on error  (Zbigniew <zbigniew2011@gmail.com>)
Re: Problem with aborting entire transactions on error  (Chris Angelico <rosuav@gmail.com>)
List pgsql-general
Please reply-to the list, not just myself.

> -----Original Message-----
> From: Zbigniew [mailto:zbigniew2011@gmail.com]
> Sent: Monday, December 10, 2012 6:26 AM
> To: David Johnston
> Subject: Re: [GENERAL] Problem with aborting entire transactions on error
>
> 2012/12/10, David Johnston <polobo@yahoo.com>:
>
> >> A simple - but very common - example: I wanted to perform really
> >> large number of inserts - using transaction, to make it faster -
> >> while being sure the duplicate entries will be skipped.
> >
> > And what happens if one of those inserts errors out for reasons other
> > than a duplicate?
>
> Then maybe there could be something like
> "DON'T_ROLLBACK_ON_DUPLICATE_ERROR"
> instead of "more general" directive?
>
> > Since current belief is that such behavior is unwise
>
> No idea, why common belief is "allowing the user to decide is unwise".
> Is the "common assumption", that the users of Postgres are children
> 4-5 years old?

If it requires coding something to provide the user the desired flexibility
then whether or not such flexibility is wise or unwise is going to go into
said decision.  Also, since you are begging others to solve your own
problems you are in many ways behaving as a child.  Especially with an Open
Source project like PostgreSQL adults are welcomed and encouraged to solve
their own problems by altering the source code and, in the spirit of
community, contributing it back to the project (contribution also means you
do not have to maintain your own custom version of the software).  Please do
not take this as a personal affront; I just simply wanted to extend on the
analogy that you put forward.  I fully recognize the need to ask the
community for help myself (I am also not a coder) but in doing so I know I
am imposing on the charity of others and as such need to make best efforts
to understand and respect their point-of-view and readily acknowledge my own
ignorance in the matter.

>
> > no one is willing to give their time to doing so.
>
> Pay attention, that actually my (and not just mine) request isn't to write
> some new, sophisticated code - but just a little change in existing code
> instead.

If this is your belief then do it yourself.  Your response to my question
above already indicates that what you initially thought was very simple had
at least one complication you did not understand.  There are likely others.

>
> > I'm not sure whether that means that if you supplied such a patch it
> > would be rejected.  Since their is a correct way to solve the
> > duplicates scenario (see below) without savepoints you may wish to
> > supply another example if you want to try and convince people.
>
> I'm afraid, I don't understand. How many examples can I supply for the
> problem, like:
> 1. I need only to avoid duplicated rows (the ones with duplicated primary
> key).
> 2. I don't need any "stronger" reaction for such error - since I know,
there
> won't be any troubles.
> 3. Since server controls primary key, all I need is to make it a bit less
> "nervous" about such attempt to insert dupe, only dropping this one (and
> not everything else together with single faulty entry)?
>
> Whatever will I supply, the scenario basically will be the same - I just
would to
> avoid such "throwing the baby out with the bathwater".
>
> > The true solution is to insert into a staging table that allows
> > duplicates (but ideally contains other constraints that you do care
> > about) and then de-dupe and insert into your final table.
>
> ...which means twice as work for the server (AT LEAST twice). When all I
> needed was just to skip dupes, not aborting the transaction. And it could
be
> perfectly done.
>
> > This may be an undesirable instance of the database not allowing you
> > to shoot yourself in the foot but as ignoring errors is bad practice
> > motivation to allow it is small.
>
> I'm NOT proposing to just ignore errors - read carefully. please - I'm
writing
> about letting user decide, what impact should an error have:
> - only dropping "faulty" query, or...
> - ...aborting the entire transaction (which is undesirable in many
situations)

Except for ETL aborting is almost universally the desired behavior.  I'll
grant you your arguments for why, during ETL, this feature could be useful
(as opposed to performing duplicate work), but I still think that a blind
"ignore duplicates and leave the first instance in place" ETL policy is a
bad one in almost all cases.

>
> > You should always be able to import the data without errors into an
> > explicitly defined table and then write queries to convert between the
> > input format and the final format - explicitly making no coding the
> > necessary translation decisions and procedures.  In that way you
> > always know that your import routine is always working as expected and
> > not guessing whether it was the known error condition or an unknown
> > condition the caused a record to go missing.
>
> Well, looking for solution yesterday I've found several messages of other
> people,
> who complained about the same: that simple duplicate record is
> aborting whole transaction. This makes me think, that indeed this is quite
> common case, when limiting the "server response" (and NOT "ignoring the
> error" completely, which could mean insertion of the duplicated row),
could
> be very useful. There are situations, when all we need is just avoid dupes
-
> that's why we have primary keys - and "too much protection" makes the
> work unnecessarily difficult.
> --
> regards,
> Zbigniew

It is easy to complain but apparently no one feels strongly enough to either
code a solution themselves or sponsor someone else to do so.  As I have not
seen any core coders respond I cannot be certain whether there are
underlying technical issues preventing this but there is at the least a
resource allocation concern since neither code donors nor those sponsored by
clients have made the time to implement this "simple feature".  It may be
more productive, not being a core coder yourself, to simply ask why such a
feature has not been implemented given the apparent demand instead of
asserting (from ignorance) that such an implementation should be very simple
to accomplish.  The later approach (as well as your response to me -
personally) is much more confrontational and contrary (the direct reply at
least) to the posted etiquette for this community.

David J.





pgsql-general by date:

Previous
From: Christian Hammers
Date:
Subject: logcheck compatible daily logfile rotation?
Next
From: Zbigniew
Date:
Subject: Re: Problem with aborting entire transactions on error