Re: Problem with aborting entire transactions on error - Mailing list pgsql-general

From Scott Marlowe
Subject Re: Problem with aborting entire transactions on error
Date
Msg-id CAOR=d=0_FwZH_1hizXiU64LceBgVzyjd9yU7JYROxT6QJ6609w@mail.gmail.com
Whole thread Raw
In response to Problem with aborting entire transactions on error  (Zbigniew <zbigniew2011@gmail.com>)
Responses Re: Problem with aborting entire transactions on error  (Zbigniew <zbigniew2011@gmail.com>)
List pgsql-general
On Sun, Dec 9, 2012 at 8:20 PM, Zbigniew <zbigniew2011@gmail.com> wrote:
> Hello,
>
> As I read while googling the web, many people complained about this
> before. Couldn't it be made optional (can be even with "default ON")?
> I understand, that there are situations, when it is a must - for
> example, when the rest of queries rely on the result of first ones -
> but there are numerous situations, when just skipping a faulty query
> is all we need.
>
> A simple - but very common - example: I wanted to perform really large
> number of inserts - using transaction, to make it faster - while being
> sure the duplicate entries will be skipped. Of course, this job will
> be done best by server itself, which is keeping an eye on "primary
> key" of the table. Unfortunately: not during a transaction! Any "dupe"
> will trash thousands other (proper) entries immediately.
>
> Why is this? My guess is, there is kind of logic in the code, like this:
>
> if { no error during query } {
>   do it
> } else {
>  withdraw this one
>  rollback entire transaction
> }
>
> Therefore my request - and, as I saw, of many others - would be just
> to introduce a little change:
>
> if { no error during query } {
>   do it
> } else {
>  withdraw this one
>  if { ROLLBACK_ON_ERROR } {
>    rollback entire transaction
>   }
> }
>
> (if there's no ROLLBACK_ON_ERROR - it should carry on with the
> remaining queries)
>
> Is it really so problematic to introduce such code change, allowing
> the users to control this behaviour? Yes, I read about using
> "savepoints" - but I think we agree, it's just cumbersome workaround -
> and not real solution, like my proposal. All we need is either a
> variable to set, or a command, that will allow to modify the present
> functionality in the way described above.

Databases aren't as simple as you imagine.  What you're basically
asking for from the pg engine is for it to enclose every insert into a
subtransaction (i.e. set a savepoint) to check for an error.  The
overhead cost of doing this in pgsql is not cheap.  Now what I would
use for this would be some tool written to interface with pgsql and do
the job for you, like pgloader.  It gives you the simple interface to
do what you're asking.

The idea that everything like this belongs in the database,
particularly coded by the core developers isn't how things generally
get done in pgsql.  We've got a small sharp team of developers working
on BIG things, like covering indexes, HOT updates, query planner
optimizations and so on.  Fixing this problem is core probably doesn't
tickle anyone's itch, and there are so many ways to work around it in
pg that would not work as well in other dbs, like loading to a temp
table and updating in one big query.  That method might run an oracle
db out of rollback space etc, but on pg it's the way it's done.

The thing that often happens is that people learn to do things a
certain way on another db and then think that that experience should
translate to postgresql straight across.  It often does not, because
postgresql is so different in many ways.

Keep in mind with pg transactions you can put almost anything into a
transaction, and the only real limit to the size of a transaction you
can run in it is the size of your hard drives.  This alone lets you
approach problems from a very different perspective than on most other
dbs.

Lastly it's a free (as in beer, as in freedom) database.  People work
on it as much out of love as for money, and if you want to get
traction for a someone else to volunteer their time to make a change,
YOU have to make the case. I'm not a pg hacker.  But I've made a case
several times and gotten someone to make a change a few times for me.
If you think you have a case here you'll have to convince a lot of
people who don't think you do.


pgsql-general by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Corrupt indexes on slave when using pg_bulkload on master
Next
From: François Beausoleil
Date:
Subject: Re: When is archive_cleanup called?