Re: Problem with aborting entire transactions on error - Mailing list pgsql-general

From Chris Angelico
Subject Re: Problem with aborting entire transactions on error
Date
Msg-id CAPTjJmpsFAqjba9Z-jdayhmN+L1+em8Xo2AAYsCz5BBtPoz3dw@mail.gmail.com
Whole thread Raw
In response to Re: Problem with aborting entire transactions on error  (Zbigniew <zbigniew2011@gmail.com>)
Responses Re: Problem with aborting entire transactions on error  (Zbigniew <zbigniew2011@gmail.com>)
List pgsql-general
Caveat: I am not a PostgreSQL hacker, and have not looked into its
internals at all, though I've read a number of excellent articles and
blog posts on some of its features (TOAST, HOT updates, MVCC, etc).
I'm a programmer who has made use of PG from a number of languages,
and formed a strong opinion on the high quality and usability of
Pike's bindings.

On Tue, Dec 11, 2012 at 5:29 AM, Zbigniew <zbigniew2011@gmail.com> wrote:
> 2012/12/10, Scott Marlowe <scott.marlowe@gmail.com>:
>
>> Databases aren't as simple as you imagine.  What you're basically
>> asking for from the pg engine is for it to enclose every insert into a
>> subtransaction (i.e. set a savepoint) to check for an error.
>
> No, I'm not.
>
> It's able (I mean Postgres) to detect an error, and it's able to react
> on error. "What I'm basically asking for" is an option to change its
> reaction; that's all. Just to change a bit the thing already exisiting
> - not to add new one.

You're thinking that it's easy to just "back out" something that
failed. Perhaps this is true in the simple case where one statement
does one insert and that's all, but if this is to be a PostgreSQL
feature, it needs to handle the more complicated cases. In each of
these examples, what should be kept and what should be undone? (Let's
suppose that your language has a query() function that executes an SQL
query while handwaving questions of which database connection to use.)
Row ID 3 already exists, in each case.

query("insert into table (id) values (1),(2),(3)");
query("insert into table (id) select id from other_table where id in (1,2,3)");
query("insert into table (id) values (1); insert into table (id)
values (2); insert into table (id) values (3)");
query("create trigger after update on other_table for each row insert
into table values (new.id); update other_table set dummy=dummy+1 where
id in (1,2,3)");

I'm pretty sure the example trigger would need to be rewritten as a
procedure, but you get the idea. There's a LOT more complexity than
just "if Postgres detects an error, it should (optionally) just not do
that bit".

>> The overhead cost of doing this in pgsql is not cheap.  Now what I would
>> use for this would be some tool written to interface with pgsql and do
>> the job for you, like pgloader.  It gives you the simple interface to
>> do what you're asking.
>
> You see? The people had to create sophisticated loaders, just to work
> around the problem.

Any good project will work in layers. The core PostgreSQL engine
doesn't have support for the special backslash commands that function
at the command-line interface; they're implemented in the psql client.
Nor does PG need any code to handle the peculiarities of network
routing, because that's all managed by the TCP/IP sockets layer. Often
the best thing to do is to bolt something onto the outside, because
that adds zero complexity for everyone who isn't using this feature.

> All this can be spared to users just by adding an OPTION. This option
> can be "off" by default. There can be even a description in manual:
> "We discourage...", "...use it only, when you know, what are you
> doing...", "...you have been warned". But I bet, many users would
> appreciate this. Besides: as you noticed (and the others too), several
> other servers allows this. Then maybe really it's not that evil thing,
> as you think?

Are you a programmer? Are you aware how much complexity each option
adds? Every single combination must be tested and debugged. In this
instance, that means testing every part of Postgres before and after
several types of failure, to make sure everything works correctly in
both cases. That is not cheap. And then there's the user-facing
complexity (documenting the option, explaining when it's useful, etc),
and now everyone has to decide whether or not to use it. Also not
cheap.

> Yes, and exactly because of this I'm pretty sure, that you understand,
> why I don't want to trash such long transaction. Why I would to commit
> it - just skipping the dupes.

The normal way to use transactions is to guarantee atomicity (the
first letter from ACID). The classic example is a money transfer - you
deduct $100 from one account and add $100 to another. Naturally you
want this to be all-or-nothing - it's better to leave the money where
it is than to leave the account-holder (or the bank, if you do them in
the other order) short a hundred bucks.

What you're doing is fiddling with transactions as a means of
improving performance, and then discovering that they aren't exactly
what you want. Have you considered looking into some other methods of
improving performance? You may find that there's a completely
different way to achieve your goal.

> Maybe someone of the devs team could answer this doubt: is the
> proposed change simple to implement?

I'm not a PG dev, but I've fought the battle against complexity in
enough other situations that I know that it's much more usual to
underestimate than overestimate the cost.

ChrisA


pgsql-general by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Problem with aborting entire transactions on error
Next
From: Sergey Konoplev
Date:
Subject: Re: to_tsquery and to_tsvector .. problem with Y