Thread: Re: Failed Statements within Transactions

Re: Failed Statements within Transactions

From

Tim Kientzle

Date:

29 December 2000, 21:53:21

> ... if a query fails inside a transactions block,
> PostgreSQL "RollBack"s the whole transaction ...

In my experience, it's unusual for SELECT statements
(queries) to cause errors.  If they do, the statements
themselves are likely broken.  (A SELECT that returns
zero rows is not an error.)  Likewise, many
other SQL statements (UPDATE, DELETE) rarely
cause errors; they usually "fail" by modifying
zero rows, which does not affect the transaction
state.

However, I do agree that a failed INSERT should
not automatically rollback the entire transaction.
INSERTs often fail due to integrity constraints.
The common suggestion of doing a test SELECT
first to determine whether to INSERT or UPDATE
is simple nonsense; that just creates a race
condition.  If you lose the race, you have
to replay the entire transaction, which
is a waste of CPU cycles.  (Worse, you're more
likely to lose the race on a more heavily loaded
system, which is exactly when you most want
to avoid replaying complex transactions.)

The INSERT/UPDATE problem is an inherent part
of the SQL language.  There is no particularly
direct way in standard SQL to ensure that a particular
(unique) piece of data is in a table.  The most
consistently reliable way to handle this is to have an
integrity constraint, try the INSERT, and then possibly
do an UPDATE if the INSERT fails.  I'm told this is the
preferred strategy on other DB systems, and it's the only
one that avoids any kind of race condition.  Furthermore,
it is usually much faster than doing a pre-SELECT.  (If the data
isn't already there, the INSERT is clearly fastest, if it is,
the INSERT will typically fail and return an error more
quickly than a SELECT would complete.  Other strategies
are more efficient if the data is usually already
there.)

PostgreSQL's behavior essentially forces you to
do BOTH of the following:
  * SELECT first to see whether to INSERT or UPDATE
  * AND be prepared to replay the _entire_ transaction
    when you lose the race.
You will sometimes lose the race, so the second
step is not optional.  I suspect many complex
PostgreSQL applications have infrequent failures
precisely because they aren't prepared to replay entire
transactions (with the correct INSERTs converted to
UPDATEs).  For a complex transaction which must ensure
certain data exists in several tables, this gets
pretty ugly.

Basically, the PostgreSQL developers have decided
that any integrity violation is a serious error;
therefore, PostgreSQL does not really permit
tentative INSERTs within transactions.  This violates
SQL conventions that are pretty well-established
in some circles, needlessly complicates
applications that use complex transactions
and introduces a fairly minor performance issue.

            - Tim

Re: Re: Failed Statements within Transactions

From

Mike Mascari

Date:

29 December 2000, 23:16:23

Tim Kientzle wrote:
> ...
> However, I do agree that a failed INSERT should
> not automatically rollback the entire transaction.
> INSERTs often fail due to integrity constraints.
> The common suggestion of doing a test SELECT
> first to determine whether to INSERT or UPDATE
> is simple nonsense; that just creates a race
> condition...
> ...
> PostgreSQL's behavior essentially forces you to
> do BOTH of the following:
>   * SELECT first to see whether to INSERT or UPDATE
>   * AND be prepared to replay the _entire_ transaction
>     when you lose the race.
> You will sometimes lose the race, so the second
> step is not optional.
> ...

How about:

CREATE TABLE foo (key int4, value text);

Then, in the client application,

INSERT INTO foo SELECT 1, 'Hello' WHERE NOT EXISTS (SELECT
key FROM foo WHERE key = 1);

The insert will then insert either (1) a single row if the
row did not already exist, or (2) no rows if the row already
existed. If the latter, you may then update the row using an
update. An UPDATE of zero rows would suggest that the row
was deleted between the time of the INSERT and the time of
the UPDATE, so one would either have to loop or take the
appropriate action for when another user wishes to delete
the row. And even that possibility is dependent upon the
selected TRANSACTION ISOLATION LEVEL. But it would *not*
cause an error in a transaction block forcing a rollback of
the entire transaction.

Mike Mascari

Re: Re: Failed Statements within Transactions

From

Tom Lane

Date:

30 December 2000, 01:38:21

Mike Mascari <mascarm@mascari.com> writes:
> How about:
> INSERT INTO foo SELECT 1, 'Hello' WHERE NOT EXISTS (SELECT
> key FROM foo WHERE key = 1);

Cute idea, but it doesn't eliminate the race-condition problem.
If two backends execute this code concurrently, both will find
that the subselect returns no rows, and both will try to do the
INSERT.  If you have a unique index on the table, then one will
succeed and the other will get an error (after waiting to see
if the first commits or not).

There's no doubt that savepoints within transactions would be a
good improvement to Postgres, and I believe Vadim has hopes of
making that happen in 7.2 or 7.3.  But in the meantime, transactions
without savepoints are a lot better than "transactions" that fail to
notice errors at all.  Any MySQL partisans who contend otherwise
just plain Don't Get It.

            regards, tom lane

Re: Re: Failed Statements within Transactions

From

Lincoln Yeoh

Date:

30 December 2000, 03:04:53

At 11:14 PM 12/29/00 -0500, Mike Mascari wrote:
>INSERT INTO foo SELECT 1, 'Hello' WHERE NOT EXISTS (SELECT
>key FROM foo WHERE key = 1);
>
>The insert will then insert either (1) a single row if the
>row did not already exist, or (2) no rows if the row already
>existed. If the latter, you may then update the row using an

Will that really work?

I've tried something like that before and it didn't.

This is because the stuff inserted may not be seen by selects of other
transactions until the transaction has been committed (depends on your
transaction isolation level, but the default behaves that way, and in this
case the default is recommended!).

Try starting up two psqls, do a begin in each, then do your query in both
transactions, then only do a commit in both. I strongly suspect you'll see
duplicate records. Anyone relying on this method should change it "Real
Soon Now".

The options I see are
1) Use a UNIQUE to constrain stuff. I don't like this as it means rolling
back stuff, and in some cases you may have to redo almost everything just
to do an update instead, so you may have to store the data elsewhere first,
instead of stuffing it into the database as you go.

Also I prefer database errors to be exceptions, not something that occurs
normally. Maybe I'm strange but I dislike the idea of _using_ errors. Seems
rather kludgy, and hints that there is a design issue somewhere. Dealing
with errors is of course ok.

2) Lock the table, do the select, then insert, then unlock table (in
Postgresql that means rollback/commit, can't unlock earlier).

3) Use a different locking mechanism.
I've suggested adding a lock on an arbitrary string feature to postgresql
much like the one in MySQL (get lock).

So what you would be able to do is

begin transaction
do lots of stuff
lock string "table=foo,key=1"
select count(*) from foo where key=1
if count=0
 insert new row (key=1)
 link row to other tables
 update the other tables
else
 update row where key=1.
 update the other tables.
fi
commit (and release lock)

This method will not block everybody, unlike locking the entire table. It
just blocks those about to do the same thing. Other people can
insert,foo,key=2 and it doesn't matter, full speed ahead ;). When you port
your application to less capable databases, you lock the entire table
instead, no need for a huge rewrite.

Trouble is, I seem to be the only one interested in this feature. I'm not
confident that my coding skills are up to scratch for adding this feature
in myself. I'd personally try to copy and adapt the lock table mechanism,
but I haven't even looked at it yet to see whether that is crazy or viable
:). It involves touching some core stuff.

Still, it may be easier to do this than adding transaction savepoints. What
do the Postgresql developers think?

Don't you think it'll be cool to be able to:
lock string "Working deep magic type 2 on e-commerce tables" ;).

Hmm, hierachical locks would be interesting but rather academic. I'll be
happy with just lock string :).

Cheerio,
Link.

So much procastination left to do ;).

Re: Failed Statements within Transactions

From

Tim Kientzle

Date:

30 December 2000, 20:12:49

Lincoln Yeoh wrote:
> Also I prefer database errors to be exceptions, not
> something that occurs normally. Maybe I'm strange
> but I dislike the idea of _using_ errors. Seems
> rather kludgy, and hints that there is a design
> issue somewhere.

Unfortunately, in this case, the design issue is in the
SQL language itself which does not provide good support
for this common operation (atomically ensuring that a
unique data item is in a table).  Unfortunately, it's
a little late to change that...  ;-)

An "error" is just a return code that indicates what
happened.  If return codes are well designed, then
they are useful and should be used.  Don't be fooled
by the word "error."  It's only an error if the program
ends up doing the wrong thing.  ;-)

            - Tim

Re: Re: Failed Statements within Transactions

From

Jan Wieck

Date:

02 January 2001, 19:29:04

Tim Kientzle wrote:
> [...]
>
> Basically, the PostgreSQL developers have decided
> that any integrity violation is a serious error;
> therefore, PostgreSQL does not really permit
> tentative INSERTs within transactions.  This violates
> SQL conventions that are pretty well-established
> in some circles, needlessly complicates
> applications that use complex transactions
> and introduces a fairly minor performance issue.

    We haven't, it's just that our code inherited this feature by
    the way, ROLLBACK is implemented - and if you'd know  only  a
    little  about  the  storage  management  in PostgreSQL, you'd
    never  tell  it  the  way  you   did.    With   the   current
    implementation  of  tuple  visibility  there  is only all-or-
    nothing. Savepoints will get us out of there, but not  before
    7.2 or 7.3.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #