Thread: transaction processing after error in statement

transaction processing after error in statement

From
holger@jakobs.com
Date:
Dear PostgreSQL Gurus,

I have been using PostgreSQL for quite a while and always relied on its
handling of transaction and concurrency. But recently I discovered a
behaviour that deviates quite a lot from how I understand transactions
and how things are handled by other databases: HP Allbase, Oracle 8 and
.mdb-files (Access).

Here's the description:

Whenever an error occurs within the transaction, PostgreSQL puts the
whole transaction in an *ABORT* state, so that there is no difference at
all between COMMITing or ROLLBACKing it. Even commands successfully
carried out before the error ocurred are rolled back, even if I COMMIT
the transaction, where no error message whatsoever is shown.

Example:
 begin; insert into table1 values (1, 'hello'); --> success! select no from table1; ERROR: Attribute 'no' not found
commit;--> success!
 

Why should the insert statement fail, just because there was a typo in
the following select statement? I was already carried out successfully,
albeit only visible to the current transaction.

I found this behaviour to be the same across all 7.x versions of
PostgreSQL.

Unfortunately, I haven't been able to find an explanation why PostgreSQL
behaves like this and why all other RDBMS I tried behave differently. In
this case the others make more sense to me.

Additionally, I have discovered that phantom reads occur in PostgreSQL
even if isolation mode serializable is used. Also not so nice!

Sincerely,

Holger



-- 
Holger Jakobs * D-51469 Bergisch Gladbach
Telefon +49-2202-59991 * Mobilfon +49-177-792-2466


Re: transaction processing after error in statement

From
Stephan Szabo
Date:
On Fri, 7 Nov 2003 holger@jakobs.com wrote:

> Whenever an error occurs within the transaction, PostgreSQL puts the
> whole transaction in an *ABORT* state, so that there is no difference at
> all between COMMITing or ROLLBACKing it. Even commands successfully
> carried out before the error ocurred are rolled back, even if I COMMIT
> the transaction, where no error message whatsoever is shown.

In PostgreSQL all errors are currently considered unrecoverable, and all
statements in a transaction must commit or rollback together as a single
unit.
In the future an implementation of nested transactions or savepoints would
presumably relax this limitation to only the successfully committed
subtransactions or statements that were not separately rolled back to a
previous savepoint.

> Additionally, I have discovered that phantom reads occur in PostgreSQL
> even if isolation mode serializable is used. Also not so nice!

You're going to have to give more information for anyone to even begin to
understand the case you're seeing.


Re: transaction processing after error in statement

From
Stephan Szabo
Date:
On Mon, 10 Nov 2003, Holger Jakobs wrote:

> Hi Stephan,
>
> On  9 Nov, Stephan Szabo wrote:
> > On Fri, 7 Nov 2003 holger@jakobs.com wrote:
> >
> >> Whenever an error occurs within the transaction, PostgreSQL puts the
> >> whole transaction in an *ABORT* state, so that there is no difference
> >> at all between COMMITing or ROLLBACKing it. Even commands
> >> successfully carried out before the error ocurred are rolled back,
> >> even if I COMMIT the transaction, where no error message whatsoever
> >> is shown.
> >
> > In PostgreSQL all errors are currently considered unrecoverable, and
> > all statements in a transaction must commit or rollback together as a
> > single unit. In the future an implementation of nested transactions or
> > savepoints would presumably relax this limitation to only the
> > successfully committed subtransactions or statements that were not
> > separately rolled back to a previous savepoint.
>
> What I meant was not subtransactions or savepoints, but the funny
> behaviour that operations already successfully carried out never will be
> committed, just because some other operation later within the same
> transaction fails. This is far different from the behaviour of all other
> DMBS I know. Why not:
>
>    begin work;
>    insert into x values (1, 'hi');
>    --> success
>    insert into x values (1, 'there');
>    --> failure due to primary key violation
>    insert into x values (2, 'foo');
>    --> success
>    commit work;
>
> and have two new tuples in the table? Why do _all_ of these operations
> have to be rolled back? I just don't get it that this has anything to do

Right now there's a technical reason (the system won't handle partially
completed statements reasonably - but nested transactions will give a way
for that to work presumably hence their mention) and a logical reason
(many of the developers seem believe that partial commit isn't a good
behavior and that transactions should be all commit or all fail).
Personally, I think it'd be nice to have some way to deal with errors
other than retrying the whole sequence, but at the mean time we have a
system which basically meets the words if not the intent of the spec.

> Hopefully this can be cleared and perhaps improved within PostgreSQL.
> Otherwise, PostgreSQL always claims to be close to standards.

This is actually fairly standard complient. Note my message about errors
being unrecoverable. The SQL spec allows an entire transaction to be
rolled back upon unrecoverable errors. Our failed state behavior may not
be (if we sent an error on commit, I believe it would be), but forcing the
entire transaction to roll back is.
That may not be what the spec intended, but it does seem to be allowed.


Re: transaction processing after error in statement

From
Holger Jakobs
Date:
Hi Stephan,

On  9 Nov, Stephan Szabo wrote:
> On Fri, 7 Nov 2003 holger@jakobs.com wrote:
> 
>> Whenever an error occurs within the transaction, PostgreSQL puts the
>> whole transaction in an *ABORT* state, so that there is no difference
>> at all between COMMITing or ROLLBACKing it. Even commands
>> successfully carried out before the error ocurred are rolled back,
>> even if I COMMIT the transaction, where no error message whatsoever
>> is shown.
> 
> In PostgreSQL all errors are currently considered unrecoverable, and
> all statements in a transaction must commit or rollback together as a
> single unit. In the future an implementation of nested transactions or
> savepoints would presumably relax this limitation to only the
> successfully committed subtransactions or statements that were not
> separately rolled back to a previous savepoint.
What I meant was not subtransactions or savepoints, but the funny
behaviour that operations already successfully carried out never will be
committed, just because some other operation later within the same
transaction fails. This is far different from the behaviour of all other
DMBS I know. Why not:
  begin work;  insert into x values (1, 'hi');  --> success  insert into x values (1, 'there');  --> failure due to
primarykey violation  insert into x values (2, 'foo');  --> success  commit work;
 

and have two new tuples in the table? Why do _all_ of these operations
have to be rolled back? I just don't get it that this has anything to do
with savepoints or so. I don't see any problem with an error being
recoverable, because the second insert failed and does not have to be
recovered while the first and the third worked fine and does not have to
be recovered either. When committing a transaction the effects of all
operations that did not fail will be made permanent. This is how
transaction processing is described in the literature.

If a programmer wants the whole transaction to fail because one part
failed, (s)he can always program a rollback in case of at least one
error. But there should always be a difference between a rollback and a
commit, after at least one statement changing data has reported a
success.

Hopefully this can be cleared and perhaps improved within PostgreSQL.
Otherwise, PostgreSQL always claims to be close to standards.

Sincerely,

Holger
-- 
Holger@Jakobs.com, Bergisch Gladbach, Germany
Telefon (0 22 02) 5 99 91 oder (01 77) 7 92 24 66



Re: transaction processing after error in statement

From
Jan Wieck
Date:
Holger Jakobs wrote:

> Hi Stephan,
> 
> On  9 Nov, Stephan Szabo wrote:
>> On Fri, 7 Nov 2003 holger@jakobs.com wrote:
>> 
>>> Whenever an error occurs within the transaction, PostgreSQL puts the
>>> whole transaction in an *ABORT* state, so that there is no difference
>>> at all between COMMITing or ROLLBACKing it. Even commands
>>> successfully carried out before the error ocurred are rolled back,
>>> even if I COMMIT the transaction, where no error message whatsoever
>>> is shown.
>> 
>> In PostgreSQL all errors are currently considered unrecoverable, and
>> all statements in a transaction must commit or rollback together as a
>> single unit. In the future an implementation of nested transactions or
>> savepoints would presumably relax this limitation to only the
>> successfully committed subtransactions or statements that were not
>> separately rolled back to a previous savepoint.
>  
> What I meant was not subtransactions or savepoints, but the funny
> behaviour that operations already successfully carried out never will be
> committed, just because some other operation later within the same
> transaction fails. This is far different from the behaviour of all other
> DMBS I know. Why not:

Why is that "funny behaviour" for you? By putting the statements into a 
transaction block you told the data management system "I want this group 
of statements to be atomic". Atomic means all or nothing. It might not 
be exactly what you intended to say, and you have a point if you 
conclude that PostgreSQL is limited because it doesn't let you say 
anything in between all or nothing. But after all, thus far it is only 
doing what you asked for.


Jan

> 
>    begin work;
>    insert into x values (1, 'hi');
>    --> success
>    insert into x values (1, 'there');
>    --> failure due to primary key violation
>    insert into x values (2, 'foo');
>    --> success
>    commit work;
> 
> and have two new tuples in the table? Why do _all_ of these operations
> have to be rolled back? I just don't get it that this has anything to do
> with savepoints or so. I don't see any problem with an error being
> recoverable, because the second insert failed and does not have to be
> recovered while the first and the third worked fine and does not have to
> be recovered either. When committing a transaction the effects of all
> operations that did not fail will be made permanent. This is how
> transaction processing is described in the literature.
> 
> If a programmer wants the whole transaction to fail because one part
> failed, (s)he can always program a rollback in case of at least one
> error. But there should always be a difference between a rollback and a
> commit, after at least one statement changing data has reported a
> success.
> 
> Hopefully this can be cleared and perhaps improved within PostgreSQL.
> Otherwise, PostgreSQL always claims to be close to standards.
> 
> Sincerely,
> 
> Holger


-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: transaction processing after error in statement

From
Rod Taylor
Date:
> be recovered either. When committing a transaction the effects of all
> operations that did not fail will be made permanent. This is how
> transaction processing is described in the literature.

I would be interested in reading that (URLs please) as I didn't see
anything in the spec that was interesting on this topic.

4.8.5 from Framework (part 01)       An SQL-transaction (transaction) is a sequence of executions of
SQL-statementsthat is atomic with respect to recovery. That is       to say: either the execution result is completely
successful,or       it has no effect on any SQL-schemas or SQL-data.
 

The "execution result is completely successful" could certainly be used
to back up PostgreSQLs choice to force a rollback. However, it doesn't
differentiate between execution of what the user requested, and
execution of recovery procedures on the successful user elements.

Irregardless, I wish a commit on a failed transaction would throw an
error -- END is good enough for Rollback or Commit.

For PostgreSQL to implement this we need Savepoints or nested
transactions internally since in many cases data is physically written
in order to perform things like Foreign Key constraint checks.



Re: transaction processing after error in statement

From
Rajesh Kumar Mallah
Date:
Rod Taylor wrote:<br /><blockquote cite="mid1068490585.25089.7.camel@jester" type="cite"><blockquote type="cite"><pre
wrap="">berecovered either. When committing a transaction the effects of all
 
operations that did not fail will be made permanent. This is how
transaction processing is described in the literature.   </pre></blockquote><pre wrap="">
I would be interested in reading that (URLs please) as I didn't see
anything in the spec that was interesting on this topic.

4.8.5 from Framework (part 01)       An SQL-transaction (transaction) is a sequence of executions of
SQL-statementsthat is atomic with respect to recovery. That is       to say: either the execution result is completely
successful,or       it has no effect on any SQL-schemas or SQL-data.</pre></blockquote> Although i am not aware of the
rootsof this discussion but would like to<br /> comment at this point .<br /><br /> When we work with sequences an
abortedtransaction does have<br /> a permanent effect on the last value  of sequence. Is this behaviour <br /> not a
violationof above defination of transaction ?<br /><br /><br /> Regds<br /> Mallah.<br /><br /><blockquote
cite="mid1068490585.25089.7.camel@jester"type="cite"><pre wrap="">
 

The "execution result is completely successful" could certainly be used
to back up PostgreSQLs choice to force a rollback. However, it doesn't
differentiate between execution of what the user requested, and
execution of recovery procedures on the successful user elements.

Irregardless, I wish a commit on a failed transaction would throw an
error -- END is good enough for Rollback or Commit.

For PostgreSQL to implement this we need Savepoints or nested
transactions internally since in many cases data is physically written
in order to perform things like Foreign Key constraint checks.


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
              <a class="moz-txt-link-freetext"
href="http://www.postgresql.org/docs/faqs/FAQ.html">http://www.postgresql.org/docs/faqs/FAQ.html</a>
</pre></blockquote><br/> 

Re: transaction processing after error in statement

From
Rod Taylor
Date:
> Although i am not aware of the roots of this discussion but would like
> to
> comment at this point .
> 
> When we work with sequences an aborted transaction does have
> a permanent effect on the last value  of sequence. Is this behaviour 
> not a violation of above defination of transaction ?

I believe you are correct, which is probably why Oracle offers
serialized sequences for those who want them to be. Sequences have been
explicitly documented as functioning the way they do as a concession for
performance.



Re: transaction processing after error in statement

From
Jan Wieck
Date:
Holger Jakobs wrote:

>> 
>> Why is that "funny behaviour" for you? By putting the statements into
>> a transaction block you told the data management system "I want this
>> group of statements to be atomic". Atomic means all or nothing. It
>> might not be exactly what you intended to say, and you have a point
>> if you conclude that PostgreSQL is limited because it doesn't let you
>> say anything in between all or nothing. But after all, thus far it is
>> only doing what you asked for.
>> 
> 
> It is "funny behaviour", because I expect those operations of the
> transaction, which executed successfully, to be performed in an atomic
> way. It is obvious that I cannot expect an operation which reported an
> error to have any effect.
> 
> "Atomic" means that all operations (whether successful or not) will be
> carried out all together or none of them - but only the successful ones
> will have had an effect.

As long as we talk in an SQL context, can you please stick to SQL terms? 
I don't know exactly what you mean with "operation". If for example the 
statement
    DELETE FROM order_line WHERE ol_ordernum = 4711;

has 12 matching rows in order_line, is an operation the removal of one 
single order line or do the actions performed by the triggers fired due 
to their removal count as separate operations for you? And if there is 
one that cannot be deleted because a row in another table with a foreign 
key references it, do you delete none of them or the remaining 11? And 
if you decide to delete none, how do you magically undo the work of the 
BEFORE triggers if you hit the foreign key after successfully processing 
5 rows? Is there an SQL return code for "partial success"?

The question about "partial success" is the important part here. Imagine 
a stored procedure in PL/pgSQL consisting of two INSERTs. One fails with 
a duplicate key error, the other one succeeds. The language lacks for 
technical reasons an exception handling mechanism, so you have to define 
if the other statement or nothing of the procedure succeeds, because you 
have no chance to report "partial success", there is no return code 
defined for that.

> 
> Again: Why not make a difference between "commit" and "rollback" in a
> transaction in this case? Why not let the user decide which parts should
> be commited? The practical reason is that programming would become a lot
> more convenient. (if there is a practical reason it does not necessarily
> need a technical reason, I believe.)

Nobody said that it should not be possible. But you have to dig a little 
deeper and make a bit more of a complete proposal for this, covering the 
different possible failure reasons, definitions how exactly to react in 
case of statements affecting multiple rows, related triggers and so on 
and so forth. "Make a difference between commit and rollback" is way too 
fuzzy here.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: transaction processing after error in statement

From
Holger Jakobs
Date:
> 
> Why is that "funny behaviour" for you? By putting the statements into
> a transaction block you told the data management system "I want this
> group of statements to be atomic". Atomic means all or nothing. It
> might not be exactly what you intended to say, and you have a point
> if you conclude that PostgreSQL is limited because it doesn't let you
> say anything in between all or nothing. But after all, thus far it is
> only doing what you asked for.
> 

It is "funny behaviour", because I expect those operations of the
transaction, which executed successfully, to be performed in an atomic
way. It is obvious that I cannot expect an operation which reported an
error to have any effect.

"Atomic" means that all operations (whether successful or not) will be
carried out all together or none of them - but only the successful ones
will have had an effect.

Again: Why not make a difference between "commit" and "rollback" in a
transaction in this case? Why not let the user decide which parts should
be commited? The practical reason is that programming would become a lot
more convenient. (if there is a practical reason it does not necessarily
need a technical reason, I believe.)

Sincerely,

Holger

-- 
Holger@Jakobs.com, Bergisch Gladbach, Germany
Telefon (0 22 02) 5 99 91 oder (01 77) 7 92 24 66



Re: transaction processing after error in statement

From
Holger Jakobs
Date:

On 11 Nov, Jan Wieck wrote:
> As long as we talk in an SQL context, can you please stick to SQL
> terms? I don't know exactly what you mean with "operation". If for
> example the statement
> 
>      DELETE FROM order_line WHERE ol_ordernum = 4711;
> 
> has 12 matching rows in order_line, is an operation the removal of one 
> single order line or do the actions performed by the triggers fired
> due to their removal count as separate operations for you? And if
> there is one that cannot be deleted because a row in another table
> with a foreign key references it, do you delete none of them or the
> remaining 11? And if you decide to delete none, how do you magically
> undo the work of the BEFORE triggers if you hit the foreign key after
> successfully processing 5 rows? Is there an SQL return code for
> "partial success"?
OK, let's say "statement" instead of "operation". 

No, there is no partial success. Either a statement delivers an "OK" or
it doesn't. Actually, you will have to undo anything the statement did
before the first error occurs. This may mean that you need some kind of
savepoint. If so, the necessity to implent this shows and should be
given a high priority. I don't know how the other RDMBS do it, but they
do. I am talking from the perspective of an RDBMS user, not as an
implementor.

Calling a procedure is a statement as well, and it includes all other
procedures called from this procedure. So the statement level is always
the statements that were carried out directly in the transaction. If
anything within one statement fails, the statement was not carried out
and must not have any effect. It is not important whether the procedure
was fired by a trigger or called by another procedure.


Are there any Open Source RDBMS which behave like Oracle, Access and
Allbase? If so, one might look into their code to find out how they have
implented it.


Coming back to Standards, here is a quote from the ANSI document:
  4.28 SQL-transactions
  The execution of a <rollback statement> may be initiated implicitly  by an implementation when it detects
unrecoverableerrors. When  such an error occurs, an exception condition is raised: transaction  rollback with an
implementation-definedsubclass code.
 

This means that a rollback does not have to be initiated if an 
unrecoverable error occurs, it only _may_ happen. Since it is 
impractical, it should not.

AN EXAMPLE:

Let's have two tables, employees and doctors in a hospital.

create table emp ( empno  integer primary key, name varchar(40)
);

create table doctor ( empno integer primary key references emp, beepernumber integer unique
);

Now let a user enter the data of a doctor. First the data of the
employee part are sent to the database: insert into emp values (1, 'Fred'); --> success 
Second the doctor-special data are sent to the database: insert into doctor values (1, 34); -->error, beepernumber
alreadypresent, unique key violation
 

Since there was an error, we let the user key in a different 
beeper number for the doctor and send the data to the
database: insert into doctor (1, 45); -->should be successful (if 45 is not already there) and it _is_ in     Allbase,
Oracle,Access -->Postgres tells you something about and *ABORT* state
 

We commit the transaction:  commit work; 
Effect in all other databases: a successfully entered doctor
Effect in PostgreSQL: nothing!

To get the same effect in PostgreSQL, we would have to rollback
(or commit, no difference) the transaction after the attempt of
inserting the non-unique beeper number and then re-insert all
data into all tables. WHY? It only makes programming more
complex, thus error-prone.


I would appreciate if most of the difficult tasks could be done within
the database system - that's what it's there fore. The first reason is
that the implementors usually are far more advanced programmers than the
programmers of applications. The second reason is that this isolates the
tricky programming from the already complex logic of the application.

Additionally, whether one might personally think that this behaviour is
important or not, it is what most commercial RDBMS do. So it would make
porting of applications from e. g. Oracle to PostgreSQL dramatically
easier. Until now, this has been one of the pitfalls. That's why I
brought this up in the first place.


Sincerely,

Holger

-- 
Holger@Jakobs.com, Bergisch Gladbach, Germany
Telefon (0 22 02) 5 99 91 oder (01 77) 7 92 24 66



Re: transaction processing after error in statement

From
Jan Wieck
Date:
Holger Jakobs wrote:

> Calling a procedure is a statement as well, and it includes all other
> procedures called from this procedure. So the statement level is always
> the statements that were carried out directly in the transaction. If
> anything within one statement fails, the statement was not carried out
> and must not have any effect. It is not important whether the procedure
> was fired by a trigger or called by another procedure.

So you define the smalles unit being one single statement as issued by 
the client application and receiving one single returncode over the 
frontend/backend protocol.

That's almost what people think of as subtransactions. I think if we 
ever implement them, we will have some session setting that lets the 
backend behave like that and your needs will be satisfied.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #