Thread: COMMIT IN STORED PROCEDURE WHILE IN A LOOP
AWS Aurora based on PG 13 I am writing a sproc to copy a schema into another. Here is the relevant portion of the code. Basically I want to commit after every table is created. In big schemas with hundreds of table I do not want to run entireoperation in one transaction. I am getting error at COMMIT -> cannot commit while a subtransaction is active. Is commit not possible in a loop BEGIN FOR object IN SELECT table_name::text FROM information_schema.TABLES WHERE table_schema = source_schema AND table_type = 'BASE TABLE' LOOP buffer := dest_schema || '.' || object; BEGIN sql_stmt := 'CREATE TABLE ' || buffer || ' (LIKE ' || source_schema || '.' || object; sql_stmt := sql_stmt || ' INCLUDING COMMENTS INCLUDING IDENTITY INCLUDING CONSTRAINTS INCLUDING DEFAULTSINCLUDING GENERATED INCLUDING INDEXES)' ; --RAISE NOTICE '%' , sql_stmt ; execute sql_stmt ; COMMIT; sql_stmt := 'INSERT INTO ' || buffer || ' OVERRIDING SYSTEM VALUE SELECT * FROM ' || source_schema ||'.' || object ; --RAISE NOTICE '%' , sql_stmt ; execute sql_stmt ; COMMIT; EXCEPTION WHEN OTHERS THEN GET STACKED DIAGNOSTICS text_var1 = MESSAGE_TEXT, text_var2 = PG_EXCEPTION_DETAIL, text_var3 = PG_EXCEPTION_HINT; print_msg := 'ERROR:->' || text_var1 || '|' || text_var2 || '|' || text_var3 ; RAISE NOTICE '%' , print_msg ; END ; END LOOP; END;
> On Oct 18, 2022, at 13:14, Ravi Krishna <s_ravikrishna@aol.com> wrote: > > I am getting error at COMMIT -> cannot commit while a subtransaction is active. > Is commit not possible in a loop You can commit in a loop, but not in BEGIN / END block that has an exception handler: that creates a subtransaction for theduration of the BEGIN / END.
> xof@thebuild.com wrote: > >> s_ravikrishna@aol.com wrote: >> >> I am getting error at COMMIT -> cannot commit while a subtransaction is active... > > You can commit in a loop, but not in BEGIN / END block that has an exception handler: that creates a subtransaction forthe duration of the BEGIN / END. This surprised me when I first started to use PG (after all those years with ORCL). There are some use cases where this limitationis a nuisance. For example, if you want to implement the famous retry loop (with exponential backoff) for an errorthat is provoked at commit time... well, you have to do it in client-side code. Could the limitation be lifted by making tractable internal implementation changes? Or is it rooted in profoundly deep featuresof the architecture—meaning that it could never be lifted?
> You can commit in a loop, but not in BEGIN / END block that has an exception handler:
> that creates a subtransaction for the duration of the BEGIN / END.
The reason I have to deal with error exception is that I want to ignore failure on a table and move on to next table.
I thought I can trick it by moving the part of the code which executes dynamic sql and has exception handler in it,
to a separate procedure. This way I can keep the main procedure without exception handler. It failed too with
the same error. So that means this restriction is not at lexical level, but at execution level.
This is so disappointing. It seems there is no way I can copy each table and commit it individually and also capture error.
> On Oct 18, 2022, at 14:29, Ravi Krishna <s_ravikrishna@aol.com> wrote: > > > You can commit in a loop, but not in BEGIN / END block that has an exception handler: > > that creates a subtransaction for the duration of the BEGIN / END. > > The reason I have to deal with error exception is that I want to ignore failure on a table and move on to next table. Rather than have a loop inside the BEGIN / END, you could put the BEGIN EXCEPTION END inside the loop, catch the error, storethe important parts of the exception in a variable, and then do the COMMIT after the END statement but before the nextiteration of the loop. A bit messier, but it gets the job done.
Bryn Llewellyn <bryn@yugabyte.com> writes: > xof@thebuild.com wrote: >> You can commit in a loop, but not in BEGIN / END block that has an exception handler: that creates a subtransaction forthe duration of the BEGIN / END. > This surprised me when I first started to use PG (after all those years > with ORCL). Really? BEGIN with an exception block is a subtransaction because it's defined to roll back to the database state as of the start of the block if an exception occurs. COMMIT in the middle fundamentally conflicts with that, I should think. Does Oracle interpret that differently? regards, tom lane
> On Oct 18, 2022, at 14:15, Bryn Llewellyn <bryn@yugabyte.com> wrote: > Could the limitation be lifted by making tractable internal implementation changes? Or is it rooted in profoundly deepfeatures of the architecture—meaning that it could never be lifted? That is a very good question. One of the issues (as I understand it) is to be able to continue to use the same connectionin the event of an error. Here's the scenario: 1. SELECT * FROM my_func(); 2. my_func gets an error. The connection is now in an error state. 3. So, catching the exception doesn't do you much good, because future operations will get an error. This problem is solved by wrapping it in a savepoint, since rolling back to the savepoint will undo the error state of theconnection. There might be a way forward, but it's hard to get the proper semantics of an exception handler without doing something likethat.
On Tue, 2022-10-18 at 17:33 -0400, Tom Lane wrote: > Bryn Llewellyn <bryn@yugabyte.com> writes: > > xof@thebuild.com wrote: > > > You can commit in a loop, but not in BEGIN / END block that has an exception handler: that creates a subtransactionfor the duration of the BEGIN / END. > > > This surprised me when I first started to use PG (after all those years > > with ORCL). > > Really? BEGIN with an exception block is a subtransaction because it's > defined to roll back to the database state as of the start of the block > if an exception occurs. COMMIT in the middle fundamentally conflicts > with that, I should think. Does Oracle interpret that differently? Looks like Oracle doesn't care much about that: SQL> CREATE TABLE mytab (id integer CHECK (id > 0)); Table created. SQL> CREATE PROCEDURE committest IS 2 BEGIN 3 INSERT INTO mytab VALUES (42); 4 COMMIT; 5 INSERT INTO mytab VALUES (-42); 6 EXCEPTION 7 WHEN OTHERS THEN 8 NULL; 9 END; 10 / Procedure created. SQL> CALL committest(); Call completed. SQL> SELECT * FROM mytab; ID ---------- 42 I looks like Oracle allows you to randomly interfere with its transaction handling. If you run commit and then enter an exception handler, it simply doesn't rollback. Yours, Laurenz Albe
On Tue, 2022-10-18 at 17:33 -0400, Tom Lane wrote:
Bryn Llewellyn <bryn@yugabyte.com> writes:xof@thebuild.com wrote:You can commit in a loop, but not in BEGIN / END block that has an exception handler: that creates a subtransaction for the duration of the BEGIN / END.This surprised me when I first started to use PG (after all those yearswith ORCL).Really? BEGIN with an exception block is a subtransaction because it'sdefined to roll back to the database state as of the start of the blockif an exception occurs. COMMIT in the middle fundamentally conflictswith that, I should think. Does Oracle interpret that differently?regards, tom lane
Hi Tom,
Yes, Oracle does interpret that differently. Bryn may correct me if I'm wrong, but Oracle creates an implicit save point when it encounters BEGIN. Exception handler doesn't necessarily roll things back. Oracle behavior is not standard and PgSQL adheres to SQL standard better than Oracle. However, being as pervasive as it is, Oracle is de facto standard.
Also, Oracle has something called "autonomous transaction" which, in effect, means that a session can have two concurrent transactions open, which is also non-standard:
Amazon, lead by Kevin Closson, the guy who has famously designed Oracle Exadata among other things, even came up with the recipe how to migrate it to Postgres:
I am a bit skeptical toward that recipe and I usually prefer programming solutions with opening another thread and sending a message. BTW, speaking of Kevin, he has also written pgio, which is a PostgreSQL version of his SLOB package. Kevin is the only retired Oracle ACE in existence. BTW, Bryn also used to be an Oracle ACE.
Regards
On Tue, 2022-10-18 at 14:31 -0700, Christophe Pettus wrote:
Rather than have a loop inside the BEGIN / END, you could put the BEGIN EXCEPTION END inside the loop, catch the error, store the important parts of the exception in a variable, and then do the COMMIT after the END statement but before the next iteration of the loop. A bit messier, but it gets the job done.
Commit within a loop is an extremely bad idea. Commit is an expensive operation which includes incrementing the XID, which is global and not local to the process. There is also a WAL write which has to be waited on. Every commit implies at least one write operation. If that was not the case, Postgres wouldn't be ACID compliant. There would be problem with the "D". Commit within a loop will have many adverse effects on the performance. Here is what Tom Kyte, who used to be a celebrity in the world of Oracle, said on this topic:
Surprisingly enough, the argument in the article is mostly portable, doesn't depend on the database type at all. Basically, transaction is a logical unit of work. If 1000 rows need to be updated, it's better to update them in a single transaction than in 1000 transactions. That is particularly true for Postgres which doesn't have problems with the undo tablespace and ORA-1555 "snapshot too old" error because of the different architecture. Also, Oracle has a secret optimization: it doesn't wait for commit, if the commit is issued within PL/SQL loop. Your idea solves the syntactic problem with commits within PLPG/SQL loops but it doesn't solve other problems that such programming causes. Commit within loop is a very bad idea. The best way to resolve the problems with commit within the loop is to remove the programmer trying to do that from the project. In my humble opinion, programmers who do stuff like that should suffer unusual and cruel punishment.
PS:
----
I am sure, that if Tom Kyte would dare to make a suggestion on this list, there would be someone who would try to explain "the Postgres way" to him in a condescending manner. I've seen that before.
> On Oct 18, 2022, at 19:18, gogala.mladen@gmail.com wrote: > > Commit within a loop is an extremely bad idea. This is an over-generalization. There are many use-cases for this (if there were not, procedures wouldn't have been nearlyas important a feature). For example, if you are processing a large update (in the hundreds of thousands or more of rows), you often want to commitregularly so that other processes don't have to wait for the whole thing to finish due to row-level locks, and to givevacuum a chance to deal with the dead tuples. Similarly, while inserting one row at a time and committing is usuallynot a great idea, it can make sense to do large inserts in batches. Applications do this kind of thing all the time, very successfully; it was just that the loop was in the application ratherthan in the procedure. High commit rates happen all the time, and they don't break PostgreSQL. For example, an IoT application collecting sensordata and doing many inserts per second is also doing many commits per second, since each bare INSERT is in its owntransaction. PostgreSQL handles it just fine.
Comments in-line.
On Tue, 2022-10-18 at 21:02 -0700, Christophe Pettus wrote:
On Oct 18, 2022, at 19:18, gogala.mladen@gmail.com wrote:Commit within a loop is an extremely bad idea.This is an over-generalization. There are many use-cases for this (if there were not, procedures wouldn't have been nearly as important a feature).For example, if you are processing a large update (in the hundreds of thousands or more of rows), you often want to commit regularly so that other processes don't have to wait for the whole thing to finish due to row-level locks, and to give vacuum a chance to deal with the dead tuples. Similarly, while inserting one row at a time and committing is usually not a great idea, it can make sense to do large inserts in batches.
That depends. Multiple commits will slow down the processing. If the goal of the exercise is to let update complete as quickly as possible, then it will be a single commit. If the goal is to enable normal processing and let the enormous update complete in its due time, then you'll do what you describe.
Applications do this kind of thing all the time, very successfully; it was just that the loop was in the application rather than in the procedure.High commit rates happen all the time, and they don't break PostgreSQL. For example, an IoT application collecting sensor data and doing many inserts per second is also doing many commits per second, since each bare INSERT is in its own transaction. PostgreSQL handles it just fine.
Point of my post is that the business logic, in your case it's IoT sensors, determines what is transaction and when to commit. Advice like "commit often and commit early", to paraphrase the famous Chicago mayor, is easy to find but I would take it with grain of salt.
Regards
--
Mladen Gogala
Database Consultant
On 10/19/22 08:06, Mladen Gogala wrote:
[snip]
In the normal course of operation (i.e, not when bulk loading), you should commit at the end of every "business transaction". We've committed after X business when running stovepipe "batch" jobs processing input files. In those cases, though, we had to track progress through the file; in the case of a rollback, the application had to go back to the last input file "save point" and start over.
[snip]
Applications do this kind of thing all the time, very successfully; it was just that the loop was in the application rather than in the procedure.High commit rates happen all the time, and they don't break PostgreSQL. For example, an IoT application collecting sensor data and doing many inserts per second is also doing many commits per second, since each bare INSERT is in its own transaction. PostgreSQL handles it just fine.Point of my post is that the business logic, in your case it's IoT sensors, determines what is transaction and when to commit. Advice like "commit often and commit early", to paraphrase the famous Chicago mayor, is easy to find but I would take it with grain of salt.
In the normal course of operation (i.e, not when bulk loading), you should commit at the end of every "business transaction". We've committed after X business when running stovepipe "batch" jobs processing input files. In those cases, though, we had to track progress through the file; in the case of a rollback, the application had to go back to the last input file "save point" and start over.
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
It seems that I made a thread-discipline error when I asked a question that had nothing to do with the frequency, or thecost, of committing when I saw this reply (paraphrased for brevity here) from Christophe: > You [cannot] commit in [a] BEGIN / END [block statement] that has an exception handler [because] that creates a subtransactionfor the duration of the [block statement]. I asked this in response (again, paraphrased for brevity): > Could the limitation be lifted...? [Or is the limitation] rooted in profoundly deep features of the architecture? Sorry that I caused some distraction. Anyway, Tom replied immediately. He said: > BEGIN with an exception block is a subtransaction because it's defined to roll back to the database state as of the startof the block if an exception occurs. COMMIT in the middle fundamentally conflicts with that, I should think. Thanks, Tom. It's clear to me now that the present PG paradigm will never, ever change. So my conclusion stands for this use case: I'm using "serializable" isolation (which luxury ORCL doesn't afford me); I knowthat I can get a "cannot serialize" error at "commit" time. Here, I cannot take appropriate action within my PL/pgSQLcode and hide the whole story of what this is about from client code. Rather, I must explain the business to theauthors of the next tier, and teach them when, and how, retry is appropriate. Tom asked, too, if ORCL has a different paradigm... Briefly, yes—and radically so. But (all of you) do please feel free toskip over my sketch here if it doesn't interest you. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — « The real account of which this is a sketch is of no interest at all if you work only with PG and have never used ORCL. Butif you need to switch, intellectually, from one to the other—and maybe need to do this all the time because your job dutiesspan both systems—then it *is* interesting. ORCL's PL/SQL is compiled (and optimized) at "create procedure" time. The new source code replaces the old in the catalog.And if it failed to compile, then you're left with an invalid unit that you cannot execute. Compilation errors arerecorded in the catalog too. Further, static dependencies (proc upon proc, proc upon table, etc) are also recorded inthe catalog. This is intimately connected with the hard distinction between static and dynamic SQL. The latter simply passesthe text on "as is" into the byte code to be dealt with at run-time. Only statements like "select", "insert", "update",delete" and a few others can be static SQL. Table creation and the like must be dynamic SQL. This probably shocksthose who move from PG to ORCL because you cannot, for example, create a table and then operate on it with static SQLin the same procedure. In particular, for the present discussion, the PL/SQL block statement is a pure lexical device. (This is the case in PL/SQL'sprogenitor, ADA. And that's where all that stuff about DIANA, that the PL/SQL programmer eventually comes to hearabout, comes from.) All memory that you had a block statement in the source is lost in the compiled so-called byte codethat gets interpreted at run time. On the other hand, every call from PL/SQL to SQL is done in its own subtransaction—andif it fails, then that single statement is atomically rolled back. The effect of all the SQLs to date,at this moment, remains intact—but uncommitted. (Of course, you might have issued "commit"(s) programmatically. So I'mtalking about SQLs that were done since the most recent "commit".) Significantly, the failure of a call from PL/SQL to SQL raises an exception—so (as well as the single-statement rollback)you now have an in-flight exception that flies up through successive scopes in search of a matching handler. Ifit remains unhandled at the last moment before the top-level PL/SQL "call" is due to finish, then a "rollback" is automaticallyissued. But if a handler *is* found, well... the exception is dead and you can carry on. Like everything elsein programming, the code author must work out what "safe" is. (It could be to turn an insert that fails 'cos a uniquekey is violated into an update.) In ORCL, just as in PG, writing "when others than null" is held to be stupid. Andcode examples that do this are deemed to be not worthy of discussion. Though the paradigms are different, each allows you properly to implement mission-critical applications. It's rather likeEnglish and Chinese. Astonishingly different. But each supports all that you need to let people communicate about mundanedaily business, science, philosophy, epistemology, and so on.) »
gogala.mladen@gmail.com schrieb am 19.10.2022 um 01:46: > Amazon, lead by Kevin Closson, the guy who has famously designed > Oracle Exadata among other things, even came up with the recipe how > to migrate it to Postgres: > > https://aws.amazon.com/blogs/database/migrating-oracle-autonomous-transactions-to-postgresql/ The workaround to use dblink to simulate autonomous transactions is nothing new, and has been floating around for quite a while. Here is a blog post from 2012 (6 years before the Amazon recipe was published) https://raghavt.blog/autonomous-transaction-in-postgresql-9-1/ and another one from 2016 https://blog.dalibo.com/2016/09/08/Pragma_Autonoumous_transactions.html
This seems like a bad use of a stored procedure. Why wouldn't you spin up 40 clients with a table name and run it across 40 connections? But also, I don't like loops in stored procedures. Working with loops in a set based system hurts me but it's personal preference.
Like, I could write this in python in less than 4 hours with an extremely simple sproc. How often are you running table deletions like this to require a sproc?
Thanks,
Ben
On Wed, Oct 19, 2022, 5:39 PM Thomas Kellerer <shammat@gmx.net> wrote:
gogala.mladen@gmail.com schrieb am 19.10.2022 um 01:46:
> Amazon, lead by Kevin Closson, the guy who has famously designed
> Oracle Exadata among other things, even came up with the recipe how
> to migrate it to Postgres:
>
> https://aws.amazon.com/blogs/database/migrating-oracle-autonomous-transactions-to-postgresql/
The workaround to use dblink to simulate autonomous transactions is nothing new,
and has been floating around for quite a while.
Here is a blog post from 2012 (6 years before the Amazon recipe was published)
https://raghavt.blog/autonomous-transaction-in-postgresql-9-1/
and another one from 2016
https://blog.dalibo.com/2016/09/08/Pragma_Autonoumous_transactions.html