Thread: Bit by "commands ignored until end of transaction block" again
Postgres kills a transaction when an error happens. This is a pain; it assumes that all SQL errors are unexpected and fatal to the transaction. There's a very simple case where it's not: UNIQUE checks. I'm generating a cache, with a simple flow:- Search for the cache key; if it exists, return its value.- If it didn't exist, createthe data based on the key, insert it into the table, and return it. This has an obvious race: another thread looks up the same key and creates it between the search and the insert. Both threads will create the cached data, thread A will insert it into the table, and thread B will get an integrity error when it tries to insert it, since it duplicates the unique key. Here, by far the simplest fix is simply to ignore the integrity error.Both threads generated the same data; the failed insertis expected and harmless. Postgres is turning this into a fatal error. There's so much that could make this trivially easy: - SQLite has the handy ON CONFLICT IGNORE, but Postgres has nothing like that. (ON CONFLICT REPLACE is great, too.)- Let me use SAVEPOINT outside of a transaction, with the effect of starting a transaction with the savepoint and ending it when it's committed. Then, I could use savepoints without needing to know whether I'm already in a transaction or not; one would simply be started and committed for me if necessary. (That's by far my biggest issue with savepoints: they force me to either specify "a transaction must be open when this function is called", or need to be able to query whether one is running to decide whether to start a transaction or a savepoint. My function's use of transactions should be invisible to the caller.)- Let me disable this error. I don't want it. (We're grownups; we can decide for ourselves which errors are fatal.) The first two are cleaner, since ignoring the error means I might ignore some other integrity error from the same statement, but I can live with that. Lacking anything better, I'll probably end up dropping out of the ORM and using some uglier SQL to work around this, but this is so trivial that it's silly to have to do that. I can't do it within the ORM; it doesn't have the vocabulary. Any tricks I'm missing? It feels like Postgres is fighting me at every turn with this one, and this isn't the first time I've had this problem. -- Glenn Maynard
Glenn Maynard wrote: > Postgres kills a transaction when an error happens. This is a pain; > it assumes that all SQL errors are unexpected and fatal to the > transaction. > > There's a very simple case where it's not: UNIQUE checks. Ah, it's usually "syntax errors" wrt interactive sessions. > I'm > generating a cache, with a simple flow: > - Search for the cache key; if it exists, return its value. > - If it didn't exist, create the data based on the key, insert it > into the table, and return it. > > This has an obvious race: another thread looks up the same key and > creates it between the search and the insert. Both threads will > create the cached data, thread A will insert it into the table, and > thread B will get an integrity error when it tries to insert it, since > it duplicates the unique key. Yep. > Here, by far the simplest fix is simply to ignore the integrity error. > Both threads generated the same data; the failed insert is expected > and harmless. Postgres is turning this into a fatal error. Well, all errors are considered fatal. But same difference. > There's so much that could make this trivially easy: > > - SQLite has the handy ON CONFLICT IGNORE, but Postgres has nothing > like that. (ON CONFLICT REPLACE is great, too.) True. Been proposed. Fiddly to implement for all use-cases if I remember correctly. > - Let me use SAVEPOINT outside of a transaction, You are never outside a transaction. All queries are executed within a transaction. > with the effect of > starting a transaction with the savepoint and ending it when it's > committed. Then, I could use savepoints without needing to know > whether I'm already in a transaction or not; one would simply be > started and committed for me if necessary. (That's by far my biggest > issue with savepoints: they force me to either specify "a transaction > must be open when this function is called", or need to be able to > query whether one is running to decide whether to start a transaction > or a savepoint. My function's use of transactions should be invisible > to the caller.) I think this is the root of your problem - all queries are within a transaction so either: 1. You have a transaction that wraps a single statement. If you get an error then only that statement was affected. 2. You have an explicit BEGIN...COMMIT transaction which could use a savepoint. Nothing to stop you setting savepoints in #1 (although they're not much use). Typically, if you're in a plpgsql function you would just catch "unique" exception codes from your insert. Or, update, see if any rows were affected, if not try an insert and if that gives a duplicate go back and try the update. You might want the second approach if 99% of the time the cache is already populated. > - Let me disable this error. I don't want it. (We're grownups; we > can decide for ourselves which errors are fatal.) You could always try submitting a patch. However, that's really what savepoints do - let you decide whether an error can be worked around. > The first two are cleaner, since ignoring the error means I might > ignore some other integrity error from the same statement, but I can > live with that. > > Lacking anything better, I'll probably end up dropping out of the ORM > and using some uglier SQL to work around this, but this is so trivial > that it's silly to have to do that. I can't do it within the ORM; it > doesn't have the vocabulary. The ORM can't control transactions, can't call functions or can't set savepoints? -- Richard Huxton Archonet Ltd
On Thu, Jul 23, 2009 at 1:31 AM, Richard Huxton<dev@archonet.com> wrote: >> - Let me use SAVEPOINT outside of a transaction, > > You are never outside a transaction. All queries are executed within a > transaction. "Transaction block", then, if you insist. > I think this is the root of your problem - all queries are within a > transaction so either: > 1. You have a transaction that wraps a single statement. If you get an error > then only that statement was affected. > 2. You have an explicit BEGIN...COMMIT transaction which could use a > savepoint. Savepoints can only be used inside transaction blocks. My function has no idea whether it's being called inside a transaction block. From inside a transaction block, my function would need to call SAVEPOINT/RELEASE SAVEPOINT. If it's not in a transaction block, it needs to call BEGIN/COMMIT instead. SAVEPOINT will fail with "SAVEPOINT can only be used in transaction blocks". This would be very simple and clean if the SAVEPOINT command transparently issued BEGIN if executed outside of a transaction block, marking the savepoint so it knows that when the savepoint is released or rolled back, the associated transaction block needs to be committed or rolled back, too. At that point, you could stop using BEGIN/COMMIT/ROLLBACK entirely, and just let savepoints do it, if you wanted--with this, the transaction commands are essentially redundant. I can't count the number of times I've wished for this. > Typically, if you're in a plpgsql function you would just catch "unique" > exception codes from your insert. Or, update, see if any rows were affected, > if not try an insert and if that gives a duplicate go back and try the > update. You might want the second approach if 99% of the time the cache is > already populated. It's just a simple INSERT, generated from a Model.objects.create() in Django. >> Lacking anything better, I'll probably end up dropping out of the ORM >> and using some uglier SQL to work around this, but this is so trivial >> that it's silly to have to do that. I can't do it within the ORM; it >> doesn't have the vocabulary. > > The ORM can't control transactions, can't call functions or can't set > savepoints? It can't write the necessary SQL to say "insert this unless it already exists", namely: INSERT INTO cache (key, data) ( SELECT i.key, "data", FROM (VALUES ("key")) AS i(key) LEFT JOIN cacheprior_entry ON (prior_entry.key = "key") WHERE prior_entry.key IS NULL) It--Django--also doesn't have a mature transaction/savepoint system; in fact, its transaction handling is an absolute mess. I've written helpers for my main codebase that simply says "wrap this in a transaction block if one isn't already started, otherwise wrap it in a savepoint". I don't want to use that code here, because it's nitty code: it needs to poke at Django internals to figure out whether it's in a transaction block or not, and dealing with other API compatibility issues. -- Glenn Maynard
Glenn Maynard wrote: > On Thu, Jul 23, 2009 at 1:31 AM, Richard Huxton<dev@archonet.com> wrote: >>> - Let me use SAVEPOINT outside of a transaction, >> You are never outside a transaction. All queries are executed within a >> transaction. > > "Transaction block", then, if you insist. > >> I think this is the root of your problem - all queries are within a >> transaction so either: >> 1. You have a transaction that wraps a single statement. If you get an error >> then only that statement was affected. >> 2. You have an explicit BEGIN...COMMIT transaction which could use a >> savepoint. > > Savepoints can only be used inside transaction blocks. Says who? Wouldn't work in plpgsql as exception handling if that was the case. > My function > has no idea whether it's being called inside a transaction block. > > From inside a transaction block, my function would need to call > SAVEPOINT/RELEASE SAVEPOINT. > > If it's not in a transaction block, it needs to call BEGIN/COMMIT > instead. SAVEPOINT will fail with "SAVEPOINT can only be used in > transaction blocks". Ah [cue light-bulb effect], I think I understand. Your function isn't in the database is it? Surely your application knows if it's issuing BEGIN..COMMIT? > This would be very simple and clean if the SAVEPOINT command > transparently issued BEGIN if executed outside of a transaction block, > marking the savepoint so it knows that when the savepoint is released > or rolled back, the associated transaction block needs to be committed > or rolled back, too. At that point, you could stop using > BEGIN/COMMIT/ROLLBACK entirely, and just let savepoints do it, if you > wanted--with this, the transaction commands are essentially redundant. > > I can't count the number of times I've wished for this. You'll have people with torches and pitchforks after you if you change RELEASE SAVEPOINT to mean COMMIT. I might even lend them my pitchfork. >> Typically, if you're in a plpgsql function you would just catch "unique" >> exception codes from your insert. Or, update, see if any rows were affected, >> if not try an insert and if that gives a duplicate go back and try the >> update. You might want the second approach if 99% of the time the cache is >> already populated. > > It's just a simple INSERT, generated from a Model.objects.create() in Django. OK, it sounds like create() isn't the method to call then, since you don't want to generate a simple INSERT. >>> Lacking anything better, I'll probably end up dropping out of the ORM >>> and using some uglier SQL to work around this, but this is so trivial >>> that it's silly to have to do that. I can't do it within the ORM; it >>> doesn't have the vocabulary. >> The ORM can't control transactions, can't call functions or can't set >> savepoints? > > It can't write the necessary SQL to say "insert this unless it already > exists", namely: > > INSERT INTO cache (key, data) ( > SELECT i.key, "data", > FROM > (VALUES ("key")) AS i(key) > LEFT JOIN cache prior_entry ON (prior_entry.key = "key") > WHERE > prior_entry.key IS NULL > ) Could it generate: "SELECT ensure_cache_contains(key,data)"? Then ten lines of plpgsql will neatly encapsulate the problem. That plpgsql can be automatically generated easily enough too. I know nothing of Django, but perhaps it's possible to subclass Model and add an "ensure" method that will call your plpgsql function? > It--Django--also doesn't have a mature transaction/savepoint system; > in fact, its transaction handling is an absolute mess. I've written > helpers for my main codebase that simply says "wrap this in a > transaction block if one isn't already started, otherwise wrap it in a > savepoint". I don't want to use that code here, because it's nitty > code: it needs to poke at Django internals to figure out whether it's > in a transaction block or not, and dealing with other API > compatibility issues. Ah, the joys of badly designed ORMs. The nice thing is that there seem to be plenty of bad ones to choose from too. If your ORM doesn't handle transactions well, the more you use it the more difficult your life will become. I'd be tempted to tidy up your existing fixes and wrap Django's ORM as cleanly as you can. That's assuming they're not interested in patches. -- Richard Huxton Archonet Ltd
On Thu, Jul 23, 2009 at 02:04:53AM -0400, Glenn Maynard wrote: > On Thu, Jul 23, 2009 at 1:31 AM, Richard Huxton<dev@archonet.com> wrote: > >> - Let me use SAVEPOINT outside of a transaction, > > > > You are never outside a transaction. All queries are executed within a > > transaction. > > "Transaction block", then, if you insist. > > > I think this is the root of your problem - all queries are within a > > transaction so either: > > 1. You have a transaction that wraps a single statement. If you get an error > > then only that statement was affected. > > 2. You have an explicit BEGIN...COMMIT transaction which could use a > > savepoint. > > Savepoints can only be used inside transaction blocks. My function > has no idea whether it's being called inside a transaction block. > > From inside a transaction block, my function would need to call > SAVEPOINT/RELEASE SAVEPOINT. > > If it's not in a transaction block, it needs to call BEGIN/COMMIT > instead. SAVEPOINT will fail with "SAVEPOINT can only be used in > transaction blocks". Have you tried this? I expect if you give it a shot, you'll find you don't actually have this problem. Really, everything is always in a transaction. If you haven't explicitly opened one, PostgreSQL opens one for you before each statement, and issues a COMMIT afterwards (or a ROLLBACK, if your statement ran into an error). Statements within functions are always executed within the same transaction, so you can issue SAVEPOINT commands anywhere in PL/pgSQL functions without problems, because you're always in a transaction. -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com
Joshua Tolley wrote: > On Thu, Jul 23, 2009 at 02:04:53AM -0400, Glenn Maynard wrote: >> On Thu, Jul 23, 2009 at 1:31 AM, Richard Huxton<dev@archonet.com> wrote: >>>> - Let me use SAVEPOINT outside of a transaction, >>> You are never outside a transaction. All queries are executed within a >>> transaction. >> "Transaction block", then, if you insist. >> >>> I think this is the root of your problem - all queries are within a >>> transaction so either: >>> 1. You have a transaction that wraps a single statement. If you get an error >>> then only that statement was affected. >>> 2. You have an explicit BEGIN...COMMIT transaction which could use a >>> savepoint. >> Savepoints can only be used inside transaction blocks. My function >> has no idea whether it's being called inside a transaction block. >> >> From inside a transaction block, my function would need to call >> SAVEPOINT/RELEASE SAVEPOINT. >> >> If it's not in a transaction block, it needs to call BEGIN/COMMIT >> instead. SAVEPOINT will fail with "SAVEPOINT can only be used in >> transaction blocks". > > Have you tried this? I expect if you give it a shot, you'll find you don't > actually have this problem. Really, everything is always in a transaction. Each statement is in it's own transaction, but the problem (as I understand it) is that you're in this sort of situation: psql -d dbname .. # select now(); now ------------------------------- 2009-07-23 17:04:21.406424+10 (1 row) Time: 2.434 ms (csmith@[local]:5432) 17:04:21 [test] # savepoint xyz; ERROR: SAVEPOINT can only be used in transaction blocks (csmith@[local]:5432) 17:04:25 [test] You haven't explicitly started a transaction, therefore savepoints won't work. Django (it seems) just issues queries with no knowledge of (and no way to support) them. -- Postgresql & php tutorials http://www.designmagick.com/
Chris, 23.07.2009 09:06: > psql -d dbname > .. > # select now(); > now > ------------------------------- > 2009-07-23 17:04:21.406424+10 > (1 row) > > Time: 2.434 ms > (csmith@[local]:5432) 17:04:21 [test] > # savepoint xyz; > ERROR: SAVEPOINT can only be used in transaction blocks > (csmith@[local]:5432) 17:04:25 [test] > > You haven't explicitly started a transaction, therefore savepoints won't > work. > > Django (it seems) just issues queries with no knowledge of (and no way > to support) them. The above situation only arises if you run in autocommit mode which is the default for psql (which I have *never* understood). If you do a "\set AUTOCOMMIT off", then you can set a savepoint without using BEGIN. I have this in my psqlrc.conf and yourexample looks like this on my computer: c:\Temp>psql training thomas psql (8.4.0) Type "help" for help. training=> select now(); now ----------------------------2009-07-23 09:30:55.791+02 (1 row) training=> savepoint abc; SAVEPOINT training=> release abc; RELEASE training=> I don't believe any serious ORM would run in autocommit mode, so that shouldn't be a problem. Thomas
On Thu, Jul 23, 2009 at 2:41 AM, Richard Huxton<dev@archonet.com> wrote: > Ah [cue light-bulb effect], I think I understand. Your function isn't in the > database is it? Surely your application knows if it's issuing BEGIN..COMMIT? I'm writing a Python library call. It has no idea whether the caller happens to be inside a transaction already, and I don't want to specify something like "always run this inside a transaction". (Callers are equally likely to want to do either, and it's bad API to force them to start a transaction--the fact that I'm using the database at al should be transparent.) > You'll have people with torches and pitchforks after you if you change > RELEASE SAVEPOINT to mean COMMIT. I might even lend them my pitchfork. RELEASE SAVEPOINT would only COMMIT the transaction *if* the savepoint that it's releasing started it. Every currently-valid case requires that a transaction is already started, so no existing code would be affected by this. SAVEPOINT a; -- implicitly issues BEGIN because one wasn't started RELEASE SAVEPOINT a; -- implicitly issues COMMIT because savepoint "a" issued the BEGIN, not the user BEGIN; SAVEPOINT a; RELEASE SAVEPOINT a; -- will not commit, because savepoint "a" didn't start the transaction Of course, there are other details--it probably shouldn't allow ROLLBACK or COMMIT on an implicit transaction block, for example. > Could it generate: "SELECT ensure_cache_contains(key,data)"? Then ten lines > of plpgsql will neatly encapsulate the problem. That plpgsql can be > automatically generated easily enough too. I don't think so, at least not without digging into internals. Django is built around knowing all data types, so it'd need to be givne types explicitly--for example, to know whether a timestamp should be formatted as a timestamp, date or time. (I do have a couple other columns here--timestamps for cache expiration, etc.) I'll have to ask Django-side if there's a public API to do this, but I don't think there is. > Ah, the joys of badly designed ORMs. The nice thing is that there seem to be > plenty of bad ones to choose from too. If your ORM doesn't handle > transactions well, the more you use it the more difficult your life will > become. I'd be tempted to tidy up your existing fixes and wrap Django's ORM > as cleanly as you can. That's assuming they're not interested in patches. The ORM on a whole is decent, but there are isolated areas where it's very braindamaged--this is one of them. They have a stable-release API-compatibility policy, which I think just gets them stuck with some really bad decisions for a long time. -- Glenn Maynard
Glenn Maynard wrote: > On Thu, Jul 23, 2009 at 2:41 AM, Richard Huxton<dev@archonet.com> wrote: >> Ah [cue light-bulb effect], I think I understand. Your function isn't in the >> database is it? Surely your application knows if it's issuing BEGIN..COMMIT? > > I'm writing a Python library call. It has no idea whether the caller > happens to be inside a transaction already, and I don't want to > specify something like "always run this inside a transaction". > (Callers are equally likely to want to do either, and it's bad API to > force them to start a transaction--the fact that I'm using the > database at al should be transparent.) That last bit is never going to work. There always needs to be some basic level of understanding between systems and transactions really have to be part of that for talking to a RDBMS. There will have to be a piece of code responsible for managing transactions somewhere in the middleware/application layers. >> You'll have people with torches and pitchforks after you if you change >> RELEASE SAVEPOINT to mean COMMIT. I might even lend them my pitchfork. > > RELEASE SAVEPOINT would only COMMIT the transaction *if* the savepoint > that it's releasing started it. Every currently-valid case requires > that a transaction is already started, so no existing code would be > affected by this. > > SAVEPOINT a; -- implicitly issues BEGIN because one wasn't started > RELEASE SAVEPOINT a; -- implicitly issues COMMIT because savepoint "a" > issued the BEGIN, not the user [snip] > Of course, there are other details--it probably shouldn't allow > ROLLBACK or COMMIT on an implicit transaction block, for example. All you're doing here is moving the point of confusion around, surely? At some point you still need to know whether you can issue BEGIN/ROLLBACK/COMMIT etc. >> Could it generate: "SELECT ensure_cache_contains(key,data)"? Then ten lines >> of plpgsql will neatly encapsulate the problem. That plpgsql can be >> automatically generated easily enough too. > > I don't think so, at least not without digging into internals. Django > is built around knowing all data types, so it'd need to be givne types > explicitly--for example, to know whether a timestamp should be > formatted as a timestamp, date or time. (I do have a couple other > columns here--timestamps for cache expiration, etc.) I'll have to ask > Django-side if there's a public API to do this, but I don't think > there is. Well, the types would be exactly the same as for your existing insert. All it's really doing is changing the template those values get substituted into. It presumably does mean patching the ORM (or subclassing from it anyway). >> Ah, the joys of badly designed ORMs. The nice thing is that there seem to be >> plenty of bad ones to choose from too. If your ORM doesn't handle >> transactions well, the more you use it the more difficult your life will >> become. I'd be tempted to tidy up your existing fixes and wrap Django's ORM >> as cleanly as you can. That's assuming they're not interested in patches. > > The ORM on a whole is decent, but there are isolated areas where it's > very braindamaged--this is one of them. They have a stable-release > API-compatibility policy, which I think just gets them stuck with some > really bad decisions for a long time. Presumably they targetted MySQL first, where there's a lot less use in multi-statement transactions with their different behaviour of their various storage-engines. -- Richard Huxton Archonet Ltd
* Glenn Maynard (glenn@zewt.org) wrote: > > The ORM can't control transactions, can't call functions or can't set > > savepoints? > > It can't write the necessary SQL to say "insert this unless it already > exists", namely: If it can't cleanly handle failure cases like this one, then I think your issue is with your ORM and not with PG. An INSERT failing on a uniqueness violation is actually a rather big deal in a relational database and not erroring on it goes quite against data integrity considerations. If your ORM could call a function instead, you could handle the insert and error-check in the function, to make up for the lack of intelligence in the ORM. Another option would be to have a 'fake' table, which has no rows in it and just has an 'ON INSERT' trigger that calls a function to handle this. That could also be a view with a do-instead rule, if the ORM has to query the same table. I would think the time would better be spent fixing the ORM though. Thanks, Stephen
On Thursday 23 July 2009 12:39:23 am Glenn Maynard wrote: > > The ORM on a whole is decent, but there are isolated areas where it's > very braindamaged--this is one of them. They have a stable-release > API-compatibility policy, which I think just gets them stuck with some > really bad decisions for a long time. > > -- > Glenn Maynard None of the options listed in the URL below work?: http://docs.djangoproject.com/en/dev/topics/db/transactions/#topics-db-transactions This is the development version of the docs so may contain some new options. In particular look at Savepoint rollback and Database-level autocommit. -- Adrian Klaver aklaver@comcast.net
On Thu, 2009-07-23 at 17:06 +1000, Chris wrote: > Joshua Tolley wrote: > > On Thu, Jul 23, 2009 at 02:04:53AM -0400, Glenn Maynard wrote: > >> On Thu, Jul 23, 2009 at 1:31 AM, Richard Huxton<dev@archonet.com> wrote: > >>>> - Let me use SAVEPOINT outside of a transaction, > >>> You are never outside a transaction. All queries are executed within a > >>> transaction. > >> "Transaction block", then, if you insist. > >> > >>> I think this is the root of your problem - all queries are within a > >>> transaction so either: > >>> 1. You have a transaction that wraps a single statement. If you get an error > >>> then only that statement was affected. > >>> 2. You have an explicit BEGIN...COMMIT transaction which could use a > >>> savepoint. > >> Savepoints can only be used inside transaction blocks. My function > >> has no idea whether it's being called inside a transaction block. > >> > >> From inside a transaction block, my function would need to call > >> SAVEPOINT/RELEASE SAVEPOINT. > >> > >> If it's not in a transaction block, it needs to call BEGIN/COMMIT > >> instead. SAVEPOINT will fail with "SAVEPOINT can only be used in > >> transaction blocks". > > > > Have you tried this? I expect if you give it a shot, you'll find you don't > > actually have this problem. Really, everything is always in a transaction. [snip] > You haven't explicitly started a transaction, therefore savepoints won't > work. True. However, he's talking about code within a PL/PgSQL function. To a PL/PgSQL function there is NO difference between: begin; select my_function(); commit; and a standalone: select my_function(); in both cases the statement executes in a transaction, and in both cases individual statements within the function are within the same transaction. That's why any function can EXCEPTION blocks, etc, which rely on savepoints. -- Craig Ringer
On Thu, 2009-07-23 at 03:39 -0400, Glenn Maynard wrote: > I'm writing a Python library call. It has no idea whether the caller > happens to be inside a transaction already, and I don't want to > specify something like "always run this inside a transaction". > (Callers are equally likely to want to do either, and it's bad API to > force them to start a transaction--the fact that I'm using the > database at al should be transparent.) Personally, I'd think about moving the function into the database, using PL/PgSQL or even PL/PythonU if you have to. Why should DB use be transparent when you're modifying the DB? In one case you immediately make a change. In another case, you schedule a change to be applied if/when the current transaction commits, so the change may or may not occur at some point in the future. That is, IMO, a big difference. Most applications with this sort of thing will have app-level transaction APIs that contain and manage the DB-level ones anyway. > RELEASE SAVEPOINT would only COMMIT the transaction *if* the savepoint > that it's releasing started it. So, what you're really asking for boils down to nestable transactions? -- Craig Ringer
Joshua Tolley escribió: > Have you tried this? I expect if you give it a shot, you'll find you don't > actually have this problem. Really, everything is always in a transaction. If > you haven't explicitly opened one, PostgreSQL opens one for you before each > statement, and issues a COMMIT afterwards (or a ROLLBACK, if your statement > ran into an error). Statements within functions are always executed within the > same transaction, so you can issue SAVEPOINT commands anywhere in PL/pgSQL > functions without problems, because you're always in a transaction. No, actually you can't call SAVEPOINT inside a PL/pgSQL function (or any SPI user for that matter -- have you tried savepoints in LOLCODE?) Inside PL/pgSQL the only way to use savepoints is with EXCEPTION blocks. You are correct that you're always in a transaction, but in this context not all transactions are equal :-( (The problem, as we found out, is that the function must always have control at the same level of transaction nestedness in SPI; you can't just let the user define and release savepoints arbitrarily.) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Fri, Jul 24, 2009 at 12:54:31PM -0400, Alvaro Herrera wrote: > Joshua Tolley escribió: > > > Have you tried this? I expect if you give it a shot, you'll find you don't > > actually have this problem. Really, everything is always in a transaction. If > > you haven't explicitly opened one, PostgreSQL opens one for you before each > > statement, and issues a COMMIT afterwards (or a ROLLBACK, if your statement > > ran into an error). Statements within functions are always executed within the > > same transaction, so you can issue SAVEPOINT commands anywhere in PL/pgSQL > > functions without problems, because you're always in a transaction. > > No, actually you can't call SAVEPOINT inside a PL/pgSQL function (or any > SPI user for that matter -- have you tried savepoints in LOLCODE?) > Inside PL/pgSQL the only way to use savepoints is with EXCEPTION blocks. > > You are correct that you're always in a transaction, but in this context > not all transactions are equal :-( > > (The problem, as we found out, is that the function must always have > control at the same level of transaction nestedness in SPI; you can't > just let the user define and release savepoints arbitrarily.) That makes sense -- and although I did try this before sending the email, apparently I didn't try it well enough. :) - Josh / eggyknap
> The above situation only arises if you run in autocommit mode which is the default for psql (which I have *never* understood). This is the short answer, in practice--assume that either a transaction is started or will be started by the SAVEPOINT command, and that if a COMMIT is needed (as a result of the SAVEPOINT or which was already needed), that the caller will do it. (I hate non-autocommit. It defies basic code design instincts, which tell me that whoever starts a transaction should finish it. I shouldn't be issuing a non-autocommit SAVEPOINT/RELEASE, and then assuming the caller will COMMIT the transaction that was started automatically. I'm stuck with it in Django. Yuck, but oh well; battling the framework's idioms isn't going to help anything.) On Thu, Jul 23, 2009 at 4:06 AM, Richard Huxton<dev@archonet.com> wrote: >> I'm writing a Python library call. It has no idea whether the caller >> happens to be inside a transaction already, and I don't want to >> specify something like "always run this inside a transaction". >> (Callers are equally likely to want to do either, and it's bad API to >> force them to start a transaction--the fact that I'm using the >> database at al should be transparent.) > > That last bit is never going to work. There always needs to be some basic > level of understanding between systems and transactions really have to be > part of that for talking to a RDBMS. There will have to be a piece of code > responsible for managing transactions somewhere in the > middleware/application layers. It's never 100% transparent--the case of making calls during a transaction and then rolling the whole thing back still needs to be documented. The point, though, is that this isn't a database-centric operation, so it shouldn't have usage restrictions like "must always" or "must never be inside a transaction". > All you're doing here is moving the point of confusion around, surely? At > some point you still need to know whether you can issue > BEGIN/ROLLBACK/COMMIT etc. Not at all--I don't need to use any of these commands. I just do this: SAVEPOINT s; INSERT INTO table ...; RELEASE SAVEPOINT s; to guarantee that my code's effect on the database is atomic. someone else wrote: > So, what you're really asking for boils down to nestable transactions? That's how I've thought of savepoints from day one. When I use them in Python code, I use a with_transaction wrapper, which transparently uses a transaction or a savepoint. -- Glenn Maynard
> Date: Thu, 23 Jul 2009 09:06:50 +0100 > From: Richard Huxton <dev@archonet.com> > To: Glenn Maynard <glenn@zewt.org> > Cc: pgsql-sql@postgresql.org > Subject: Re: Bit by "commands ignored until end of transaction block" > again > Message-ID: <4A681A1A.1090902@archonet.com> > > Glenn Maynard wrote: >> On Thu, Jul 23, 2009 at 2:41 AM, Richard Huxton<dev@archonet.com> wrote: >>> Ah [cue light-bulb effect], I think I understand. Your function isn't in the >>> database is it? Surely your application knows if it's issuing BEGIN..COMMIT? >> I'm writing a Python library call. It has no idea whether the caller >> happens to be inside a transaction already, and I don't want to >> specify something like "always run this inside a transaction". >> (Callers are equally likely to want to do either, and it's bad API to >> force them to start a transaction--the fact that I'm using the >> database at al should be transparent.) > > That last bit is never going to work. There always needs to be some > basic level of understanding between systems and transactions really > have to be part of that for talking to a RDBMS. There will have to be a > piece of code responsible for managing transactions somewhere in the > middleware/application layers. > >>> You'll have people with torches and pitchforks after you if you change >>> RELEASE SAVEPOINT to mean COMMIT. I might even lend them my pitchfork. >> RELEASE SAVEPOINT would only COMMIT the transaction *if* the savepoint >> that it's releasing started it. Every currently-valid case requires >> that a transaction is already started, so no existing code would be >> affected by this. >> >> SAVEPOINT a; -- implicitly issues BEGIN because one wasn't started >> RELEASE SAVEPOINT a; -- implicitly issues COMMIT because savepoint "a" >> issued the BEGIN, not the user > [snip] >> Of course, there are other details--it probably shouldn't allow >> ROLLBACK or COMMIT on an implicit transaction block, for example. > > All you're doing here is moving the point of confusion around, surely? > At some point you still need to know whether you can issue > BEGIN/ROLLBACK/COMMIT etc. > >>> Could it generate: "SELECT ensure_cache_contains(key,data)"? Then ten lines >>> of plpgsql will neatly encapsulate the problem. That plpgsql can be >>> automatically generated easily enough too. >> I don't think so, at least not without digging into internals. Django >> is built around knowing all data types, so it'd need to be givne types >> explicitly--for example, to know whether a timestamp should be >> formatted as a timestamp, date or time. (I do have a couple other >> columns here--timestamps for cache expiration, etc.) I'll have to ask >> Django-side if there's a public API to do this, but I don't think >> there is. > > Well, the types would be exactly the same as for your existing insert. > All it's really doing is changing the template those values get > substituted into. It presumably does mean patching the ORM (or > subclassing from it anyway). > >>> Ah, the joys of badly designed ORMs. The nice thing is that there seem to be >>> plenty of bad ones to choose from too. If your ORM doesn't handle >>> transactions well, the more you use it the more difficult your life will >>> become. I'd be tempted to tidy up your existing fixes and wrap Django's ORM >>> as cleanly as you can. That's assuming they're not interested in patches. >> The ORM on a whole is decent, but there are isolated areas where it's >> very braindamaged--this is one of them. They have a stable-release >> API-compatibility policy, which I think just gets them stuck with some >> really bad decisions for a long time. > > Presumably they targetted MySQL first, where there's a lot less use in > multi-statement transactions with their different behaviour of their > various storage-engines. > FWIW, the way the Rails ORM ActiveRecord (another fairly damaged ORM) handles this is by allowing you to open any number of transaction blocks, but only the outer transaction block commits (in Pg): Property.transaction { # SQL => 'BEGIN' User.transaction { Foo.transaction { Foo.connection.execute('--some sqlcode') # SQL => '--some sql code' } } } # SQL => 'COMMIT' This is pretty kludgy but lets me start any arbitrary transaction without worrying if there's already another one running "on top" of me (which I gather is your worry in this case). Dunno if the approach could work using a class wrapper and the Django ORM, but I would have thought that the implementation shouldn't be too hard.. And apologies to the list for going a little off-track from Pg. Steve
On Sun, 2009-07-26 at 19:15 -0400, Science wrote: > FWIW, the way the Rails ORM ActiveRecord (another fairly damaged ORM) > handles this is by allowing you to open any number of transaction > blocks, but only the outer transaction block commits (in Pg): > > Property.transaction { # SQL => 'BEGIN' > User.transaction { > Foo.transaction { > Foo.connection.execute('--some sql code') # SQL => '--some sql code' > } > } > } # SQL => 'COMMIT' What happens if, Foo.transaction does something that causes an error, though, or issues a rollback? It's not creating savepoints, so if Foo.transaction rolls back it throws out the work of User.transaction and Property.transaction too. Ugh. That design would be quite good _IF_ it used savepoints: Property.transaction { # SQL => 'BEGIN' User.transaction { # SQL => SAVEPOINT User Foo.transaction { # SQL => SAVEPOINTFoo Foo.connection.execute('--some sql code') # SQL => '--some sql code' } # SQL => RELEASESAVEPOINT Foo } # SQL => RELEASE SAVEPOINT User } # SQL => 'COMMIT' ... so that inner transactions could ROLLBACK TO SAVEPOINT on error , and so that asking for a rollback would give you a ROLLBACK TO SAVEPOINT if the transaction is a subtransaction. -- Craig Ringer
Craig Ringer wrote: > On Sun, 2009-07-26 at 19:15 -0400, Science wrote: > >> FWIW, the way the Rails ORM ActiveRecord (another fairly damaged ORM) >> handles this is by allowing you to open any number of transaction >> blocks, but only the outer transaction block commits (in Pg): >> >> Property.transaction { # SQL => 'BEGIN' >> User.transaction { >> Foo.transaction { >> Foo.connection.execute('--some sql code') # SQL => '--some sql code' >> } >> } >> } # SQL => 'COMMIT' > > What happens if, Foo.transaction does something that causes an error, > though, or issues a rollback? It's not creating savepoints, so if > Foo.transaction rolls back it throws out the work of User.transaction > and Property.transaction too. > > Ugh. > > That design would be quite good _IF_ it used savepoints: > > > Property.transaction { # SQL => 'BEGIN' > User.transaction { # SQL => SAVEPOINT User > Foo.transaction { # SQL => SAVEPOINT Foo > Foo.connection.execute('--some sql code') # SQL => '--some sql code' > } # SQL => RELEASE SAVEPOINT Foo > } # SQL => RELEASE SAVEPOINT User > } # SQL => 'COMMIT' > > ... so that inner transactions could ROLLBACK TO SAVEPOINT on error , > and so that asking for a rollback would give you a ROLLBACK TO SAVEPOINT > if the transaction is a subtransaction. > For all I know that's how it works these days. I haven't looked at the code underneath this in a couple of years. It should be trivial to implement in the way you describe based on how the Ruby codebase is set up -- hopefully all this will help OP with the Django/Python version of the same problem. Steve