Thread: TODO note
Hi,<br /><br />I note that the implementation of tab completion for SET TRANSACTION in PSQL could benefit from the implementationof autonomous transactions (also TODO).<br /><br />Regards,<br /><br />Colin<br />
On Wed, Sep 15, 2010 at 3:37 AM, Colin 't Hart <colinthart@gmail.com> wrote: > I note that the implementation of tab completion for SET TRANSACTION in PSQL > could benefit from the implementation of autonomous transactions (also > TODO). I think it's safe to say that if we ever manage to get autonomous transactions working, there are a GREAT MANY things which will benefit from that. There's probably an easier way to get at that Todo item, though, if someone feels like beating on it. One problem with autonomous transactions is that you have to figure out where to store all the state associated with the autonomous transaction and its subtransactions. Another is that you have to avoid an unacceptable slowdown in the tuple-visibility checks in the process. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > On Wed, Sep 15, 2010 at 3:37 AM, Colin 't Hart <colinthart@gmail.com> wrote: >> I note that the implementation of tab completion for SET TRANSACTION in PSQL >> could benefit from the implementation of autonomous transactions (also >> TODO). > > I think it's safe to say that if we ever manage to get autonomous > transactions working, there are a GREAT MANY things which will benefit > from that. There's probably an easier way to get at that Todo item, > though, if someone feels like beating on it. > > One problem with autonomous transactions is that you have to figure > out where to store all the state associated with the autonomous > transaction and its subtransactions. Another is that you have to > avoid an unacceptable slowdown in the tuple-visibility checks in the > process. As I understand it, in many ways, autonomous transactions are like distinct database client sessions, but that the client in this case is another database session, especially if the autonomous transaction can make a commit that persists even if the initial session afterwards does a rollback. Similarly, using autonomous transactions is akin to multi-processing. Normal distinct database client sessions are like distinct processes, but usually are started externally to the DBMS, but autonomous transactions are like processes started within the DBMS. Also, under the assumption that everything in a DBMS session should be subject to transactions, so that both data-manipulation and data-definition can be rolled back, autonomous transactions are like a generalization of supporting sequence generators that remember their incremented state even when the action that incremented it is rolled back; the sequence generator update is effectively an autonomous transaction, in that case. The point being, the answer to how to implement autonomous transactions could be as simple as, do the same thing as how you manage multiple concurrent client sessions, more or less. If each client gets its own Postgres OS process, then an autonomous transaction just farms out to another one of those which does the work. Or maybe there could be a lighter weight version of this. Does this design principle seem reasonable? If autonomous transactions could be used a lot, then maybe the other process could be kept connected and be fed other subsequent autonomous actions, such as if it is being used to implement an activity log, so some kind of IPC would be going on. -- Darren Duncan
On Wed, Sep 15, 2010 at 2:32 PM, Darren Duncan <darren@darrenduncan.net> wrote: > The point being, the answer to how to implement autonomous transactions > could be as simple as, do the same thing as how you manage multiple > concurrent client sessions, more or less. If each client gets its own > Postgres OS process, then an autonomous transaction just farms out to > another one of those which does the work. Or maybe there could be a lighter > weight version of this. > > Does this design principle seem reasonable? I guess so, but the devil is in the details. I suspect that we don't actually want to fork a new backend for every autonomous transactions.That would be pretty expensive, and we already havean expensive way of emulating this functionality using dblink. Finding all of the bits that think there's only one top-level transaction per backend and generalizing them to support multiple top-level transactions per backend doesn't sound easy, though, especially since you must do it without losing performance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > On Wed, Sep 15, 2010 at 2:32 PM, Darren Duncan <darren@darrenduncan.net> wrote: >> The point being, the answer to how to implement autonomous transactions >> could be as simple as, do the same thing as how you manage multiple >> concurrent client sessions, more or less. If each client gets its own >> Postgres OS process, then an autonomous transaction just farms out to >> another one of those which does the work. Or maybe there could be a lighter >> weight version of this. >> >> Does this design principle seem reasonable? > > I guess so, but the devil is in the details. I suspect that we don't > actually want to fork a new backend for every autonomous transactions. > That would be pretty expensive, and we already have an expensive way > of emulating this functionality using dblink. Finding all of the bits > that think there's only one top-level transaction per backend and > generalizing them to support multiple top-level transactions per > backend doesn't sound easy, though, especially since you must do it > without losing performance. As you say, the devil is in the details, but I see this as mainly being an implementation issue, where an essentially same task could abstract different possible implementations, some more light or heavyweight. This is loosely how I look at the issue conceptually, meaning like the illusion that the DBMS presents to the user: The DBMS is a multi-process virtual machine, the database being worked on is the file system or disk, and uncommitted transactions are data structures in memory that may have multiple versions. Each autonomous transaction is associated with a single process. A process can either be started by the user (client connection) or by another process (autonomous transaction). Regardless of how a process is started, the way to manage multiple autonomous tasks is that each has its own process. Tasks that are not mutually autonomous would be within the same process. Child transactions or savepoints have the same process as their parent when the parent can rollback their commits. Whether the DBMS uses multiple OS threads or multiple OS processes or uses coroutines or whatever is an implementation detail. A point here being that over time Postgres can evolve to use either multiple OS processes or multiple threads or a coroutine system within a single thread/process, to provide the illusion of each autonomous transaction being an independent process, and the data structures and algorithms for managing autonomous transactions can be similar to or the same as multiple client connections, since conceptually they are alike. -- Darren Duncan
Excerpts from Robert Haas's message of mié sep 15 14:57:29 -0400 2010: > I guess so, but the devil is in the details. I suspect that we don't > actually want to fork a new backend for every autonomous transactions. > That would be pretty expensive, and we already have an expensive way > of emulating this functionality using dblink. Finding all of the bits > that think there's only one top-level transaction per backend and > generalizing them to support multiple top-level transactions per > backend doesn't sound easy, though, Yeah, and the transaction handling code is already pretty complex. > especially since you must do it without losing performance. Presumably we'd have fast paths for the main transaction, and any autonomous transactions beside that one would incur in some slowdown. I think the complex parts are, first, figuring out what to do with global variables that currently represent a transaction (they are sprinkled all over the place); and second, how to represent the autonomous transactions in shared memory without requiring the PGPROC array to be arbitrarily resizable. The other alternative would be to bolt the autonomous transaction somehow in the current subtransaction stack thing and marking it in some different way so that we can reuse the games we play with "push/pop" there. That still leaves us with the PGPROC problem. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, Sep 15, 2010 at 6:21 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Excerpts from Robert Haas's message of mié sep 15 14:57:29 -0400 2010: > >> I guess so, but the devil is in the details. I suspect that we don't >> actually want to fork a new backend for every autonomous transactions. >> That would be pretty expensive, and we already have an expensive way >> of emulating this functionality using dblink. Finding all of the bits >> that think there's only one top-level transaction per backend and >> generalizing them to support multiple top-level transactions per >> backend doesn't sound easy, though, > > Yeah, and the transaction handling code is already pretty complex. Yep. >> especially since you must do it without losing performance. > > Presumably we'd have fast paths for the main transaction, and > any autonomous transactions beside that one would incur in some > slowdown. > > I think the complex parts are, first, figuring out what to do with > global variables that currently represent a transaction (they are > sprinkled all over the place); and second, how to represent the > autonomous transactions in shared memory without requiring the PGPROC > array to be arbitrarily resizable. > > The other alternative would be to bolt the autonomous transaction > somehow in the current subtransaction stack thing and marking it in some > different way so that we can reuse the games we play with "push/pop" > there. That still leaves us with the PGPROC problem. I wonder if we could use/generalize pg_subtrans in some way to handle the PGPROC problem. I haven't thought about it much, though. One thing that strikes me (maybe this is obvious) is that the execution of the main transaction and the autonomous transaction are not interleaved: it's a stack. So in terms of globals and stuff, assuming you knew which things needed to be updated, you could push all that stuff off to the side, do whatever with the new transaction, and then restore all the context afterwards. That doesn't help in terms of PGPROC, of course, but for backend-local state it seems workable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Hi, On 09/15/2010 07:30 PM, Robert Haas wrote: > One problem with autonomous transactions is that you have to figure > out where to store all the state associated with the autonomous > transaction and its subtransactions. Another is that you have to > avoid an unacceptable slowdown in the tuple-visibility checks in the > process. It just occurs to me that this is the other potential use case for bgworkers: autonomous transactions. Simply store any kind of state in the bgworker and use one per autonomous transaction. What's left to be done: implement communication between the controlling backend (with the client connection) and the bgworker (imessages), drop the bgworker's session to user privileges (and re-raise to superuser after the job) and implement better error handling, as those would have to be propagated back to the controlling backend. Regards Markus Wanner
Robert Haas <robertmhaas@gmail.com> writes: > One thing that strikes me (maybe this is obvious) is that the > execution of the main transaction and the autonomous transaction are > not interleaved: it's a stack. So in terms of globals and stuff, > assuming you knew which things needed to be updated, you could push > all that stuff off to the side, do whatever with the new transaction, > and then restore all the context afterwards. I think they call that dynamic scope, in advanced programming language. I guess that's calling for a quote of Greenspun's Tenth Rule: Any sufficiently complicated C or Fortran program contains an ad hoc informally-specified bug-ridden slow implementationof half of Common Lisp. So the name of the game could be to find out a way to implement (a limited form of) dynamic scoping in PostgreSQL, in C, then find out all and any backend local variable that needs that to support autonomous transactions, then make it happen… Right? Regards, -- dim
On Thu, Sep 16, 2010 at 5:19 AM, Dimitri Fontaine <dfontaine@hi-media.com> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> One thing that strikes me (maybe this is obvious) is that the >> execution of the main transaction and the autonomous transaction are >> not interleaved: it's a stack. So in terms of globals and stuff, >> assuming you knew which things needed to be updated, you could push >> all that stuff off to the side, do whatever with the new transaction, >> and then restore all the context afterwards. > > I think they call that dynamic scope, in advanced programming > language. I guess that's calling for a quote of Greenspun's Tenth Rule: > > Any sufficiently complicated C or Fortran program contains an ad hoc > informally-specified bug-ridden slow implementation of half of Common > Lisp. > > So the name of the game could be to find out a way to implement (a > limited form of) dynamic scoping in PostgreSQL, in C, then find out all > and any backend local variable that needs that to support autonomous > transactions, then make it happen… Right? Interestingly, PostgreSQL was originally written in LISP, and there are remnants of that in the code today; for example, our heavy use of List nodes. But I don't think that has much to do with this project. I plan to reserve judgment on the best way of managing the relevant state until such time as someone has gone to the trouble of identifying what state that is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > I plan to reserve judgment on the best way of managing the relevant > state until such time as someone has gone to the trouble of > identifying what state that is. The really fundamental problem here is that you never will be able to identify all such state. Even assuming that you successfully completed the herculean task of fixing the core backend, what of add-on code? (This is also why I'm quite unimpressed with the idea of trying to get backends to switch to a different database after startup.) regards, tom lane
Robert Haas wrote: > On Thu, Sep 16, 2010 at 5:19 AM, Dimitri Fontaine <dfontaine@hi-media.com> wrote: >> I think they call that dynamic scope, in advanced programming >> language. I guess that's calling for a quote of Greenspun's Tenth Rule: >> >> Any sufficiently complicated C or Fortran program contains an ad hoc >> informally-specified bug-ridden slow implementation of half of Common >> Lisp. >> >> So the name of the game could be to find out a way to implement (a >> limited form of) dynamic scoping in PostgreSQL, in C, then find out all >> and any backend local variable that needs that to support autonomous >> transactions, then make it happen… Right? > > Interestingly, PostgreSQL was originally written in LISP, and there > are remnants of that in the code today; for example, our heavy use of > List nodes. But I don't think that has much to do with this project. > I plan to reserve judgment on the best way of managing the relevant > state until such time as someone has gone to the trouble of > identifying what state that is. It would probably do Pg some good to try and recapture its functional language roots where reasonably possible. I believe that, design-wise, functional languages really are the best way to do object-relational databases, given that pure functions and immutable data structures are typically the best way to express anything one would do with them. -- Darren Duncan