Thread: TODO note

TODO note

From
"Colin 't Hart"
Date:
Hi,<br /><br />I note that the implementation of tab completion for SET TRANSACTION in PSQL could benefit from the
implementationof autonomous transactions (also TODO).<br /><br />Regards,<br /><br />Colin<br /> 

Re: TODO note

From
Robert Haas
Date:
On Wed, Sep 15, 2010 at 3:37 AM, Colin 't Hart <colinthart@gmail.com> wrote:
> I note that the implementation of tab completion for SET TRANSACTION in PSQL
> could benefit from the implementation of autonomous transactions (also
> TODO).

I think it's safe to say that if we ever manage to get autonomous
transactions working, there are a GREAT MANY things which will benefit
from that.  There's probably an easier way to get at that Todo item,
though, if someone feels like beating on it.

One problem with autonomous transactions is that you have to figure
out where to store all the state associated with the autonomous
transaction and its subtransactions.  Another is that you have to
avoid an unacceptable slowdown in the tuple-visibility checks in the
process.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


autonomous transactions (was Re: TODO note)

From
Darren Duncan
Date:
Robert Haas wrote:
> On Wed, Sep 15, 2010 at 3:37 AM, Colin 't Hart <colinthart@gmail.com> wrote:
>> I note that the implementation of tab completion for SET TRANSACTION in PSQL
>> could benefit from the implementation of autonomous transactions (also
>> TODO).
> 
> I think it's safe to say that if we ever manage to get autonomous
> transactions working, there are a GREAT MANY things which will benefit
> from that.  There's probably an easier way to get at that Todo item,
> though, if someone feels like beating on it.
> 
> One problem with autonomous transactions is that you have to figure
> out where to store all the state associated with the autonomous
> transaction and its subtransactions.  Another is that you have to
> avoid an unacceptable slowdown in the tuple-visibility checks in the
> process.

As I understand it, in many ways, autonomous transactions are like distinct 
database client sessions, but that the client in this case is another database 
session, especially if the autonomous transaction can make a commit that 
persists even if the initial session afterwards does a rollback.

Similarly, using autonomous transactions is akin to multi-processing.  Normal 
distinct database client sessions are like distinct processes, but usually are 
started externally to the DBMS, but autonomous transactions are like processes 
started within the DBMS.

Also, under the assumption that everything in a DBMS session should be subject 
to transactions, so that both data-manipulation and data-definition can be 
rolled back, autonomous transactions are like a generalization of supporting 
sequence generators that remember their incremented state even when the action 
that incremented it is rolled back; the sequence generator update is effectively 
an autonomous transaction, in that case.

The point being, the answer to how to implement autonomous transactions could be 
as simple as, do the same thing as how you manage multiple concurrent client 
sessions, more or less.  If each client gets its own Postgres OS process, then 
an autonomous transaction just farms out to another one of those which does the 
work.  Or maybe there could be a lighter weight version of this.

Does this design principle seem reasonable?

If autonomous transactions could be used a lot, then maybe the other process 
could be kept connected and be fed other subsequent autonomous actions, such as 
if it is being used to implement an activity log, so some kind of IPC would be 
going on.

-- Darren Duncan


Re: autonomous transactions (was Re: TODO note)

From
Robert Haas
Date:
On Wed, Sep 15, 2010 at 2:32 PM, Darren Duncan <darren@darrenduncan.net> wrote:
> The point being, the answer to how to implement autonomous transactions
> could be as simple as, do the same thing as how you manage multiple
> concurrent client sessions, more or less.  If each client gets its own
> Postgres OS process, then an autonomous transaction just farms out to
> another one of those which does the work.  Or maybe there could be a lighter
> weight version of this.
>
> Does this design principle seem reasonable?

I guess so, but the devil is in the details.  I suspect that we don't
actually want to fork a new backend for every autonomous transactions.That would be pretty expensive, and we already
havean expensive way 
of emulating this functionality using dblink.  Finding all of the bits
that think there's only one top-level transaction per backend and
generalizing them to support multiple top-level transactions per
backend doesn't sound easy, though, especially since you must do it
without losing performance.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: autonomous transactions (was Re: TODO note)

From
Darren Duncan
Date:
Robert Haas wrote:
> On Wed, Sep 15, 2010 at 2:32 PM, Darren Duncan <darren@darrenduncan.net> wrote:
>> The point being, the answer to how to implement autonomous transactions
>> could be as simple as, do the same thing as how you manage multiple
>> concurrent client sessions, more or less.  If each client gets its own
>> Postgres OS process, then an autonomous transaction just farms out to
>> another one of those which does the work.  Or maybe there could be a lighter
>> weight version of this.
>>
>> Does this design principle seem reasonable?
> 
> I guess so, but the devil is in the details.  I suspect that we don't
> actually want to fork a new backend for every autonomous transactions.
>  That would be pretty expensive, and we already have an expensive way
> of emulating this functionality using dblink.  Finding all of the bits
> that think there's only one top-level transaction per backend and
> generalizing them to support multiple top-level transactions per
> backend doesn't sound easy, though, especially since you must do it
> without losing performance.

As you say, the devil is in the details, but I see this as mainly being an 
implementation issue, where an essentially same task could abstract different 
possible implementations, some more light or heavyweight.

This is loosely how I look at the issue conceptually, meaning like the illusion 
that the DBMS presents to the user:

The DBMS is a multi-process virtual machine, the database being worked on is the 
file system or disk, and uncommitted transactions are data structures in memory 
that may have multiple versions.  Each autonomous transaction is associated with 
a single process.  A process can either be started by the user (client 
connection) or by another process (autonomous transaction).  Regardless of how a 
process is started, the way to manage multiple autonomous tasks is that each has 
its own process.  Tasks that are not mutually autonomous would be within the 
same process.  Child transactions or savepoints have the same process as their 
parent when the parent can rollback their commits.

Whether the DBMS uses multiple OS threads or multiple OS processes or uses 
coroutines or whatever is an implementation detail.

A point here being that over time Postgres can evolve to use either multiple OS 
processes or multiple threads or a coroutine system within a single 
thread/process, to provide the illusion of each autonomous transaction being an 
independent process, and the data structures and algorithms for managing 
autonomous transactions can be similar to or the same as multiple client 
connections, since conceptually they are alike.

-- Darren Duncan


Re: autonomous transactions (was Re: TODO note)

From
Alvaro Herrera
Date:
Excerpts from Robert Haas's message of mié sep 15 14:57:29 -0400 2010:

> I guess so, but the devil is in the details.  I suspect that we don't
> actually want to fork a new backend for every autonomous transactions.
>  That would be pretty expensive, and we already have an expensive way
> of emulating this functionality using dblink.  Finding all of the bits
> that think there's only one top-level transaction per backend and
> generalizing them to support multiple top-level transactions per
> backend doesn't sound easy, though,

Yeah, and the transaction handling code is already pretty complex.

> especially since you must do it without losing performance.

Presumably we'd have fast paths for the main transaction, and
any autonomous transactions beside that one would incur in some
slowdown.

I think the complex parts are, first, figuring out what to do with
global variables that currently represent a transaction (they are
sprinkled all over the place); and second, how to represent the
autonomous transactions in shared memory without requiring the PGPROC
array to be arbitrarily resizable.

The other alternative would be to bolt the autonomous transaction
somehow in the current subtransaction stack thing and marking it in some
different way so that we can reuse the games we play with "push/pop"
there.  That still leaves us with the PGPROC problem.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: autonomous transactions (was Re: TODO note)

From
Robert Haas
Date:
On Wed, Sep 15, 2010 at 6:21 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Excerpts from Robert Haas's message of mié sep 15 14:57:29 -0400 2010:
>
>> I guess so, but the devil is in the details.  I suspect that we don't
>> actually want to fork a new backend for every autonomous transactions.
>>  That would be pretty expensive, and we already have an expensive way
>> of emulating this functionality using dblink.  Finding all of the bits
>> that think there's only one top-level transaction per backend and
>> generalizing them to support multiple top-level transactions per
>> backend doesn't sound easy, though,
>
> Yeah, and the transaction handling code is already pretty complex.

Yep.

>> especially since you must do it without losing performance.
>
> Presumably we'd have fast paths for the main transaction, and
> any autonomous transactions beside that one would incur in some
> slowdown.
>
> I think the complex parts are, first, figuring out what to do with
> global variables that currently represent a transaction (they are
> sprinkled all over the place); and second, how to represent the
> autonomous transactions in shared memory without requiring the PGPROC
> array to be arbitrarily resizable.
>
> The other alternative would be to bolt the autonomous transaction
> somehow in the current subtransaction stack thing and marking it in some
> different way so that we can reuse the games we play with "push/pop"
> there.  That still leaves us with the PGPROC problem.

I wonder if we could use/generalize pg_subtrans in some way to handle
the PGPROC problem.  I haven't thought about it much, though.

One thing that strikes me (maybe this is obvious) is that the
execution of the main transaction and the autonomous transaction are
not interleaved: it's a stack.  So in terms of globals and stuff,
assuming you knew which things needed to be updated, you could push
all that stuff off to the side, do whatever with the new transaction,
and then restore all the context afterwards.  That doesn't help in
terms of PGPROC, of course, but for backend-local state it seems
workable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: TODO note

From
Markus Wanner
Date:
Hi,

On 09/15/2010 07:30 PM, Robert Haas wrote:
> One problem with autonomous transactions is that you have to figure
> out where to store all the state associated with the autonomous
> transaction and its subtransactions.  Another is that you have to
> avoid an unacceptable slowdown in the tuple-visibility checks in the
> process.

It just occurs to me that this is the other potential use case for 
bgworkers: autonomous transactions. Simply store any kind of state in 
the bgworker and use one per autonomous transaction.

What's left to be done: implement communication between the controlling 
backend (with the client connection) and the bgworker (imessages), drop 
the bgworker's session to user privileges (and re-raise to superuser 
after the job) and implement better error handling, as those would have 
to be propagated back to the controlling backend.

Regards

Markus Wanner


Re: autonomous transactions

From
Dimitri Fontaine
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> One thing that strikes me (maybe this is obvious) is that the
> execution of the main transaction and the autonomous transaction are
> not interleaved: it's a stack.  So in terms of globals and stuff,
> assuming you knew which things needed to be updated, you could push
> all that stuff off to the side, do whatever with the new transaction,
> and then restore all the context afterwards.

I think they call that dynamic scope, in advanced programming
language. I guess that's calling for a quote of Greenspun's Tenth Rule:
 Any sufficiently complicated C or Fortran program contains an ad hoc informally-specified bug-ridden slow
implementationof half of Common Lisp. 

So the name of the game could be to find out a way to implement (a
limited form of) dynamic scoping in PostgreSQL, in C, then find out all
and any backend local variable that needs that to support autonomous
transactions, then make it happen… Right?

Regards,
--
dim


Re: autonomous transactions

From
Robert Haas
Date:
On Thu, Sep 16, 2010 at 5:19 AM, Dimitri Fontaine
<dfontaine@hi-media.com> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> One thing that strikes me (maybe this is obvious) is that the
>> execution of the main transaction and the autonomous transaction are
>> not interleaved: it's a stack.  So in terms of globals and stuff,
>> assuming you knew which things needed to be updated, you could push
>> all that stuff off to the side, do whatever with the new transaction,
>> and then restore all the context afterwards.
>
> I think they call that dynamic scope, in advanced programming
> language. I guess that's calling for a quote of Greenspun's Tenth Rule:
>
>  Any sufficiently complicated C or Fortran program contains an ad hoc
>  informally-specified bug-ridden slow implementation of half of Common
>  Lisp.
>
> So the name of the game could be to find out a way to implement (a
> limited form of) dynamic scoping in PostgreSQL, in C, then find out all
> and any backend local variable that needs that to support autonomous
> transactions, then make it happen… Right?

Interestingly, PostgreSQL was originally written in LISP, and there
are remnants of that in the code today; for example, our heavy use of
List nodes.  But I don't think that has much to do with this project.
I plan to reserve judgment on the best way of managing the relevant
state until such time as someone has gone to the trouble of
identifying what state that is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: autonomous transactions

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I plan to reserve judgment on the best way of managing the relevant
> state until such time as someone has gone to the trouble of
> identifying what state that is.

The really fundamental problem here is that you never will be able to
identify all such state.  Even assuming that you successfully completed
the herculean task of fixing the core backend, what of add-on code?

(This is also why I'm quite unimpressed with the idea of trying to
get backends to switch to a different database after startup.)
        regards, tom lane


Re: autonomous transactions

From
Darren Duncan
Date:
Robert Haas wrote:
> On Thu, Sep 16, 2010 at 5:19 AM, Dimitri Fontaine <dfontaine@hi-media.com> wrote:
>> I think they call that dynamic scope, in advanced programming
>> language. I guess that's calling for a quote of Greenspun's Tenth Rule:
>>
>>  Any sufficiently complicated C or Fortran program contains an ad hoc
>>  informally-specified bug-ridden slow implementation of half of Common
>>  Lisp.
>>
>> So the name of the game could be to find out a way to implement (a
>> limited form of) dynamic scoping in PostgreSQL, in C, then find out all
>> and any backend local variable that needs that to support autonomous
>> transactions, then make it happen… Right?
> 
> Interestingly, PostgreSQL was originally written in LISP, and there
> are remnants of that in the code today; for example, our heavy use of
> List nodes.  But I don't think that has much to do with this project.
> I plan to reserve judgment on the best way of managing the relevant
> state until such time as someone has gone to the trouble of
> identifying what state that is.

It would probably do Pg some good to try and recapture its functional language 
roots where reasonably possible.  I believe that, design-wise, functional 
languages really are the best way to do object-relational databases, given that 
pure functions and immutable data structures are typically the best way to 
express anything one would do with them. -- Darren Duncan