Thread: Autonomous Transaction is back

Autonomous Transaction is back

From

Rajeev rastogi

Date:

23 July 2015, 05:31:52

<div class="WordSection1"><p class="MsoNormal">After few failed attempt to propose Autonomous transaction earlier. I
alongwith Simon Riggs would like to propose again but completely different in approach.<p class="MsoNormal"> <p
class="MsoNormal">Wealso had discussion about this feature in last PGCon2015 Unconference Day, those who missed this
discussion,please refer <p class="MsoNormal">                                <a
href="https://wiki.postgresql.org/wiki/AutonomousTransactionsUnconference2015">
https://wiki.postgresql.org/wiki/AutonomousTransactionsUnconference2015</a><pclass="MsoNormal"> <p
class="MsoNormal"> <pclass="MsoNormal">Before jumping into the design and code proposal for this feature, me along with
SimonRiggs wanted to propose its behavior and usage to keep everyone in the same boat.<p class="MsoNormal">So we have
summarizedthe behavior and usage of the Autonomous Transaction based on the discussion with community members in last
PGCon2015Unconference Day:<p class="MsoNormal"> <p class="MsoNormal"><b>Behavior of Autonomous Transaction</b>:<p
class="MsoNormal">1.           The autonomous transaction treated as a completely different transaction from the master
transaction.<pclass="MsoNormal">2.            It should be allowed to deadlock with master transaction. We need to
work-outa solution to avoid deadlock.<p class="MsoNormal">3.            It can support multiple level of nesting based
onthe configuration (may be max as 70).<p class="MsoNormal">4.            Outer (i.e. main or upper autonomous)
transactionto be suspended while the inner autonomous transaction is running.<p class="MsoNormal">5.            Outer
transactionshould not see data of inner till inner is committed (serializable upper transaction should not see even
afterinner transaction commit).<p class="MsoNormal"> <p class="MsoNormal"><b>How to Use Autonomous Transaction</b>:<p
class="MsoNormal">1.We can issue explicit command to start an Autonomous transaction as below:<p
class="MsoNormal">               BEGIN AUTONOMOUS TRANSACTION  (Don’t worry about keywords at this point.)<p
class="MsoNormal">               Do you work.<p class="MsoNormal">                COMMIT/ROLLBACK       (Will
commit/rollbackthe autonomous transaction and will return to main transaction or upper autonomous
transaction).             <p class="MsoNormal"> <p class="MsoNormal">2. The above commands can be issued either inside
theprocedure to make few statements of procedure inside autonomous transaction or even in stand-alone query
execution.<pclass="MsoNormal">3. We can make whole procedure itself as autonomous, which will be similar to start
autonomoustransaction in the beginning of the procedure and commit/rollback at the end of the procedure.   <p
class="MsoNormal"> <pclass="MsoNormal">There was another discussion in Unconference Day to decide whether to implement
COMMIT/ROLLBACKinside the procedure or autonomous transaction. So our opinion about this is that <p
class="MsoNormal">               COMMIT/ROLLBACK inside procedure will be somewhat different from Autonomous
Transactionas incase of first, once we commit inside the procedure, <p class="MsoNormal" style="text-indent:36.0pt">it
commitseverything done before call of procedure. This is the behavior of Oracle. <p class="MsoNormal"
style="text-indent:36.0pt">Soin this case user required to be very careful to not do any operation before call of
procedure,which is not yet intended to be committed inside procedure.<p class="MsoNormal">                <p
class="MsoNormal">Sowe can prefer to implement Autonomous Transaction, which will not only be compatible with Oracle
butalso gives really strong required features.<p class="MsoNormal"> <p class="MsoNormal">I have not put the use-cases
hereas already we agree about its strong use-cases.<p class="MsoNormal"> <p class="MsoNormal">Requesting for everyone's
opinionregarding this based on which we can proceed to enhance/tune/re-write our design.                <p
class="MsoNormal"> <pclass="MsoNormal"><i><span style="color:black">Thanks and Regards,</span></i><p
class="MsoNormal"><i>KumarRajeev Rastogi<span style="color:black"> </span></i><p class="MsoNormal"> </div>

Re: Autonomous Transaction is back

From

Robert Haas

Date:

23 July 2015, 15:34:20

On Thu, Jul 23, 2015 at 1:31 AM, Rajeev rastogi
<rajeev.rastogi@huawei.com> wrote:
> 2.            It should be allowed to deadlock with master transaction. We
> need to work-out a solution to avoid deadlock.

This sentence seems to contradict itself.  I thought the consensus was
that the transaction should NOT conflict with the master transaction.

> 3.            It can support multiple level of nesting based on the
> configuration (may be max as 70).

Why have any fixed maximum?

> 2. The above commands can be issued either inside the procedure to make few
> statements of procedure inside autonomous transaction or even in stand-alone
> query execution.

I think inside a procedure the autonomous transaction will need to be
lexically scoped.  You won't be able to do this, for example:

BEGIN AUTONOMOUS TRANSACTION;
FOR x IN SELECT ... LOOP   COMMIT;   BEGIN AUTONOMOUS TRANSACTION;
END LOOP;

Rather you'd have to do something like this:

FOR x IN SELECT .. LOOP  BEGIN WITH AUTONOMOUS TRANSACTION      do stuff  END;
END LOOP;

In that way it would work like an EXCEPTION block, which can be
attached to a begin block to establish a subtransaction.  There's no
way in PL/pgsql to persistently enter a subtransaction; but you can
have one that applies to a particular lexical scope (the BEGIN block
to which EXCEPTION is attached).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Autonomous Transaction is back

From

Josh Berkus

Date:

23 July 2015, 18:33:50

> Requesting for everyone's opinion regarding this based on which we can
> proceed to enhance/tune/re-write our design.   

So, one of the things which came up during the discussion was advancing
XMIN, which is not important to the audit logging use case, but is very
important for the batch job use case.  What have you concluded regarding
this item?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Autonomous Transaction is back

From

Robert Haas

Date:

23 July 2015, 18:39:09

On Thu, Jul 23, 2015 at 2:33 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> Requesting for everyone's opinion regarding this based on which we can
>> proceed to enhance/tune/re-write our design.
>
> So, one of the things which came up during the discussion was advancing
> XMIN, which is not important to the audit logging use case, but is very
> important for the batch job use case.  What have you concluded regarding
> this item?

Could you explain more specifically what you are talking about here?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Autonomous Transaction is back

From

Josh Berkus

Date:

23 July 2015, 18:49:57

On 07/23/2015 11:39 AM, Robert Haas wrote:
> On Thu, Jul 23, 2015 at 2:33 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> Requesting for everyone's opinion regarding this based on which we can
>>> proceed to enhance/tune/re-write our design.
>>
>> So, one of the things which came up during the discussion was advancing
>> XMIN, which is not important to the audit logging use case, but is very
>> important for the batch job use case.  What have you concluded regarding
>> this item?
> 
> Could you explain more specifically what you are talking about here?
> 

Yeah, my notes are kinda incoherent, no?

There's two core use-cases for Autonomous Transactions (hereafter ATX):

* audit logging
* batch jobs

Audit Logging: triggers or other statements which should leave a record
even when a transaction aborts.  While audit logging is the main example
of this use case, any kind of logging of failures is applicable.

Batch Jobs: large data-manipulation tasks which need to be broken up
into segments, with each segment committing separately.  Example:
updating 1 million records in batches of 1000.

Unlike the Audit Logging case, Batch Jobs generally want XMIN to advance
so that updated/imported/deleted rows can be vacuumed or HOT updated.
Thus the need to allow XMIN to advance.

One of the things we kind of concluded from our discussion was that the
two core use-cases are probably different features:

Audit Logging:
* requires 2-way data interaction with outer transaction
* no parallelism
* XMIN does not need to advance
* master transaction should still commit/fail
* needs to support nesting

Batch Jobs:
* 1-way data interaction sufficient (master-->child)
* parallelism desired
* XMIN should advance
* master process could be transactionless
* does not need to support nesting

Of these two, the Audit Logging case is the more important one to
implement because there is no real workaround for it.  Batch Jobs can,
and are, handled by external scripting, and having ATX for them is more
of a convenience than anything else.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Autonomous Transaction is back

From

Robert Haas

Date:

23 July 2015, 19:40:36

On Thu, Jul 23, 2015 at 2:49 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 07/23/2015 11:39 AM, Robert Haas wrote:
>> On Thu, Jul 23, 2015 at 2:33 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>>> Requesting for everyone's opinion regarding this based on which we can
>>>> proceed to enhance/tune/re-write our design.
>>>
>>> So, one of the things which came up during the discussion was advancing
>>> XMIN, which is not important to the audit logging use case, but is very
>>> important for the batch job use case.  What have you concluded regarding
>>> this item?
>>
>> Could you explain more specifically what you are talking about here?
>>
> Yeah, my notes are kinda incoherent, no?
>
> There's two core use-cases for Autonomous Transactions (hereafter ATX):
>
> * audit logging
> * batch jobs
>
> Audit Logging: triggers or other statements which should leave a record
> even when a transaction aborts.  While audit logging is the main example
> of this use case, any kind of logging of failures is applicable.
>
> Batch Jobs: large data-manipulation tasks which need to be broken up
> into segments, with each segment committing separately.  Example:
> updating 1 million records in batches of 1000.
>
> Unlike the Audit Logging case, Batch Jobs generally want XMIN to advance
> so that updated/imported/deleted rows can be vacuumed or HOT updated.
> Thus the need to allow XMIN to advance.
>
> One of the things we kind of concluded from our discussion was that the
> two core use-cases are probably different features:
>
> Audit Logging:
> * requires 2-way data interaction with outer transaction
> * no parallelism
> * XMIN does not need to advance
> * master transaction should still commit/fail
> * needs to support nesting
>
> Batch Jobs:
> * 1-way data interaction sufficient (master-->child)
> * parallelism desired
> * XMIN should advance
> * master process could be transactionless
> * does not need to support nesting
>
> Of these two, the Audit Logging case is the more important one to
> implement because there is no real workaround for it.  Batch Jobs can,
> and are, handled by external scripting, and having ATX for them is more
> of a convenience than anything else.

You're still not really explaining what you mean by "xmin should
advance".  If the parent transaction holds a snapshot, or for as long
as it does, xmin can't be advanced safely.  If it doesn't, you'll be
fine.  I suppose the situation you're worried about is where we
execute a stored procedure that repeatedly spawns autonomous
transactions.  Since the parent transaction will always have a
snapshot, you won't advance xmin until the entire stored procedure
finishes.

That's a problem, but I think it is rather unfair to say that it has
anything to do with autonomous transactions.  "Run a procedure without
needing to hold a snapshot" is a completely separate feature request
from "allow autonomous transactions", and it's probably neither easy
nor uncontroversial.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Autonomous Transaction is back

From

Josh Berkus

Date:

23 July 2015, 19:58:54

On 07/23/2015 12:40 PM, Robert Haas wrote:
> That's a problem, but I think it is rather unfair to say that it has
> anything to do with autonomous transactions.  "Run a procedure without
> needing to hold a snapshot" is a completely separate feature request
> from "allow autonomous transactions", and it's probably neither easy
> nor uncontroversial.

That's more-or-less a summary of what I just posted.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Autonomous Transaction is back

From

Ashutosh Bapat

Date:

24 July 2015, 05:22:40

On Thu, Jul 23, 2015 at 11:01 AM, Rajeev rastogi <rajeev.rastogi@huawei.com> wrote:

After few failed attempt to propose Autonomous transaction earlier. I along with Simon Riggs would like to propose again but completely different in approach.

We also had discussion about this feature in last PGCon2015 Unconference Day, those who missed this discussion, please refer
                                https://wiki.postgresql.org/wiki/AutonomousTransactionsUnconference2015

Before jumping into the design and code proposal for this feature, me along with Simon Riggs wanted to propose its behavior and usage to keep everyone in the same boat.
So we have summarized the behavior and usage of the Autonomous Transaction based on the discussion with community members in last PGCon2015 Unconference Day:

Behavior of Autonomous Transaction:
1.            The autonomous transaction treated as a completely different transaction from the master transaction.
2.            It should be allowed to deadlock with master transaction. We need to work-out a solution to avoid deadlock.

For an autonomous transaction not to conflict with the master, either it shouldn't take conflicting resources or when it does so, they should be shared by the master transaction. The first needs to be implemented in an application, the later will need careful design of how to share such resources (hierarchial resource sharing?) and how to release the resources when the ATX ends. Given that the transactions are autonomous as per point 1, they should be allowed to conflict and the deadlock detection would take care of such deadlocks.

3. It can support multiple level of nesting based on the configuration (may be max as 70).

This might just be an implementation artifact in other RDBMSes, and may not necessarily apply to the implementation done here. So, we should not list it as "necessary" behaviour of ATX. Just as an example (no way suggesting that the design should be this way), if we implement autonomous transactions using background worker processes or additional backends like prepared transactions, we will have a limit on the number of simultaneously open ATX. This may translate to nesting limit if we implement 4, but not if we implement parallel autonomous transactions.

4. Outer (i.e. main or upper autonomous) transaction to be suspended while the inner autonomous transaction is running.
5. Outer transaction should not see data of inner till inner is committed (serializable upper transaction should not see even after inner transaction commit).

To me only 1 and 5 seem to be the real behaviour we should define, may be 4, but that restricts parallelism. Although parallel autonomous transactions will have its own problems to solve like sharing code. But the design shouldn't restrict it from being a future possibility, I guess.

How to Use Autonomous Transaction:
1. We can issue explicit command to start an Autonomous transaction as below:
                BEGIN AUTONOMOUS TRANSACTION (Don’t worry about keywords at this point.)
                Do you work.
                COMMIT/ROLLBACK       (Will commit/rollback the autonomous transaction and will return to main transaction or upper autonomous transaction).

2. The above commands can be issued either inside the procedure to make few statements of procedure inside autonomous transaction or even in stand-alone query execution.
3. We can make whole procedure itself as autonomous, which will be similar to start autonomous transaction in the beginning of the procedure and commit/rollback at the end of the procedure.

There was another discussion in Unconference Day to decide whether to implement COMMIT/ROLLBACK inside the procedure or autonomous transaction. So our opinion about this is that
                COMMIT/ROLLBACK inside procedure will be somewhat different from Autonomous Transaction as incase of first, once we commit inside the procedure,
it commits everything done before call of procedure. This is the behavior of Oracle.
So in this case user required to be very careful to not do any operation before call of procedure, which is not yet intended to be committed inside procedure.

So we can prefer to implement Autonomous Transaction, which will not only be compatible with Oracle but also gives really strong required features.

I have not put the use-cases here as already we agree about its strong use-cases.

Requesting for everyone's opinion regarding this based on which we can proceed to enhance/tune/re-write our design.

Thanks and Regards,
Kumar Rajeev Rastogi

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Autonomous Transaction is back

From

Merlin Moncure

Date:

24 July 2015, 19:39:20

On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Batch Jobs: large data-manipulation tasks which need to be broken up
> into segments, with each segment committing separately.  Example:
> updating 1 million records in batches of 1000.

Autonomous transactions are not a good fit for this case; stored
procedures are a better way to go for any scenario where you don't
want be be in a snapshot (for example, suppose you want to change
isolation level on the fly).

merlin

Re: Autonomous Transaction is back

From

Rajeev rastogi

Date:

27 July 2015, 09:48:25

On 23 July 2015 21:04, Robert Haas Wrote:
On Thu, Jul 23, 2015 at 1:31 AM, Rajeev rastogi <rajeev.rastogi@huawei.com> wrote:
>> 2.            It should be allowed to deadlock with master transaction. We
>> need to work-out a solution to avoid deadlock.

>This sentence seems to contradict itself.  I thought the consensus was that the transaction should NOT conflict with
themaster transaction.

Since we are saying transaction is autonomous to parent transaction, we cannot guarantee that it does not take any
conflictinglock unless otherwise designed so by the application.

But yes, we should have mechanism to deal with the possible deadlock.

>> 3.            It can support multiple level of nesting based on the
>> configuration (may be max as 70).

>Why have any fixed maximum?

Since we are planning to have nested autonomous transaction, so it is required to have limit on this so that resources
canbe controlled.

>> 2. The above commands can be issued either inside the procedure to 
>> make few statements of procedure inside autonomous transaction or even 
>> in stand-alone query execution.

>I think inside a procedure the autonomous transaction will need to be lexically scoped.  You won't be able to do this,
forexample:

>BEGIN AUTONOMOUS TRANSACTION;
>FOR x IN SELECT ... LOOP
>    COMMIT;
>    BEGIN AUTONOMOUS TRANSACTION;
>END LOOP;

I am not sure, how we will be able to control this. IMHO user should be able to control this, especially since it does
nothave any meaning from user perspective.

Please let me know if I am missing something here.

>Rather you'd have to do something like this:

>FOR x IN SELECT .. LOOP
>   BEGIN WITH AUTONOMOUS TRANSACTION
>       do stuff
>   END;
>END LOOP;

Thanks and Regards,
Kumar Rajeev Rastogi

Re: Autonomous Transaction is back

From

Joel Jacobson

Date:

27 July 2015, 21:41:42

On Fri, Jul 24, 2015 at 9:39 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Batch Jobs: large data-manipulation tasks which need to be broken up
> into segments, with each segment committing separately. Example:
> updating 1 million records in batches of 1000.

Autonomous transactions are not a good fit for this case; stored
procedures are a better way to go for any scenario where you don't
want be be in a snapshot (for example, suppose you want to change
isolation level on the fly).

Hm, you mean we need real "stored procedures" in PostgreSQL and not just "functions"?

If not, I think it would be sufficient to add Autonomous Transaction support to the type of functions we already have in pg to allow writing a batch job function which would commit after X numbers of modified rows, instead of having to write a script in an external language such as Perl to call the function in a while-loop and commit in between each function call.

However, we should also add a way for the caller to protect against an Autonomous Transaction in a function called by the caller. Imagine if you're the author of function X() and within X() make use of some other function Y() which has been written by some other author, and within your function X(), it's very important either all of your work or none at all gets committed, then you need to make sure none of the changes you made before calling Y() gets committed, and thus we need a way to prevent Y() from starting and committing an Autonomous Transaction, otherwise we would increase the risk and complexity of working with functions and plpgsql in PostgreSQL as you would then need to be sure none of the functions you are using within a function will start and commit an ATX.

Re: Autonomous Transaction is back

From

Josh Berkus

Date:

27 July 2015, 21:49:41

On 07/27/2015 02:41 PM, Joel Jacobson wrote:
> However, we should also add a way for the caller to protect against an
> Autonomous Transaction in a function called by the caller. Imagine if
> you're the author of function X() and within X() make use of some other
> function Y() which has been written by some other author, and within
> your function X(), it's very important either all of your work or none
> at all gets committed, then you need to make sure none of the changes
> you made before calling Y() gets committed, and thus we need a way to
> prevent Y() from starting and committing an Autonomous Transaction,
> otherwise we would increase the risk and complexity of working with
> functions and plpgsql in PostgreSQL as you would then need to be sure
> none of the functions you are using within a function will start and
> commit an ATX.

Ah, you're missing how commits in ATX are expected to work.  Let me
illustrate:

X (  Data write A1  call Y(Start ATXData write B1Commit ATX  )  Data write A2  Exception
)

In this workflow, B1 would be committed and persistent. Neither A1 nor
A2 would be committed, or visible to other users.  Depending on what
implementation we end up with, A1 might not even be visible to Y().

So that solves your use case without any need to "block" ATXs in called
functions.  However, it leads to some interesting cases involving
self-deadlocks; see the original post on this thread.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Autonomous Transaction is back

From

Josh Berkus

Date:

27 July 2015, 21:51:03

On 07/27/2015 02:47 AM, Rajeev rastogi wrote:
>> Why have any fixed maximum?
> Since we are planning to have nested autonomous transaction, so it is required to have limit on this so that
resourcescan be controlled.
 

Is there a particular reason why this limit wouldn't just be
max_stack_depth?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Autonomous Transaction is back

From

Merlin Moncure

Date:

27 July 2015, 21:59:50

On Mon, Jul 27, 2015 at 4:41 PM, Joel Jacobson <joel@trustly.com> wrote:
> On Fri, Jul 24, 2015 at 9:39 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>
>> On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> > Batch Jobs: large data-manipulation tasks which need to be broken up
>> > into segments, with each segment committing separately.  Example:
>> > updating 1 million records in batches of 1000.
>>
>> Autonomous transactions are not a good fit for this case; stored
>> procedures are a better way to go for any scenario where you don't
>> want be be in a snapshot (for example, suppose you want to change
>> isolation level on the fly).
>
>
> Hm, you mean we need real "stored procedures" in PostgreSQL and not just
> "functions"?

Yes, exactly.

Autonomous transactions aren't really set up for cases where the
function runs for a very long time or indefinitely.  This is the
'advancing xmin' problem as Josh puts it but I think the problem is
much bigger than that.  Anyways, this is mostly irrelevant to
autonomous transactions as long as the design isn't extended to try
and cover that case.

Is the Autonomous Transaction feature only going to be exposed through pl/pgsql?

merlin

Re: Autonomous Transaction is back

From

Joel Jacobson

Date:

27 July 2015, 22:12:48

On Mon, Jul 27, 2015 at 11:49 PM, Josh Berkus <josh@agliodbs.com> wrote:

Ah, you're missing how commits in ATX are expected to work. Let me
illustrate:

X (
Data write A1
call Y(
Start ATX
Data write B1
Commit ATX
)
Data write A2
Exception
)

In this workflow, B1 would be committed and persistent. Neither A1 nor
A2 would be committed, or visible to other users. Depending on what
implementation we end up with, A1 might not even be visible to Y().

So that solves your use case without any need to "block" ATXs in called
functions. However, it leads to some interesting cases involving
self-deadlocks; see the original post on this thread.

I don't follow. In your example above, if I'm X(), how do I ensure Y() won't have committed anyting at all when I later at "Exception" decide to rollback everything from "Data write A1" to "Data write A2" including any writes made by Y() (in the example "Data write B1")?

I understand the "Exception" will take care of rollbacking my (X's) writes, but that's not sufficient if you want to make sure you rollback *everything*, including any writes made by functions you call.

Right now, when writing a function, if you raise an exception, you can be sure all writes you have made will be rollbacked, but your caller function might caught the exception and decide to carry on and commit work made before your function was called, but at least you can be confident your writes won't be committed as long as you don't caught the exception you raised in your own function. If I understand it correctly, that would change with the addition of Autonomous Transaction, unless given a way to prevent a function you call from starting and commiting a Autonomous Transaction. Wrong? If so, then please show how to prevent Y() from commiting the "Data write B1" in your example, I don't get it.

Re: Autonomous Transaction is back

From

Josh Berkus

Date:

27 July 2015, 22:56:11

On 07/27/2015 03:12 PM, Joel Jacobson wrote:
> On Mon, Jul 27, 2015 at 11:49 PM, Josh Berkus <josh@agliodbs.com
> <mailto:josh@agliodbs.com>> wrote:
> 
>     Ah, you're missing how commits in ATX are expected to work.  Let me
>     illustrate:
> 
>     X (
>        Data write A1
>        call Y(
>             Start ATX
>             Data write B1
>             Commit ATX
>        )
>        Data write A2
>        Exception
>     )
> 
>     In this workflow, B1 would be committed and persistent. Neither A1 nor
>     A2 would be committed, or visible to other users.  Depending on what
>     implementation we end up with, A1 might not even be visible to Y().
> 
>     So that solves your use case without any need to "block" ATXs in called
>     functions.  However, it leads to some interesting cases involving
>     self-deadlocks; see the original post on this thread.
> 
> 
> I don't follow. In your example above, if I'm X(), how do I ensure Y()
> won't have committed anyting at all when I later at "Exception" decide
> to rollback everything from "Data write A1" to "Data write A2" including
> any writes made by Y() (in the example "Data write B1")?

Ah, ok.  The goal of the project is that the writer of X() *cannot*
prevent Y() from writing its data (B1) and committing it.

One of the primary use cases for ATX is audit triggers.  If a function
writer could override ATX and prevent the audit triggers from
committing, then that use case would be violated.

Can you explain what use case you have where simply telling the staff
"if you use ATX without clearing it, you'll be fired" is not sufficient?Possibly there's something we failed to account
forin the unconference
 
discussion.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Autonomous Transaction is back

From

Jim Nasby

Date:

27 July 2015, 23:40:58

On 7/27/15 5:12 PM, Joel Jacobson wrote:
> Right now, when writing a function, if you raise an exception, you can
> be sure all writes you have made will be rollbacked, but your caller
> function might caught the exception and decide to carry on and commit
> work made before your function was called, but at least you can be
> confident your writes won't be committed as long as you don't caught the
> exception you raised in your own function. If I understand it correctly,
> that would change with the addition of Autonomous Transaction, unless
> given a way to prevent a function you call from starting and commiting
> a Autonomous Transaction. Wrong? If so, then please show how to prevent
> Y() from commiting the "Data write B1" in your example, I don't get it.

What's being described here doesn't make sense in either use case ([1] & 
[2]), but I do understand the concern about what 3rd party software is 
doing. It would be nice to have the ability to disallow and/or disable 
autonomous transactions, but I don't see a practical way of doing that 
other than introducing a new GUC. I'm not sure if it's worth that effort.

[1] the "batch process" use case: batches that still hold their own 
transaction open don't gain anything.

[2] the "audit logging" case. If you didn't care about auditing 
surviving regardless of a rollback then you wouldn't go to the extra 
work of an autonomous transaction to begin with.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Autonomous Transaction is back

From

Jim Nasby

Date:

27 July 2015, 23:44:02

On 7/27/15 5:56 PM, Josh Berkus wrote:
> Can you explain what use case you have where simply telling the staff
> "if you use ATX without clearing it, you'll be fired" is not sufficient?
>   Possibly there's something we failed to account for in the unconference
> discussion.

That there's no way to enforce that, short of hand-auditing code? 
There's already enough things that are difficult/impossible to enforce, 
I'd rather not add another one.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Autonomous Transaction is back

From

Jim Nasby

Date:

27 July 2015, 23:47:24

On 7/27/15 6:40 PM, Jim Nasby wrote:
> On 7/27/15 5:12 PM, Joel Jacobson wrote:
>> Right now, when writing a function, if you raise an exception, you can
>> be sure all writes you have made will be rollbacked, but your caller
>> function might caught the exception and decide to carry on and commit
>> work made before your function was called, but at least you can be
>> confident your writes won't be committed as long as you don't caught the
>> exception you raised in your own function. If I understand it correctly,
>> that would change with the addition of Autonomous Transaction, unless
>> given a way to prevent a function you call from starting and commiting
>> a Autonomous Transaction. Wrong? If so, then please show how to prevent
>> Y() from commiting the "Data write B1" in your example, I don't get it.
>
> What's being described here doesn't make sense in either use case ([1] &
> [2]), but I do understand the concern about what 3rd party software is
> doing. It would be nice to have the ability to disallow and/or disable
> autonomous transactions, but I don't see a practical way of doing that
> other than introducing a new GUC. I'm not sure if it's worth that effort.

It just occurred to me that another option would be to have an event 
trigger for beginning an autonomous transaction.

> [1] the "batch process" use case: batches that still hold their own
> transaction open don't gain anything.
>
> [2] the "audit logging" case. If you didn't care about auditing
> surviving regardless of a rollback then you wouldn't go to the extra
> work of an autonomous transaction to begin with.


-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Autonomous Transaction is back

From

Pavel Stehule

Date:

28 July 2015, 03:40:43

2015-07-27 23:59 GMT+02:00 Merlin Moncure <mmoncure@gmail.com>:

On Mon, Jul 27, 2015 at 4:41 PM, Joel Jacobson <joel@trustly.com> wrote:
> On Fri, Jul 24, 2015 at 9:39 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>
>> On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> > Batch Jobs: large data-manipulation tasks which need to be broken up
>> > into segments, with each segment committing separately. Example:
>> > updating 1 million records in batches of 1000.
>>
>> Autonomous transactions are not a good fit for this case; stored
>> procedures are a better way to go for any scenario where you don't
>> want be be in a snapshot (for example, suppose you want to change
>> isolation level on the fly).
>
>
> Hm, you mean we need real "stored procedures" in PostgreSQL and not just
> "functions"?

Yes, exactly.

Autonomous transactions aren't really set up for cases where the
function runs for a very long time or indefinitely. This is the
'advancing xmin' problem as Josh puts it but I think the problem is
much bigger than that. Anyways, this is mostly irrelevant to
autonomous transactions as long as the design isn't extended to try
and cover that case.

Is the Autonomous Transaction feature only going to be exposed through pl/pgsql?

I hope not.

The integration with plpgsql can be secondary question. In this case I prefer a relation to block statement without possibility to explicit COMMIT. Minimally in functions.

some like

BEGIN

BEGIN AUTONOMOUS
...

END;

This is consistent with current subtransaction support, and disallow some corner cases like forgotten COMMIT.

Regards

Pavel

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Autonomous Transaction is back

From

Ashutosh Bapat

Date:

28 July 2015, 06:31:10

<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote"><blockquote class="gmail_quote"
style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br /></span>Ah, you're missing how
commitsin ATX are expected to work.  Let me<br /> illustrate:<br /><br /> X (<br />    Data write A1<br />    call
Y(<br/>         Start ATX<br />         Data write B1<br />         Commit ATX<br />    )<br />    Data write A2<br />
  Exception<br /> )<br /><br /> In this workflow, B1 would be committed and persistent. Neither A1 nor<br /> A2 would
becommitted, or visible to other users.  Depending on what<br /> implementation we end up with, A1 might not even be
visibleto Y().<br /><br /></blockquote></div><br /></div><div class="gmail_extra">A1 should never be visible to Y(),
elsewe will end up with inconsistencies. E.g.<br /><br /></div><div class="gmail_extra">A1 is a primary key and B1
writesa foreign key referencing A1. Commit ATX, will not complain as it sees A1, but in case X rolls back, we may have
B1referencing nothing.<br /><br /></div><div class="gmail_extra">Am I missing something?<br clear="all" /></div><div
class="gmail_extra"><br/>-- <br /><div class="gmail_signature"><div dir="ltr">Best Wishes,<br />Ashutosh Bapat<br
/>EnterpriseDBCorporation<br />The Postgres Database Company<br /></div></div></div></div>

Re: Autonomous Transaction is back

From

Joel Jacobson

Date:

28 July 2015, 09:02:54

On Tue, Jul 28, 2015 at 12:56 AM, Josh Berkus <josh@agliodbs.com> wrote:

Ah, ok. The goal of the project is that the writer of X() *cannot*
prevent Y() from writing its data (B1) and committing it.

One of the primary use cases for ATX is audit triggers. If a function
writer could override ATX and prevent the audit triggers from
committing, then that use case would be violated.

Can you explain what use case you have where simply telling the staff
"if you use ATX without clearing it, you'll be fired" is not sufficient?
Possibly there's something we failed to account for in the unconference
discussion.

I fully understand and agree why you want to prevent X() from letting Y() commit, if the use-case is e.g. auditing.

I'll try to explain where I'm coming from by providing a bit of background and context.

One of the greatest strengths with writing an entire application using only sql and plpgsql functions,

is you don't have to worry about side effects when calling functions, since you are always in total control to

rollback all your writes at your stack-depth and deeper down the stack, and the caller can likewise be certain

that if the function it called threw an exception, all of its work is rollbacked.

However, if e.g. making use of plperlu functions, it's possible those functions might have written files to disk

or made network connections to the outside world,

i.e. it's possible they have caused side-effects that naturally cannot be rollbacked by postgres.

Currently, in our codebase at Trustly, we have already quite a few plperlu functions.

In each one of them, we need to think carefully about side-effects,

it's usually fine since most of them are immutable.

But if we had to worry about all plpgsql functions being able to write things without consent from the caller,

then we would have a completely different situation with increased complexity and risk for failures.

>"if you use ATX without clearing it, you'll be fired"

We trust each other so we don't have that kind of problem.

But still, even when trusting your own and others code, I would say quite often you make use of a function

for which there are small details you have forgotten about or never knew in the first place,

and if that little detail would be something written in an ATX, that could be a problem

if you the caller wouldn't want whatever was written to be written.

Don't get me wrong, ATX is something I would absoltely love, since then you could

for instance in function doing password validation, update the FailedLoginAttempts column

in an ATX and then still raise an exception to rollback the operation and return an error to the client.

However, the need for ATXs is at least for us a special need you won't need in most functions,

and since the risk and complexity increases with it, I would prefer if it can be enaled/disabled

by default globally and explicitly enabled/disabled per function.

If the global default is "disabled" or if it's "disabled" for a specific function, like for X() in your example,

and if it's enabled for Y(), then X() tries to call Y() you should get an error even before Y() is executed.

That way we can still do auditing, since X() couldn't execute Y() since it was declared as AT,

and that would be the same thing as if X() wouldn't have the line of code in it that executs Y(),

something which X() is in power of as if X() calls Y() or not, is ultimately X()'s decision.

If we declare entire functions as AT, then we only have to check before executing the function

if AT is allowed or not in the context, determined by the global default or if the caller function is defined AT or NOT AT.

Use cases:

1. X[AT] -> Y[AT]

OK, since caller X() is declared AT i.e. allows AT in itself and in callees.

2. X[AT] -> Y[NOT AT]

OK, since caller X() is declared AT i.e. allows AT in itself and in callees,

and since Y() is NOT AT, i.e. not making use of AT and not allowing it in callees

that is not in violation with anything.

3: X[NOT AT] -> Y[AT]

Invalid, since caller X() is declared NOT AT, i.e. disallows AT in itself and in callees,

and since Y() is declared AT it cannot be executed since it's declared AT.

4: X[NOT AT] -> Y[NOT AT]

OK, since caller X() is declared NOT AT, i.e. disallows AT in itself and in callees,

and since Y() is also declared NOT AT, it can be executed.

Re: Autonomous Transaction is back

From

Craig Ringer

Date:

28 July 2015, 10:01:12

On 23 July 2015 at 13:31, Rajeev rastogi <rajeev.rastogi@huawei.com> wrote:

> 1.            The autonomous transaction treated as a completely different
> transaction from the master transaction.

Personally I think that's a lot more useful than having the inner tx
able to see the outer tx's uncommitted changes.

> 2.            It should be allowed to deadlock with master transaction. We
> need to work-out a solution to avoid deadlock.

The deadlock case in autonomous tx's is a bit different.

Assuming you don't intend to allow interleaving, where you can switch
between transactions at will rather than just at begin/commit, the
only way a deadlock can happen is when the outer tx holds a lock that
the inner tx tries to acquire.

That should be practical to special-case by maintaining a list of
parent transaction (virtual?) transaction IDs. Attempts to wait on a
lock held by any of those should fail immediately. There's no point
waiting for the deadlock detector since the outer tx can never
progress and commit/rollback to release locks, and it might not be
able to see the parent/child relationship from outside the backend
doing the nested tx anyway.

There's no need to check the parent list until we actually try to wait
on a lock, though I don't know whether it's practical to delay until
then.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Autonomous Transaction is back

From

Rajeev rastogi

Date:

29 July 2015, 02:50:39

On 28 July 2015 15:31, Craig Ringer Wrote:


>> 2.            It should be allowed to deadlock with master transaction.
>We
>> need to work-out a solution to avoid deadlock.
>
>The deadlock case in autonomous tx's is a bit different.
>
>Assuming you don't intend to allow interleaving, where you can switch
>between transactions at will rather than just at begin/commit, the only
>way a deadlock can happen is when the outer tx holds a lock that the
>inner tx tries to acquire.
>
>That should be practical to special-case by maintaining a list of parent
>transaction (virtual?) transaction IDs. Attempts to wait on a lock held
>by any of those should fail immediately. There's no point waiting for
>the deadlock detector since the outer tx can never progress and
>commit/rollback to release locks, and it might not be able to see the
>parent/child relationship from outside the backend doing the nested tx
>anyway.

Thanks, sounds to be a good idea. I shall evaluate the same.

Thanks and Regards,
Kumar Rajeev Rastogi

Re: Autonomous Transaction is back

From

Rajeev rastogi

Date:

30 July 2015, 09:55:53

On 28 July 2015 03:21, Josh Berkus Wrote:
On 07/27/2015 02:47 AM, Rajeev rastogi wrote:
>>> Why have any fixed maximum?
>> Since we are planning to have nested autonomous transaction, so it is required to have limit on this so that
resourcescan be controlled.

>Is there a particular reason why this limit wouldn't just be max_stack_depth?

We will require to allocate some initial resources in order to handle all nested autonomous transaction. 
So I think it is better to have some different configuration parameter.

Thanks and Regards,
Kumar Rajeev Rastogi

Re: Autonomous Transaction is back

From

Robert Haas

Date:

31 July 2015, 17:40:20

On Tue, Jul 28, 2015 at 6:01 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> That should be practical to special-case by maintaining a list of
> parent transaction (virtual?) transaction IDs. Attempts to wait on a
> lock held by any of those should fail immediately. There's no point
> waiting for the deadlock detector since the outer tx can never
> progress and commit/rollback to release locks, and it might not be
> able to see the parent/child relationship from outside the backend
> doing the nested tx anyway.

I think we're going entirely down the wrong path here.  Why is it ever
useful for a backend's lock requests to conflict with themselves, even
with autonomous transactions?  That seems like an artifact of somebody
else's implementation that we should be happy we don't need to copy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Autonomous Transaction is back

From

Rajeev rastogi

Date:

03 August 2015, 04:38:32

On 31 July 2015 23:10, Robert Haas Wrote: 
On Tue, Jul 28, 2015 at 6:01 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> That should be practical to special-case by maintaining a list of 
>> parent transaction (virtual?) transaction IDs. Attempts to wait on a 
>> lock held by any of those should fail immediately. There's no point 
>> waiting for the deadlock detector since the outer tx can never 
>> progress and commit/rollback to release locks, and it might not be 
>> able to see the parent/child relationship from outside the backend 
>> doing the nested tx anyway.

>I think we're going entirely down the wrong path here.  Why is it ever useful for a backend's lock requests to
conflictwith themselves, even with autonomous transactions?  That seems like an artifact of somebody else's
implementationthat we should be happy we don't need to copy.

IMHO, since most of the locking are managed at transaction level not backend level and we consider main & autonomous
transactionto be independent transaction, then practically they may conflict right.    

It is also right as you said that there is no as such useful use-cases where autonomous transaction conflicts with main
(parent)transaction. But we cannot take it for granted as user might make a mistake. So at-least we should have some
mechanismto handle this rare case, for which as of now I think throwing error from autonomous transaction as one of the
solution.Once error thrown from autonomous transaction, main transaction may continue as it is (or abort main
transactionalso??). 

Any other suggestion to handle this will be really helpful.

Thanks and Regards,
Kumar Rajeev Rastogi

Re: Autonomous Transaction is back

From

Merlin Moncure

Date:

03 August 2015, 13:09:50

On Sun, Aug 2, 2015 at 11:37 PM, Rajeev rastogi
<rajeev.rastogi@huawei.com> wrote:
> On 31 July 2015 23:10, Robert Haas Wrote:
>>I think we're going entirely down the wrong path here.  Why is it ever useful for a backend's lock requests to
conflictwith themselves, even with autonomous transactions?  That seems like an artifact of somebody else's
implementationthat we should be happy we don't need to copy. 
>
> IMHO, since most of the locking are managed at transaction level not backend level and we consider main & autonomous
transactionto be independent transaction, then practically they may conflict right. 
> It is also right as you said that there is no as such useful use-cases where autonomous transaction conflicts with
main(parent) transaction. But we cannot take it for granted as user might make a mistake. So at-least we should have
somemechanism to handle this rare case, for which as of now I think throwing error from autonomous transaction as one
ofthe solution. Once error thrown from autonomous transaction, main transaction may continue as it is (or abort main
transactionalso??). 

hm.  OK, what's the behavior of:

BEGIN UPDATE foo SET x = x + 1 WHERE foo_id = 1;
 BEGIN WITH AUTONOMOUS TRANSACTION   UPDATE foo SET x = x + 1 WHERE foo_id = 1; END;
 RAISE EXCEPTION ...;
EXCEPTION ...

END;

Also,
*) What do the other candidate implementations do?  IMO, compatibility
should be the underlying design principle.

*) What will the "SQL only" feature look like?

*) Is the SPI interface going to be extended to expose AT?

merlin

Re: Autonomous Transaction is back

From

Rajeev rastogi

Date:

04 August 2015, 09:12:54

On 03 August 2015 18:40, Merlin Moncure [mailto:mmoncure@gmail.com] Wrote:
>On Sun, Aug 2, 2015 at 11:37 PM, Rajeev rastogi
><rajeev.rastogi@huawei.com> wrote:
>> On 31 July 2015 23:10, Robert Haas Wrote:
>>>I think we're going entirely down the wrong path here.  Why is it ever
>useful for a backend's lock requests to conflict with themselves, even
>with autonomous transactions?  That seems like an artifact of somebody
>else's implementation that we should be happy we don't need to copy.
>>
>> IMHO, since most of the locking are managed at transaction level not
>backend level and we consider main & autonomous transaction to be
>independent transaction, then practically they may conflict right.
>> It is also right as you said that there is no as such useful use-cases
>where autonomous transaction conflicts with main (parent) transaction.
>But we cannot take it for granted as user might make a mistake. So at-
>least we should have some mechanism to handle this rare case, for which
>as of now I think throwing error from autonomous transaction as one of
>the solution. Once error thrown from autonomous transaction, main
>transaction may continue as it is (or abort main transaction also??).
>
>hm.  OK, what's the behavior of:
>
>BEGIN
>  UPDATE foo SET x = x + 1 WHERE foo_id = 1;
>
>  BEGIN WITH AUTONOMOUS TRANSACTION
>    UPDATE foo SET x = x + 1 WHERE foo_id = 1;
>  END;
>
>  RAISE EXCEPTION ...;
>EXCEPTION ...
>
>END;

It should throw an error (or something equivalent) as the second update will wait for record lock to get released,
whichin this case will not happen till second update finishes. So catch 22.
 

>Also,
>*) What do the other candidate implementations do?  IMO, compatibility
>should be the underlying design principle.


Oracle throws error in such case. But we can decide on what behavior we want to keep.

>*) What will the "SQL only" feature look like?

Similar to PL as mentioned in your example, we can provide the "SQL only" feature also.

>*) Is the SPI interface going to be extended to expose AT?

I don’t think at this point that there is any need of exposing SPI interface for this.

Thanks and Regards,
Kumar Rajeev Rastogi

Re: Autonomous Transaction is back

From

Merlin Moncure

Date:

04 August 2015, 15:19:38

On Tue, Aug 4, 2015 at 4:12 AM, Rajeev rastogi
<rajeev.rastogi@huawei.com> wrote:
> On 03 August 2015 18:40, Merlin Moncure [mailto:mmoncure@gmail.com] Wrote:
>>On Sun, Aug 2, 2015 at 11:37 PM, Rajeev rastogi
>><rajeev.rastogi@huawei.com> wrote:
>>> On 31 July 2015 23:10, Robert Haas Wrote:
>>>>I think we're going entirely down the wrong path here.  Why is it ever
>>useful for a backend's lock requests to conflict with themselves, even
>>with autonomous transactions?  That seems like an artifact of somebody
>>else's implementation that we should be happy we don't need to copy.
>>>
>>> IMHO, since most of the locking are managed at transaction level not
>>backend level and we consider main & autonomous transaction to be
>>independent transaction, then practically they may conflict right.
>>> It is also right as you said that there is no as such useful use-cases
>>where autonomous transaction conflicts with main (parent) transaction.
>>But we cannot take it for granted as user might make a mistake. So at-
>>least we should have some mechanism to handle this rare case, for which
>>as of now I think throwing error from autonomous transaction as one of
>>the solution. Once error thrown from autonomous transaction, main
>>transaction may continue as it is (or abort main transaction also??).
>>
>>hm.  OK, what's the behavior of:
>>
>>BEGIN
>>  UPDATE foo SET x = x + 1 WHERE foo_id = 1;
>>
>>  BEGIN WITH AUTONOMOUS TRANSACTION
>>    UPDATE foo SET x = x + 1 WHERE foo_id = 1;
>>  END;
>>
>>  RAISE EXCEPTION ...;
>>EXCEPTION ...
>>
>>END;
>
> It should throw an error (or something equivalent) as the second update will wait for record lock to get released,
whichin this case will not happen till second update finishes. So catch 22. 

Yeah. Point being, from my point of view autonomous transactions have
to conflict with the master transaction (or any transaction really).
I agree the right course of action is to error out immediately...what
else could you do?  There isn't even a deadlock in the classic sense
and allowing control to continue would result in indeterminate
behavior FWICT.

>>Also,
>>*) What do the other candidate implementations do?  IMO, compatibility>>should be the underlying design principle.
>
> Oracle throws error in such case. But we can decide on what behavior we want to keep.

gotcha.  makes sense.

>>*) What will the "SQL only" feature look like?
>
> Similar to PL as mentioned in your example, we can provide the "SQL only" feature also.
>
>>*) Is the SPI interface going to be extended to expose AT?
>
> I don’t think at this point that there is any need of exposing SPI interface for this.

Ok, how do AT work in a non-plpgsql ("SQL only") scenario?  Are you
going to similarly extend BEGIN TRANSACTION?

merlin

Re: Autonomous Transaction is back

From

Robert Haas

Date:

06 August 2015, 21:15:10

On Mon, Aug 3, 2015 at 9:09 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> hm.  OK, what's the behavior of:
>
> BEGIN
>   UPDATE foo SET x = x + 1 WHERE foo_id = 1;
>
>   BEGIN WITH AUTONOMOUS TRANSACTION
>     UPDATE foo SET x = x + 1 WHERE foo_id = 1;
>   END;
>
>   RAISE EXCEPTION ...;
> EXCEPTION ...
>
> END;

Sure, so that case might need a little bit of special handling.  That
doesn't mean it's a good idea for heavyweight locks to conflict in
general.  I think you're going to find that implementing the latter is
an extremely unrewarding task, and that the benefits are seriously
negative.  For example, consider:

BEGIN
UPDATE foo SET x = x + 1 WHERE foo_id = 1;
BEGIN WITH AUTONOMOUS TRANSACTION
UPDATE foo SET x = x + 1 WHERE foo_id = 2;
END;
END;

Now, suppose that a concurrent session does LOCK TABLE foo after the
first UPDATE and before the second one.  That's now a soft deadlock.
But the only way the deadlock detector can see that is if the main
transaction and the autonomous transaction have separate PGPROC
entries, which is a design we explicitly rejected because it puts a
tight limit on the number of ATs that can be in progress and the level
to which those ATs can be nested.  But let's say you don't care, so we
go back to that design.  The deadlock detector will have to be taught
that the outer transaction can't help but wait for the inner
transaction, so we teach it that.  Now it can see that the only way to
resolve the deadlock without aborting any transactions is to reorder
the lock request from the autonomous transaction ahead of the
concurrent session that is seeking a full table lock.  So the
autonomous transaction acquires the lock without blocking after all.
You have exactly the same result that you would have had anyway but
with a phenomenal amount of additional code and complexity.

And for what?  In the original example, the way the deadlock is going
to be reported is like this:

ERROR: deadlock detected
DETAIL: Process 12345 waits for ShareLock on transaction 1000; blocked
by process 12345.
Process 12345 waits for ShareLock on transaction 1001; blocked by process 12345.

That is not a model of clarity.  On the other hand, if you just make a
rule that attempting to update or delete a tuple that an outer
transaction has already updated throws a bespoke error, you can do
something like this:

ERROR: tuple to be updated was already modified by a suspended outer transaction

...which has precedent in an existing message in trigger.c.
Similarly, if you try to drop a table that the outer transaction has
locked, the natural thing is for CheckTableNotInUse() to catch that
and report it this way:

ERROR: cannot DROP TABLE "foo" because it is being used by active
queries in this session

If you work hard enough, you can instead make that generate a deadlock
error message, but you're going to have to work pretty hard, and the
result is worse.

I'd really like to hear some more *specific* scenarios where it's
valuable for locks to conflict between the outer transaction and the
AT.  I grant that tuple updates are a case where the conflict has to
be detected somehow, but I don't accept that the lock manager is the
best way to do that, and I don't accept that there are a large number
of other cases that will need similar handling.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Autonomous Transaction is back

From

Merlin Moncure

Date:

07 August 2015, 03:04:54

On Thu, Aug 6, 2015 at 4:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Aug 3, 2015 at 9:09 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> hm.  OK, what's the behavior of:
>>
>> BEGIN
>>   UPDATE foo SET x = x + 1 WHERE foo_id = 1;
>>
>>   BEGIN WITH AUTONOMOUS TRANSACTION
>>     UPDATE foo SET x = x + 1 WHERE foo_id = 1;
>>   END;
>>
>>   RAISE EXCEPTION ...;
>> EXCEPTION ...
>>
>> END;
>
> Sure, so that case might need a little bit of special handling.  That
> doesn't mean it's a good idea for heavyweight locks to conflict in
> general.  I think you're going to find that implementing the latter is
> an extremely unrewarding task, and that the benefits are seriously
> negative.  For example, consider:
>
> BEGIN
> UPDATE foo SET x = x + 1 WHERE foo_id = 1;
> BEGIN WITH AUTONOMOUS TRANSACTION
> UPDATE foo SET x = x + 1 WHERE foo_id = 2;
> END;
> END;
>
> Now, suppose that a concurrent session does LOCK TABLE foo after the
> first UPDATE and before the second one.  That's now a soft deadlock.
> But the only way the deadlock detector can see that is if the main
> transaction and the autonomous transaction have separate PGPROC
> entries, which is a design we explicitly rejected because it puts a
> tight limit on the number of ATs that can be in progress and the level
> to which those ATs can be nested.  But let's say you don't care, so we
> go back to that design.  The deadlock detector will have to be taught
> that the outer transaction can't help but wait for the inner
> transaction, so we teach it that.  Now it can see that the only way to
> resolve the deadlock without aborting any transactions is to reorder
> the lock request from the autonomous transaction ahead of the
> concurrent session that is seeking a full table lock.  So the
> autonomous transaction acquires the lock without blocking after all.
> You have exactly the same result that you would have had anyway but
> with a phenomenal amount of additional code and complexity.
>
> And for what?  In the original example, the way the deadlock is going
> to be reported is like this:
>
> ERROR: deadlock detected
> DETAIL: Process 12345 waits for ShareLock on transaction 1000; blocked
> by process 12345.
> Process 12345 waits for ShareLock on transaction 1001; blocked by process 12345.
>
> That is not a model of clarity.  On the other hand, if you just make a
> rule that attempting to update or delete a tuple that an outer
> transaction has already updated throws a bespoke error, you can do
> something like this:
>
> ERROR: tuple to be updated was already modified by a suspended outer transaction
>
> ...which has precedent in an existing message in trigger.c.
> Similarly, if you try to drop a table that the outer transaction has
> locked, the natural thing is for CheckTableNotInUse() to catch that
> and report it this way:
>
> ERROR: cannot DROP TABLE "foo" because it is being used by active
> queries in this session
>
> If you work hard enough, you can instead make that generate a deadlock
> error message, but you're going to have to work pretty hard, and the
> result is worse.
>
> I'd really like to hear some more *specific* scenarios where it's
> valuable for locks to conflict between the outer transaction and the
> AT.  I grant that tuple updates are a case where the conflict has to
> be detected somehow, but I don't accept that the lock manager is the
> best way to do that, and I don't accept that there are a large number
> of other cases that will need similar handling.

I don't necessarily disagree with what you're saying, but it's not
clear to me what the proposed behavior is.  Since the AT can commit
before the outer, ISTM *any* ungranted lock requested by the AT but
held by the outer leads to either A: functional deadlock (regardless
of implementation details) or B: special behavior.  Deadlocks would
certainly require some acrobatics to detect and resolve due to the
fact that one party to the lock is not in fact blocked on a lock but
on the outer's execution state.  So maybe the right thing to do is to
simply ignore the problem and hang both transactions until timeout or
cancel; this isn't really much different vs. ghetto dblink style AT
that is done today in my experience.

merlin

Re: Autonomous Transaction is back

From

Robert Haas

Date:

07 August 2015, 15:26:13

On Thu, Aug 6, 2015 at 11:04 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> I don't necessarily disagree with what you're saying, but it's not
> clear to me what the proposed behavior is.  Since the AT can commit
> before the outer, ISTM *any* ungranted lock requested by the AT but
> held by the outer leads to either A: functional deadlock (regardless
> of implementation details) or B: special behavior.

I don't accept that.  We've already GOT cases where a query can be
suspended and other queries can be running in the same backend.  You
can do that via cursors.  Those cases work fine, and the deadlock
detector doesn't know anything about them.  How is this any different?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Autonomous Transaction is back

From

Noah Misch

Date:

15 August 2015, 22:48:08

On Fri, Aug 07, 2015 at 11:26:08AM -0400, Robert Haas wrote:
> On Thu, Aug 6, 2015 at 11:04 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> > I don't necessarily disagree with what you're saying, but it's not
> > clear to me what the proposed behavior is.  Since the AT can commit
> > before the outer, ISTM *any* ungranted lock requested by the AT but
> > held by the outer leads to either A: functional deadlock (regardless
> > of implementation details) or B: special behavior.
> 
> I don't accept that.  We've already GOT cases where a query can be
> suspended and other queries can be running in the same backend.  You
> can do that via cursors.  Those cases work fine, and the deadlock
> detector doesn't know anything about them.  How is this any different?

In today's scenarios, the later query cannot commit unless the suspended query
also commits.  (Changing that is the raison d'être of autonomous
transactions.)  If the autonomous transaction can interact with uncommitted
work in a way that other backends could not, crazy things happen when the
autonomous transaction commits and the suspended transaction aborts:

CREATE TABLE t (c) AS SELECT 1;
BEGIN;
UPDATE t SET c = 2 WHERE c = 1;
BEGIN_AUTONOMOUS;
UPDATE t SET c = 3 WHERE c = 1;
UPDATE t SET c = 4 WHERE c = 2;
COMMIT_AUTONOMOUS;
ROLLBACK;

If you replace the autonomous transaction with a savepoint, the c=3 update
finds no rows, and the c=4 update changes one row.  When the outer transaction
aborts, only the original c=1 row remains live.  If you replace the autonomous
transaction with a dblink/pg_background call, the c=3 update waits
indefinitely for c=2 to commit or abort, an undetected deadlock.

Suppose you make the autonomous transaction see tuples like in the savepoint
case.  The c=3 update finds no rows to update, and the c=4 update changes one
row.  When the outer transaction aborts, you have two live rows (c=1 and c=4).
Suppose you instead make the autonomous transaction see tuples like in the
dblink case, yet let c=3 ignore the lock and change a row.  If both the
autonomous transaction and the outer transaction were to commit, then you get
two live rows (c=2 and c=3).  Neither of those outcomes is acceptable, of
course.  In today's tuple update rules, c=3 must deadlock[1].  Other credible
tuple update rules may not have this problem, but nothing jumps to mind.

[1] That's not to say it must use the shmem lock structures and deadlock
detector.

Re: Autonomous Transaction is back

From

Alvaro Herrera

Date:

16 August 2015, 01:21:26

Noah Misch wrote:

> In today's scenarios, the later query cannot commit unless the suspended query
> also commits.  (Changing that is the raison d'être of autonomous
> transactions.)  If the autonomous transaction can interact with uncommitted
> work in a way that other backends could not, crazy things happen when the
> autonomous transaction commits and the suspended transaction aborts:
> 
> CREATE TABLE t (c) AS SELECT 1;
> BEGIN;
> UPDATE t SET c = 2 WHERE c = 1;
> BEGIN_AUTONOMOUS;
> UPDATE t SET c = 3 WHERE c = 1;
> UPDATE t SET c = 4 WHERE c = 2;
> COMMIT_AUTONOMOUS;
> ROLLBACK;
> 
> If you replace the autonomous transaction with a savepoint, the c=3 update
> finds no rows, and the c=4 update changes one row.  When the outer transaction
> aborts, only the original c=1 row remains live.  If you replace the autonomous
> transaction with a dblink/pg_background call, the c=3 update waits
> indefinitely for c=2 to commit or abort, an undetected deadlock.

Maybe what we need to solve this is to restrict what the autonomous
transaction can do; for instance, make it so that the autonomous
transaction can see all rows of the outer transaction as if the outer
transaction were committed, but trying to update any such row raises an
error.  As far as I can see, this closes this particular problem.  (We
likely need additional rules to close all holes, but hopefully you get
the idea.)

Perhaps there exists a set of rules strong enough to eliminate all
problematic visibility scenarios, but which still enables behavior
useful enough to cover the proposed use cases.  The audit scenario is
covered because the audit trail doesn't need to modify the audited
tuples themselves, only read them.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Autonomous Transaction is back

From

Noah Misch

Date:

16 August 2015, 02:25:28

On Sat, Aug 15, 2015 at 10:20:55PM -0300, Alvaro Herrera wrote:
> Noah Misch wrote:
> 
> > In today's scenarios, the later query cannot commit unless the suspended query
> > also commits.  (Changing that is the raison d'être of autonomous
> > transactions.)  If the autonomous transaction can interact with uncommitted
> > work in a way that other backends could not, crazy things happen when the
> > autonomous transaction commits and the suspended transaction aborts:
> > 
> > CREATE TABLE t (c) AS SELECT 1;
> > BEGIN;
> > UPDATE t SET c = 2 WHERE c = 1;
> > BEGIN_AUTONOMOUS;
> > UPDATE t SET c = 3 WHERE c = 1;
> > UPDATE t SET c = 4 WHERE c = 2;
> > COMMIT_AUTONOMOUS;
> > ROLLBACK;
> > 
> > If you replace the autonomous transaction with a savepoint, the c=3 update
> > finds no rows, and the c=4 update changes one row.  When the outer transaction
> > aborts, only the original c=1 row remains live.  If you replace the autonomous
> > transaction with a dblink/pg_background call, the c=3 update waits
> > indefinitely for c=2 to commit or abort, an undetected deadlock.
> 
> Maybe what we need to solve this is to restrict what the autonomous
> transaction can do; for instance, make it so that the autonomous
> transaction can see all rows of the outer transaction as if the outer
> transaction were committed, but trying to update any such row raises an
> error.  As far as I can see, this closes this particular problem.  (We
> likely need additional rules to close all holes, but hopefully you get
> the idea.)
> 
> Perhaps there exists a set of rules strong enough to eliminate all
> problematic visibility scenarios, but which still enables behavior
> useful enough to cover the proposed use cases.  The audit scenario is
> covered because the audit trail doesn't need to modify the audited
> tuples themselves, only read them.

My starting expectation is that the semantics of an autonomous transaction
will be exactly those of dblink/pg_background.  (I said that during the
unconference session.)  The application would need to read data from tables
before switching to the autonomous section.  Autonomous transactions are then
a performance and syntactic help, not a source of new semantics.  Does any
database have autonomous transactions that do otherwise?

Re: Autonomous Transaction is back

From

Albe Laurenz

Date:

17 August 2015, 07:44:47

Noah Misch wrote:
> > > If the autonomous transaction can interact with uncommitted
> > > work in a way that other backends could not, crazy things happen when the
> > > autonomous transaction commits and the suspended transaction aborts:
> > >
> > > CREATE TABLE t (c) AS SELECT 1;
> > > BEGIN;
> > > UPDATE t SET c = 2 WHERE c = 1;
> > > BEGIN_AUTONOMOUS;
> > > UPDATE t SET c = 3 WHERE c = 1;
> > > UPDATE t SET c = 4 WHERE c = 2;
> > > COMMIT_AUTONOMOUS;
> > > ROLLBACK;
> > >
> > > If you replace the autonomous transaction with a savepoint, the c=3 update
> > > finds no rows, and the c=4 update changes one row.  When the outer transaction
> > > aborts, only the original c=1 row remains live.  If you replace the autonomous
> > > transaction with a dblink/pg_background call, the c=3 update waits
> > > indefinitely for c=2 to commit or abort, an undetected deadlock.

> My starting expectation is that the semantics of an autonomous transaction
> will be exactly those of dblink/pg_background.  (I said that during the
> unconference session.)  The application would need to read data from tables
> before switching to the autonomous section.  Autonomous transactions are then
> a performance and syntactic help, not a source of new semantics.  Does any
> database have autonomous transactions that do otherwise?

Oracle behaves like that, i.e. it deadlocks with your example:

SQL> SELECT * FROM t;
        C
----------        1

SQL> CREATE PROCEDURE proc2 IS 2  PRAGMA AUTONOMOUS_TRANSACTION; 3  BEGIN 4  UPDATE t SET c = 3 WHERE c = 1; 5  UPDATE
tSET c = 4 WHERE c = 2; 6  COMMIT; 7  END; 8  /
 

Procedure created.

SQL> CREATE PROCEDURE proc1 IS 2  BEGIN 3  UPDATE t SET c = 2 WHERE c = 1; 4  proc2; 5  ROLLBACK; 6  END; 7  /

Procedure created.

SQL> CALL proc1();
CALL proc1()    *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "LAURENZ.PROC2", line 4
ORA-06512: at "LAURENZ.PROC1", line 4

Yours,
Laurenz Albe

Re: Autonomous Transaction is back

From

Robert Haas

Date:

18 August 2015, 13:17:49

On Sat, Aug 15, 2015 at 6:47 PM, Noah Misch <noah@leadboat.com> wrote:
> CREATE TABLE t (c) AS SELECT 1;
> BEGIN;
> UPDATE t SET c = 2 WHERE c = 1;
> BEGIN_AUTONOMOUS;
> UPDATE t SET c = 3 WHERE c = 1;
> UPDATE t SET c = 4 WHERE c = 2;
> COMMIT_AUTONOMOUS;
> ROLLBACK;
>
> If you replace the autonomous transaction with a savepoint, the c=3 update
> finds no rows, and the c=4 update changes one row.  When the outer transaction
> aborts, only the original c=1 row remains live.  If you replace the autonomous
> transaction with a dblink/pg_background call, the c=3 update waits
> indefinitely for c=2 to commit or abort, an undetected deadlock.
>
> Suppose you make the autonomous transaction see tuples like in the savepoint
> case.  The c=3 update finds no rows to update, and the c=4 update changes one
> row.  When the outer transaction aborts, you have two live rows (c=1 and c=4).
> Suppose you instead make the autonomous transaction see tuples like in the
> dblink case, yet let c=3 ignore the lock and change a row.  If both the
> autonomous transaction and the outer transaction were to commit, then you get
> two live rows (c=2 and c=3).  Neither of those outcomes is acceptable, of
> course.  In today's tuple update rules, c=3 must deadlock[1].  Other credible
> tuple update rules may not have this problem, but nothing jumps to mind.
>
> [1] That's not to say it must use the shmem lock structures and deadlock
> detector.

This footnote goes to my point.

It seems clear to me that having the autonomous transaction "see" the
effects of the outer uncommitted transaction is a recipe for trouble.
If the autonomous transaction updates a row and commits, and the outer
transaction later aborts, the resulting state is inconsistent with any
serial history.  I'm fairly certain that's going to leave us in an
unhappy place.

Even more obviously, ending up with two committed row versions that
are both updates of a single ancestor version is no good.

So, I agree that this scenario should be an error.  What I don't agree
with is the idea that it should be the deadlock detector's job to
throw that error.  Rather, I think that when we examine the xmax of
the tuple we can see - which is the original one, not the one updated
by the outer transaction - we should check whether that XID belongs to
an outer transaction.  If it does, we should throw an error instead of
trying to lock it.  That way (1) the error message will be clear and
specific to the situation and (2) we don't need a separate PGPROC for
each autonomous transaction.  The first of those benefits is
agreeable; the second one is, in my opinion, a key design goal for
this feature.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Autonomous Transaction is back

From

Merlin Moncure

Date:

18 August 2015, 19:06:41

On Tue, Aug 18, 2015 at 8:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Aug 15, 2015 at 6:47 PM, Noah Misch <noah@leadboat.com> wrote:
>> CREATE TABLE t (c) AS SELECT 1;
>> BEGIN;
>> UPDATE t SET c = 2 WHERE c = 1;
>> BEGIN_AUTONOMOUS;
>> UPDATE t SET c = 3 WHERE c = 1;
>> UPDATE t SET c = 4 WHERE c = 2;
>> COMMIT_AUTONOMOUS;
>> ROLLBACK;
>>
>> If you replace the autonomous transaction with a savepoint, the c=3 update
>> finds no rows, and the c=4 update changes one row.  When the outer transaction
>> aborts, only the original c=1 row remains live.  If you replace the autonomous
>> transaction with a dblink/pg_background call, the c=3 update waits
>> indefinitely for c=2 to commit or abort, an undetected deadlock.
>>
>> Suppose you make the autonomous transaction see tuples like in the savepoint
>> case.  The c=3 update finds no rows to update, and the c=4 update changes one
>> row.  When the outer transaction aborts, you have two live rows (c=1 and c=4).
>> Suppose you instead make the autonomous transaction see tuples like in the
>> dblink case, yet let c=3 ignore the lock and change a row.  If both the
>> autonomous transaction and the outer transaction were to commit, then you get
>> two live rows (c=2 and c=3).  Neither of those outcomes is acceptable, of
>> course.  In today's tuple update rules, c=3 must deadlock[1].  Other credible
>> tuple update rules may not have this problem, but nothing jumps to mind.
>>
>> [1] That's not to say it must use the shmem lock structures and deadlock
>> detector.
>
> This footnote goes to my point.
>
> It seems clear to me that having the autonomous transaction "see" the
> effects of the outer uncommitted transaction is a recipe for trouble.
> If the autonomous transaction updates a row and commits, and the outer
> transaction later aborts, the resulting state is inconsistent with any
> serial history.  I'm fairly certain that's going to leave us in an
> unhappy place.
>
> Even more obviously, ending up with two committed row versions that
> are both updates of a single ancestor version is no good.
>
> So, I agree that this scenario should be an error.  What I don't agree
> with is the idea that it should be the deadlock detector's job to
> throw that error.  Rather, I think that when we examine the xmax of
> the tuple we can see - which is the original one, not the one updated
> by the outer transaction - we should check whether that XID belongs to
> an outer transaction.

Hm: do you mean 'an' outer transaction (meaning, basically, any in
progress transaction) or the outer transaction of the AT.  I think you
mean outer transaction of the AT, which makes a lot of sense and
should be easy and fast to test.  It's like an implied NOWAIT if the
locker is the AT and the lockee is the parent.

Can you get away with only looking at tuples though?  For example,
what about advisory locks?  Table locks?


merlin

Re: Autonomous Transaction is back

From

Rajeev rastogi

Date:

21 August 2015, 01:37:43

On 18 August 2015 21:18, Robert Haas Wrote:

>This footnote goes to my point.
>
>It seems clear to me that having the autonomous transaction "see" the
>effects of the outer uncommitted transaction is a recipe for trouble.
>If the autonomous transaction updates a row and commits, and the outer
>transaction later aborts, the resulting state is inconsistent with any
>serial history.  I'm fairly certain that's going to leave us in an
>unhappy place.
>
>Even more obviously, ending up with two committed row versions that are
>both updates of a single ancestor version is no good.
>
>So, I agree that this scenario should be an error.  What I don't agree
>with is the idea that it should be the deadlock detector's job to throw
>that error.  Rather, I think that when we examine the xmax of the tuple
>we can see - which is the original one, not the one updated by the outer
>transaction - we should check whether that XID belongs to an outer
>transaction.  If it does, we should throw an error instead of trying to
>lock it.  That way (1) the error message will be clear and specific to
>the situation and (2) we don't need a separate PGPROC for each
>autonomous transaction.  The first of those benefits is agreeable; the
>second one is, in my opinion, a key design goal for this feature.

Yes I agree with this. I was in favor of error all the time without involving 
deadlock detector. 

Thanks and Regards,
Kumar Rajeev Rastogi

Re: Autonomous Transaction is back

From

Robert Haas

Date:

21 August 2015, 14:06:54

On Tue, Aug 18, 2015 at 3:06 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Aug 18, 2015 at 8:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sat, Aug 15, 2015 at 6:47 PM, Noah Misch <noah@leadboat.com> wrote:
>>> CREATE TABLE t (c) AS SELECT 1;
>>> BEGIN;
>>> UPDATE t SET c = 2 WHERE c = 1;
>>> BEGIN_AUTONOMOUS;
>>> UPDATE t SET c = 3 WHERE c = 1;
>>> UPDATE t SET c = 4 WHERE c = 2;
>>> COMMIT_AUTONOMOUS;
>>> ROLLBACK;
>>>
>>> If you replace the autonomous transaction with a savepoint, the c=3 update
>>> finds no rows, and the c=4 update changes one row.  When the outer transaction
>>> aborts, only the original c=1 row remains live.  If you replace the autonomous
>>> transaction with a dblink/pg_background call, the c=3 update waits
>>> indefinitely for c=2 to commit or abort, an undetected deadlock.
>>>
>>> Suppose you make the autonomous transaction see tuples like in the savepoint
>>> case.  The c=3 update finds no rows to update, and the c=4 update changes one
>>> row.  When the outer transaction aborts, you have two live rows (c=1 and c=4).
>>> Suppose you instead make the autonomous transaction see tuples like in the
>>> dblink case, yet let c=3 ignore the lock and change a row.  If both the
>>> autonomous transaction and the outer transaction were to commit, then you get
>>> two live rows (c=2 and c=3).  Neither of those outcomes is acceptable, of
>>> course.  In today's tuple update rules, c=3 must deadlock[1].  Other credible
>>> tuple update rules may not have this problem, but nothing jumps to mind.
>>>
>>> [1] That's not to say it must use the shmem lock structures and deadlock
>>> detector.
>>
>> This footnote goes to my point.
>>
>> It seems clear to me that having the autonomous transaction "see" the
>> effects of the outer uncommitted transaction is a recipe for trouble.
>> If the autonomous transaction updates a row and commits, and the outer
>> transaction later aborts, the resulting state is inconsistent with any
>> serial history.  I'm fairly certain that's going to leave us in an
>> unhappy place.
>>
>> Even more obviously, ending up with two committed row versions that
>> are both updates of a single ancestor version is no good.
>>
>> So, I agree that this scenario should be an error.  What I don't agree
>> with is the idea that it should be the deadlock detector's job to
>> throw that error.  Rather, I think that when we examine the xmax of
>> the tuple we can see - which is the original one, not the one updated
>> by the outer transaction - we should check whether that XID belongs to
>> an outer transaction.
>
> Hm: do you mean 'an' outer transaction (meaning, basically, any in
> progress transaction) or the outer transaction of the AT.  I think you
> mean outer transaction of the AT, which makes a lot of sense and
> should be easy and fast to test.  It's like an implied NOWAIT if the
> locker is the AT and the lockee is the parent.
>
> Can you get away with only looking at tuples though?  For example,
> what about advisory locks?  Table locks?

Well, that's an interesting question.  Can we get away with regarding
those things as non-conflicting, as between the parent and child
transactions?

If we want to insist that they conflict, the easy case is easy: when
acquiring a lock, we can consult the local lock table hash and see if
an outer transaction already holds such a lock.  If so, we error out.
The problem is when somebody else is waiting for the lock we hold.
That gets complicated, because the deadlock detector can't work out
that we're deadlocking with ourselves unless there are two separate
PGPROCs, one for the inner transaction and one for the outer
transaction, and even then it might think it can rearrange the lock
queue so that the outer transaction goes first, which in fact won't
work here.

I'm not exactly sure how to make this work out without involving the
deadlock detector, but I think we should try hard to get there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Autonomous Transaction is back

From

Noah Misch

Date:

22 August 2015, 06:23:10

On Fri, Aug 21, 2015 at 10:06:44AM -0400, Robert Haas wrote:
> On Tue, Aug 18, 2015 at 3:06 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> > On Tue, Aug 18, 2015 at 8:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> On Sat, Aug 15, 2015 at 6:47 PM, Noah Misch <noah@leadboat.com> wrote:
> >>> [1] That's not to say it must use the shmem lock structures and deadlock
> >>> detector.
> >>
> >> This footnote goes to my point.

> >> So, I agree that this scenario should be an error.  What I don't agree
> >> with is the idea that it should be the deadlock detector's job to
> >> throw that error.

I couldn't gather from your earlier messages that this scenario should get an
error, so I'm glad to have that clarified.

> > Can you get away with only looking at tuples though?  For example,
> > what about advisory locks?  Table locks?
> 
> Well, that's an interesting question.  Can we get away with regarding
> those things as non-conflicting, as between the parent and child
> transactions?

For system lock types, no.  While one could define advisory locks to work
differently, we should assume that today's advisory lockers have expectations
like those of system lockers.  An autonomous transaction should not bypass any
lock that a transaction of another backend could not bypass.

Re: Autonomous Transaction is back

From

Robert Haas

Date:

03 September 2015, 20:22:01

On Sat, Aug 22, 2015 at 2:23 AM, Noah Misch <noah@leadboat.com> wrote:
>> > Can you get away with only looking at tuples though?  For example,
>> > what about advisory locks?  Table locks?
>>
>> Well, that's an interesting question.  Can we get away with regarding
>> those things as non-conflicting, as between the parent and child
>> transactions?
>
> For system lock types, no.  While one could define advisory locks to work
> differently, we should assume that today's advisory lockers have expectations
> like those of system lockers.  An autonomous transaction should not bypass any
> lock that a transaction of another backend could not bypass.

Why?

Suppose you do this:

BEGIN;
DECLARE CURSOR foo FOR SELECT * FROM foo;
BEGIN AUTONOMOUS TRANSACTION;
ALTER TABLE foo ALTER bar TYPE int;

This has got to fail for safety reasons, but CheckTableNotInUse() is
on it.  Suppose you do this:

BEGIN;
LOCK foo;
BEGIN AUTONOMOUS TRANSACTION;
INSERT INTO foo VALUES ('spelunk');

How will making this fail improve anything?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Autonomous Transaction is back

From

Noah Misch

Date:

06 September 2015, 05:56:20

On Thu, Sep 03, 2015 at 04:21:55PM -0400, Robert Haas wrote:
> On Sat, Aug 22, 2015 at 2:23 AM, Noah Misch <noah@leadboat.com> wrote:
> >> > Can you get away with only looking at tuples though?  For example,
> >> > what about advisory locks?  Table locks?
> >>
> >> Well, that's an interesting question.  Can we get away with regarding
> >> those things as non-conflicting, as between the parent and child
> >> transactions?
> >
> > For system lock types, no.  While one could define advisory locks to work
> > differently, we should assume that today's advisory lockers have expectations
> > like those of system lockers.  An autonomous transaction should not bypass any
> > lock that a transaction of another backend could not bypass.
> 
> Why?
> 
> Suppose you do this:
> 
> BEGIN;
> DECLARE CURSOR foo FOR SELECT * FROM foo;
> BEGIN AUTONOMOUS TRANSACTION;
> ALTER TABLE foo ALTER bar TYPE int;
> 
> This has got to fail for safety reasons, but CheckTableNotInUse() is
> on it.  Suppose you do this:
>
> BEGIN;
> LOCK foo;
> BEGIN AUTONOMOUS TRANSACTION;
> INSERT INTO foo VALUES ('spelunk');
> 
> How will making this fail improve anything?

Core PostgreSQL doesn't care.  This is a user interface design decision to be
made in light of current SQL expectations and future SQL author wishes.  The
LOCK reference page, our contract with users, says nothing to constrain the
choice.  LOCK is exceptional in that we never get much insight into the
caller's expectations.  I think LOCK should follow closely the built-in
commands that take the same locks.  This variation of your examples must fail
in order to avoid a paradox if the first transaction aborts:

BEGIN;
ALTER TABLE foo ALTER bar TYPE frob;
BEGIN AUTONOMOUS TRANSACTION;
INSERT INTO foo VALUES ('spelunk');

If we made that fail and made your second example succeed, that would imply
"LOCK foo" acquires a special kind of AccessExclusiveLock differing from what
ALTER TABLE acquires.  That's incompatible with my sense of the LOCK command's
role in the system.  An ability to procure an option to acquire, without
waiting, a lock and delegate that option to another transaction would have
applications.  It's a different feature calling for distinct syntax.

My comments have flowed out of a principle that autonomous transactions shall
have precisely the same semantics as using another backend via dblink.  They
should have less overhead.  They may give different error messages.  They
shall not permit sequences of commands that fail in a dblink implementation of
the same multi-transaction sequence.  I chose this principle because it fits
my intuitive notion of transaction "autonomy" and because everything I've
heard about other implementations suggests that they work in that way.  If
some RDBMS implementation does otherwise, I would find that persuasive.

What design principle(s) have you been using to decide how autonomous
transactions should behave?

Re: Autonomous Transaction is back

From

Merlin Moncure

Date:

08 September 2015, 18:22:13

On Sun, Sep 6, 2015 at 12:56 AM, Noah Misch <noah@leadboat.com> wrote:
> My comments have flowed out of a principle that autonomous transactions shall
> have precisely the same semantics as using another backend via dblink.

That's what I thought, too.  AT is syntax sugar for dblink approach.
Syntax and performance aside, I think the only way dblink style AT
could be improved would be to have:

1) better cancel handling, especially cancel child when parent cancels
(this is a major headache with dblink)
2) automatic rejection of acquire attempts by child on exclusive
assets held by parent

#2 is what we've been discussing, but I'm pretty confused.  Most
especially I'm not clear on whether parent and children share a pid or
have unique pid.  If pids are unique as in dblink, #2 could be done
via a query assuming you have a way of identifying child transactions
in pg_stat_activity.  The discussion has suggested that the pid is
shared, which I found odd but took on faith.

I'm also wondering if we need syntax to handle synchronous vs
asynchronous execution of the AT (the latter of which is only possible
with a separate PID, I think).

merlin

Re: Autonomous Transaction is back

From

Robert Haas

Date:

09 September 2015, 14:04:10

On Sun, Sep 6, 2015 at 1:56 AM, Noah Misch <noah@leadboat.com> wrote:
> What design principle(s) have you been using to decide how autonomous
> transactions should behave?

I have to admit to a complete lack of principle.  I'm quite frightened
of what this is going to need from the lock manager, and I'm trying to
wriggle out of having to do things there that are going to be nastily
hard.  My wriggling isn't going very well, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Autonomous Transaction is back

From

Merlin Moncure

Date:

09 September 2015, 15:00:13

On Wed, Sep 9, 2015 at 9:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Sep 6, 2015 at 1:56 AM, Noah Misch <noah@leadboat.com> wrote:
>> What design principle(s) have you been using to decide how autonomous
>> transactions should behave?
>
> I have to admit to a complete lack of principle.  I'm quite frightened
> of what this is going to need from the lock manager, and I'm trying to
> wriggle out of having to do things there that are going to be nastily
> hard.  My wriggling isn't going very well, though.

Hm.  Here is current dblink behavior:

postgres=# create table l (id int);
CREATE TABLE
postgres=# insert into l values(1);
INSERT 0 1
postgres=# update l set id =2 where id = 1;
UPDATE 1
Time: 0.595 ms
postgres=# select dblink('', 'update l set id = 3 where id = 1');
<hangs forever due to deadlock of client lock and parent execution point>

Does the lock manager really needs to be extended to address this
case?  pg_locks pretty clearly explains what's happening, via:
postgres=# select locktype, transactionid, pid, granted from pg_locks
where not granted;  locktype    │ transactionid │  pid  │ granted
───────────────┼───────────────┼───────┼─────────transactionid │         88380 │ 20543 │ f

and

postgres=# select locktype, transactionid, pid, granted from pg_locks
where transactionid = 88380;  locktype    │ transactionid │  pid  │ granted
───────────────┼───────────────┼───────┼─────────transactionid │         88380 │ 20543 │ ftransactionid │         88380
│19022 │ t 

If pg_locks and/or pg_stat_activity were extended with a 'parent_pid'
column, a userland query could terminate affected backends with a join
against that column where any ungranted.  Naturally there's a lot more
to it than that; you'd want to issue an appropriate cancellation
message and various other things.  I suppose I'm probably missing
something, but I'm not clear on why the lock manager needs to be
overhauled to deal with this case when we can just scan current
strictures assuming we can a) manage child pid independently of parent
pid and b) easily figure out who is parent of who.

merlin

Re: Autonomous Transaction is back

From

Noah Misch

Date:

11 September 2015, 01:39:57

On Wed, Sep 09, 2015 at 10:04:01AM -0400, Robert Haas wrote:
> On Sun, Sep 6, 2015 at 1:56 AM, Noah Misch <noah@leadboat.com> wrote:
> > What design principle(s) have you been using to decide how autonomous
> > transactions should behave?
> 
> I have to admit to a complete lack of principle.  I'm quite frightened
> of what this is going to need from the lock manager, and I'm trying to
> wriggle out of having to do things there that are going to be nastily
> hard.  My wriggling isn't going very well, though.

It's an exceptionally-challenging development project, agreed.  So much code
assumes the 1:1 relationship between backends and top-level transactions.

Re: Autonomous Transaction is back

From

Merlin Moncure

Date:

11 September 2015, 19:30:59

On Thu, Sep 10, 2015 at 8:39 PM, Noah Misch <noah@leadboat.com> wrote:
> On Wed, Sep 09, 2015 at 10:04:01AM -0400, Robert Haas wrote:
>> On Sun, Sep 6, 2015 at 1:56 AM, Noah Misch <noah@leadboat.com> wrote:
>> > What design principle(s) have you been using to decide how autonomous
>> > transactions should behave?
>>
>> I have to admit to a complete lack of principle.  I'm quite frightened
>> of what this is going to need from the lock manager, and I'm trying to
>> wriggle out of having to do things there that are going to be nastily
>> hard.  My wriggling isn't going very well, though.
>
> It's an exceptionally-challenging development project, agreed.  So much code
> assumes the 1:1 relationship between backends and top-level transactions.

I guess I'm being obtuse, but can you explain why that assumption must
be revisited?  I don't see why it has to be...I must be missing
something.

merlin

Re: Autonomous Transaction is back

From

Noah Misch

Date:

12 September 2015, 03:16:30

On Fri, Sep 11, 2015 at 02:30:53PM -0500, Merlin Moncure wrote:
> On Thu, Sep 10, 2015 at 8:39 PM, Noah Misch <noah@leadboat.com> wrote:
> > It's an exceptionally-challenging development project, agreed.  So much code
> > assumes the 1:1 relationship between backends and top-level transactions.
> 
> I guess I'm being obtuse, but can you explain why that assumption must
> be revisited?

It's not imperative, but the proposal on the table does so.  Robert described
doing so as "a key design goal for this feature"[1].  I have a general picture
of why the proposers chose that, but I will defer to them for any elaboration.

[1] http://www.postgresql.org/message-id/CA+TgmoZE__KOHfUyVZpxvTJSB85xm34KC_t41-OXhTB_111SHQ@mail.gmail.com