Thread: Autonomous Transaction is back
<div class="WordSection1"><p class="MsoNormal">After few failed attempt to propose Autonomous transaction earlier. I alongwith Simon Riggs would like to propose again but completely different in approach.<p class="MsoNormal"> <p class="MsoNormal">Wealso had discussion about this feature in last PGCon2015 Unconference Day, those who missed this discussion,please refer <p class="MsoNormal"> <a href="https://wiki.postgresql.org/wiki/AutonomousTransactionsUnconference2015"> https://wiki.postgresql.org/wiki/AutonomousTransactionsUnconference2015</a><pclass="MsoNormal"> <p class="MsoNormal"> <pclass="MsoNormal">Before jumping into the design and code proposal for this feature, me along with SimonRiggs wanted to propose its behavior and usage to keep everyone in the same boat.<p class="MsoNormal">So we have summarizedthe behavior and usage of the Autonomous Transaction based on the discussion with community members in last PGCon2015Unconference Day:<p class="MsoNormal"> <p class="MsoNormal"><b>Behavior of Autonomous Transaction</b>:<p class="MsoNormal">1. The autonomous transaction treated as a completely different transaction from the master transaction.<pclass="MsoNormal">2. It should be allowed to deadlock with master transaction. We need to work-outa solution to avoid deadlock.<p class="MsoNormal">3. It can support multiple level of nesting based onthe configuration (may be max as 70).<p class="MsoNormal">4. Outer (i.e. main or upper autonomous) transactionto be suspended while the inner autonomous transaction is running.<p class="MsoNormal">5. Outer transactionshould not see data of inner till inner is committed (serializable upper transaction should not see even afterinner transaction commit).<p class="MsoNormal"> <p class="MsoNormal"><b>How to Use Autonomous Transaction</b>:<p class="MsoNormal">1.We can issue explicit command to start an Autonomous transaction as below:<p class="MsoNormal"> BEGIN AUTONOMOUS TRANSACTION (Don’t worry about keywords at this point.)<p class="MsoNormal"> Do you work.<p class="MsoNormal"> COMMIT/ROLLBACK (Will commit/rollbackthe autonomous transaction and will return to main transaction or upper autonomous transaction). <p class="MsoNormal"> <p class="MsoNormal">2. The above commands can be issued either inside theprocedure to make few statements of procedure inside autonomous transaction or even in stand-alone query execution.<pclass="MsoNormal">3. We can make whole procedure itself as autonomous, which will be similar to start autonomoustransaction in the beginning of the procedure and commit/rollback at the end of the procedure. <p class="MsoNormal"> <pclass="MsoNormal">There was another discussion in Unconference Day to decide whether to implement COMMIT/ROLLBACKinside the procedure or autonomous transaction. So our opinion about this is that <p class="MsoNormal"> COMMIT/ROLLBACK inside procedure will be somewhat different from Autonomous Transactionas incase of first, once we commit inside the procedure, <p class="MsoNormal" style="text-indent:36.0pt">it commitseverything done before call of procedure. This is the behavior of Oracle. <p class="MsoNormal" style="text-indent:36.0pt">Soin this case user required to be very careful to not do any operation before call of procedure,which is not yet intended to be committed inside procedure.<p class="MsoNormal"> <p class="MsoNormal">Sowe can prefer to implement Autonomous Transaction, which will not only be compatible with Oracle butalso gives really strong required features.<p class="MsoNormal"> <p class="MsoNormal">I have not put the use-cases hereas already we agree about its strong use-cases.<p class="MsoNormal"> <p class="MsoNormal">Requesting for everyone's opinionregarding this based on which we can proceed to enhance/tune/re-write our design. <p class="MsoNormal"> <pclass="MsoNormal"><i><span style="color:black">Thanks and Regards,</span></i><p class="MsoNormal"><i>KumarRajeev Rastogi<span style="color:black"> </span></i><p class="MsoNormal"> </div>
On Thu, Jul 23, 2015 at 1:31 AM, Rajeev rastogi <rajeev.rastogi@huawei.com> wrote: > 2. It should be allowed to deadlock with master transaction. We > need to work-out a solution to avoid deadlock. This sentence seems to contradict itself. I thought the consensus was that the transaction should NOT conflict with the master transaction. > 3. It can support multiple level of nesting based on the > configuration (may be max as 70). Why have any fixed maximum? > 2. The above commands can be issued either inside the procedure to make few > statements of procedure inside autonomous transaction or even in stand-alone > query execution. I think inside a procedure the autonomous transaction will need to be lexically scoped. You won't be able to do this, for example: BEGIN AUTONOMOUS TRANSACTION; FOR x IN SELECT ... LOOP COMMIT; BEGIN AUTONOMOUS TRANSACTION; END LOOP; Rather you'd have to do something like this: FOR x IN SELECT .. LOOP BEGIN WITH AUTONOMOUS TRANSACTION do stuff END; END LOOP; In that way it would work like an EXCEPTION block, which can be attached to a begin block to establish a subtransaction. There's no way in PL/pgsql to persistently enter a subtransaction; but you can have one that applies to a particular lexical scope (the BEGIN block to which EXCEPTION is attached). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> Requesting for everyone's opinion regarding this based on which we can > proceed to enhance/tune/re-write our design. So, one of the things which came up during the discussion was advancing XMIN, which is not important to the audit logging use case, but is very important for the batch job use case. What have you concluded regarding this item? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Thu, Jul 23, 2015 at 2:33 PM, Josh Berkus <josh@agliodbs.com> wrote: >> Requesting for everyone's opinion regarding this based on which we can >> proceed to enhance/tune/re-write our design. > > So, one of the things which came up during the discussion was advancing > XMIN, which is not important to the audit logging use case, but is very > important for the batch job use case. What have you concluded regarding > this item? Could you explain more specifically what you are talking about here? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 07/23/2015 11:39 AM, Robert Haas wrote: > On Thu, Jul 23, 2015 at 2:33 PM, Josh Berkus <josh@agliodbs.com> wrote: >>> Requesting for everyone's opinion regarding this based on which we can >>> proceed to enhance/tune/re-write our design. >> >> So, one of the things which came up during the discussion was advancing >> XMIN, which is not important to the audit logging use case, but is very >> important for the batch job use case. What have you concluded regarding >> this item? > > Could you explain more specifically what you are talking about here? > Yeah, my notes are kinda incoherent, no? There's two core use-cases for Autonomous Transactions (hereafter ATX): * audit logging * batch jobs Audit Logging: triggers or other statements which should leave a record even when a transaction aborts. While audit logging is the main example of this use case, any kind of logging of failures is applicable. Batch Jobs: large data-manipulation tasks which need to be broken up into segments, with each segment committing separately. Example: updating 1 million records in batches of 1000. Unlike the Audit Logging case, Batch Jobs generally want XMIN to advance so that updated/imported/deleted rows can be vacuumed or HOT updated. Thus the need to allow XMIN to advance. One of the things we kind of concluded from our discussion was that the two core use-cases are probably different features: Audit Logging: * requires 2-way data interaction with outer transaction * no parallelism * XMIN does not need to advance * master transaction should still commit/fail * needs to support nesting Batch Jobs: * 1-way data interaction sufficient (master-->child) * parallelism desired * XMIN should advance * master process could be transactionless * does not need to support nesting Of these two, the Audit Logging case is the more important one to implement because there is no real workaround for it. Batch Jobs can, and are, handled by external scripting, and having ATX for them is more of a convenience than anything else. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Thu, Jul 23, 2015 at 2:49 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 07/23/2015 11:39 AM, Robert Haas wrote: >> On Thu, Jul 23, 2015 at 2:33 PM, Josh Berkus <josh@agliodbs.com> wrote: >>>> Requesting for everyone's opinion regarding this based on which we can >>>> proceed to enhance/tune/re-write our design. >>> >>> So, one of the things which came up during the discussion was advancing >>> XMIN, which is not important to the audit logging use case, but is very >>> important for the batch job use case. What have you concluded regarding >>> this item? >> >> Could you explain more specifically what you are talking about here? >> > Yeah, my notes are kinda incoherent, no? > > There's two core use-cases for Autonomous Transactions (hereafter ATX): > > * audit logging > * batch jobs > > Audit Logging: triggers or other statements which should leave a record > even when a transaction aborts. While audit logging is the main example > of this use case, any kind of logging of failures is applicable. > > Batch Jobs: large data-manipulation tasks which need to be broken up > into segments, with each segment committing separately. Example: > updating 1 million records in batches of 1000. > > Unlike the Audit Logging case, Batch Jobs generally want XMIN to advance > so that updated/imported/deleted rows can be vacuumed or HOT updated. > Thus the need to allow XMIN to advance. > > One of the things we kind of concluded from our discussion was that the > two core use-cases are probably different features: > > Audit Logging: > * requires 2-way data interaction with outer transaction > * no parallelism > * XMIN does not need to advance > * master transaction should still commit/fail > * needs to support nesting > > Batch Jobs: > * 1-way data interaction sufficient (master-->child) > * parallelism desired > * XMIN should advance > * master process could be transactionless > * does not need to support nesting > > Of these two, the Audit Logging case is the more important one to > implement because there is no real workaround for it. Batch Jobs can, > and are, handled by external scripting, and having ATX for them is more > of a convenience than anything else. You're still not really explaining what you mean by "xmin should advance". If the parent transaction holds a snapshot, or for as long as it does, xmin can't be advanced safely. If it doesn't, you'll be fine. I suppose the situation you're worried about is where we execute a stored procedure that repeatedly spawns autonomous transactions. Since the parent transaction will always have a snapshot, you won't advance xmin until the entire stored procedure finishes. That's a problem, but I think it is rather unfair to say that it has anything to do with autonomous transactions. "Run a procedure without needing to hold a snapshot" is a completely separate feature request from "allow autonomous transactions", and it's probably neither easy nor uncontroversial. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 07/23/2015 12:40 PM, Robert Haas wrote: > That's a problem, but I think it is rather unfair to say that it has > anything to do with autonomous transactions. "Run a procedure without > needing to hold a snapshot" is a completely separate feature request > from "allow autonomous transactions", and it's probably neither easy > nor uncontroversial. That's more-or-less a summary of what I just posted. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
After few failed attempt to propose Autonomous transaction earlier. I along with Simon Riggs would like to propose again but completely different in approach.
We also had discussion about this feature in last PGCon2015 Unconference Day, those who missed this discussion, please refer
https://wiki.postgresql.org/wiki/AutonomousTransactionsUnconference2015
Before jumping into the design and code proposal for this feature, me along with Simon Riggs wanted to propose its behavior and usage to keep everyone in the same boat.
So we have summarized the behavior and usage of the Autonomous Transaction based on the discussion with community members in last PGCon2015 Unconference Day:
Behavior of Autonomous Transaction:
1. The autonomous transaction treated as a completely different transaction from the master transaction.
2. It should be allowed to deadlock with master transaction. We need to work-out a solution to avoid deadlock.
For an autonomous transaction not to conflict with the master, either it shouldn't take conflicting resources or when it does so, they should be shared by the master transaction. The first needs to be implemented in an application, the later will need careful design of how to share such resources (hierarchial resource sharing?) and how to release the resources when the ATX ends. Given that the transactions are autonomous as per point 1, they should be allowed to conflict and the deadlock detection would take care of such deadlocks.
3. It can support multiple level of nesting based on the configuration (may be max as 70).
4. Outer (i.e. main or upper autonomous) transaction to be suspended while the inner autonomous transaction is running.
5. Outer transaction should not see data of inner till inner is committed (serializable upper transaction should not see even after inner transaction commit).
How to Use Autonomous Transaction:
1. We can issue explicit command to start an Autonomous transaction as below:
BEGIN AUTONOMOUS TRANSACTION (Don’t worry about keywords at this point.)
Do you work.
COMMIT/ROLLBACK (Will commit/rollback the autonomous transaction and will return to main transaction or upper autonomous transaction).
2. The above commands can be issued either inside the procedure to make few statements of procedure inside autonomous transaction or even in stand-alone query execution.
3. We can make whole procedure itself as autonomous, which will be similar to start autonomous transaction in the beginning of the procedure and commit/rollback at the end of the procedure.
There was another discussion in Unconference Day to decide whether to implement COMMIT/ROLLBACK inside the procedure or autonomous transaction. So our opinion about this is that
COMMIT/ROLLBACK inside procedure will be somewhat different from Autonomous Transaction as incase of first, once we commit inside the procedure,
it commits everything done before call of procedure. This is the behavior of Oracle.
So in this case user required to be very careful to not do any operation before call of procedure, which is not yet intended to be committed inside procedure.
So we can prefer to implement Autonomous Transaction, which will not only be compatible with Oracle but also gives really strong required features.
I have not put the use-cases here as already we agree about its strong use-cases.
Requesting for everyone's opinion regarding this based on which we can proceed to enhance/tune/re-write our design.
Thanks and Regards,
Kumar Rajeev Rastogi
--
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus <josh@agliodbs.com> wrote: > Batch Jobs: large data-manipulation tasks which need to be broken up > into segments, with each segment committing separately. Example: > updating 1 million records in batches of 1000. Autonomous transactions are not a good fit for this case; stored procedures are a better way to go for any scenario where you don't want be be in a snapshot (for example, suppose you want to change isolation level on the fly). merlin
On 23 July 2015 21:04, Robert Haas Wrote: On Thu, Jul 23, 2015 at 1:31 AM, Rajeev rastogi <rajeev.rastogi@huawei.com> wrote: >> 2. It should be allowed to deadlock with master transaction. We >> need to work-out a solution to avoid deadlock. >This sentence seems to contradict itself. I thought the consensus was that the transaction should NOT conflict with themaster transaction. Since we are saying transaction is autonomous to parent transaction, we cannot guarantee that it does not take any conflictinglock unless otherwise designed so by the application. But yes, we should have mechanism to deal with the possible deadlock. >> 3. It can support multiple level of nesting based on the >> configuration (may be max as 70). >Why have any fixed maximum? Since we are planning to have nested autonomous transaction, so it is required to have limit on this so that resources canbe controlled. >> 2. The above commands can be issued either inside the procedure to >> make few statements of procedure inside autonomous transaction or even >> in stand-alone query execution. >I think inside a procedure the autonomous transaction will need to be lexically scoped. You won't be able to do this, forexample: >BEGIN AUTONOMOUS TRANSACTION; >FOR x IN SELECT ... LOOP > COMMIT; > BEGIN AUTONOMOUS TRANSACTION; >END LOOP; I am not sure, how we will be able to control this. IMHO user should be able to control this, especially since it does nothave any meaning from user perspective. Please let me know if I am missing something here. >Rather you'd have to do something like this: >FOR x IN SELECT .. LOOP > BEGIN WITH AUTONOMOUS TRANSACTION > do stuff > END; >END LOOP; Thanks and Regards, Kumar Rajeev Rastogi
On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Batch Jobs: large data-manipulation tasks which need to be broken up
> into segments, with each segment committing separately. Example:
> updating 1 million records in batches of 1000.
Autonomous transactions are not a good fit for this case; stored
procedures are a better way to go for any scenario where you don't
want be be in a snapshot (for example, suppose you want to change
isolation level on the fly).
On 07/27/2015 02:41 PM, Joel Jacobson wrote: > However, we should also add a way for the caller to protect against an > Autonomous Transaction in a function called by the caller. Imagine if > you're the author of function X() and within X() make use of some other > function Y() which has been written by some other author, and within > your function X(), it's very important either all of your work or none > at all gets committed, then you need to make sure none of the changes > you made before calling Y() gets committed, and thus we need a way to > prevent Y() from starting and committing an Autonomous Transaction, > otherwise we would increase the risk and complexity of working with > functions and plpgsql in PostgreSQL as you would then need to be sure > none of the functions you are using within a function will start and > commit an ATX. Ah, you're missing how commits in ATX are expected to work. Let me illustrate: X ( Data write A1 call Y(Start ATXData write B1Commit ATX ) Data write A2 Exception ) In this workflow, B1 would be committed and persistent. Neither A1 nor A2 would be committed, or visible to other users. Depending on what implementation we end up with, A1 might not even be visible to Y(). So that solves your use case without any need to "block" ATXs in called functions. However, it leads to some interesting cases involving self-deadlocks; see the original post on this thread. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 07/27/2015 02:47 AM, Rajeev rastogi wrote: >> Why have any fixed maximum? > Since we are planning to have nested autonomous transaction, so it is required to have limit on this so that resourcescan be controlled. Is there a particular reason why this limit wouldn't just be max_stack_depth? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Mon, Jul 27, 2015 at 4:41 PM, Joel Jacobson <joel@trustly.com> wrote: > On Fri, Jul 24, 2015 at 9:39 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> >> On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus <josh@agliodbs.com> wrote: >> > Batch Jobs: large data-manipulation tasks which need to be broken up >> > into segments, with each segment committing separately. Example: >> > updating 1 million records in batches of 1000. >> >> Autonomous transactions are not a good fit for this case; stored >> procedures are a better way to go for any scenario where you don't >> want be be in a snapshot (for example, suppose you want to change >> isolation level on the fly). > > > Hm, you mean we need real "stored procedures" in PostgreSQL and not just > "functions"? Yes, exactly. Autonomous transactions aren't really set up for cases where the function runs for a very long time or indefinitely. This is the 'advancing xmin' problem as Josh puts it but I think the problem is much bigger than that. Anyways, this is mostly irrelevant to autonomous transactions as long as the design isn't extended to try and cover that case. Is the Autonomous Transaction feature only going to be exposed through pl/pgsql? merlin
Ah, you're missing how commits in ATX are expected to work. Let me
illustrate:
X (
Data write A1
call Y(
Start ATX
Data write B1
Commit ATX
)
Data write A2
Exception
)
In this workflow, B1 would be committed and persistent. Neither A1 nor
A2 would be committed, or visible to other users. Depending on what
implementation we end up with, A1 might not even be visible to Y().
So that solves your use case without any need to "block" ATXs in called
functions. However, it leads to some interesting cases involving
self-deadlocks; see the original post on this thread.
On 07/27/2015 03:12 PM, Joel Jacobson wrote: > On Mon, Jul 27, 2015 at 11:49 PM, Josh Berkus <josh@agliodbs.com > <mailto:josh@agliodbs.com>> wrote: > > Ah, you're missing how commits in ATX are expected to work. Let me > illustrate: > > X ( > Data write A1 > call Y( > Start ATX > Data write B1 > Commit ATX > ) > Data write A2 > Exception > ) > > In this workflow, B1 would be committed and persistent. Neither A1 nor > A2 would be committed, or visible to other users. Depending on what > implementation we end up with, A1 might not even be visible to Y(). > > So that solves your use case without any need to "block" ATXs in called > functions. However, it leads to some interesting cases involving > self-deadlocks; see the original post on this thread. > > > I don't follow. In your example above, if I'm X(), how do I ensure Y() > won't have committed anyting at all when I later at "Exception" decide > to rollback everything from "Data write A1" to "Data write A2" including > any writes made by Y() (in the example "Data write B1")? Ah, ok. The goal of the project is that the writer of X() *cannot* prevent Y() from writing its data (B1) and committing it. One of the primary use cases for ATX is audit triggers. If a function writer could override ATX and prevent the audit triggers from committing, then that use case would be violated. Can you explain what use case you have where simply telling the staff "if you use ATX without clearing it, you'll be fired" is not sufficient?Possibly there's something we failed to account forin the unconference discussion. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 7/27/15 5:12 PM, Joel Jacobson wrote: > Right now, when writing a function, if you raise an exception, you can > be sure all writes you have made will be rollbacked, but your caller > function might caught the exception and decide to carry on and commit > work made before your function was called, but at least you can be > confident your writes won't be committed as long as you don't caught the > exception you raised in your own function. If I understand it correctly, > that would change with the addition of Autonomous Transaction, unless > given a way to prevent a function you call from starting and commiting > a Autonomous Transaction. Wrong? If so, then please show how to prevent > Y() from commiting the "Data write B1" in your example, I don't get it. What's being described here doesn't make sense in either use case ([1] & [2]), but I do understand the concern about what 3rd party software is doing. It would be nice to have the ability to disallow and/or disable autonomous transactions, but I don't see a practical way of doing that other than introducing a new GUC. I'm not sure if it's worth that effort. [1] the "batch process" use case: batches that still hold their own transaction open don't gain anything. [2] the "audit logging" case. If you didn't care about auditing surviving regardless of a rollback then you wouldn't go to the extra work of an autonomous transaction to begin with. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
On 7/27/15 5:56 PM, Josh Berkus wrote: > Can you explain what use case you have where simply telling the staff > "if you use ATX without clearing it, you'll be fired" is not sufficient? > Possibly there's something we failed to account for in the unconference > discussion. That there's no way to enforce that, short of hand-auditing code? There's already enough things that are difficult/impossible to enforce, I'd rather not add another one. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
On 7/27/15 6:40 PM, Jim Nasby wrote: > On 7/27/15 5:12 PM, Joel Jacobson wrote: >> Right now, when writing a function, if you raise an exception, you can >> be sure all writes you have made will be rollbacked, but your caller >> function might caught the exception and decide to carry on and commit >> work made before your function was called, but at least you can be >> confident your writes won't be committed as long as you don't caught the >> exception you raised in your own function. If I understand it correctly, >> that would change with the addition of Autonomous Transaction, unless >> given a way to prevent a function you call from starting and commiting >> a Autonomous Transaction. Wrong? If so, then please show how to prevent >> Y() from commiting the "Data write B1" in your example, I don't get it. > > What's being described here doesn't make sense in either use case ([1] & > [2]), but I do understand the concern about what 3rd party software is > doing. It would be nice to have the ability to disallow and/or disable > autonomous transactions, but I don't see a practical way of doing that > other than introducing a new GUC. I'm not sure if it's worth that effort. It just occurred to me that another option would be to have an event trigger for beginning an autonomous transaction. > [1] the "batch process" use case: batches that still hold their own > transaction open don't gain anything. > > [2] the "audit logging" case. If you didn't care about auditing > surviving regardless of a rollback then you wouldn't go to the extra > work of an autonomous transaction to begin with. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com
On Mon, Jul 27, 2015 at 4:41 PM, Joel Jacobson <joel@trustly.com> wrote:
> On Fri, Jul 24, 2015 at 9:39 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>
>> On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> > Batch Jobs: large data-manipulation tasks which need to be broken up
>> > into segments, with each segment committing separately. Example:
>> > updating 1 million records in batches of 1000.
>>
>> Autonomous transactions are not a good fit for this case; stored
>> procedures are a better way to go for any scenario where you don't
>> want be be in a snapshot (for example, suppose you want to change
>> isolation level on the fly).
>
>
> Hm, you mean we need real "stored procedures" in PostgreSQL and not just
> "functions"?
Yes, exactly.
Autonomous transactions aren't really set up for cases where the
function runs for a very long time or indefinitely. This is the
'advancing xmin' problem as Josh puts it but I think the problem is
much bigger than that. Anyways, this is mostly irrelevant to
autonomous transactions as long as the design isn't extended to try
and cover that case.
Is the Autonomous Transaction feature only going to be exposed through pl/pgsql?
...
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br /></span>Ah, you're missing how commitsin ATX are expected to work. Let me<br /> illustrate:<br /><br /> X (<br /> Data write A1<br /> call Y(<br/> Start ATX<br /> Data write B1<br /> Commit ATX<br /> )<br /> Data write A2<br /> Exception<br /> )<br /><br /> In this workflow, B1 would be committed and persistent. Neither A1 nor<br /> A2 would becommitted, or visible to other users. Depending on what<br /> implementation we end up with, A1 might not even be visibleto Y().<br /><br /></blockquote></div><br /></div><div class="gmail_extra">A1 should never be visible to Y(), elsewe will end up with inconsistencies. E.g.<br /><br /></div><div class="gmail_extra">A1 is a primary key and B1 writesa foreign key referencing A1. Commit ATX, will not complain as it sees A1, but in case X rolls back, we may have B1referencing nothing.<br /><br /></div><div class="gmail_extra">Am I missing something?<br clear="all" /></div><div class="gmail_extra"><br/>-- <br /><div class="gmail_signature"><div dir="ltr">Best Wishes,<br />Ashutosh Bapat<br />EnterpriseDBCorporation<br />The Postgres Database Company<br /></div></div></div></div>
Ah, ok. The goal of the project is that the writer of X() *cannot*
prevent Y() from writing its data (B1) and committing it.
One of the primary use cases for ATX is audit triggers. If a function
writer could override ATX and prevent the audit triggers from
committing, then that use case would be violated.
Can you explain what use case you have where simply telling the staff
"if you use ATX without clearing it, you'll be fired" is not sufficient?
Possibly there's something we failed to account for in the unconference
discussion.
On 23 July 2015 at 13:31, Rajeev rastogi <rajeev.rastogi@huawei.com> wrote: > 1. The autonomous transaction treated as a completely different > transaction from the master transaction. Personally I think that's a lot more useful than having the inner tx able to see the outer tx's uncommitted changes. > 2. It should be allowed to deadlock with master transaction. We > need to work-out a solution to avoid deadlock. The deadlock case in autonomous tx's is a bit different. Assuming you don't intend to allow interleaving, where you can switch between transactions at will rather than just at begin/commit, the only way a deadlock can happen is when the outer tx holds a lock that the inner tx tries to acquire. That should be practical to special-case by maintaining a list of parent transaction (virtual?) transaction IDs. Attempts to wait on a lock held by any of those should fail immediately. There's no point waiting for the deadlock detector since the outer tx can never progress and commit/rollback to release locks, and it might not be able to see the parent/child relationship from outside the backend doing the nested tx anyway. There's no need to check the parent list until we actually try to wait on a lock, though I don't know whether it's practical to delay until then. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 28 July 2015 15:31, Craig Ringer Wrote: >> 2. It should be allowed to deadlock with master transaction. >We >> need to work-out a solution to avoid deadlock. > >The deadlock case in autonomous tx's is a bit different. > >Assuming you don't intend to allow interleaving, where you can switch >between transactions at will rather than just at begin/commit, the only >way a deadlock can happen is when the outer tx holds a lock that the >inner tx tries to acquire. > >That should be practical to special-case by maintaining a list of parent >transaction (virtual?) transaction IDs. Attempts to wait on a lock held >by any of those should fail immediately. There's no point waiting for >the deadlock detector since the outer tx can never progress and >commit/rollback to release locks, and it might not be able to see the >parent/child relationship from outside the backend doing the nested tx >anyway. Thanks, sounds to be a good idea. I shall evaluate the same. Thanks and Regards, Kumar Rajeev Rastogi
On 28 July 2015 03:21, Josh Berkus Wrote: On 07/27/2015 02:47 AM, Rajeev rastogi wrote: >>> Why have any fixed maximum? >> Since we are planning to have nested autonomous transaction, so it is required to have limit on this so that resourcescan be controlled. >Is there a particular reason why this limit wouldn't just be max_stack_depth? We will require to allocate some initial resources in order to handle all nested autonomous transaction. So I think it is better to have some different configuration parameter. Thanks and Regards, Kumar Rajeev Rastogi
On Tue, Jul 28, 2015 at 6:01 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > That should be practical to special-case by maintaining a list of > parent transaction (virtual?) transaction IDs. Attempts to wait on a > lock held by any of those should fail immediately. There's no point > waiting for the deadlock detector since the outer tx can never > progress and commit/rollback to release locks, and it might not be > able to see the parent/child relationship from outside the backend > doing the nested tx anyway. I think we're going entirely down the wrong path here. Why is it ever useful for a backend's lock requests to conflict with themselves, even with autonomous transactions? That seems like an artifact of somebody else's implementation that we should be happy we don't need to copy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 31 July 2015 23:10, Robert Haas Wrote: On Tue, Jul 28, 2015 at 6:01 AM, Craig Ringer <craig@2ndquadrant.com> wrote: >> That should be practical to special-case by maintaining a list of >> parent transaction (virtual?) transaction IDs. Attempts to wait on a >> lock held by any of those should fail immediately. There's no point >> waiting for the deadlock detector since the outer tx can never >> progress and commit/rollback to release locks, and it might not be >> able to see the parent/child relationship from outside the backend >> doing the nested tx anyway. >I think we're going entirely down the wrong path here. Why is it ever useful for a backend's lock requests to conflictwith themselves, even with autonomous transactions? That seems like an artifact of somebody else's implementationthat we should be happy we don't need to copy. IMHO, since most of the locking are managed at transaction level not backend level and we consider main & autonomous transactionto be independent transaction, then practically they may conflict right. It is also right as you said that there is no as such useful use-cases where autonomous transaction conflicts with main (parent)transaction. But we cannot take it for granted as user might make a mistake. So at-least we should have some mechanismto handle this rare case, for which as of now I think throwing error from autonomous transaction as one of the solution.Once error thrown from autonomous transaction, main transaction may continue as it is (or abort main transactionalso??). Any other suggestion to handle this will be really helpful. Thanks and Regards, Kumar Rajeev Rastogi
On Sun, Aug 2, 2015 at 11:37 PM, Rajeev rastogi <rajeev.rastogi@huawei.com> wrote: > On 31 July 2015 23:10, Robert Haas Wrote: >>I think we're going entirely down the wrong path here. Why is it ever useful for a backend's lock requests to conflictwith themselves, even with autonomous transactions? That seems like an artifact of somebody else's implementationthat we should be happy we don't need to copy. > > IMHO, since most of the locking are managed at transaction level not backend level and we consider main & autonomous transactionto be independent transaction, then practically they may conflict right. > It is also right as you said that there is no as such useful use-cases where autonomous transaction conflicts with main(parent) transaction. But we cannot take it for granted as user might make a mistake. So at-least we should have somemechanism to handle this rare case, for which as of now I think throwing error from autonomous transaction as one ofthe solution. Once error thrown from autonomous transaction, main transaction may continue as it is (or abort main transactionalso??). hm. OK, what's the behavior of: BEGIN UPDATE foo SET x = x + 1 WHERE foo_id = 1; BEGIN WITH AUTONOMOUS TRANSACTION UPDATE foo SET x = x + 1 WHERE foo_id = 1; END; RAISE EXCEPTION ...; EXCEPTION ... END; Also, *) What do the other candidate implementations do? IMO, compatibility should be the underlying design principle. *) What will the "SQL only" feature look like? *) Is the SPI interface going to be extended to expose AT? merlin
On 03 August 2015 18:40, Merlin Moncure [mailto:mmoncure@gmail.com] Wrote: >On Sun, Aug 2, 2015 at 11:37 PM, Rajeev rastogi ><rajeev.rastogi@huawei.com> wrote: >> On 31 July 2015 23:10, Robert Haas Wrote: >>>I think we're going entirely down the wrong path here. Why is it ever >useful for a backend's lock requests to conflict with themselves, even >with autonomous transactions? That seems like an artifact of somebody >else's implementation that we should be happy we don't need to copy. >> >> IMHO, since most of the locking are managed at transaction level not >backend level and we consider main & autonomous transaction to be >independent transaction, then practically they may conflict right. >> It is also right as you said that there is no as such useful use-cases >where autonomous transaction conflicts with main (parent) transaction. >But we cannot take it for granted as user might make a mistake. So at- >least we should have some mechanism to handle this rare case, for which >as of now I think throwing error from autonomous transaction as one of >the solution. Once error thrown from autonomous transaction, main >transaction may continue as it is (or abort main transaction also??). > >hm. OK, what's the behavior of: > >BEGIN > UPDATE foo SET x = x + 1 WHERE foo_id = 1; > > BEGIN WITH AUTONOMOUS TRANSACTION > UPDATE foo SET x = x + 1 WHERE foo_id = 1; > END; > > RAISE EXCEPTION ...; >EXCEPTION ... > >END; It should throw an error (or something equivalent) as the second update will wait for record lock to get released, whichin this case will not happen till second update finishes. So catch 22. >Also, >*) What do the other candidate implementations do? IMO, compatibility >should be the underlying design principle. Oracle throws error in such case. But we can decide on what behavior we want to keep. >*) What will the "SQL only" feature look like? Similar to PL as mentioned in your example, we can provide the "SQL only" feature also. >*) Is the SPI interface going to be extended to expose AT? I don’t think at this point that there is any need of exposing SPI interface for this. Thanks and Regards, Kumar Rajeev Rastogi
On Tue, Aug 4, 2015 at 4:12 AM, Rajeev rastogi <rajeev.rastogi@huawei.com> wrote: > On 03 August 2015 18:40, Merlin Moncure [mailto:mmoncure@gmail.com] Wrote: >>On Sun, Aug 2, 2015 at 11:37 PM, Rajeev rastogi >><rajeev.rastogi@huawei.com> wrote: >>> On 31 July 2015 23:10, Robert Haas Wrote: >>>>I think we're going entirely down the wrong path here. Why is it ever >>useful for a backend's lock requests to conflict with themselves, even >>with autonomous transactions? That seems like an artifact of somebody >>else's implementation that we should be happy we don't need to copy. >>> >>> IMHO, since most of the locking are managed at transaction level not >>backend level and we consider main & autonomous transaction to be >>independent transaction, then practically they may conflict right. >>> It is also right as you said that there is no as such useful use-cases >>where autonomous transaction conflicts with main (parent) transaction. >>But we cannot take it for granted as user might make a mistake. So at- >>least we should have some mechanism to handle this rare case, for which >>as of now I think throwing error from autonomous transaction as one of >>the solution. Once error thrown from autonomous transaction, main >>transaction may continue as it is (or abort main transaction also??). >> >>hm. OK, what's the behavior of: >> >>BEGIN >> UPDATE foo SET x = x + 1 WHERE foo_id = 1; >> >> BEGIN WITH AUTONOMOUS TRANSACTION >> UPDATE foo SET x = x + 1 WHERE foo_id = 1; >> END; >> >> RAISE EXCEPTION ...; >>EXCEPTION ... >> >>END; > > It should throw an error (or something equivalent) as the second update will wait for record lock to get released, whichin this case will not happen till second update finishes. So catch 22. Yeah. Point being, from my point of view autonomous transactions have to conflict with the master transaction (or any transaction really). I agree the right course of action is to error out immediately...what else could you do? There isn't even a deadlock in the classic sense and allowing control to continue would result in indeterminate behavior FWICT. >>Also, >>*) What do the other candidate implementations do? IMO, compatibility>>should be the underlying design principle. > > Oracle throws error in such case. But we can decide on what behavior we want to keep. gotcha. makes sense. >>*) What will the "SQL only" feature look like? > > Similar to PL as mentioned in your example, we can provide the "SQL only" feature also. > >>*) Is the SPI interface going to be extended to expose AT? > > I don’t think at this point that there is any need of exposing SPI interface for this. Ok, how do AT work in a non-plpgsql ("SQL only") scenario? Are you going to similarly extend BEGIN TRANSACTION? merlin
On Mon, Aug 3, 2015 at 9:09 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > hm. OK, what's the behavior of: > > BEGIN > UPDATE foo SET x = x + 1 WHERE foo_id = 1; > > BEGIN WITH AUTONOMOUS TRANSACTION > UPDATE foo SET x = x + 1 WHERE foo_id = 1; > END; > > RAISE EXCEPTION ...; > EXCEPTION ... > > END; Sure, so that case might need a little bit of special handling. That doesn't mean it's a good idea for heavyweight locks to conflict in general. I think you're going to find that implementing the latter is an extremely unrewarding task, and that the benefits are seriously negative. For example, consider: BEGIN UPDATE foo SET x = x + 1 WHERE foo_id = 1; BEGIN WITH AUTONOMOUS TRANSACTION UPDATE foo SET x = x + 1 WHERE foo_id = 2; END; END; Now, suppose that a concurrent session does LOCK TABLE foo after the first UPDATE and before the second one. That's now a soft deadlock. But the only way the deadlock detector can see that is if the main transaction and the autonomous transaction have separate PGPROC entries, which is a design we explicitly rejected because it puts a tight limit on the number of ATs that can be in progress and the level to which those ATs can be nested. But let's say you don't care, so we go back to that design. The deadlock detector will have to be taught that the outer transaction can't help but wait for the inner transaction, so we teach it that. Now it can see that the only way to resolve the deadlock without aborting any transactions is to reorder the lock request from the autonomous transaction ahead of the concurrent session that is seeking a full table lock. So the autonomous transaction acquires the lock without blocking after all. You have exactly the same result that you would have had anyway but with a phenomenal amount of additional code and complexity. And for what? In the original example, the way the deadlock is going to be reported is like this: ERROR: deadlock detected DETAIL: Process 12345 waits for ShareLock on transaction 1000; blocked by process 12345. Process 12345 waits for ShareLock on transaction 1001; blocked by process 12345. That is not a model of clarity. On the other hand, if you just make a rule that attempting to update or delete a tuple that an outer transaction has already updated throws a bespoke error, you can do something like this: ERROR: tuple to be updated was already modified by a suspended outer transaction ...which has precedent in an existing message in trigger.c. Similarly, if you try to drop a table that the outer transaction has locked, the natural thing is for CheckTableNotInUse() to catch that and report it this way: ERROR: cannot DROP TABLE "foo" because it is being used by active queries in this session If you work hard enough, you can instead make that generate a deadlock error message, but you're going to have to work pretty hard, and the result is worse. I'd really like to hear some more *specific* scenarios where it's valuable for locks to conflict between the outer transaction and the AT. I grant that tuple updates are a case where the conflict has to be detected somehow, but I don't accept that the lock manager is the best way to do that, and I don't accept that there are a large number of other cases that will need similar handling. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Aug 6, 2015 at 4:15 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Aug 3, 2015 at 9:09 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> hm. OK, what's the behavior of: >> >> BEGIN >> UPDATE foo SET x = x + 1 WHERE foo_id = 1; >> >> BEGIN WITH AUTONOMOUS TRANSACTION >> UPDATE foo SET x = x + 1 WHERE foo_id = 1; >> END; >> >> RAISE EXCEPTION ...; >> EXCEPTION ... >> >> END; > > Sure, so that case might need a little bit of special handling. That > doesn't mean it's a good idea for heavyweight locks to conflict in > general. I think you're going to find that implementing the latter is > an extremely unrewarding task, and that the benefits are seriously > negative. For example, consider: > > BEGIN > UPDATE foo SET x = x + 1 WHERE foo_id = 1; > BEGIN WITH AUTONOMOUS TRANSACTION > UPDATE foo SET x = x + 1 WHERE foo_id = 2; > END; > END; > > Now, suppose that a concurrent session does LOCK TABLE foo after the > first UPDATE and before the second one. That's now a soft deadlock. > But the only way the deadlock detector can see that is if the main > transaction and the autonomous transaction have separate PGPROC > entries, which is a design we explicitly rejected because it puts a > tight limit on the number of ATs that can be in progress and the level > to which those ATs can be nested. But let's say you don't care, so we > go back to that design. The deadlock detector will have to be taught > that the outer transaction can't help but wait for the inner > transaction, so we teach it that. Now it can see that the only way to > resolve the deadlock without aborting any transactions is to reorder > the lock request from the autonomous transaction ahead of the > concurrent session that is seeking a full table lock. So the > autonomous transaction acquires the lock without blocking after all. > You have exactly the same result that you would have had anyway but > with a phenomenal amount of additional code and complexity. > > And for what? In the original example, the way the deadlock is going > to be reported is like this: > > ERROR: deadlock detected > DETAIL: Process 12345 waits for ShareLock on transaction 1000; blocked > by process 12345. > Process 12345 waits for ShareLock on transaction 1001; blocked by process 12345. > > That is not a model of clarity. On the other hand, if you just make a > rule that attempting to update or delete a tuple that an outer > transaction has already updated throws a bespoke error, you can do > something like this: > > ERROR: tuple to be updated was already modified by a suspended outer transaction > > ...which has precedent in an existing message in trigger.c. > Similarly, if you try to drop a table that the outer transaction has > locked, the natural thing is for CheckTableNotInUse() to catch that > and report it this way: > > ERROR: cannot DROP TABLE "foo" because it is being used by active > queries in this session > > If you work hard enough, you can instead make that generate a deadlock > error message, but you're going to have to work pretty hard, and the > result is worse. > > I'd really like to hear some more *specific* scenarios where it's > valuable for locks to conflict between the outer transaction and the > AT. I grant that tuple updates are a case where the conflict has to > be detected somehow, but I don't accept that the lock manager is the > best way to do that, and I don't accept that there are a large number > of other cases that will need similar handling. I don't necessarily disagree with what you're saying, but it's not clear to me what the proposed behavior is. Since the AT can commit before the outer, ISTM *any* ungranted lock requested by the AT but held by the outer leads to either A: functional deadlock (regardless of implementation details) or B: special behavior. Deadlocks would certainly require some acrobatics to detect and resolve due to the fact that one party to the lock is not in fact blocked on a lock but on the outer's execution state. So maybe the right thing to do is to simply ignore the problem and hang both transactions until timeout or cancel; this isn't really much different vs. ghetto dblink style AT that is done today in my experience. merlin
On Thu, Aug 6, 2015 at 11:04 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > I don't necessarily disagree with what you're saying, but it's not > clear to me what the proposed behavior is. Since the AT can commit > before the outer, ISTM *any* ungranted lock requested by the AT but > held by the outer leads to either A: functional deadlock (regardless > of implementation details) or B: special behavior. I don't accept that. We've already GOT cases where a query can be suspended and other queries can be running in the same backend. You can do that via cursors. Those cases work fine, and the deadlock detector doesn't know anything about them. How is this any different? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Aug 07, 2015 at 11:26:08AM -0400, Robert Haas wrote: > On Thu, Aug 6, 2015 at 11:04 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > > I don't necessarily disagree with what you're saying, but it's not > > clear to me what the proposed behavior is. Since the AT can commit > > before the outer, ISTM *any* ungranted lock requested by the AT but > > held by the outer leads to either A: functional deadlock (regardless > > of implementation details) or B: special behavior. > > I don't accept that. We've already GOT cases where a query can be > suspended and other queries can be running in the same backend. You > can do that via cursors. Those cases work fine, and the deadlock > detector doesn't know anything about them. How is this any different? In today's scenarios, the later query cannot commit unless the suspended query also commits. (Changing that is the raison d'être of autonomous transactions.) If the autonomous transaction can interact with uncommitted work in a way that other backends could not, crazy things happen when the autonomous transaction commits and the suspended transaction aborts: CREATE TABLE t (c) AS SELECT 1; BEGIN; UPDATE t SET c = 2 WHERE c = 1; BEGIN_AUTONOMOUS; UPDATE t SET c = 3 WHERE c = 1; UPDATE t SET c = 4 WHERE c = 2; COMMIT_AUTONOMOUS; ROLLBACK; If you replace the autonomous transaction with a savepoint, the c=3 update finds no rows, and the c=4 update changes one row. When the outer transaction aborts, only the original c=1 row remains live. If you replace the autonomous transaction with a dblink/pg_background call, the c=3 update waits indefinitely for c=2 to commit or abort, an undetected deadlock. Suppose you make the autonomous transaction see tuples like in the savepoint case. The c=3 update finds no rows to update, and the c=4 update changes one row. When the outer transaction aborts, you have two live rows (c=1 and c=4). Suppose you instead make the autonomous transaction see tuples like in the dblink case, yet let c=3 ignore the lock and change a row. If both the autonomous transaction and the outer transaction were to commit, then you get two live rows (c=2 and c=3). Neither of those outcomes is acceptable, of course. In today's tuple update rules, c=3 must deadlock[1]. Other credible tuple update rules may not have this problem, but nothing jumps to mind. [1] That's not to say it must use the shmem lock structures and deadlock detector.
Noah Misch wrote: > In today's scenarios, the later query cannot commit unless the suspended query > also commits. (Changing that is the raison d'être of autonomous > transactions.) If the autonomous transaction can interact with uncommitted > work in a way that other backends could not, crazy things happen when the > autonomous transaction commits and the suspended transaction aborts: > > CREATE TABLE t (c) AS SELECT 1; > BEGIN; > UPDATE t SET c = 2 WHERE c = 1; > BEGIN_AUTONOMOUS; > UPDATE t SET c = 3 WHERE c = 1; > UPDATE t SET c = 4 WHERE c = 2; > COMMIT_AUTONOMOUS; > ROLLBACK; > > If you replace the autonomous transaction with a savepoint, the c=3 update > finds no rows, and the c=4 update changes one row. When the outer transaction > aborts, only the original c=1 row remains live. If you replace the autonomous > transaction with a dblink/pg_background call, the c=3 update waits > indefinitely for c=2 to commit or abort, an undetected deadlock. Maybe what we need to solve this is to restrict what the autonomous transaction can do; for instance, make it so that the autonomous transaction can see all rows of the outer transaction as if the outer transaction were committed, but trying to update any such row raises an error. As far as I can see, this closes this particular problem. (We likely need additional rules to close all holes, but hopefully you get the idea.) Perhaps there exists a set of rules strong enough to eliminate all problematic visibility scenarios, but which still enables behavior useful enough to cover the proposed use cases. The audit scenario is covered because the audit trail doesn't need to modify the audited tuples themselves, only read them. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Aug 15, 2015 at 10:20:55PM -0300, Alvaro Herrera wrote: > Noah Misch wrote: > > > In today's scenarios, the later query cannot commit unless the suspended query > > also commits. (Changing that is the raison d'être of autonomous > > transactions.) If the autonomous transaction can interact with uncommitted > > work in a way that other backends could not, crazy things happen when the > > autonomous transaction commits and the suspended transaction aborts: > > > > CREATE TABLE t (c) AS SELECT 1; > > BEGIN; > > UPDATE t SET c = 2 WHERE c = 1; > > BEGIN_AUTONOMOUS; > > UPDATE t SET c = 3 WHERE c = 1; > > UPDATE t SET c = 4 WHERE c = 2; > > COMMIT_AUTONOMOUS; > > ROLLBACK; > > > > If you replace the autonomous transaction with a savepoint, the c=3 update > > finds no rows, and the c=4 update changes one row. When the outer transaction > > aborts, only the original c=1 row remains live. If you replace the autonomous > > transaction with a dblink/pg_background call, the c=3 update waits > > indefinitely for c=2 to commit or abort, an undetected deadlock. > > Maybe what we need to solve this is to restrict what the autonomous > transaction can do; for instance, make it so that the autonomous > transaction can see all rows of the outer transaction as if the outer > transaction were committed, but trying to update any such row raises an > error. As far as I can see, this closes this particular problem. (We > likely need additional rules to close all holes, but hopefully you get > the idea.) > > Perhaps there exists a set of rules strong enough to eliminate all > problematic visibility scenarios, but which still enables behavior > useful enough to cover the proposed use cases. The audit scenario is > covered because the audit trail doesn't need to modify the audited > tuples themselves, only read them. My starting expectation is that the semantics of an autonomous transaction will be exactly those of dblink/pg_background. (I said that during the unconference session.) The application would need to read data from tables before switching to the autonomous section. Autonomous transactions are then a performance and syntactic help, not a source of new semantics. Does any database have autonomous transactions that do otherwise?
Noah Misch wrote: > > > If the autonomous transaction can interact with uncommitted > > > work in a way that other backends could not, crazy things happen when the > > > autonomous transaction commits and the suspended transaction aborts: > > > > > > CREATE TABLE t (c) AS SELECT 1; > > > BEGIN; > > > UPDATE t SET c = 2 WHERE c = 1; > > > BEGIN_AUTONOMOUS; > > > UPDATE t SET c = 3 WHERE c = 1; > > > UPDATE t SET c = 4 WHERE c = 2; > > > COMMIT_AUTONOMOUS; > > > ROLLBACK; > > > > > > If you replace the autonomous transaction with a savepoint, the c=3 update > > > finds no rows, and the c=4 update changes one row. When the outer transaction > > > aborts, only the original c=1 row remains live. If you replace the autonomous > > > transaction with a dblink/pg_background call, the c=3 update waits > > > indefinitely for c=2 to commit or abort, an undetected deadlock. > My starting expectation is that the semantics of an autonomous transaction > will be exactly those of dblink/pg_background. (I said that during the > unconference session.) The application would need to read data from tables > before switching to the autonomous section. Autonomous transactions are then > a performance and syntactic help, not a source of new semantics. Does any > database have autonomous transactions that do otherwise? Oracle behaves like that, i.e. it deadlocks with your example: SQL> SELECT * FROM t; C ---------- 1 SQL> CREATE PROCEDURE proc2 IS 2 PRAGMA AUTONOMOUS_TRANSACTION; 3 BEGIN 4 UPDATE t SET c = 3 WHERE c = 1; 5 UPDATE tSET c = 4 WHERE c = 2; 6 COMMIT; 7 END; 8 / Procedure created. SQL> CREATE PROCEDURE proc1 IS 2 BEGIN 3 UPDATE t SET c = 2 WHERE c = 1; 4 proc2; 5 ROLLBACK; 6 END; 7 / Procedure created. SQL> CALL proc1(); CALL proc1() * ERROR at line 1: ORA-00060: deadlock detected while waiting for resource ORA-06512: at "LAURENZ.PROC2", line 4 ORA-06512: at "LAURENZ.PROC1", line 4 Yours, Laurenz Albe
On Sat, Aug 15, 2015 at 6:47 PM, Noah Misch <noah@leadboat.com> wrote: > CREATE TABLE t (c) AS SELECT 1; > BEGIN; > UPDATE t SET c = 2 WHERE c = 1; > BEGIN_AUTONOMOUS; > UPDATE t SET c = 3 WHERE c = 1; > UPDATE t SET c = 4 WHERE c = 2; > COMMIT_AUTONOMOUS; > ROLLBACK; > > If you replace the autonomous transaction with a savepoint, the c=3 update > finds no rows, and the c=4 update changes one row. When the outer transaction > aborts, only the original c=1 row remains live. If you replace the autonomous > transaction with a dblink/pg_background call, the c=3 update waits > indefinitely for c=2 to commit or abort, an undetected deadlock. > > Suppose you make the autonomous transaction see tuples like in the savepoint > case. The c=3 update finds no rows to update, and the c=4 update changes one > row. When the outer transaction aborts, you have two live rows (c=1 and c=4). > Suppose you instead make the autonomous transaction see tuples like in the > dblink case, yet let c=3 ignore the lock and change a row. If both the > autonomous transaction and the outer transaction were to commit, then you get > two live rows (c=2 and c=3). Neither of those outcomes is acceptable, of > course. In today's tuple update rules, c=3 must deadlock[1]. Other credible > tuple update rules may not have this problem, but nothing jumps to mind. > > [1] That's not to say it must use the shmem lock structures and deadlock > detector. This footnote goes to my point. It seems clear to me that having the autonomous transaction "see" the effects of the outer uncommitted transaction is a recipe for trouble. If the autonomous transaction updates a row and commits, and the outer transaction later aborts, the resulting state is inconsistent with any serial history. I'm fairly certain that's going to leave us in an unhappy place. Even more obviously, ending up with two committed row versions that are both updates of a single ancestor version is no good. So, I agree that this scenario should be an error. What I don't agree with is the idea that it should be the deadlock detector's job to throw that error. Rather, I think that when we examine the xmax of the tuple we can see - which is the original one, not the one updated by the outer transaction - we should check whether that XID belongs to an outer transaction. If it does, we should throw an error instead of trying to lock it. That way (1) the error message will be clear and specific to the situation and (2) we don't need a separate PGPROC for each autonomous transaction. The first of those benefits is agreeable; the second one is, in my opinion, a key design goal for this feature. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Aug 18, 2015 at 8:17 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Aug 15, 2015 at 6:47 PM, Noah Misch <noah@leadboat.com> wrote: >> CREATE TABLE t (c) AS SELECT 1; >> BEGIN; >> UPDATE t SET c = 2 WHERE c = 1; >> BEGIN_AUTONOMOUS; >> UPDATE t SET c = 3 WHERE c = 1; >> UPDATE t SET c = 4 WHERE c = 2; >> COMMIT_AUTONOMOUS; >> ROLLBACK; >> >> If you replace the autonomous transaction with a savepoint, the c=3 update >> finds no rows, and the c=4 update changes one row. When the outer transaction >> aborts, only the original c=1 row remains live. If you replace the autonomous >> transaction with a dblink/pg_background call, the c=3 update waits >> indefinitely for c=2 to commit or abort, an undetected deadlock. >> >> Suppose you make the autonomous transaction see tuples like in the savepoint >> case. The c=3 update finds no rows to update, and the c=4 update changes one >> row. When the outer transaction aborts, you have two live rows (c=1 and c=4). >> Suppose you instead make the autonomous transaction see tuples like in the >> dblink case, yet let c=3 ignore the lock and change a row. If both the >> autonomous transaction and the outer transaction were to commit, then you get >> two live rows (c=2 and c=3). Neither of those outcomes is acceptable, of >> course. In today's tuple update rules, c=3 must deadlock[1]. Other credible >> tuple update rules may not have this problem, but nothing jumps to mind. >> >> [1] That's not to say it must use the shmem lock structures and deadlock >> detector. > > This footnote goes to my point. > > It seems clear to me that having the autonomous transaction "see" the > effects of the outer uncommitted transaction is a recipe for trouble. > If the autonomous transaction updates a row and commits, and the outer > transaction later aborts, the resulting state is inconsistent with any > serial history. I'm fairly certain that's going to leave us in an > unhappy place. > > Even more obviously, ending up with two committed row versions that > are both updates of a single ancestor version is no good. > > So, I agree that this scenario should be an error. What I don't agree > with is the idea that it should be the deadlock detector's job to > throw that error. Rather, I think that when we examine the xmax of > the tuple we can see - which is the original one, not the one updated > by the outer transaction - we should check whether that XID belongs to > an outer transaction. Hm: do you mean 'an' outer transaction (meaning, basically, any in progress transaction) or the outer transaction of the AT. I think you mean outer transaction of the AT, which makes a lot of sense and should be easy and fast to test. It's like an implied NOWAIT if the locker is the AT and the lockee is the parent. Can you get away with only looking at tuples though? For example, what about advisory locks? Table locks? merlin
On 18 August 2015 21:18, Robert Haas Wrote: >This footnote goes to my point. > >It seems clear to me that having the autonomous transaction "see" the >effects of the outer uncommitted transaction is a recipe for trouble. >If the autonomous transaction updates a row and commits, and the outer >transaction later aborts, the resulting state is inconsistent with any >serial history. I'm fairly certain that's going to leave us in an >unhappy place. > >Even more obviously, ending up with two committed row versions that are >both updates of a single ancestor version is no good. > >So, I agree that this scenario should be an error. What I don't agree >with is the idea that it should be the deadlock detector's job to throw >that error. Rather, I think that when we examine the xmax of the tuple >we can see - which is the original one, not the one updated by the outer >transaction - we should check whether that XID belongs to an outer >transaction. If it does, we should throw an error instead of trying to >lock it. That way (1) the error message will be clear and specific to >the situation and (2) we don't need a separate PGPROC for each >autonomous transaction. The first of those benefits is agreeable; the >second one is, in my opinion, a key design goal for this feature. Yes I agree with this. I was in favor of error all the time without involving deadlock detector. Thanks and Regards, Kumar Rajeev Rastogi
On Tue, Aug 18, 2015 at 3:06 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Aug 18, 2015 at 8:17 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sat, Aug 15, 2015 at 6:47 PM, Noah Misch <noah@leadboat.com> wrote: >>> CREATE TABLE t (c) AS SELECT 1; >>> BEGIN; >>> UPDATE t SET c = 2 WHERE c = 1; >>> BEGIN_AUTONOMOUS; >>> UPDATE t SET c = 3 WHERE c = 1; >>> UPDATE t SET c = 4 WHERE c = 2; >>> COMMIT_AUTONOMOUS; >>> ROLLBACK; >>> >>> If you replace the autonomous transaction with a savepoint, the c=3 update >>> finds no rows, and the c=4 update changes one row. When the outer transaction >>> aborts, only the original c=1 row remains live. If you replace the autonomous >>> transaction with a dblink/pg_background call, the c=3 update waits >>> indefinitely for c=2 to commit or abort, an undetected deadlock. >>> >>> Suppose you make the autonomous transaction see tuples like in the savepoint >>> case. The c=3 update finds no rows to update, and the c=4 update changes one >>> row. When the outer transaction aborts, you have two live rows (c=1 and c=4). >>> Suppose you instead make the autonomous transaction see tuples like in the >>> dblink case, yet let c=3 ignore the lock and change a row. If both the >>> autonomous transaction and the outer transaction were to commit, then you get >>> two live rows (c=2 and c=3). Neither of those outcomes is acceptable, of >>> course. In today's tuple update rules, c=3 must deadlock[1]. Other credible >>> tuple update rules may not have this problem, but nothing jumps to mind. >>> >>> [1] That's not to say it must use the shmem lock structures and deadlock >>> detector. >> >> This footnote goes to my point. >> >> It seems clear to me that having the autonomous transaction "see" the >> effects of the outer uncommitted transaction is a recipe for trouble. >> If the autonomous transaction updates a row and commits, and the outer >> transaction later aborts, the resulting state is inconsistent with any >> serial history. I'm fairly certain that's going to leave us in an >> unhappy place. >> >> Even more obviously, ending up with two committed row versions that >> are both updates of a single ancestor version is no good. >> >> So, I agree that this scenario should be an error. What I don't agree >> with is the idea that it should be the deadlock detector's job to >> throw that error. Rather, I think that when we examine the xmax of >> the tuple we can see - which is the original one, not the one updated >> by the outer transaction - we should check whether that XID belongs to >> an outer transaction. > > Hm: do you mean 'an' outer transaction (meaning, basically, any in > progress transaction) or the outer transaction of the AT. I think you > mean outer transaction of the AT, which makes a lot of sense and > should be easy and fast to test. It's like an implied NOWAIT if the > locker is the AT and the lockee is the parent. > > Can you get away with only looking at tuples though? For example, > what about advisory locks? Table locks? Well, that's an interesting question. Can we get away with regarding those things as non-conflicting, as between the parent and child transactions? If we want to insist that they conflict, the easy case is easy: when acquiring a lock, we can consult the local lock table hash and see if an outer transaction already holds such a lock. If so, we error out. The problem is when somebody else is waiting for the lock we hold. That gets complicated, because the deadlock detector can't work out that we're deadlocking with ourselves unless there are two separate PGPROCs, one for the inner transaction and one for the outer transaction, and even then it might think it can rearrange the lock queue so that the outer transaction goes first, which in fact won't work here. I'm not exactly sure how to make this work out without involving the deadlock detector, but I think we should try hard to get there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Aug 21, 2015 at 10:06:44AM -0400, Robert Haas wrote: > On Tue, Aug 18, 2015 at 3:06 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > > On Tue, Aug 18, 2015 at 8:17 AM, Robert Haas <robertmhaas@gmail.com> wrote: > >> On Sat, Aug 15, 2015 at 6:47 PM, Noah Misch <noah@leadboat.com> wrote: > >>> [1] That's not to say it must use the shmem lock structures and deadlock > >>> detector. > >> > >> This footnote goes to my point. > >> So, I agree that this scenario should be an error. What I don't agree > >> with is the idea that it should be the deadlock detector's job to > >> throw that error. I couldn't gather from your earlier messages that this scenario should get an error, so I'm glad to have that clarified. > > Can you get away with only looking at tuples though? For example, > > what about advisory locks? Table locks? > > Well, that's an interesting question. Can we get away with regarding > those things as non-conflicting, as between the parent and child > transactions? For system lock types, no. While one could define advisory locks to work differently, we should assume that today's advisory lockers have expectations like those of system lockers. An autonomous transaction should not bypass any lock that a transaction of another backend could not bypass.
On Sat, Aug 22, 2015 at 2:23 AM, Noah Misch <noah@leadboat.com> wrote: >> > Can you get away with only looking at tuples though? For example, >> > what about advisory locks? Table locks? >> >> Well, that's an interesting question. Can we get away with regarding >> those things as non-conflicting, as between the parent and child >> transactions? > > For system lock types, no. While one could define advisory locks to work > differently, we should assume that today's advisory lockers have expectations > like those of system lockers. An autonomous transaction should not bypass any > lock that a transaction of another backend could not bypass. Why? Suppose you do this: BEGIN; DECLARE CURSOR foo FOR SELECT * FROM foo; BEGIN AUTONOMOUS TRANSACTION; ALTER TABLE foo ALTER bar TYPE int; This has got to fail for safety reasons, but CheckTableNotInUse() is on it. Suppose you do this: BEGIN; LOCK foo; BEGIN AUTONOMOUS TRANSACTION; INSERT INTO foo VALUES ('spelunk'); How will making this fail improve anything? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Sep 03, 2015 at 04:21:55PM -0400, Robert Haas wrote: > On Sat, Aug 22, 2015 at 2:23 AM, Noah Misch <noah@leadboat.com> wrote: > >> > Can you get away with only looking at tuples though? For example, > >> > what about advisory locks? Table locks? > >> > >> Well, that's an interesting question. Can we get away with regarding > >> those things as non-conflicting, as between the parent and child > >> transactions? > > > > For system lock types, no. While one could define advisory locks to work > > differently, we should assume that today's advisory lockers have expectations > > like those of system lockers. An autonomous transaction should not bypass any > > lock that a transaction of another backend could not bypass. > > Why? > > Suppose you do this: > > BEGIN; > DECLARE CURSOR foo FOR SELECT * FROM foo; > BEGIN AUTONOMOUS TRANSACTION; > ALTER TABLE foo ALTER bar TYPE int; > > This has got to fail for safety reasons, but CheckTableNotInUse() is > on it. Suppose you do this: > > BEGIN; > LOCK foo; > BEGIN AUTONOMOUS TRANSACTION; > INSERT INTO foo VALUES ('spelunk'); > > How will making this fail improve anything? Core PostgreSQL doesn't care. This is a user interface design decision to be made in light of current SQL expectations and future SQL author wishes. The LOCK reference page, our contract with users, says nothing to constrain the choice. LOCK is exceptional in that we never get much insight into the caller's expectations. I think LOCK should follow closely the built-in commands that take the same locks. This variation of your examples must fail in order to avoid a paradox if the first transaction aborts: BEGIN; ALTER TABLE foo ALTER bar TYPE frob; BEGIN AUTONOMOUS TRANSACTION; INSERT INTO foo VALUES ('spelunk'); If we made that fail and made your second example succeed, that would imply "LOCK foo" acquires a special kind of AccessExclusiveLock differing from what ALTER TABLE acquires. That's incompatible with my sense of the LOCK command's role in the system. An ability to procure an option to acquire, without waiting, a lock and delegate that option to another transaction would have applications. It's a different feature calling for distinct syntax. My comments have flowed out of a principle that autonomous transactions shall have precisely the same semantics as using another backend via dblink. They should have less overhead. They may give different error messages. They shall not permit sequences of commands that fail in a dblink implementation of the same multi-transaction sequence. I chose this principle because it fits my intuitive notion of transaction "autonomy" and because everything I've heard about other implementations suggests that they work in that way. If some RDBMS implementation does otherwise, I would find that persuasive. What design principle(s) have you been using to decide how autonomous transactions should behave?
On Sun, Sep 6, 2015 at 12:56 AM, Noah Misch <noah@leadboat.com> wrote: > My comments have flowed out of a principle that autonomous transactions shall > have precisely the same semantics as using another backend via dblink. That's what I thought, too. AT is syntax sugar for dblink approach. Syntax and performance aside, I think the only way dblink style AT could be improved would be to have: 1) better cancel handling, especially cancel child when parent cancels (this is a major headache with dblink) 2) automatic rejection of acquire attempts by child on exclusive assets held by parent #2 is what we've been discussing, but I'm pretty confused. Most especially I'm not clear on whether parent and children share a pid or have unique pid. If pids are unique as in dblink, #2 could be done via a query assuming you have a way of identifying child transactions in pg_stat_activity. The discussion has suggested that the pid is shared, which I found odd but took on faith. I'm also wondering if we need syntax to handle synchronous vs asynchronous execution of the AT (the latter of which is only possible with a separate PID, I think). merlin
On Sun, Sep 6, 2015 at 1:56 AM, Noah Misch <noah@leadboat.com> wrote: > What design principle(s) have you been using to decide how autonomous > transactions should behave? I have to admit to a complete lack of principle. I'm quite frightened of what this is going to need from the lock manager, and I'm trying to wriggle out of having to do things there that are going to be nastily hard. My wriggling isn't going very well, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Sep 9, 2015 at 9:04 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Sep 6, 2015 at 1:56 AM, Noah Misch <noah@leadboat.com> wrote: >> What design principle(s) have you been using to decide how autonomous >> transactions should behave? > > I have to admit to a complete lack of principle. I'm quite frightened > of what this is going to need from the lock manager, and I'm trying to > wriggle out of having to do things there that are going to be nastily > hard. My wriggling isn't going very well, though. Hm. Here is current dblink behavior: postgres=# create table l (id int); CREATE TABLE postgres=# insert into l values(1); INSERT 0 1 postgres=# update l set id =2 where id = 1; UPDATE 1 Time: 0.595 ms postgres=# select dblink('', 'update l set id = 3 where id = 1'); <hangs forever due to deadlock of client lock and parent execution point> Does the lock manager really needs to be extended to address this case? pg_locks pretty clearly explains what's happening, via: postgres=# select locktype, transactionid, pid, granted from pg_locks where not granted; locktype │ transactionid │ pid │ granted ───────────────┼───────────────┼───────┼─────────transactionid │ 88380 │ 20543 │ f and postgres=# select locktype, transactionid, pid, granted from pg_locks where transactionid = 88380; locktype │ transactionid │ pid │ granted ───────────────┼───────────────┼───────┼─────────transactionid │ 88380 │ 20543 │ ftransactionid │ 88380 │19022 │ t If pg_locks and/or pg_stat_activity were extended with a 'parent_pid' column, a userland query could terminate affected backends with a join against that column where any ungranted. Naturally there's a lot more to it than that; you'd want to issue an appropriate cancellation message and various other things. I suppose I'm probably missing something, but I'm not clear on why the lock manager needs to be overhauled to deal with this case when we can just scan current strictures assuming we can a) manage child pid independently of parent pid and b) easily figure out who is parent of who. merlin
On Wed, Sep 09, 2015 at 10:04:01AM -0400, Robert Haas wrote: > On Sun, Sep 6, 2015 at 1:56 AM, Noah Misch <noah@leadboat.com> wrote: > > What design principle(s) have you been using to decide how autonomous > > transactions should behave? > > I have to admit to a complete lack of principle. I'm quite frightened > of what this is going to need from the lock manager, and I'm trying to > wriggle out of having to do things there that are going to be nastily > hard. My wriggling isn't going very well, though. It's an exceptionally-challenging development project, agreed. So much code assumes the 1:1 relationship between backends and top-level transactions.
On Thu, Sep 10, 2015 at 8:39 PM, Noah Misch <noah@leadboat.com> wrote: > On Wed, Sep 09, 2015 at 10:04:01AM -0400, Robert Haas wrote: >> On Sun, Sep 6, 2015 at 1:56 AM, Noah Misch <noah@leadboat.com> wrote: >> > What design principle(s) have you been using to decide how autonomous >> > transactions should behave? >> >> I have to admit to a complete lack of principle. I'm quite frightened >> of what this is going to need from the lock manager, and I'm trying to >> wriggle out of having to do things there that are going to be nastily >> hard. My wriggling isn't going very well, though. > > It's an exceptionally-challenging development project, agreed. So much code > assumes the 1:1 relationship between backends and top-level transactions. I guess I'm being obtuse, but can you explain why that assumption must be revisited? I don't see why it has to be...I must be missing something. merlin
On Fri, Sep 11, 2015 at 02:30:53PM -0500, Merlin Moncure wrote: > On Thu, Sep 10, 2015 at 8:39 PM, Noah Misch <noah@leadboat.com> wrote: > > It's an exceptionally-challenging development project, agreed. So much code > > assumes the 1:1 relationship between backends and top-level transactions. > > I guess I'm being obtuse, but can you explain why that assumption must > be revisited? It's not imperative, but the proposal on the table does so. Robert described doing so as "a key design goal for this feature"[1]. I have a general picture of why the proposers chose that, but I will defer to them for any elaboration. [1] http://www.postgresql.org/message-id/CA+TgmoZE__KOHfUyVZpxvTJSB85xm34KC_t41-OXhTB_111SHQ@mail.gmail.com