Thread: autonomous transactions
I would like to propose the attached patch implementing autonomous transactions for discussion and review. This work was mostly inspired by the discussion about pg_background a while back [0]. It seemed that most people liked the idea of having something like that, but couldn't perhaps agree on the final interface. Most if not all of the preliminary patches in that thread were committed, but the user interface portions were then abandoned in favor of other work. (I'm aware that rebased versions of pg_background existing. I have one, too.) The main use case, in a nutshell, is to be able to commit certain things independently without having it affected by what happens later to the current transaction, for example for audit logging. My patch consists of three major pieces. (I didn't make them three separate patches because it will be clear where the boundaries are.) - A API interface to open a "connection" to a background worker, run queries, get results: AutonomousSessionStart(), AutonomousSessionEnd(), AutonomousSessionExecute(), etc. The communication happens using the client/server protocol. - Patches to PL/pgSQL to implement Oracle-style autonomous transaction blocks: AS $$ DECLARE PRAGMA AUTONOMOUS_TRANSACTION; BEGIN FOR i IN 0..9 LOOP START TRANSACTION; INSERT INTO test1 VALUES (i); IF i % 2 = 0 THEN COMMIT; ELSE ROLLBACK; END IF; END LOOP; RETURN 42; END; $$; This is very incomplete and has some open technical issues that I will discuss below. But those are all issues of PL/pgSQL, not really issues of how autonomous sessions work. Basically, a block that is declared with that pragma uses the autonomous C API instead of SPI to do its things. - Patches to PL/Python to implement a context manager for autonomous sessions (similar to how subtransactions work there): with plpy.autonomous() as a: for i in range(0, 10): a.execute("BEGIN") a.execute("INSERT INTO test1 (a) VALUES (%d)" % i) if i % 2 == 0: a.execute("COMMIT") else: a.execute("ROLLBACK") This works quite well, except perhaps some tuning with memory management and some caching and some refactoring. While the PL/pgSQL work is more of a top-level goal, I added the PL/Python implementation because it is easier to map the C API straight out to something more accessible, so testing it out is much easier. The main technical problem I had with PL/pgSQL is how to parse named parameters. If you're in PL/Python, say, you do plan = a.prepare("INSERT INTO test1 (a, b) VALUES ($1, $2)", ["int4", "text"]) and that works fine, because it maps straight to the client/server protocol. But in PL/pgSQL, you will want something like DECLARE x, y ... BEGIN INSERT INTO test1 (a, b) VALUES (x, y) When running in-process (SPI), we install parser hooks that allow the parser to check back into PL/pgSQL about whether x, y are variables and what they mean. When we run in an autonomous session, we don't have that available. So my idea was to extend the protocol Parse message to allow sending a symbol table instead of parameter types. So instead of saying, there are two parameters and here are their types, I would send a list of symbols and types, and the server would respond to the Parse message with some kind of information about which symbols it found. I think that would work, but I got lost in the weeds and didn't get very far. But you can see some of that in the code. If anyone has other ideas, I'd be very interested. Other than that, I think there are also other bits and pieces that are worth looking at, and perhaps have some overlap with other efforts, such as: - Refining the internal APIs for running queries, with more flexibility than SPI. There have recently been discussions about that. I just used whatever was in tcop/postgres.c directly, like pg_background does, and that seems mostly fine, but if there are other ideas, they would be useful for this, too. - An exception to the "mostly fine" is that the desirable handling of log_statement, log_duration, log_min_duration_statement for non-top-level execution is unclear. - The autonomous session API could also be useful for other things, such as perhaps implementing a variant of pg_background on top of them, or doing other asynchronous or background execution schemes. So input on that is welcome. - There is some overlap with the protocol handling for parallel query, including things like error propagation, notify handling, encoding handling. I suspect that other background workers will need similar facilities, so we could simplify some of that. - Client encoding in particular was recently discussed for parallel query. The problem with the existing solution is that it makes assign_client_encoding() require hardcoded knowledge of all relevant background worker types. So I tried a more general solution, with a hook. - I added new test files in the plpgsql directory. The main test for plpgsql runs as part of the main test suite. Maybe we want to move that to the plpgsql directory as well. - More guidance for using some of the background worker and shared memory queue facilities. For example, I don't know what a good queue size would be. - Both PL/pgSQL and PL/Python expose some details of SPI in ways that make it difficult to run some things not through SPI. For example, return codes are exposed directly by PL/Python. PL/pgSQL is heavily tied to the API flow of SPI. It's fixable, but it will be some work. I had originally wanted to hide the autonomous session API inside SPI or make it fully compatible with SPI, but that was quickly thrown out. PL/Python now contains some ugly code to make certain things match up so that existing code can be used. It's not always pretty. - The patch "Set log_line_prefix and application name in test drivers" (https://commitfest.postgresql.org/10/717/) is helpful in testing and debugging this. [0]: https://www.postgresql.org/message-id/flat/CA+Tgmoam66dTzCP8N2cRcS6S6dBMFX+JMba+mDf68H=KAkNjPQ@mail.gmail.com -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
I would love to see autonomous transactions in core. I just have one major concern, but thankfully it's easily addressed. There should be a way to within the session and/or txn permanently block autonomous transactions. This is important if you as a caller function want to be sure none of the work made by anything called down the stack gets committed. That is, if you as a caller decide to rollback, e.g. by raising an exception, and you want to be sure *everything* gets rollbacked, including all work by functions you've called. If the caller can't control this, then the author of the caller function would need to inspect the source code of all function being called, to be sure there are no code using autonomous transactions. Coding conventions, rules and discipline are all good and will help against misuse of the feature, but some day someone will make a mistake and wrongly use the autonomous transaction and cause unwanted unknown side-effect I as a caller function didn't expect or know about. Once you have blocked autonomous transactions in a session or txn, then any function called must not be able to unblock it (in the session or txn), otherwise it defeats the purpose.
On 30 August 2016 at 23:10, Joel Jacobson <joel@trustly.com> wrote: > > There should be a way to within the session and/or txn permanently > block autonomous transactions. > This will defeat one of the use cases of autonomous transactions: auditing > > Coding conventions, rules and discipline are all good and will help > against misuse of the feature, but some day someone will make a > mistake and wrongly use the autonomous transaction and cause unwanted > unknown side-effect I as a caller function didn't expect or know > about. > well, if the feature is not guilty why do you want to put it in jail? -- Jaime Casanova www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 30 August 2016 at 20:50, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > - Patches to PL/pgSQL to implement Oracle-style autonomous transaction > blocks: > > AS $$ > DECLARE > PRAGMA AUTONOMOUS_TRANSACTION; > BEGIN > FOR i IN 0..9 LOOP > START TRANSACTION; > INSERT INTO test1 VALUES (i); > IF i % 2 = 0 THEN > COMMIT; > ELSE > ROLLBACK; > END IF; > END LOOP; > > RETURN 42; > END; > $$; > this is the syntax it will use? i just compiled this in head and created a function based on this one. The main difference is that the column in test1 it's a pk so i used INSERT ON CONFLICT DO NOTHING and i'm getting this error postgres=# select foo(); LOG: namespace item variable itemno 1, name val CONTEXT: PL/pgSQL function foo() line 7 at SQL statement STATEMENT: select foo(); ERROR: null value in column "i" violates not-null constraint DETAIL: Failing row contains (null). STATEMENT: INSERT INTO test1 VALUES (val) ON CONFLICT DO NOTHING ERROR: null value in column "i" violates not-null constraint DETAIL: Failing row contains (null). CONTEXT: PL/pgSQL function foo() line 7 at SQL statement STATEMENT: select foo(); ERROR: null value in column "i" violates not-null constraint DETAIL: Failing row contains (null). CONTEXT: PL/pgSQL function foo() line 7 at SQL statement this happens even everytime i use the PRAGMA even if no START TRANSACTION, COMMIT or ROLLBACK are used -- Jaime Casanova www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Aug 31, 2016 at 6:41 AM, Jaime Casanova <jaime.casanova@2ndquadrant.com> wrote: > > On 30 August 2016 at 23:10, Joel Jacobson <joel@trustly.com> wrote: > > > > There should be a way to within the session and/or txn permanently > > block autonomous transactions. > > > > This will defeat one of the use cases of autonomous transactions: auditing My idea on how to deal with this would be to mark the function to be "AUTONOMOUS" similar to how a function is marked to be "PARALLEL SAFE", and to throw an error if a caller that has blocked autonomous transactions tries to call a function that is marked to be autonomous. That way none of the code that needs to be audited would ever get executed.
2016-08-31 15:09 GMT+02:00 Joel Jacobson <joel@trustly.com>:
On Wed, Aug 31, 2016 at 6:41 AM, Jaime Casanova
<jaime.casanova@2ndquadrant.com> wrote:
>
> On 30 August 2016 at 23:10, Joel Jacobson <joel@trustly.com> wrote:
> >
> > There should be a way to within the session and/or txn permanently
> > block autonomous transactions.
> >
>
> This will defeat one of the use cases of autonomous transactions: auditing
My idea on how to deal with this would be to mark the function to be
"AUTONOMOUS" similar to how a function is marked to be "PARALLEL
SAFE",
and to throw an error if a caller that has blocked autonomous
transactions tries to call a function that is marked to be autonomous.
That way none of the code that needs to be audited would ever get executed.
I like this idea - it allows better (cleaner) snapshot isolation.
Regards
Pavel
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 31, 2016 at 2:50 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > - A API interface to open a "connection" to a background worker, run > queries, get results: AutonomousSessionStart(), AutonomousSessionEnd(), > AutonomousSessionExecute(), etc. The communication happens using the > client/server protocol. > I'm surprised by the background worker. I had envisioned autonomous transactions being implemented by allowing a process to reserve a second entry in PGPROC with the same pid. Or perhaps save its existing information in a separate pgproc slot (or stack of slots) and restoring it after the autonomous transaction commits. Using a background worker mean that the autonomous transaction can't access any state from the process memory. Parameters in plpgsql are a symptom of this but I suspect there will be others. What happens if a statement timeout occurs during an autonomous transaction? What happens if you use a pl language in the autonomous transaction and if it tries to use non-transactional information such as prepared statements? -- greg
On 31 August 2016 at 21:46, Greg Stark <stark@mit.edu> wrote: > On Wed, Aug 31, 2016 at 2:50 AM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> - A API interface to open a "connection" to a background worker, run >> queries, get results: AutonomousSessionStart(), AutonomousSessionEnd(), >> AutonomousSessionExecute(), etc. The communication happens using the >> client/server protocol. Peter, you mention "Oracle-style autonomous transaction blocks". What are the semantics to be expected of those with regards to: - Accessing objects exclusively locked by the outer xact or where the requested lockmode conflicts with a lock held by the outer xact - Visibility of data written by the outer xact ? Also, is it intended (outside the plpgsql interface) that the autonomous xact can proceed concurrently/interleaved with a local backend xact? i.e. the local backend xact isn't suspended and you're allowed to do things on the local backend as well? If so, what handling do you have in mind for deadlocks between the local backend xact and the bgworker with the autonomous xact? I'd expect the local backend to always win, killing the autonomous xact every time. > I'm surprised by the background worker. I had envisioned autonomous > transactions being implemented by allowing a process to reserve a > second entry in PGPROC with the same pid. Or perhaps save its existing > information in a separate pgproc slot (or stack of slots) and > restoring it after the autonomous transaction commits. I suspect that there'll be way too much code that relies on stashing xact-scoped stuff in globals for that to be viable. Caches alone. Peter will be able to explain more, I'm sure. We'd probably need a new transaction data object that everything xact-scoped hangs off, so we can pass it everywhere or swap it out of some global. The mechanical refactoring alone would be pretty scary, not to mention the complexity of actually identifying all the less obvious places that need changing. Consider invalidation callbacks. They're always "fun", and so simple to get right.... -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 31/08/16 16:11, Craig Ringer wrote: > On 31 August 2016 at 21:46, Greg Stark <stark@mit.edu> wrote: >> On Wed, Aug 31, 2016 at 2:50 AM, Peter Eisentraut >> <peter.eisentraut@2ndquadrant.com> wrote: >>> - A API interface to open a "connection" to a background worker, run >>> queries, get results: AutonomousSessionStart(), AutonomousSessionEnd(), >>> AutonomousSessionExecute(), etc. The communication happens using the >>> client/server protocol. > > Peter, you mention "Oracle-style autonomous transaction blocks". > > What are the semantics to be expected of those with regards to: > > - Accessing objects exclusively locked by the outer xact or where the > requested lockmode conflicts with a lock held by the outer xact > > - Visibility of data written by the outer xact > That would be my question as well. > > Also, is it intended (outside the plpgsql interface) that the > autonomous xact can proceed concurrently/interleaved with a local > backend xact? i.e. the local backend xact isn't suspended and you're > allowed to do things on the local backend as well? If so, what > handling do you have in mind for deadlocks between the local backend > xact and the bgworker with the autonomous xact? I'd expect the local > backend to always win, killing the autonomous xact every time. > I would expect that in PLs it's handled by them, if you misuse this on C level that's your problem? >> I'm surprised by the background worker. I had envisioned autonomous >> transactions being implemented by allowing a process to reserve a >> second entry in PGPROC with the same pid. Or perhaps save its existing >> information in a separate pgproc slot (or stack of slots) and >> restoring it after the autonomous transaction commits. > > I suspect that there'll be way too much code that relies on stashing > xact-scoped stuff in globals for that to be viable. Caches alone. > Peter will be able to explain more, I'm sure. > I can also see some advantages in bgworker approach. For example it could be used for "fire and forget" type of interface in the future, where you return as soon as you send exec and don't care about waiting for result. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 31 August 2016 at 14:09, Joel Jacobson <joel@trustly.com> wrote: > On Wed, Aug 31, 2016 at 6:41 AM, Jaime Casanova > <jaime.casanova@2ndquadrant.com> wrote: >> >> On 30 August 2016 at 23:10, Joel Jacobson <joel@trustly.com> wrote: >> > >> > There should be a way to within the session and/or txn permanently >> > block autonomous transactions. >> > >> >> This will defeat one of the use cases of autonomous transactions: auditing > > My idea on how to deal with this would be to mark the function to be > "AUTONOMOUS" similar to how a function is marked to be "PARALLEL > SAFE", > and to throw an error if a caller that has blocked autonomous > transactions tries to call a function that is marked to be autonomous. > > That way none of the code that needs to be audited would ever get executed. Not sure I see why you would want to turn off execution for only some functions. What happens if your function calls some other function with side-effects? How would you roll that back? How would you mark functions for the general case? Functions with side effects can't be tested with simple unit tests; that has nothing to do with autonomous transactions. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Aug 31, 2016 at 3:11 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > > I suspect that there'll be way too much code that relies on stashing > xact-scoped stuff in globals for that to be viable. Caches alone. > Peter will be able to explain more, I'm sure. > > We'd probably need a new transaction data object that everything > xact-scoped hangs off, so we can pass it everywhere or swap it out of > some global. The mechanical refactoring alone would be pretty scary, > not to mention the complexity of actually identifying all the less > obvious places that need changing. Well this is the converse of the same problem. Today process state and transaction are tied together. One way or another you're trying to split that -- either by having two processes share state or by having one process manage two transactions. I suppose we already have the infrastructure for parallel query so there's at least some shared problem space there. -- greg
> On Aug 31, 2016, at 6:46 AM, Greg Stark <stark@mit.edu> wrote: > > Using a background worker mean that the autonomous transaction can't > access any state from the process memory. Parameters in plpgsql are a > symptom of this but I suspect there will be others. What happens if a > statement timeout occurs during an autonomous transaction? What > happens if you use a pl language in the autonomous transaction and if > it tries to use non-transactional information such as prepared > statements? > +1 on this. The proposed solution loosely matches what was done in DB2 9.7 and it runs into the same complexity. Passing local variable or session level variables back and forth became a source of grief. At SFDC PG we have taken a different tack: 1. Gather up all the transaction state that is scattered across global variables into a struct 2. backup/restore transaction state when an autonomous transaction is invoked. This allows full access to all non-transactional state. The downside is that full access also includes uncommitted DDL (shared recache). So we had to restrict DDL in the parent transaction prior to the spawning of the child. If there is interest in exploring this kind of solution as an alternative I can elaborate. Cheers Serge
On 08/31/2016 03:09 PM, Joel Jacobson wrote: > On Wed, Aug 31, 2016 at 6:41 AM, Jaime Casanova > <jaime.casanova@2ndquadrant.com> wrote: >> >> On 30 August 2016 at 23:10, Joel Jacobson <joel@trustly.com> wrote: >>> >>> There should be a way to within the session and/or txn permanently >>> block autonomous transactions. >>> >> >> This will defeat one of the use cases of autonomous transactions: auditing > > My idea on how to deal with this would be to mark the function to be > "AUTONOMOUS" similar to how a function is marked to be "PARALLEL > SAFE", > and to throw an error if a caller that has blocked autonomous > transactions tries to call a function that is marked to be autonomous. > > That way none of the code that needs to be audited would ever get executed. Part of what people want this for is to audit what people *try* to do. We can already audit what they've actually done. With your solution, we still wouldn't know when an unauthorized attempt to do something happened. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Wed, 31 Aug 2016 14:46:30 +0100 Greg Stark <stark@mit.edu> wrote: > On Wed, Aug 31, 2016 at 2:50 AM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > - A API interface to open a "connection" to a background worker, run > > queries, get results: AutonomousSessionStart(), > > AutonomousSessionEnd(), AutonomousSessionExecute(), etc. The > > communication happens using the client/server protocol. > > I'm surprised by the background worker. I had envisioned autonomous > transactions being implemented by allowing a process to reserve a > second entry in PGPROC with the same pid. Or perhaps save its existing > information in a separate pgproc slot (or stack of slots) and > restoring it after the autonomous transaction commits. > > Using a background worker mean that the autonomous transaction can't > access any state from the process memory. Parameters in plpgsql are a > symptom of this but I suspect there will be others. What happens if a > statement timeout occurs during an autonomous transaction? What > happens if you use a pl language in the autonomous transaction and if > it tries to use non-transactional information such as prepared > statements? I am trying to implement autonomous transactions that way. I have already implemented suspending and restoring the parent transaction state, at least some of it. The next thing on the plan is the procarray/snapshot stuff. I think we should reuse the same PGPROC for the autonomous transaction, and allocate a stack of PGXACTs for the case of nested autonomous transactions. Solving the more general problem, running multiple concurrent transactions with a single backend, may also be interesting for some users. Autonomous transactions would then be just a use case for that feature. Regards, Constantin Pan Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Thu, Sep 1, 2016 at 12:12 AM, Vik Fearing <vik@2ndquadrant.fr> wrote: > Part of what people want this for is to audit what people *try* to do. > We can already audit what they've actually done. > > With your solution, we still wouldn't know when an unauthorized attempt > to do something happened. The unauthorized attempt to execute the function will still be logged to the PostgreSQL log file since it would throw an error, just like trying to connect with e.g. an invalid username would be logged to the log files. I think that's enough for that use-case, since it's arguably not an application layer problem, since no part of the code was ever executed. But if someone tries to execute a function where one of the input params is a password and the function raises an exception if the password is incorrect and wants to log the unauthorized attempt, then that would be a good example of when you could use and would need to use autonomous transactions to log the invalid password attempt.
On Wed, Aug 31, 2016 at 6:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 31 August 2016 at 14:09, Joel Jacobson <joel@trustly.com> wrote: >> My idea on how to deal with this would be to mark the function to be >> "AUTONOMOUS" similar to how a function is marked to be "PARALLEL >> SAFE", >> and to throw an error if a caller that has blocked autonomous >> transactions tries to call a function that is marked to be autonomous. >> >> That way none of the code that needs to be audited would ever get executed. > > Not sure I see why you would want to turn off execution for only some functions. > > What happens if your function calls some other function with > side-effects? I'm not sure I understand your questions. All volatile functions modifying data have side-effects. What I meant was if they are allowed to commit it even if the caller doesn't want to. However, I'll try to clarify the two scenarios I envision: 1. If a function is declared AUTONOMOUS and it gets called, then that means nothing in the txn has blocked autonomous yet and the function and any other function will be able to do autonomous txns from that here on, so if some function would try to block autonomous that would throw an error. 2. If a function has blocked autonomous and something later on tries to call a function declared AUTONOMOUS then that would throw an error. Basically, we start with a NULL state where autonomous is neither blocked or explicitly allowed. Whatever happens first decides if autonomous transactions will explicitly be blocked or allowed during the txn. So we can go from NULL -> AUTONOMOUS ALLOWED or NULL -> AUTONOMOUS BLOCKED, but that's the only two state transitions possible. Once set, it cannot be changed. If nothing in an application cares about autonomous transactions, they don't have to do anything special, they don't need to modify any of their code. But if it for some reason is important to block autonomous transactions because the application is written in a way where it is expected a RAISE EXCEPTION always rollbacks everything, then the author of such an application (e.g. me) can just block autonomous transactions and continue to live happily ever after without having to dream nightmares about developers misusing the feature, and only use it when appropriate.
On 2016-08-31 06:10:31 +0200, Joel Jacobson wrote: > This is important if you as a caller function want to be sure none of > the work made by anything called down the stack gets committed. > That is, if you as a caller decide to rollback, e.g. by raising an > exception, and you want to be sure *everything* gets rollbacked, > including all work by functions you've called. > If the caller can't control this, then the author of the caller > function would need to inspect the source code of all function being > called, to be sure there are no code using autonomous transactions. I'm not convinced this makes much sense. All FDWs, dblink etc. already allow you do stuff outside of a transaction. Andres
On Thu, Sep 1, 2016 at 8:38 PM, Andres Freund <andres@anarazel.de> wrote: > > I'm not convinced this makes much sense. All FDWs, dblink etc. already > allow you do stuff outside of a transaction. You as a DBA can prevent FDWs from being used and dblink is an extension that you don't have to install. So if preventing side-effects is necessary in your application, that can be achieved by simply not installing dblink and preventing FDWs.
On Wed, Aug 31, 2016 at 7:20 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > I would like to propose the attached patch implementing autonomous > transactions for discussion and review. I'm pretty skeptical of this approach. Like Greg Stark, Serge Rielau, and Constantin Pan, I had expected that autonomous transactions should be implemented inside of a single backend, without relying on workers. That approach would be much less likely to run afoul of limits on the number of background workers, and it will probably perform considerably better too, especially when the autonomous transaction does only a small amount of work, like inserting a log message someplace. That is not to say that providing an interface to some pg_background-like functionality is a bad idea; there's been enough interest in that from various quarters to suggest that it's actually quite useful, and I don't even think that it's a bad plan to integrate that with the PLs in some way. However, I think that it's a different feature than autonomous transactions. As others have also noted, it can be used to fire-and-forget a command, or to run a command while foreground processing continues, both of which would be out of scope for an autonomous transaction facility per se. So my suggestion is that you pursue the work but change the name. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 9/2/16 3:45 AM, Robert Haas wrote: > So my suggestion is > that you pursue the work but change the name. That might make the plpgsql issues significantly easier to deal with as well, by making it very explicit that you're doing something with a completely separate connection. That would make requiring special handling for passing plpgsql variables to a query much less confusing. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
On 2 September 2016 at 09:45, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Aug 31, 2016 at 7:20 AM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> I would like to propose the attached patch implementing autonomous >> transactions for discussion and review. > > I'm pretty skeptical of this approach. Like Greg Stark, Serge Rielau, > and Constantin Pan, I had expected that autonomous transactions should > be implemented inside of a single backend, without relying on workers. The problem is that we have limits on the number of concurrent transactions, which is currently limited by the number of procs. We could expand that but given the current way we do snapshots there will be realistically low limits however we do this. So doing autonomous transactions inside a single backend doesn't gain you very much, yet it is an enormously invasive patch to do it that way, not least because it requires you to rewrite locking and deadlocks to make them work correctly when proc is not 1:1 with xid. And as Serge points out it introduces other restrictions that we know about now, perhaps more as well. Given that I've already looked into the "single backend" approach in detail, it looks to me that Peter's approach is viable and robust. > That approach would be much less likely to run afoul of limits on the > number of background workers That would also be an argument against using them for parallelism, yet we used background workers there. > , and it will probably perform > considerably better too, especially when the autonomous transaction > does only a small amount of work, like inserting a log message > someplace. That is not to say that providing an interface to some > pg_background-like functionality is a bad idea; there's been enough > interest in that from various quarters to suggest that it's actually > quite useful, and I don't even think that it's a bad plan to integrate > that with the PLs in some way. However, I think that it's a different > feature than autonomous transactions. As others have also noted, it > can be used to fire-and-forget a command, or to run a command while > foreground processing continues, both of which would be out of scope > for an autonomous transaction facility per se. So my suggestion is > that you pursue the work but change the name. We agree that we like the infrastructure Peter has provided is useful, so that part can go ahead. If that infrastructure can be used in a simple syntactic way to provide Autonomous Transactions, then I say let that happen. Peter, please comment on how much infrastructure is required to implement ATs this way? If somebody believes there is a better way for ATs, that is just an optimization of the limits and can occur later, if the people who believe there is a better way can prove that is the case and come up with a patch. It's OK for features to go in now and have better infrastructure later, e.g. LISTEN/NOTIFY. I would very much like to see ATs in v10 and this is a viable approach with working code. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Sep 3, 2016 at 12:09 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > So doing autonomous transactions inside a single backend doesn't gain > you very much, yet it is an enormously invasive patch to do it that > way, not least because it requires you to rewrite locking and > deadlocks to make them work correctly when proc is not 1:1 with xid. > And as Serge points out it introduces other restrictions that we know > about now, perhaps more as well. Well using a separate process also requires rewriting locking and deadlock detection since a reasonable user might expect that second process to have access to data locked in their current transaction.The plus side is that we're already facing that issue with parallel query so at least it's something that only has to be solved once instead of a new problem. Parallel query is currently restricted to read-only queries however. Autonomous transactions will certainly need to be read-write so the question then is what problems led to the restriction in parallel query and are they any more tractable with autonomous transactions? -- greg
> On Sep 3, 2016, at 5:25 AM, Greg Stark <stark@mit.edu> wrote: > > On Sat, Sep 3, 2016 at 12:09 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> So doing autonomous transactions inside a single backend doesn't gain >> you very much, yet it is an enormously invasive patch to do it that >> way, not least because it requires you to rewrite locking and >> deadlocks to make them work correctly when proc is not 1:1 with xid. >> And as Serge points out it introduces other restrictions that we know >> about now, perhaps more as well. > > Well using a separate process also requires rewriting locking and > deadlock detection since a reasonable user might expect that second > process to have access to data locked in their current transaction.The > plus side is that we're already facing that issue with parallel query > so at least it's something that only has to be solved once instead of > a new problem. I can’t speak for reasonable users, (or persons in fact ;-) But… previous implementations of ATs do fully expect them to deadlock on their parents and not see uncommitted changes. So if one’s goal is to merely match competitors’ behavior then that part is a non issue. I do not recall the single backend approach having been particularly invasive. We managed to do a the 9.4 -> 9.5 merge with little problem despite it. IMHO, solving the problem of passing variables to and from an AT is required for viability of the feature. How else would the AT know what it’s supposed to do? Starting an AT within a DDL transaction seems a much more narrow use case to me. Interestingly, despite being supported in PL/SQL on nested BEGIN END blocks, we nearly exclusively see AT’s covering the entire function or trigger. This usage property can be used to narrow the scope of variable passing to function parameters. Cheers Serge Rielau salesforce.com
<p dir="ltr"><p dir="ltr">On 3 Sep. 2016 20:27, "Greg Stark" <<a href="mailto:stark@mit.edu">stark@mit.edu</a>> wrote:<br/> ><p dir="ltr">> Well using a separate process also requires rewriting locking and<br /> > deadlock detectionsince a reasonable user might expect that second<br /> > process to have access to data locked in their currenttransaction.<p dir="ltr">The user is going to hit some confusing issues.<p dir="ltr">Xact1: alter table... Add column...<br/> Xact2: inserts data with new format<br /> Xact2: autonomous commits leaving xact1 running<br /> Xact2: rollback<pdir="ltr">There are similar issues with FKs and and sorts. It isn't limited to schema changes.<p dir="ltr">I don'tsee how autonomous xacts that inherit parent xact's uncommitted data, locks, etc can ever really be safe. Useful, butuse at your own (considerable) risk and unlikely to be worth the issues they introduce.<p dir="ltr">If they inherit locksbut don't see uncommitted data it'd be even worse.<p dir="ltr">See the autonomous commit subxact thread for more onthese issues.<p dir="ltr">><br /> > Parallel query is currently restricted to read-only queries however.<br /> >Autonomous transactions will certainly need to be read-write so the<br /> > question then is what problems led tothe restriction in parallel<br /> > query and are they any more tractable with autonomous transactions?<br /><p dir="ltr">Parallelquery is fairly different because it doesn't have it's own xact. This should mean no need to be able towait on a separate virtualxid or xid, no need to have their own TransactionID allocated, no complex considerations of visibilityand most importantly can't commit separately.
On Sat, Sep 3, 2016 at 7:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 2 September 2016 at 09:45, Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, Aug 31, 2016 at 7:20 AM, Peter Eisentraut >> <peter.eisentraut@2ndquadrant.com> wrote: >>> I would like to propose the attached patch implementing autonomous >>> transactions for discussion and review. >> >> I'm pretty skeptical of this approach. Like Greg Stark, Serge Rielau, >> and Constantin Pan, I had expected that autonomous transactions should >> be implemented inside of a single backend, without relying on workers. > > The problem is that we have limits on the number of concurrent > transactions, which is currently limited by the number of procs. True. I believe this issue has been discussed before, and I think it's a solvable issue. I believe that an autonomous transaction could be made to appear to the rest of the system (outside the affected backend) as if it were a subtransaction, but then get committed before the containing transaction gets committed. This would avoid needing more PGPROCs (but getting deadlock detection to work is hard). > So doing autonomous transactions inside a single backend doesn't gain > you very much, yet it is an enormously invasive patch to do it that > way, not least because it requires you to rewrite locking and > deadlocks to make them work correctly when proc is not 1:1 with xid. > And as Serge points out it introduces other restrictions that we know > about now, perhaps more as well. Yes, but I think that if we want to really meet the needs of Oracle users who are used to being able to slap an autonomous transaction pragma on any function that they like, we're going to need a solution that is far lighter-weight than starting up a new backend. Any approach based on background workers is going to make the cost of a function call something like 4ms, which I suspect is going to make it useless for a pretty significant percentage of the cases where users want to use autonomous transactions. For example, if you want to log every attempt to access an object, this is a phenomenally expensive way to get there. Calling a per-row trigger is already pretty expensive; calling a per-row trigger that has to *launch a process for every row* is going to be insanely bad. >> That approach would be much less likely to run afoul of limits on the >> number of background workers > > That would also be an argument against using them for parallelism, yet > we used background workers there. I guess that's sort of true, but parallel query intrinsically requires multiple processes or threads, whereas autonomous transactions only require that if you pick an implementation that requires that. Also, the parallel query facility is designed to only apply to operations that are already pretty expensive, namely long-running queries, but there are lots of use cases where an autonomous transaction gets spawned to do a very small amount of work, and not infrequently in a loop. So the startup cost is a significantly more serious problem for this use case. Of course, if we could decrease the startup cost of a bgworker, that'd be good for parallel query, too, because then we could deploy it on shorter queries. But the point remains that for parallel query the planner can always decide to use a non-parallel plan if the query is cheap enough that the startup cost of a worker will be a problem. That isn't the case here; if the worker startup cost is a problem, then you'll just end up with really slow SQL code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
<p dir="ltr">On 8 Sep. 2016 3:47 am, "Robert Haas" <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>wrote:<br /> ><p dir="ltr">> Of course, if we coulddecrease the startup cost of a bgworker<p dir="ltr">For this use in autonomous tx's we could probably pool workers.Or at least lazily terminate them so that the loop cases work better by re-using an existing bgworker.<p dir="ltr">I'mpretty sure we're going to need a bgworker pool sooner or later anyway.
<div class="WordSection1"><div style="border:none;border-left:solid blue 1.5pt;padding:0mm 0mm 0mm 4.0pt"><p><b><font face="Tahoma"size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif";font-weight:bold">From:</span></font></b><fontface="Tahoma" size="2"><spanlang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> pgsql-hackers-owner@postgresql.org[mailto:pgsql-hackers-owner@postgresql.org] <b> <span style="font-weight:bold">On BehalfOf </span></b>Craig Ringer<br /></span></font><span lang="EN-US">> Of course, if we could decrease the startup costof a bgworker</span><p><font face="MS Pゴシック" size="3"><span lang="EN-US" style="font-size:12.0pt">For this use in autonomoustx's we could probably pool workers. Or at least lazily terminate them so that the loop cases work better by re-usingan existing bgworker.</span></font><p><font color="#1f497d" face="MS Pゴシック" size="3"><span lang="EN-US" style="font-size:12.0pt;color:#1F497D"> </span></font><p><fontcolor="#1f497d" face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">ThoughI may say something odd, isn’t the bgworkerapproach going to increase context switches? I thought PostgreSQL has made efforts to decrease context switches,e.g. </span></font><p><font color="#1f497d" face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span></font><p style="margin-left:18.0pt"><fontcolor="#1f497d" face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">*Each backend itself writes WAL to disk unlike Oraclerequests LGWR process to write REDO to disk.</span></font><p><font color="#1f497d" face="Arial" size="2"><span lang="EN-US"style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span></font><p><font color="#1f497d"face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">*Releasing and re-acquiring a lwlock appears to tryto avoid context switches.</span></font><p style="margin-left:5.25pt"><font color="#1f497d" face="Arial" size="2"><spanlang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span></font><p style="margin-left:5.25pt"><fontcolor="#1f497d" face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> /*</span></font><p style="margin-left:5.25pt"><fontcolor="#1f497d" face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> * Loop here to try to acquire lock aftereach time we are signaled by</span></font><p style="margin-left:5.25pt"><font color="#1f497d" face="Arial" size="2"><spanlang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> * LWLockRelease.</span></font><pstyle="margin-left:5.25pt"><font color="#1f497d" face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> *</span></font><p style="margin-left:5.25pt"><fontcolor="#1f497d" face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> * NOTE: it might seem better to have LWLockReleaseactually grant us the</span></font><p style="margin-left:5.25pt"><font color="#1f497d" face="Arial" size="2"><spanlang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> * lock, ratherthan retrying and possibly having to go back to sleep. But</span></font><p style="margin-left:5.25pt"><font color="#1f497d"face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> * in practice that is no good because itmeans a process swap for every</span></font><p style="margin-left:5.25pt"><font color="#1f497d" face="Arial" size="2"><spanlang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> * lock acquisitionwhen two or more processes are contending for the same</span></font><p style="margin-left:5.25pt"><font color="#1f497d"face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> * lock. Since LWLocks are normally usedto protect not-very-long</span></font><p style="margin-left:5.25pt"><font color="#1f497d" face="Arial" size="2"><spanlang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> * sectionsof computation, a process needs to be able to acquire and</span></font><p style="margin-left:5.25pt"><font color="#1f497d"face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> * release the same lock many times duringa single CPU time slice, even</span></font><p style="margin-left:5.25pt"><font color="#1f497d" face="Arial" size="2"><spanlang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> * in the presenceof contention. The efficiency of being able to do that</span></font><p style="margin-left:5.25pt"><font color="#1f497d"face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> * outweighs the inefficiency of sometimeswasting a process dispatch</span></font><p style="margin-left:5.25pt"><font color="#1f497d" face="Arial" size="2"><spanlang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> * cycle becausethe lock is not free when a released waiter finally gets</span></font><p style="margin-left:5.25pt"><font color="#1f497d"face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> * to run. See pgsql-hackers archives for29-Dec-01.</span></font><p><font color="#1f497d" face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> */</span></font><p><font color="#1f497d"face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span></font><p><fontcolor="#1f497d" face="Arial"size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">I’m not surewhether to be nervous about the context switch cost in the use cases of autonomous transactions.</span></font><p><fontcolor="#1f497d" face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span></font><p><fontcolor="#1f497d" face="Arial"size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">Regards</span></font><p><fontcolor="#1f497d" face="Arial"size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">TakayukiTsunakawa</span></font><p><font color="#1f497d"face="Arial" size="2"><span lang="EN-US" style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> </span></font></div></div>
On 8 September 2016 at 08:18, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Craig Ringer >> Of course, if we could decrease the startup cost of a bgworker > >> For this use in autonomous tx's we could probably pool workers. Or at least >> lazily terminate them so that the loop cases work better by re-using an >> existing bgworker. > > I’m not sure whether to be nervous about the context switch cost in the use > cases of autonomous transactions. That's probably going to be one of the smaller costs. Doing this with bgworkers won't be cheap, but you need to consider the alternative. Factoring out all transaction-specific data currently stored in or pointed to by globals into a transaction state struct that can be swapped out. Making PgXact and PGPROC capable of representing multiple/suspended transactions. Lots more. Much of which would have a performance impact on all day to day operations whether or not autononomous xacts are actually in use. I haven't looked into it in detail. Peter can probably explain more and better. I'm just pointing out that I doubt there's any way to do this without a cost somewhere, and having that cost limited to actual uses of autonomous xacts would be nice. (BTW, could you please set your reply style so that your mail client quotes text and it's possible to see what you wrote vs what is quoted?) -- -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Remote DBA, Training &Services
Hi, On 2016-08-30 21:50:05 -0400, Peter Eisentraut wrote: > I would like to propose the attached patch implementing autonomous > transactions for discussion and review. > > This work was mostly inspired by the discussion about pg_background a > while back [0]. It seemed that most people liked the idea of having > something like that, but couldn't perhaps agree on the final interface. > Most if not all of the preliminary patches in that thread were > committed, but the user interface portions were then abandoned in favor > of other work. (I'm aware that rebased versions of pg_background > existing. I have one, too.) I kind of dislike this approach for a different reason than already mentioned in this thread: Namely it's not re-usable for implementing streaming logical replication of large transaction, i.e. allow to decode & apply un-committed changes. The issue there is that one really needs to have multiple transactions active in the same connection, which this approach does not allow. That's not necessarily a fatal objection, but I did want to throw that as a design goal out there. - Andres
<p dir="ltr"><br /> On 8 Sep. 2016 1:38 pm, "Andres Freund" <<a href="mailto:andres@anarazel.de">andres@anarazel.de</a>>wrote:<br /> ><br /> > I kind of dislike this approach fora different reason than already<br /> > mentioned in this thread: Namely it's not re-usable for implementing<br />> streaming logical replication of large transaction, i.e. allow to decode<br /> > & apply un-committed changes. The issue there is that one really needs<br /> > to have multiple transactions active in the same connection,which this<br /> > approach does not allow.<p dir="ltr">I've been thinking about this recently with an eyeto handling the majority of transactions on a typical system that neither perform DDL nor are concurrent with it.<p dir="ltr">Thefollowing might be fairly nonsensical if I've misunderstood the problem as I've not had a chance to more thanglance at it, but:<p dir="ltr">I wonder if some kind of speculative concurrent decoding+output is possible without beingable to have multiple xacts on a backend. We could use a shared xact and snapshot for all concurrent xacts. If any ofthem do anything that requires invalidations etc we dump the speculatively processed ones and fall back to reorder bufferingand serial output at commit time until we pass the xact that caused an invalidation and don't have more pending.Add a callback to the output plugin to tell it that speculative decoding of an xact has been aborted.<p dir="ltr">Ormaybe even just put that speculstive decoding on hold and pick up where we left off partway through the reorderbuffer when we get around to processing it serially. That way we don't have to discard work already done. More usefullywe could avoid having to enqueue stuff into the reorder buffer just in case we have to switch to fallback serialdecoding.<p dir="ltr">Failing that:<p dir="ltr">Since logical decoding only needs read only xacts that roll back, itonly needs multiple backend private virtualxacts. They don't need SERIALIZABLE/SSI which I think (?) means other backendsdon't need to be able to wait on them. That seems simpler than what autonomous xacts would need since there we needthem exposed in PgXact, waitable-on for txid locks, etc. Right? In the same way that historical snapshots are cut-downmaybe logical decoding xacts can be too.<p dir="ltr">I suspect that waiting for full in-process multiple xact supportto do interleaved decoding/replay will mean a long wait indeed.
On Wed, Sep 7, 2016 at 9:14 PM, Craig Ringer <craig.ringer@2ndquadrant.com> wrote: > That's probably going to be one of the smaller costs. Doing this with > bgworkers won't be cheap, but you need to consider the alternative. > Factoring out all transaction-specific data currently stored in or > pointed to by globals into a transaction state struct that can be > swapped out. Making PgXact and PGPROC capable of representing > multiple/suspended transactions. Lots more. Much of which would have a > performance impact on all day to day operations whether or not > autononomous xacts are actually in use. > > I haven't looked into it in detail. Peter can probably explain more > and better. I'm just pointing out that I doubt there's any way to do > this without a cost somewhere, and having that cost limited to actual > uses of autonomous xacts would be nice. I don't really believe this line of argument. I mean, sure, it's nice to limit the cost of features to the people who are using those features. Totally agreed. But if the cost really wouldn't be that high anyway, which I suspect is the case here, then that argument loses its force. And if that separation increases the performance cost of the feature by two or three orders of magnitude in realistic use cases, then you really have to wonder if you've picked the right approach. Again, I'm not saying that having ways to run commands in the background in a worker isn't useful. If I thought that wasn't useful, I would not have written pg_background and developed it as far as I did. But I still don't think that's the right way to implement an autonomous transaction facility. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 13/09/16 22:24, Robert Haas wrote: > On Wed, Sep 7, 2016 at 9:14 PM, Craig Ringer > <craig.ringer@2ndquadrant.com> wrote: >> That's probably going to be one of the smaller costs. Doing this with >> bgworkers won't be cheap, but you need to consider the alternative. >> Factoring out all transaction-specific data currently stored in or >> pointed to by globals into a transaction state struct that can be >> swapped out. Making PgXact and PGPROC capable of representing >> multiple/suspended transactions. Lots more. Much of which would have a >> performance impact on all day to day operations whether or not >> autononomous xacts are actually in use. >> >> I haven't looked into it in detail. Peter can probably explain more >> and better. I'm just pointing out that I doubt there's any way to do >> this without a cost somewhere, and having that cost limited to actual >> uses of autonomous xacts would be nice. > > I don't really believe this line of argument. I mean, sure, it's nice > to limit the cost of features to the people who are using those > features. Totally agreed. But if the cost really wouldn't be that > high anyway, which I suspect is the case here, then that argument > loses its force. And if that separation increases the performance > cost of the feature by two or three orders of magnitude in realistic > use cases, then you really have to wonder if you've picked the right > approach. > > Again, I'm not saying that having ways to run commands in the > background in a worker isn't useful. If I thought that wasn't useful, > I would not have written pg_background and developed it as far as I > did. But I still don't think that's the right way to implement an > autonomous transaction facility. > I mostly agree. I think if this was called something like background transactions it might be better. It's definitely useful functionality but the naming is clearly contentious. It won't stop people using it for same use-cases as autonomous transactions though (which is fine). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Sep 13, 2016 at 5:06 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: > I mostly agree. I think if this was called something like background > transactions it might be better. It's definitely useful functionality but > the naming is clearly contentious. It won't stop people using it for same > use-cases as autonomous transactions though (which is fine). Quite right. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8/31/16 12:38 AM, Jaime Casanova wrote: > On 30 August 2016 at 20:50, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> >> - Patches to PL/pgSQL to implement Oracle-style autonomous transaction >> blocks: >> >> AS $$ >> DECLARE >> PRAGMA AUTONOMOUS_TRANSACTION; >> BEGIN >> FOR i IN 0..9 LOOP >> START TRANSACTION; >> INSERT INTO test1 VALUES (i); >> IF i % 2 = 0 THEN >> COMMIT; >> ELSE >> ROLLBACK; >> END IF; >> END LOOP; >> >> RETURN 42; >> END; >> $$; >> > > this is the syntax it will use? That is the syntax that Oracle uses. We could make up our own. > i just compiled this in head and created a function based on this one. > The main difference is that the column in test1 it's a pk so i used > INSERT ON CONFLICT DO NOTHING > > and i'm getting this error > > postgres=# select foo(); > LOG: namespace item variable itemno 1, name val > CONTEXT: PL/pgSQL function foo() line 7 at SQL statement > STATEMENT: select foo(); > ERROR: null value in column "i" violates not-null constraint > DETAIL: Failing row contains (null). > STATEMENT: INSERT INTO test1 VALUES (val) ON CONFLICT DO NOTHING > ERROR: null value in column "i" violates not-null constraint > DETAIL: Failing row contains (null). > CONTEXT: PL/pgSQL function foo() line 7 at SQL statement > STATEMENT: select foo(); > ERROR: null value in column "i" violates not-null constraint > DETAIL: Failing row contains (null). > CONTEXT: PL/pgSQL function foo() line 7 at SQL statement > > this happens even everytime i use the PRAGMA even if no START > TRANSACTION, COMMIT or ROLLBACK are used The PL/pgSQL part doesn't work well yet. If you want to play around, use the PL/Python integration. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 8/31/16 8:46 AM, Greg Stark wrote: > I'm surprised by the background worker. I had envisioned autonomous > transactions being implemented by allowing a process to reserve a > second entry in PGPROC with the same pid. Or perhaps save its existing > information in a separate pgproc slot (or stack of slots) and > restoring it after the autonomous transaction commits. There is quite likely room for a feature like that too, but it's not this one. > Using a background worker mean that the autonomous transaction can't > access any state from the process memory. That is intentional. Autonomous transactions is actually a misnomer. It's autonomous sessions. But Oracle names it wrong. Autonomous sessions (or whatever you want to call them) have their own session state, configuration parameters, potentially different plugins loaded, different authorization state, and so on. > What happens if a > statement timeout occurs during an autonomous transaction? Right now not much. ;-) But I think the behavior should be that the autonomous session is aborted. > What > happens if you use a pl language in the autonomous transaction and if > it tries to use non-transactional information such as prepared > statements? The same thing that happens if you open a new unrelated database connection and try to do that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 8/31/16 9:11 AM, Craig Ringer wrote: > Peter, you mention "Oracle-style autonomous transaction blocks". > > What are the semantics to be expected of those with regards to: > > - Accessing objects exclusively locked by the outer xact or where the > requested lockmode conflicts with a lock held by the outer xact > > - Visibility of data written by the outer xact The semantics are that the autonomous session is completely isolated from the parent session. It has no special knowledge about transactions happening on the parent. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 9/3/16 9:08 AM, Serge Rielau wrote: > Interestingly, despite being supported in PL/SQL on nested BEGIN END blocks, > we nearly exclusively see AT’s covering the entire function or trigger. Oracle only supports an entire PL/SQL function to be declared autonomous. But it was pretty easy in my implementation to allow that for arbitrary blocks. In fact, it would have been harder not to do that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 7 September 2016 at 20:46, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Sep 3, 2016 at 7:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On 2 September 2016 at 09:45, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Wed, Aug 31, 2016 at 7:20 AM, Peter Eisentraut >>> <peter.eisentraut@2ndquadrant.com> wrote: >>>> I would like to propose the attached patch implementing autonomous >>>> transactions for discussion and review. >>> >>> I'm pretty skeptical of this approach. Like Greg Stark, Serge Rielau, >>> and Constantin Pan, I had expected that autonomous transactions should >>> be implemented inside of a single backend, without relying on workers. >> >> The problem is that we have limits on the number of concurrent >> transactions, which is currently limited by the number of procs. > > True. I believe this issue has been discussed before, and I think > it's a solvable issue. I believe that an autonomous transaction could > be made to appear to the rest of the system (outside the affected > backend) as if it were a subtransaction, but then get committed before > the containing transaction gets committed. This would avoid needing > more PGPROCs (but getting deadlock detection to work is hard). Just to point out that I've actually written this approach already. The patch is available, Autonomous Subtransactions. We discussed it in Ottawa and it was rejected. (I thought Robert was there, but Serge and Tom definitely were). See other posts in this thread by Serge and Craig to explain why. I take your point that startup time may be not as good as it can be. But what Peter has works and is useful, whatever we call it. We have various approaches... summarised in chronological order of their suggestion 1. Use additional PGXACTs - rejected because it wouldn't provide enough room 2. Use Autonomous SubTransactions - rejected because the semantics are different to what we might expect from ATs 3. Use background transactions (this thread) 4. Use pause and resume so we don't use up too many PGXACTs What I see is that there are some valid and useful features here and we should be developing them further because they have other benefits. * Autonomous Subtransactions might not work for Autonomous Transactions, but they are useful for incremental loading of large data, which then allows easier logical decoding. So that sounds like we should do that anyway, even if they are not ATs. They can also be used within parallel tasks. * Background Transactions are useful and we should have them. * The labelling "Autonomous Transaction" is a simple coat of paint, which can easily be transferred to a better implementation if one comes. If one doesn't then its better to have something than nothing. So I suggest we commit Background Transactions first and then in a fairly thin commit, implement Autonomous Transactions on top of it for now and if we get a better one, switch it over. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 06/10/16 11:56, Simon Riggs wrote: > > * The labelling "Autonomous Transaction" is a simple coat of paint, > which can easily be transferred to a better implementation if one > comes. If one doesn't then its better to have something than nothing. > So I suggest we commit Background Transactions first and then in a > fairly thin commit, implement Autonomous Transactions on top of it for > now and if we get a better one, switch it over. > +1 to this -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Oct 6, 2016 at 5:56 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Just to point out that I've actually written this approach already. > The patch is available, Autonomous Subtransactions. > We discussed it in Ottawa and it was rejected. (I thought Robert was > there, but Serge and Tom definitely were). Where is the patch? > See other posts in this thread by Serge and Craig to explain why. I don't think the posts on Craig and Serge explain why that approach was rejected or would be a bad idea. > We have various approaches... summarised in chronological order of > their suggestion > > 1. Use additional PGXACTs - rejected because it wouldn't provide enough room Of course, a background worker uses a PGXACT too and a lot more, so if you think extra PGXACTs are bad, you should *really* think background workers are bad. > 2. Use Autonomous SubTransactions - rejected because the semantics are > different to what we might expect from ATs In what way? I think the questions of how you implement it and what the semantics are are largely orthogonal questions. To which proposal is this referring? > 3. Use background transactions (this thread) Sure. > 4. Use pause and resume so we don't use up too many PGXACTs I don't know what "pause and resume" means. > * The labelling "Autonomous Transaction" is a simple coat of paint, > which can easily be transferred to a better implementation if one > comes. If one doesn't then its better to have something than nothing. > So I suggest we commit Background Transactions first and then in a > fairly thin commit, implement Autonomous Transactions on top of it for > now and if we get a better one, switch it over. I think we should implement background transactions and call them background transactions. That allows us to expose additional functionality which is useful, like the ability to kick something off and check back later for the results. There's no reason to call it background transactions and also call it autonomous transactions: one feature doesn't need two names. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 6 October 2016 at 21:27, Robert Haas <robertmhaas@gmail.com> wrote: >> * The labelling "Autonomous Transaction" is a simple coat of paint, >> which can easily be transferred to a better implementation if one >> comes. If one doesn't then its better to have something than nothing. >> So I suggest we commit Background Transactions first and then in a >> fairly thin commit, implement Autonomous Transactions on top of it for >> now and if we get a better one, switch it over. > > I think we should implement background transactions and call them > background transactions. That allows us to expose additional > functionality which is useful, like the ability to kick something off > and check back later for the results. There's no reason to call it > background transactions and also call it autonomous transactions: one > feature doesn't need two names. For myself, I don't care what you call it. I just want to be able to invoke it by saying PRAGMA AUTONOMOUS_TRANSACTION; and I know many others do also. If a better implementation emerges I would happily replace this one with another. I'm happy to also invoke it via an alternate mechanism or API, so that it can continue to be used even if the above mechanism changes. We have no need to wait for the perfect solution, even assuming we would ever agree that just one exists. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Oct 6, 2016 at 4:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > For myself, I don't care what you call it. > > I just want to be able to invoke it by saying PRAGMA AUTONOMOUS_TRANSACTION; > and I know many others do also. To me, those statements are rather contradictory, and I don't support the latter proposal. However, I don't have the only vote here, either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-10-06 10:56:34 +0100, Simon Riggs wrote: > On 7 September 2016 at 20:46, Robert Haas <robertmhaas@gmail.com> wrote: > > On Sat, Sep 3, 2016 at 7:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > True. I believe this issue has been discussed before, and I think > > it's a solvable issue. I believe that an autonomous transaction could > > be made to appear to the rest of the system (outside the affected > > backend) as if it were a subtransaction, but then get committed before > > the containing transaction gets committed. This would avoid needing > > more PGPROCs (but getting deadlock detection to work is hard). > > Just to point out that I've actually written this approach already. > The patch is available, Autonomous Subtransactions. > We discussed it in Ottawa and it was rejected. (I thought Robert was > there, but Serge and Tom definitely were). Hm, maybe I'm missing something, but don't pretty all of the objections to that approach also apply to this one, baring some additional changes required for lock conflict detection? > We have various approaches... summarised in chronological order of > their suggestion > > 1. Use additional PGXACTs - rejected because it wouldn't provide enough room Doesn't this approach precisely requires additional PGXACTs in the form of additional background workers? Plus a full process overhead? > 2. Use Autonomous SubTransactions - rejected because the semantics are > different to what we might expect from ATs Which semantics are fundamentally different? Snapshot visibility seems fairly trivial to adapt, and so seems locking. In even seems slightly easier to give a good error message about deadlocking against the main transaction in this approach. Greetings, Andres Freund
On Thu, Oct 6, 2016 at 3:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 6 October 2016 at 21:27, Robert Haas <robertmhaas@gmail.com> wrote: >> I think we should implement background transactions and call them >> background transactions. That allows us to expose additional >> functionality which is useful, like the ability to kick something off >> and check back later for the results. There's no reason to call it >> background transactions and also call it autonomous transactions: one >> feature doesn't need two names. > > I'm happy to also invoke it via an alternate mechanism or API, so that > it can continue to be used even if the above mechanism changes. > > We have no need to wait for the perfect solution, even assuming we > would ever agree that just one exists. -1 on implementing both autonomous and background transactions. This will confuse everyone. The lingo here is no so important, I think. What *is* important is defining how the locking and execution rules should work and the implementation should flow from that. Those rules should be estimated from competing implementations and how well they work. +1 for any solution that makes migration from other solutions to postgres easier. bgworkers should be considered if you want things to run in parallel. Reading the proposal (note, I may have missed it) it isn't clear to me if you can have the parent and AT run a query at the same time. Should this (parallel execution) be a design goal, then that's the end of the story. However I don't think it is TBH. ISTM the expectation is single threaded behavior with finer grained control of commits. If we're not 100% clear on this point one way or the other then things are a bit preemptive. Maybe we are clear and I missed something? One major advantage non-bgworker serilized execution approach is that certain classes of deadlock are easier to detect or do not exist since there is only one execution state; AIUI it's impossible for two transaction states to be simultaneously waiting assuming the pl/pgsql instuctions are not run in parallel with one exception, and that is the AT trying to acquire a lock exclusively held by the master. If the AT blocks on the parent it ought to be O(1) and instant to detect that and roll it back with right supporting infrastructure in the lock manager. It also makes sharing execution state much easier, especially the parts that look like, "I'm waiting here until the other guy finishes" since there's only one "guy". How will advisory locks work? I think they'd block with bgworkers and not block with non-bgworkers. What about other session based stuff like prepared statements? Expectations around those cases out to clarify the implementation. merlin
On 10/10/16 16:44, Merlin Moncure wrote: > On Thu, Oct 6, 2016 at 3:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On 6 October 2016 at 21:27, Robert Haas <robertmhaas@gmail.com> wrote: >>> I think we should implement background transactions and call them >>> background transactions. That allows us to expose additional >>> functionality which is useful, like the ability to kick something off >>> and check back later for the results. There's no reason to call it >>> background transactions and also call it autonomous transactions: one >>> feature doesn't need two names. >> >> I'm happy to also invoke it via an alternate mechanism or API, so that >> it can continue to be used even if the above mechanism changes. >> >> We have no need to wait for the perfect solution, even assuming we >> would ever agree that just one exists. > > -1 on implementing both autonomous and background transactions. This > will confuse everyone. > I personally care much more about having background transactions than autonomous ones (as I only ever had use-cases for the background ones) so don't agree there. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
2016-10-11 17:06 GMT+02:00 Petr Jelinek <petr@2ndquadrant.com>:
On 10/10/16 16:44, Merlin Moncure wrote:
> On Thu, Oct 6, 2016 at 3:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 6 October 2016 at 21:27, Robert Haas <robertmhaas@gmail.com> wrote:
>>> I think we should implement background transactions and call them
>>> background transactions. That allows us to expose additional
>>> functionality which is useful, like the ability to kick something off
>>> and check back later for the results. There's no reason to call it
>>> background transactions and also call it autonomous transactions: one
>>> feature doesn't need two names.
>>
>> I'm happy to also invoke it via an alternate mechanism or API, so that
>> it can continue to be used even if the above mechanism changes.
>>
>> We have no need to wait for the perfect solution, even assuming we
>> would ever agree that just one exists.
>
> -1 on implementing both autonomous and background transactions. This
> will confuse everyone.
>
I personally care much more about having background transactions than
autonomous ones (as I only ever had use-cases for the background ones)
so don't agree there.
we can, we should to have both - background can be used for paralelism, autonomous for logging.
they are not 100% replaceable.
Regards
Pavel
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 11, 2016 at 10:06 AM, Petr Jelinek <petr@2ndquadrant.com> wrote: > On 10/10/16 16:44, Merlin Moncure wrote: >> On Thu, Oct 6, 2016 at 3:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> On 6 October 2016 at 21:27, Robert Haas <robertmhaas@gmail.com> wrote: >>>> I think we should implement background transactions and call them >>>> background transactions. That allows us to expose additional >>>> functionality which is useful, like the ability to kick something off >>>> and check back later for the results. There's no reason to call it >>>> background transactions and also call it autonomous transactions: one >>>> feature doesn't need two names. >>> >>> I'm happy to also invoke it via an alternate mechanism or API, so that >>> it can continue to be used even if the above mechanism changes. >>> >>> We have no need to wait for the perfect solution, even assuming we >>> would ever agree that just one exists. >> >> -1 on implementing both autonomous and background transactions. This >> will confuse everyone. > > I personally care much more about having background transactions than > autonomous ones (as I only ever had use-cases for the background ones) > so don't agree there. All right. But would you agree then that AT should at least emulate competing implementations? A major advantage of bgworkers is possibly supporting concurrent activity and maybe the syntax could be more directed to possibly moving in that direction other than copying oracle style (PRAGMA etc), particularly if the locking rules are substantially different. merlin
2016-10-11 21:54 GMT+02:00 Merlin Moncure <mmoncure@gmail.com>:
On Tue, Oct 11, 2016 at 10:06 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
> On 10/10/16 16:44, Merlin Moncure wrote:
>> On Thu, Oct 6, 2016 at 3:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On 6 October 2016 at 21:27, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> I think we should implement background transactions and call them
>>>> background transactions. That allows us to expose additional
>>>> functionality which is useful, like the ability to kick something off
>>>> and check back later for the results. There's no reason to call it
>>>> background transactions and also call it autonomous transactions: one
>>>> feature doesn't need two names.
>>>
>>> I'm happy to also invoke it via an alternate mechanism or API, so that
>>> it can continue to be used even if the above mechanism changes.
>>>
>>> We have no need to wait for the perfect solution, even assuming we
>>> would ever agree that just one exists.
>>
>> -1 on implementing both autonomous and background transactions. This
>> will confuse everyone.
>
> I personally care much more about having background transactions than
> autonomous ones (as I only ever had use-cases for the background ones)
> so don't agree there.
All right. But would you agree then that AT should at least emulate
competing implementations? A major advantage of bgworkers is possibly
supporting concurrent activity and maybe the syntax could be more
directed to possibly moving in that direction other than copying
oracle style (PRAGMA etc), particularly if the locking rules are
substantially different.
There is a big trap - AT is usually used for writing to log tables. When BT fails on maximum active workers then, then you cannot do any expected operation.
Regards
Pavel
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/10/16 21:54, Merlin Moncure wrote: > On Tue, Oct 11, 2016 at 10:06 AM, Petr Jelinek <petr@2ndquadrant.com> wrote: >> On 10/10/16 16:44, Merlin Moncure wrote: >>> On Thu, Oct 6, 2016 at 3:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>>> On 6 October 2016 at 21:27, Robert Haas <robertmhaas@gmail.com> wrote: >>>>> I think we should implement background transactions and call them >>>>> background transactions. That allows us to expose additional >>>>> functionality which is useful, like the ability to kick something off >>>>> and check back later for the results. There's no reason to call it >>>>> background transactions and also call it autonomous transactions: one >>>>> feature doesn't need two names. >>>> >>>> I'm happy to also invoke it via an alternate mechanism or API, so that >>>> it can continue to be used even if the above mechanism changes. >>>> >>>> We have no need to wait for the perfect solution, even assuming we >>>> would ever agree that just one exists. >>> >>> -1 on implementing both autonomous and background transactions. This >>> will confuse everyone. >> >> I personally care much more about having background transactions than >> autonomous ones (as I only ever had use-cases for the background ones) >> so don't agree there. > > All right. But would you agree then that AT should at least emulate > competing implementations? A major advantage of bgworkers is possibly > supporting concurrent activity and maybe the syntax could be more > directed to possibly moving in that direction other than copying > oracle style (PRAGMA etc), particularly if the locking rules are > substantially different. > Yes, I am just saying we should have both. I don't feel like I can judge if background transactions solve autonomous transactions use-cases so I am not expressing opinion there. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Here is a patch for the background sessions C API and PL/Python support. This was previously submitted as "autonomous transactions", which proved controversial, and there have been several suggestions for a new name. I have renamed everything, removed all the incomplete PL/pgSQL stuff, did some refinement on the PL/Python interfaces, added resource owner management so that you can preserve session handles across transactions. That allows a pg_background-like behavior implemented in a PL function. I have also added documentation, so reviewers could start there. Probably not quite all done yet, but I think it contains a lot of useful pieces that we could make into something nice. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, Nov 1, 2016 at 2:25 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
Here is a patch for the background sessions C API and PL/Python support.
This was previously submitted as "autonomous transactions", which
proved controversial, and there have been several suggestions for a new
name.
I have renamed everything, removed all the incomplete PL/pgSQL stuff,
did some refinement on the PL/Python interfaces, added resource owner
management so that you can preserve session handles across transactions.
That allows a pg_background-like behavior implemented in a PL function.
I have also added documentation, so reviewers could start there.
Probably not quite all done yet, but I think it contains a lot of useful
pieces that we could make into something nice.
Moved to next CF with "needs review" status.
Regards,
Hari Babu
Fujitsu Australia
Hi! I'm planning to review this patch. I have some questions, maybe answers could help me understand patch better. 1. As far as I can see, we connot use COPY FROM STDIN in bg session? Since one of purposes is to orchestrate transactions, may be that would be valuable. 2. From my point of view the interface of the feature is the strong point here (compared to pg_background). But it does not help programmer to follow good practice: bgworker is a valuable and limited resource, may be we could start session with something like TryBeginSession()? May be this violates some policies or idioms which I'm not aware of. Thank you for your work. Best regards, Andrey Borodin.
On Sun, Dec 11, 2016 at 5:38 AM, Andrew Borodin <borodin@octonica.com> wrote: > 1. As far as I can see, we connot use COPY FROM STDIN in bg session? > Since one of purposes is to orchestrate transactions, may be that > would be valuable. A background worker has no client connection, so what would COPY FROM STDIN do? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12 Dec. 2016 21:55, "Robert Haas" <robertmhaas@gmail.com> wrote:
On Sun, Dec 11, 2016 at 5:38 AM, Andrew Borodin <borodin@octonica.com> wrote:A background worker has no client connection, so what would COPY FROM STDIN do?
> 1. As far as I can see, we connot use COPY FROM STDIN in bg session?
> Since one of purposes is to orchestrate transactions, may be that
> would be valuable.
It doesn't make sense. But a bgworker may well want to supply input to COPY. A COPY FROM CALLBACK of COPY FROM FILEDESC or whatever.
I have the feeling something like this came up on the logical replication thread. Logical rep needs to efficiently load data via bgworker.
On Mon, Dec 12, 2016 at 10:02 AM, Craig Ringer <craig.ringer@2ndquadrant.com> wrote: > On 12 Dec. 2016 21:55, "Robert Haas" <robertmhaas@gmail.com> wrote: > On Sun, Dec 11, 2016 at 5:38 AM, Andrew Borodin <borodin@octonica.com> > wrote: >> 1. As far as I can see, we connot use COPY FROM STDIN in bg session? >> Since one of purposes is to orchestrate transactions, may be that >> would be valuable. > > A background worker has no client connection, so what would COPY FROM STDIN > do? > > It doesn't make sense. But a bgworker may well want to supply input to COPY. > A COPY FROM CALLBACK of COPY FROM FILEDESC or whatever. That's kinda weird, though. I mean, you don't need to go through all of the COPY code just to call heap_multi_insert() or whatever, do you?You can hand-roll whatever you need there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12 December 2016 at 23:29, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Dec 12, 2016 at 10:02 AM, Craig Ringer > <craig.ringer@2ndquadrant.com> wrote: >> On 12 Dec. 2016 21:55, "Robert Haas" <robertmhaas@gmail.com> wrote: >> On Sun, Dec 11, 2016 at 5:38 AM, Andrew Borodin <borodin@octonica.com> >> wrote: >>> 1. As far as I can see, we connot use COPY FROM STDIN in bg session? >>> Since one of purposes is to orchestrate transactions, may be that >>> would be valuable. >> >> A background worker has no client connection, so what would COPY FROM STDIN >> do? >> >> It doesn't make sense. But a bgworker may well want to supply input to COPY. >> A COPY FROM CALLBACK of COPY FROM FILEDESC or whatever. > > That's kinda weird, though. I mean, you don't need to go through all > of the COPY code just to call heap_multi_insert() or whatever, do you? > You can hand-roll whatever you need there. And fire triggers and constraint checks if necessary, update indexes, etc. But yeah. The original idea with logical rep was to get COPY-format data from the upstream when initializing a table, and apply it via COPY in a bgworker. I think that's changed in favour of another approach in logical rep now, but thought it was worth mentioning as something it might make sense for someone to want to do. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Remote DBA, Training &Services
On 12/12/16 16:29, Robert Haas wrote: > On Mon, Dec 12, 2016 at 10:02 AM, Craig Ringer > <craig.ringer@2ndquadrant.com> wrote: >> On 12 Dec. 2016 21:55, "Robert Haas" <robertmhaas@gmail.com> wrote: >> On Sun, Dec 11, 2016 at 5:38 AM, Andrew Borodin <borodin@octonica.com> >> wrote: >>> 1. As far as I can see, we connot use COPY FROM STDIN in bg session? >>> Since one of purposes is to orchestrate transactions, may be that >>> would be valuable. >> >> A background worker has no client connection, so what would COPY FROM STDIN >> do? >> >> It doesn't make sense. But a bgworker may well want to supply input to COPY. >> A COPY FROM CALLBACK of COPY FROM FILEDESC or whatever. > > That's kinda weird, though. I mean, you don't need to go through all > of the COPY code just to call heap_multi_insert() or whatever, do you? > You can hand-roll whatever you need there. > You do if source of your data is already in COPY (or csv) format. I do have similar usecase in logical replication followup patch that I plan to submit to Jan CF, so maybe that will be interesting for this as well. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 12/11/16 5:38 AM, Andrew Borodin wrote: > 2. From my point of view the interface of the feature is the strong > point here (compared to pg_background). But it does not help > programmer to follow good practice: bgworker is a valuable and limited > resource, may be we could start session with something like > TryBeginSession()? What exactly would that do? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2016-12-14 20:45 GMT+05:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>: > On 12/11/16 5:38 AM, Andrew Borodin wrote: >> 2. From my point of view the interface of the feature is the strong >> point here (compared to pg_background). But it does not help >> programmer to follow good practice: bgworker is a valuable and limited >> resource, may be we could start session with something like >> TryBeginSession()? > > What exactly would that do? Return status (success\failure) and session object, if a function succeeded. If there is max_connections exceeded, then (false,null). I'm not sure whether this idiom is common for Python. Best regards, Andrey Borodin.
On 12/14/16 11:33 AM, Andrew Borodin wrote: > 2016-12-14 20:45 GMT+05:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>: >> On 12/11/16 5:38 AM, Andrew Borodin wrote: >>> 2. From my point of view the interface of the feature is the strong >>> point here (compared to pg_background). But it does not help >>> programmer to follow good practice: bgworker is a valuable and limited >>> resource, may be we could start session with something like >>> TryBeginSession()? >> >> What exactly would that do? > Return status (success\failure) and session object, if a function succeeded. > > If there is max_connections exceeded, then (false,null). > > I'm not sure whether this idiom is common for Python. You can catch PostgreSQL exceptions in PL/Python, so this can be handled in user code. Some better connection management or pooling can probably be built on top of the primitives later, I'd say. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2016-12-15 0:30 GMT+05:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>: >>>> TryBeginSession()? >>> >>> What exactly would that do? >> Return status (success\failure) and session object, if a function succeeded. >> >> If there is max_connections exceeded, then (false,null). >> >> I'm not sure whether this idiom is common for Python. > > You can catch PostgreSQL exceptions in PL/Python, so this can be handled > in user code. > > Some better connection management or pooling can probably be built on > top of the primitives later, I'd say. Agree, doing this in Python is the better option. And one more thing... Can we have BackgroundSessionExecute() splitted into two parts: start query and wait for results? It would allow pg_background to reuse bgsession's code. Best regards, Andrey Borodin.
On Thu, Dec 15, 2016 at 12:24 PM, Andrew Borodin <borodin@octonica.com> wrote: > 2016-12-15 0:30 GMT+05:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>: >>>>> TryBeginSession()? >>>> >>>> What exactly would that do? >>> Return status (success\failure) and session object, if a function succeeded. >>> >>> If there is max_connections exceeded, then (false,null). >>> >>> I'm not sure whether this idiom is common for Python. >> >> You can catch PostgreSQL exceptions in PL/Python, so this can be handled >> in user code. >> >> Some better connection management or pooling can probably be built on >> top of the primitives later, I'd say. > > Agree, doing this in Python is the better option. > > And one more thing... Can we have BackgroundSessionExecute() splitted > into two parts: start query and wait for results? > It would allow pg_background to reuse bgsession's code. > +1 Regards, Amul
On 12/15/16 1:54 AM, Andrew Borodin wrote: > And one more thing... Can we have BackgroundSessionExecute() splitted > into two parts: start query and wait for results? > It would allow pg_background to reuse bgsession's code. Yes, I will look into that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2016-12-16 20:17 GMT+05:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>: >> And one more thing... Can we have BackgroundSessionExecute() splitted >> into two parts: start query and wait for results? >> It would allow pg_background to reuse bgsession's code. > > Yes, I will look into that. Thank you. I'm marking both patches as "Waiting for author", keeping in mind that pg_background is waiting for bgsessions. After updates I'll review these patches. Best regards, Andrey Borodin.
On 12/16/16 10:38 AM, Andrew Borodin wrote: > 2016-12-16 20:17 GMT+05:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>: >>> And one more thing... Can we have BackgroundSessionExecute() splitted >>> into two parts: start query and wait for results? >>> It would allow pg_background to reuse bgsession's code. >> >> Yes, I will look into that. > > Thank you. I'm marking both patches as "Waiting for author", keeping > in mind that pg_background is waiting for bgsessions. > After updates I'll review these patches. New patch, mainly with the function split as you requested above, not much else changed. For additional entertainment, I include patches that integrate background sessions into dblink. So dblink can open a connection to a background session, and then you can use the existing dblink functions to send queries, read results, etc. People use dblink to make self-connections to get autonomous subsessions, so this would directly address that use case. The 0001 patch is some prerequisite refactoring to remove an ugly macro mess, which is useful independent of this. 0002 is the actual patch. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Fri, Dec 30, 2016 at 3:48 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 12/16/16 10:38 AM, Andrew Borodin wrote: >> 2016-12-16 20:17 GMT+05:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>: >>>> And one more thing... Can we have BackgroundSessionExecute() splitted >>>> into two parts: start query and wait for results? >>>> It would allow pg_background to reuse bgsession's code. >>> >>> Yes, I will look into that. >> >> Thank you. I'm marking both patches as "Waiting for author", keeping >> in mind that pg_background is waiting for bgsessions. >> After updates I'll review these patches. > > New patch, mainly with the function split as you requested above, not > much else changed. > Thanks for your v2 patch, this is really helpful. One more requirement for pg_background is session, command_qh, response_qh and worker_handle should be last longer than current memory context, for that we might need to allocate these in TopMemoryContext. Please find attach patch does the same change in BackgroundSessionStart(). Do let me know if you have any other thoughts/suggestions, thank you. Regards, Amul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 1/3/17 1:26 AM, amul sul wrote: > One more requirement for pg_background is session, command_qh, > response_qh and worker_handle should be last longer than current > memory context, for that we might need to allocate these in > TopMemoryContext. Please find attach patch does the same change in > BackgroundSessionStart(). I had pondered this issue extensively. The standard coding convention in postgres is that the caller sets the memory context. See the dblink and plpython patches that make this happen in their own way. I agree it would make sense that you either pass in a memory context or always use TopMemoryContext. I'm open to these ideas, but they did not seem to match any existing usage. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2017-01-03 19:39 GMT+05:00 Peter Eisentraut <peter.eisentraut@2ndquadrant. com>:
On 1/3/17 1:26 AM, amul sul wrote:
> One more requirement for pg_background is session, command_qh,
> response_qh and worker_handle should be last longer than current
> memory context, for that we might need to allocate these in
> TopMemoryContext. Please find attach patch does the same change in
> BackgroundSessionStart().
I had pondered this issue extensively. The standard coding convention
in postgres is that the caller sets the memory context. See the dblink
and plpython patches that make this happen in their own way.
I agree it would make sense that you either pass in a memory context or
always use TopMemoryContext. I'm open to these ideas, but they did not
seem to match any existing usage.
+1
Please excuse me if I'm not getting something obvious, but seems like BackgroundSessionNew() caller from pg_background_launch() can pick up TopMemoryCtx. I think that context setup should be done by extension, not by bg_session API.
Please excuse me if I'm not getting something obvious, but seems like BackgroundSessionNew() caller from pg_background_launch() can pick up TopMemoryCtx. I think that context setup should be done by extension, not by bg_session API.
Best regards, Andrey Borodin.
On Tue, Jan 3, 2017 at 11:36 PM, Andrew Borodin <borodin@octonica.com> wrote: > 2017-01-03 19:39 GMT+05:00 Peter Eisentraut > <peter.eisentraut@2ndquadrant.com>: >> >> On 1/3/17 1:26 AM, amul sul wrote: >> > One more requirement for pg_background is session, command_qh, >> > response_qh and worker_handle should be last longer than current >> > memory context, for that we might need to allocate these in >> > TopMemoryContext. Please find attach patch does the same change in >> > BackgroundSessionStart(). >> >> I had pondered this issue extensively. The standard coding convention >> in postgres is that the caller sets the memory context. See the dblink >> and plpython patches that make this happen in their own way. >> >> I agree it would make sense that you either pass in a memory context or >> always use TopMemoryContext. I'm open to these ideas, but they did not >> seem to match any existing usage. > > +1 > Please excuse me if I'm not getting something obvious, but seems like > BackgroundSessionNew() caller from pg_background_launch() can pick up > TopMemoryCtx. I think that context setup should be done by extension, not by > bg_session API. > Agree, will do this changes for pg_background. One more query, can we modify BackgroundSessionStart()/BackgroundSession struct to get background worker PID as well? I am asking because of BackgroundSessionNew() only returns session pointer, but pg_background_launch() requires this PID to pass to user, which will be further used as session identifier at fetching result and/or executing further queries as well as to send query cancelation signal to background worker. I can understand this requirement could be sound useless for now, because it only for the benefit of pg_background contrib module only. Thoughts? Thanks & Regards, Amul
2017-01-04 10:23 GMT+05:00 amul sul <sulamul@gmail.com>: > One more query, can we modify > BackgroundSessionStart()/BackgroundSession struct to get background > worker PID as well? I think since session always has a PID it's absoultley reasonable to return PID. > I can understand this requirement could be sound useless for now, > because it only for the benefit of pg_background contrib module only. As far as i can unserstand BackgroundSession is not just a feature itself, it's the API. So PID would benefit to pg_background and all API use cases we didn't implement yet. I do not think that one PID in structure will waste huge amount of memory, cycles, dev time, readbility of docs, clearness of API etc. AFAIK the only reason may be if the PID is not always there. Best regards, Andrey Borodin.
On Wed, Jan 4, 2017 at 2:57 PM, Andrew Borodin <borodin@octonica.com> wrote: > 2017-01-04 10:23 GMT+05:00 amul sul <sulamul@gmail.com>: >> One more query, can we modify >> BackgroundSessionStart()/BackgroundSession struct to get background >> worker PID as well? > I think since session always has a PID it's absoultley reasonable to return PID. > >> I can understand this requirement could be sound useless for now, >> because it only for the benefit of pg_background contrib module only. > As far as i can unserstand BackgroundSession is not just a feature > itself, it's the API. So PID would benefit to pg_background and all > API use cases we didn't implement yet. I do not think that one PID in > structure will waste huge amount of memory, cycles, dev time, > readbility of docs, clearness of API etc. AFAIK the only reason may be > if the PID is not always there. > +1, but to make BackgroundSession member accessible outside of bgsession.c, we might need to moved BackgroundSession definition to bgsession.h. Regards, Amul Sul
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, passed Here’s review of Background Sessions v2 patch. ===Purpose=== Patch provides API for controlling other backend. Also, patch contains Python API support for calling C API. ===Overall status=== Patch applies to current HEAD clearly. Contains new test and passes existing tests. Contains sufficient documentation. Contains 2 TODO items. Not sure it's OK to it leave so. Another patch from this commitfest (pg_background) is based on this patch. ===Suggestions=== I haven’t found a way to safely acquire status of session (without possibility of ereport(ERROR)). I do not see how to pass massive data, except by one long char* SQL. All the values have to be formatted as text. BackgroundSessionStart() result do not contain PID. This functionality is expected by pg_background (though, can be addedseparately by pg_background). I suppose, this is done to prevent API users from accessing internals of BackgroundSessionstructure. But some information have to be public, anyway. bgsession.c code contains very little comments. I do not think that switch inside a switch (see bgsession_worker_main()) is easy to follow. ===Conclusion=== There’s always something to improve, but I think that this patch is ready for committer. PS. I’ve read the db_link patches, but this review does not apply to them. I suppose db_link refactoring would be usefuland functionality is added, so I think these patches deserve separate commitfest entry. Best regards, Andrey Borodin. The new status of this patch is: Ready for Committer
On Thu, Dec 29, 2016 at 5:18 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > For additional entertainment, I include patches that integrate > background sessions into dblink. So dblink can open a connection to a > background session, and then you can use the existing dblink functions > to send queries, read results, etc. People use dblink to make > self-connections to get autonomous subsessions, so this would directly > address that use case. The 0001 patch is some prerequisite refactoring > to remove an ugly macro mess, which is useful independent of this. 0002 > is the actual patch. Would that constitute a complete replacement for pg_background? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/10/17 10:58 AM, Robert Haas wrote: > On Thu, Dec 29, 2016 at 5:18 PM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> For additional entertainment, I include patches that integrate >> background sessions into dblink. So dblink can open a connection to a >> background session, and then you can use the existing dblink functions >> to send queries, read results, etc. People use dblink to make >> self-connections to get autonomous subsessions, so this would directly >> address that use case. The 0001 patch is some prerequisite refactoring >> to remove an ugly macro mess, which is useful independent of this. 0002 >> is the actual patch. > > Would that constitute a complete replacement for pg_background? I think so. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2017-01-12 9:01 GMT+05:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>: > On 1/10/17 10:58 AM, Robert Haas wrote: >> On Thu, Dec 29, 2016 at 5:18 PM, Peter Eisentraut >> <peter.eisentraut@2ndquadrant.com> wrote: >>> For additional entertainment, I include patches that integrate >>> background sessions into dblink. So dblink can open a connection to a >>> background session, and then you can use the existing dblink functions >>> to send queries, read results, etc. People use dblink to make >>> self-connections to get autonomous subsessions, so this would directly >>> address that use case. The 0001 patch is some prerequisite refactoring >>> to remove an ugly macro mess, which is useful independent of this. 0002 >>> is the actual patch. >> >> Would that constitute a complete replacement for pg_background? > > I think so. That constitute replacement on the set of existing functionality. It's not certain whether new features for pg_background would be coherent with db_link ideology. E.g. if one day we implement data exchange between two running queries for pg_background, it would be in conflict with db_link ideology. I have not opinion on is db_link or pg_background apropriate place for this functionality. Just mentioning some thoughts. BTW can we have an automatic FWD for bgsessions? Sounds crazy for me, but technically make sense. Best regards, Andrey Borodin.
On Sat, Jan 7, 2017 at 5:30 PM, Andrey Borodin <amborodin@acm.org> wrote: > The new status of this patch is: Ready for Committer I don't think that this thread has reached a conclusion yet. From what I can see the last patch does not apply, so I have moved the patch to next CF with "waiting on author". -- Michael
> For additional entertainment, I include patches that integrate > background sessions into dblink. So dblink can open a connection to a > background session, and then you can use the existing dblink functions > to send queries, read results, etc. People use dblink to make > self-connections to get autonomous subsessions, so this would directly > address that use case. The 0001 patch is some prerequisite refactoring > to remove an ugly macro mess, which is useful independent of this. 0002 > is the actual patch. Updated patch, mainly with improved error handling and some tidying up. Related to this is also the patch in <https://www.postgresql.org/message-id/d100f62a-0606-accc-693b-cdc6d16b9296@2ndquadrant.com> as a resource control mechanism. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi
2017-03-01 3:35 GMT+01:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>:
> For additional entertainment, I include patches that integrate
> background sessions into dblink. So dblink can open a connection to a
> background session, and then you can use the existing dblink functions
> to send queries, read results, etc. People use dblink to make
> self-connections to get autonomous subsessions, so this would directly
> address that use case. The 0001 patch is some prerequisite refactoring
> to remove an ugly macro mess, which is useful independent of this. 0002
> is the actual patch.
Updated patch, mainly with improved error handling and some tidying up.
I am checking this patch. I have few questions ( I didn't find a reply in doc)
1. will be background session process closed automatically when parent process is closed?
2. what timeouts are valid for this process - statement timeout, idle in transaction timeout
I see significant risk on leaking sessions.
There can be more doc and examples in plpython doc. It will be main interface for this feature. Mainly about session processing.
Regards
Pavel
p.s. It is great functionality and patch looks very well.
Related to this is also the patch in
<https://www.postgresql.org/message-id/d100f62a-0606-accc- 693b-cdc6d16b9296@2ndquadrant. com>
as a resource control mechanism.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3/8/17 14:22, Pavel Stehule wrote: > 1. will be background session process closed automatically when parent > process is closed? If the communications queue goes away the process will eventually die. This is similar to how a backend process will eventually die if the client goes away. Some more testing would be good here. > 2. what timeouts are valid for this process - statement timeout, idle in > transaction timeout Those should work the same way. It's the same code that runs the queries, starts/stops transactions, etc. > I see significant risk on leaking sessions. Yeah, that's a valid concern. But I think it works ok. > There can be more doc and examples in plpython doc. It will be main > interface for this feature. Mainly about session processing. OK, I'll look into that again. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2017-03-09 14:52 GMT+01:00 Peter Eisentraut <peter.eisentraut@2ndquadrant. com>:
On 3/8/17 14:22, Pavel Stehule wrote:
> 1. will be background session process closed automatically when parent
> process is closed?
If the communications queue goes away the process will eventually die.
This is similar to how a backend process will eventually die if the
client goes away. Some more testing would be good here.
what means "eventually die"?
I called pg_sleep() in called subprocess.
Cancel, terminating parent process has not any effect. It is maybe artificial test.
Little bit more realistic - waiting on table lock in background worker was successful - and when parent was cancelled, then worker process was destroyed too.
But when parent was terminated, then background worker process continued.
What is worse - the background worker had 100% CPU and I had to restart notebook.
CREATE OR REPLACE FUNCTION public.foo()RETURNS voidLANGUAGE plpythonu AS $function$ with plpy.BackgroundSession() as a: a.execute('update foo2 set a = 30') a.execute('insert into foo2 values(10)') $function$ postgres=#
I blocked foo2 in another session.
Regards
Pavel
> 2. what timeouts are valid for this process - statement timeout, idle in
> transaction timeout
Those should work the same way. It's the same code that runs the
queries, starts/stops transactions, etc.
> I see significant risk on leaking sessions.
Yeah, that's a valid concern. But I think it works ok.
> There can be more doc and examples in plpython doc. It will be main
> interface for this feature. Mainly about session processing.
OK, I'll look into that again.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Mar 11, 2017 at 10:11 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2017-03-09 14:52 GMT+01:00 Peter Eisentraut > <peter.eisentraut@2ndquadrant.com>: >> >> On 3/8/17 14:22, Pavel Stehule wrote: >> > 1. will be background session process closed automatically when parent >> > process is closed? >> >> If the communications queue goes away the process will eventually die. >> This is similar to how a backend process will eventually die if the >> client goes away. Some more testing would be good here. > > > what means "eventually die"? > > I called pg_sleep() in called subprocess. > > Cancel, terminating parent process has not any effect. It is maybe > artificial test. > > Little bit more realistic - waiting on table lock in background worker was > successful - and when parent was cancelled, then worker process was > destroyed too. > > But when parent was terminated, then background worker process continued. > > What is worse - the background worker had 100% CPU and I had to restart > notebook. > > CREATE OR REPLACE FUNCTION public.foo() > RETURNS void > LANGUAGE plpythonu > AS $function$ > with plpy.BackgroundSession() as a: > a.execute('update foo2 set a = 30') > a.execute('insert into foo2 values(10)') > $function$ > postgres=# > > > I blocked foo2 in another session. I'm not sure what's going on with this patch set, but in general a background process can't just go away when the foreground process goes away. We could arrange to kill it, a la pg_terminate_backend(), or we can let it keep running, and either of those things might be what somebody wants, depending on the situation. But it can't just vanish into thin air. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2017-03-13 21:22 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
I'm not sure what's going on with this patch set, but in general aOn Sat, Mar 11, 2017 at 10:11 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2017-03-09 14:52 GMT+01:00 Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com>:
>>
>> On 3/8/17 14:22, Pavel Stehule wrote:
>> > 1. will be background session process closed automatically when parent
>> > process is closed?
>>
>> If the communications queue goes away the process will eventually die.
>> This is similar to how a backend process will eventually die if the
>> client goes away. Some more testing would be good here.
>
>
> what means "eventually die"?
>
> I called pg_sleep() in called subprocess.
>
> Cancel, terminating parent process has not any effect. It is maybe
> artificial test.
>
> Little bit more realistic - waiting on table lock in background worker was
> successful - and when parent was cancelled, then worker process was
> destroyed too.
>
> But when parent was terminated, then background worker process continued.
>
> What is worse - the background worker had 100% CPU and I had to restart
> notebook.
>
> CREATE OR REPLACE FUNCTION public.foo()
> RETURNS void
> LANGUAGE plpythonu
> AS $function$
> with plpy.BackgroundSession() as a:
> a.execute('update foo2 set a = 30')
> a.execute('insert into foo2 values(10)')
> $function$
> postgres=#
>
>
> I blocked foo2 in another session.
background process can't just go away when the foreground process goes
away. We could arrange to kill it, a la pg_terminate_backend(), or we
can let it keep running, and either of those things might be what
somebody wants, depending on the situation. But it can't just vanish
into thin air.
I understand, so there are not one solution only - and process maintenance is hard.
Very often strategy can be recheck of parent process in some waiting cycles. It should not to impact performance.
I afraid so some waiting times in bg process can be high probable with this patch - and then is probable so somebody use pg_terminate_backend. This situation should not to finish by server restart.
Regards
Pavel
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Mar 14, 2017 at 3:31 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Very often strategy can be recheck of parent process in some waiting > cycles. It should not to impact performance. I think that's going to be hard to arrange, and I think it isn't necessary. If the leader wants to arrange for the worker to die when it exits, it can use TerminateBackgroundWorker() from a PG_ENSURE_ERROR_CLEANUP block or on_shmem_exit hook. > I afraid so some waiting times in bg process can be high probable with this > patch - and then is probable so somebody use pg_terminate_backend. This > situation should not to finish by server restart. I don't understand. The only way you'd need a server restart is if a background process wasn't responding to SIGTERM, and that's a bug independent of anything this patch does. It would be cause by the background process not doing CHECK_FOR_INTERRUPTS() or the moral equivalent regularly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2017-03-14 19:08 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
On Tue, Mar 14, 2017 at 3:31 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Very often strategy can be recheck of parent process in some waiting
> cycles. It should not to impact performance.
I think that's going to be hard to arrange, and I think it isn't
necessary. If the leader wants to arrange for the worker to die when
it exits, it can use TerminateBackgroundWorker() from a
PG_ENSURE_ERROR_CLEANUP block or on_shmem_exit hook.
> I afraid so some waiting times in bg process can be high probable with this
> patch - and then is probable so somebody use pg_terminate_backend. This
> situation should not to finish by server restart.
I don't understand. The only way you'd need a server restart is if a
background process wasn't responding to SIGTERM, and that's a bug
independent of anything this patch does. It would be cause by the
background process not doing CHECK_FOR_INTERRUPTS() or the moral
equivalent regularly.
It is bug, and I don't know if it s this extension bug or general bug.
There is not adequate cleaning after killing.
How can be implemented pg_cancel_backend on background process if there are not CHECK_FOR_INTERRUPTS?
Regards
Pavel
On Tue, Mar 14, 2017 at 4:54 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> I don't understand. The only way you'd need a server restart is if a >> background process wasn't responding to SIGTERM, and that's a bug >> independent of anything this patch does. It would be cause by the >> background process not doing CHECK_FOR_INTERRUPTS() or the moral >> equivalent regularly. > > It is bug, and I don't know if it s this extension bug or general bug. > > There is not adequate cleaning after killing. > > How can be implemented pg_cancel_backend on background process if there are > not CHECK_FOR_INTERRUPTS? You can't. But what does that have to do with this patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2017-03-15 0:44 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
On Tue, Mar 14, 2017 at 4:54 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> I don't understand. The only way you'd need a server restart is if a
>> background process wasn't responding to SIGTERM, and that's a bug
>> independent of anything this patch does. It would be cause by the
>> background process not doing CHECK_FOR_INTERRUPTS() or the moral
>> equivalent regularly.
>
> It is bug, and I don't know if it s this extension bug or general bug.
>
> There is not adequate cleaning after killing.
>
> How can be implemented pg_cancel_backend on background process if there are
> not CHECK_FOR_INTERRUPTS?
You can't. But what does that have to do with this patch?
I don't understand - CHECK_FOR_INTERRUPTS called from executor implicitly.
Pavel
On Wed, Mar 15, 2017 at 6:43 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I don't understand - CHECK_FOR_INTERRUPTS called from executor implicitly. True. So there shouldn't be any problem here. I'm confused as can be about what you want changed. Some review of the patch itself: + pq_redirect_to_shm_mq(session->seg, session->command_qh); + pq_beginmessage(&msg, 'X'); + pq_endmessage(&msg); + pq_stop_redirect_to_shm_mq(); shm_redirect_to_shm_mq() wasn't really designed to be used this way; it's designed for use by the worker, not the process that launched it. If an error occurs while output is redirected, bad things will happen. I think it would be better to find a way of sending that message to the queue without doing this. Also, I suspect this is deadlock-prone. If we get stuck trying to send a message to the background session while the queue is full, and at the same time the session is stuck trying to send us a long error message, we will have an undetected deadlock. That's why pg_background() puts the string being passed to the worker into the DSM segment in its entirety, rather than sending it through a shm_mq. + elog(ERROR, "no T before D"); That's not much of an error message, even for an elog. + elog(ERROR, "already received a T message"); Nor that. None of these functions have function header comments. Or much in the way of internal comments. For example: + case '1': + break; + case 'E': + rethrow_errornotice(&msg); + break; That's really not clear without more commentary. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 15/03/17 17:58, Robert Haas wrote: > On Wed, Mar 15, 2017 at 6:43 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> I don't understand - CHECK_FOR_INTERRUPTS called from executor implicitly. > > True. So there shouldn't be any problem here. I'm confused as can be > about what you want changed. > > Some review of the patch itself: > > + pq_redirect_to_shm_mq(session->seg, session->command_qh); > + pq_beginmessage(&msg, 'X'); > + pq_endmessage(&msg); > + pq_stop_redirect_to_shm_mq(); > > shm_redirect_to_shm_mq() wasn't really designed to be used this way; > it's designed for use by the worker, not the process that launched it. > If an error occurs while output is redirected, bad things will happen. > I think it would be better to find a way of sending that message to > the queue without doing this. Couldn't we just create special version of pq_endmessage that sends to shm_mq? > > Also, I suspect this is deadlock-prone. If we get stuck trying to > send a message to the background session while the queue is full, and > at the same time the session is stuck trying to send us a long error > message, we will have an undetected deadlock. That's why > pg_background() puts the string being passed to the worker into the > DSM segment in its entirety, rather than sending it through a shm_mq. > Yeah I think this will need to use the nowait = true when sending to shm_mq and chunk the message if necessary... -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Mar 18, 2017 at 10:59 AM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: >> shm_redirect_to_shm_mq() wasn't really designed to be used this way; >> it's designed for use by the worker, not the process that launched it. >> If an error occurs while output is redirected, bad things will happen. >> I think it would be better to find a way of sending that message to >> the queue without doing this. > > Couldn't we just create special version of pq_endmessage that sends to > shm_mq? Yes, I think that sounds good. >> Also, I suspect this is deadlock-prone. If we get stuck trying to >> send a message to the background session while the queue is full, and >> at the same time the session is stuck trying to send us a long error >> message, we will have an undetected deadlock. That's why >> pg_background() puts the string being passed to the worker into the >> DSM segment in its entirety, rather than sending it through a shm_mq. > > Yeah I think this will need to use the nowait = true when sending to > shm_mq and chunk the message if necessary... Hmm, yeah. If you take breaks while sending to check for data that you need to receive, then you should be fine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Mar 18, 2017 at 1:10 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Mar 18, 2017 at 10:59 AM, Petr Jelinek > <petr.jelinek@2ndquadrant.com> wrote: >>> shm_redirect_to_shm_mq() wasn't really designed to be used this way; >>> it's designed for use by the worker, not the process that launched it. >>> If an error occurs while output is redirected, bad things will happen. >>> I think it would be better to find a way of sending that message to >>> the queue without doing this. >> >> Couldn't we just create special version of pq_endmessage that sends to >> shm_mq? > > Yes, I think that sounds good. > >>> Also, I suspect this is deadlock-prone. If we get stuck trying to >>> send a message to the background session while the queue is full, and >>> at the same time the session is stuck trying to send us a long error >>> message, we will have an undetected deadlock. That's why >>> pg_background() puts the string being passed to the worker into the >>> DSM segment in its entirety, rather than sending it through a shm_mq. >> >> Yeah I think this will need to use the nowait = true when sending to >> shm_mq and chunk the message if necessary... > > Hmm, yeah. If you take breaks while sending to check for data that > you need to receive, then you should be fine. So is this patch going anywhere? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 4/4/17 10:28, Robert Haas wrote: > So is this patch going anywhere? Not right now. It will take some time to sort out your feedback and do some refactoring. I will close the patch for now. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services