Thread: [RFC] Transaction management overhaul is necessary?

[RFC] Transaction management overhaul is necessary?

From
"Tsunakawa, Takayuki"
Date:
Hello,

From our experience in handling customers' problems, I feel it's necessary to evolve PostgreSQL's transaction
management. The concrete problems are:
 

1. PostgreSQL cannot end and begin transactions in PL/pgSQL and PL/Java stored functions.
This is often the reason people could not migrate to PostgreSQL.


2. PostgreSQL does not support statement-level rollback.
When some customer ran a batch app using psqlODBC, one postgres process used dozens of GBs of memory and crashed the
OS. The batch app prepares some SQL statements with parameters, execute it five millions of times with different
parametervalues in a single transaction.  They didn't experience a problem with Oracle.
 

This was because psqlODBC starts and ends a subtransaction for each SQL statement by default to implement
statement-levelrollback.  And PostgreSQL creates one CurTransactionContext memory context, which is 8KB, for each
subtransactionand retain them until the top transaction ends.  The total memory used becomes 40GB (8KB * 5 million
subtransactions.) This was avoided by setting the Protocol parameter to 7.4-1, which means transaction-level rollback.
 

The savepoint approach for supporting statement-level rollback is inefficient, because it adds two roundtrips
(SAVEPOINTand RELEASE) for each statement.
 



I know autonomous transaction is also discussed, which seems to be difficult, so I hope some kind of transaction
managementoverhaul can be discussed to cover all these transaction-related features.  How should I start?  I found the
followingitem in the TODO list (but I haven't read it yet.)  What other discussions should I look at?
 

--------------------------------------------------
Implement stored procedures 
This might involve the control of transaction state and the return of multiple result sets 
PL/pgSQL stored procedure returning multiple result sets (SELECTs)? 
Proposal: real procedures again (8.4) 
http://archives.postgresql.org/pgsql-hackers/2010-09/msg00542.php 
Gathering specs and discussion on feature (post 9.1) 
--------------------------------------------------


Regards
Takayuki Tsunakawa




Re: [RFC] Transaction management overhaul is necessary?

From
Pavel Stehule
Date:


2016-10-21 10:24 GMT+02:00 Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com>:
Hello,

>From our experience in handling customers' problems, I feel it's necessary to evolve PostgreSQL's transaction management.  The concrete problems are:

1. PostgreSQL cannot end and begin transactions in PL/pgSQL and PL/Java stored functions.
This is often the reason people could not migrate to PostgreSQL. 


2. PostgreSQL does not support statement-level rollback.
When some customer ran a batch app using psqlODBC, one postgres process used dozens of GBs of memory and crashed the OS.  The batch app prepares some SQL statements with parameters, execute it five millions of times with different parameter values in a single transaction.  They didn't experience a problem with Oracle.

This was because psqlODBC starts and ends a subtransaction for each SQL statement by default to implement statement-level rollback.  And PostgreSQL creates one CurTransactionContext memory context, which is 8KB, for each subtransaction and retain them until the top transaction ends.  The total memory used becomes 40GB (8KB * 5 million subtransactions.)  This was avoided by setting the Protocol parameter to 7.4-1, which means transaction-level rollback.

The savepoint approach for supporting statement-level rollback is inefficient, because it adds two roundtrips (SAVEPOINT and RELEASE) for each statement.



I know autonomous transaction is also discussed, which seems to be difficult, so I hope some kind of transaction management overhaul can be discussed to cover all these transaction-related features.  How should I start?  I found the following item in the TODO list (but I haven't read it yet.)  What other discussions should I look at?

You should to implement a CALL statement - that can be independent on outer transaction. The behave inside procedure called by CALL statement should be same like client side - and there you can controll transactions explicitly without nesting.

Regards

Pavel
 

--------------------------------------------------
Implement stored procedures
This might involve the control of transaction state and the return of multiple result sets
PL/pgSQL stored procedure returning multiple result sets (SELECTs)?
Proposal: real procedures again (8.4)
http://archives.postgresql.org/pgsql-hackers/2010-09/msg00542.php
Gathering specs and discussion on feature (post 9.1)
--------------------------------------------------


Regards
Takayuki Tsunakawa



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [RFC] Transaction management overhaul is necessary?

From
Craig Ringer
Date:
On 21 October 2016 at 18:57, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2016-10-21 10:24 GMT+02:00 Tsunakawa, Takayuki
> <tsunakawa.takay@jp.fujitsu.com>:
>>
>> Hello,
>>
>> From our experience in handling customers' problems, I feel it's necessary
>> to evolve PostgreSQL's transaction management.  The concrete problems are:
>>
>> 1. PostgreSQL cannot end and begin transactions in PL/pgSQL and PL/Java
>> stored functions.
>> This is often the reason people could not migrate to PostgreSQL.

I've run into relatively few for whom this has landed up being a
showstopper, but I agree it's a pain.

There'll probably be more as bigger outfits seek to move from That
Other Database where it's routine to do this.

>> This was because psqlODBC starts and ends a subtransaction for each SQL
>> statement by default to implement statement-level rollback.  And PostgreSQL
>> creates one CurTransactionContext memory context, which is 8KB, for each
>> subtransaction and retain them until the top transaction ends.

Surely that's where to start then. Find a way to pool and re-use,
fully release, or otherwise be done with transaction contexts for
released savepoints.

>>  The total
>> memory used becomes 40GB (8KB * 5 million subtransactions.)  This was
>> avoided by setting the Protocol parameter to 7.4-1, which means
>> transaction-level rollback.

You can control transaction level rollback in psqlODBC directly. You
do not need to fall back to the old protocol. Check the driver
options.

>> The savepoint approach for supporting statement-level rollback is
>> inefficient, because it adds two roundtrips (SAVEPOINT and RELEASE) for each
>> statement.

Right. We can't just fire off each statement wrapped in SAVEPOINT and
RELEASE SAVEPOINT because we need to get the result of the statement
and decide whether to ROLLBACK TO SAVEPOINT or RELEASE SAVEPOINT. It
only requires two round trips if you shove the SAVEPOINT in with the
intended statement, but it's still messy.

I'd like to see an alternative statement with semantics more akin to
COMMIT - which automatically into ROLLBACK if the tx is aborted.
COMMIT SAVEPOINT would be too confusing since it's not truly
committed. I don't know what to call it. But basically something that
does RELEASE SAVEPOINT [named savepoint] unless the subxact is in
aborted state, in which case it does ROLLBACK TO [named savepoint].
Bonus points for letting it remember the last savepoint created and
use that.

Furthermore, we should really add it on the protocol level so drivers
can send subtransaction control messages more compactly, without
needing to go through the parser etc, and without massively spamming
the logs. For this purpose savepoint names would be internally
generated so the driver wouldn't have to send them. We'd log savepoint
boundaries when transaction logging was enabled. Since the client
would send the first such protocol request we could do it on the sly
without a protocol version bump; clients could just check server
version and not use the new messages for older servers. If they send
it to an older server they get a protocol error, which is fine.

> You should to implement a CALL statement - that can be independent on outer
> transaction. The behave inside procedure called by CALL statement should be
> same like client side - and there you can controll transactions explicitly
> without nesting.

I agree that'd be desirable. Top level "procedures" are necessary for
this, really.

This would also enable us to return multiple result sets.

We'd probably have to start at least one small read-only tx for the
initial cache access to look up the proc and set everything up, but if
we don't allocate xids local transactions are super cheap.

However, I think trying to tackle the memory context bloat reported
upthread would be a more effective starting point since it immediately
targets the problem actually experienced.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [RFC] Transaction management overhaul is necessary?

From
"Tsunakawa, Takayuki"
Date:
From: Craig Ringer [mailto:craig@2ndquadrant.com]
> >> This was because psqlODBC starts and ends a subtransaction for each
> >> SQL statement by default to implement statement-level rollback.  And
> >> PostgreSQL creates one CurTransactionContext memory context, which is
> >> 8KB, for each subtransaction and retain them until the top transaction
> ends.
> 
> Surely that's where to start then. Find a way to pool and re-use, fully
> release, or otherwise be done with transaction contexts for released
> savepoints.

Yes, I'll investigate this.  Any reference information would be appreciated on why the CurTransactionContexts had to be
retained,and whether it's difficult to circumvent.
 


> You can control transaction level rollback in psqlODBC directly. You do
> not need to fall back to the old protocol. Check the driver options.

That driver option is Protocol=7.4-1.  The name is misleading, as the driver now ignores version part (7.4), and
interprets1 as transaction-rollback.
 


> Right. We can't just fire off each statement wrapped in SAVEPOINT and RELEASE
> SAVEPOINT because we need to get the result of the statement and decide
> whether to ROLLBACK TO SAVEPOINT or RELEASE SAVEPOINT. It only requires
> two round trips if you shove the SAVEPOINT in with the intended statement,
> but it's still messy.
> 
> I'd like to see an alternative statement with semantics more akin to COMMIT
> - which automatically into ROLLBACK if the tx is aborted.
> COMMIT SAVEPOINT would be too confusing since it's not truly committed.
> I don't know what to call it. But basically something that does RELEASE
> SAVEPOINT [named savepoint] unless the subxact is in aborted state, in which
> case it does ROLLBACK TO [named savepoint].
> Bonus points for letting it remember the last savepoint created and use
> that.
> 
> Furthermore, we should really add it on the protocol level so drivers can
> send subtransaction control messages more compactly, without needing to
> go through the parser etc, and without massively spamming the logs. For
> this purpose savepoint names would be internally generated so the driver
> wouldn't have to send them. We'd log savepoint boundaries when transaction
> logging was enabled. Since the client would send the first such protocol
> request we could do it on the sly without a protocol version bump; clients
> could just check server version and not use the new messages for older
> servers. If they send it to an older server they get a protocol error, which
> is fine.

I'm simply thinking of proposing a new GUC, something like "SET auto_rollback = {none | statement | transaction}",
wherenone is the default and traditional behavior.
 


> > You should to implement a CALL statement - that can be independent on
> > outer transaction. The behave inside procedure called by CALL
> > statement should be same like client side - and there you can controll
> > transactions explicitly without nesting.
> 
> I agree that'd be desirable. Top level "procedures" are necessary for this,
> really.
> 
> This would also enable us to return multiple result sets.
> 
> We'd probably have to start at least one small read-only tx for the initial
> cache access to look up the proc and set everything up, but if we don't
> allocate xids local transactions are super cheap.

OK, that would be a very big challenge... I can't imagine how difficult it will be now.  But supporting the stored
procedurewith CALL statement would be a wall to overcome.
 


> However, I think trying to tackle the memory context bloat reported upthread
> would be a more effective starting point since it immediately targets the

Yes, I think I'll address this.  Maybe I'll start different threads for each topic:

1. Memory context bloat
2. Statement-level rollback
3. Stored procedures where transactions can be ended and started

Regards
Takayuki Tsunakawa