Thread: Mark a transaction uncommittable
This is a proposal for a new transaction characteristic. I haven't written any code, yet, and am interested in hearing if others may find this feature useful. Many a times we start a transaction that we never intend to commit; for example, for testing, or for EXPLAIN ANALYZE, or after detecting unexpected results but still interested in executing more commands without risking commit, etc. A user would like to declare their intent to eventually abort the transaction as soon as possible, so that the transaction does not accidentally get committed. This feature would allow the user to mark a transaction such that it can never be committed. We must allow such marker to be placed when the transaction is being started, or while it's in progress. Once marked uncommittable, do not allow the marker to be removed. Hence, once deemed uncommittable, the transaction cannot be committed, even intentionally. This protects against cases where one script includes another (e.g. psql's \i command), and the included script may have statements that turn this marker back on. Any command that ends a transaction (END, COMMIT, ROLLBACK) must result in a rollback. All of these properties seem useful for savepoints, too. But I want to focus on just the top-level transactions, first. I feel like the BEGIN and SET TRANSACTION commands would be the right places to introduce this feature. BEGIN [ work | transaction ] [ [ NOT ] COMMITTABLE ]; SET TRANSACTION [ [ NOT ] COMMITTABLE ]; I'm not yet sure if the COMMIT AND CHAIN command should carry this characteristic to the next transaction. Thoughts? Best regards, Gurjeet https://Gurje.et http://aws.amazon.com
On Sat, 22 Apr 2023 at 11:01, Gurjeet Singh <gurjeet@singh.im> wrote:
This is a proposal for a new transaction characteristic. I haven't
written any code, yet, and am interested in hearing if others may find
this feature useful.
Many a times we start a transaction that we never intend to commit;
for example, for testing, or for EXPLAIN ANALYZE, or after detecting
unexpected results but still interested in executing more commands
without risking commit, etc.
A user would like to declare their intent to eventually abort the
transaction as soon as possible, so that the transaction does not
accidentally get committed.
I have an application for this: creating various dev/test versions of data from production.
Start by restoring a copy of production from backup. Then successively create several altered versions of the data and save them to a place where developers can pick them up. For example, you might have one version which has all data old than 1 year deleted, and another where 99% of the students/customers/whatever are deleted. Anonymization could also be applied. This would give you realistic (because it ultimately originates from production) test data.
This could be done by starting a non-committable transaction, making the adjustments, then doing a pg_dump in the same transaction (using --snapshot to allow it to see that transaction). Then rollback, and repeat for the other versions. This saves repeatedly restoring the (probably very large) production data each time.
What I’m not sure about is how long it takes to rollback a transaction. I'm assuming that it’s very quick compared to restoring from backup.
It would be nice if pg_basebackup could also have the --snapshot option.
Hi, On Sat, Apr 22, 2023 at 12:53:23PM -0400, Isaac Morland wrote: > > I have an application for this: creating various dev/test versions of data > from production. > > Start by restoring a copy of production from backup. Then successively > create several altered versions of the data and save them to a place where > developers can pick them up. For example, you might have one version which > has all data old than 1 year deleted, and another where 99% of the > students/customers/whatever are deleted. Anonymization could also be > applied. This would give you realistic (because it ultimately originates > from production) test data. > > This could be done by starting a non-committable transaction, making the > adjustments, then doing a pg_dump in the same transaction (using --snapshot > to allow it to see that transaction). Then rollback, and repeat for the > other versions. This saves repeatedly restoring the (probably very large) > production data each time. There already are tools to handle those use cases. Looks for instance at https://github.com/mla/pg_sample to backup a consistent subset of the data, or https://github.com/rjuju/pg_anonymize to transparently pg_dump (or interactively query) anonymized data. Both tool also works when connected on a physical standby, while trying to update data before dumping them wouldn't.
On Sat, Apr 22, 2023 at 8:01 AM Gurjeet Singh <gurjeet@singh.im> wrote: > > This is a proposal for a new transaction characteristic. I haven't > written any code, yet, and am interested in hearing if others may find > this feature useful. Please see attached the patch that introduces this new feature. The patch includes all the code changes that I could foresee; that is, it includes server changes as well as changes for psql's auto-completion. The patch does not include doc changes, or any tests. Please see a sample session at the end of the email, though. The patch introduces a new keyword, COMMITTABLE, and uses that to introduce the new transaction attribute via BEGIN [TRANSACTION] and SET TRANSACTION commands. The new code follows the semantics of the [NOT] DEFERRABLE attribute of a transaction almost exactly. > Many a times we start a transaction that we never intend to commit; > for example, for testing, or for EXPLAIN ANALYZE, or after detecting > unexpected results but still interested in executing more commands > without risking commit, etc. > > A user would like to declare their intent to eventually abort the > transaction as soon as possible, so that the transaction does not > accidentally get committed. > > This feature would allow the user to mark a transaction such that it > can never be committed. We must allow such marker to be placed when > the transaction is being started, or while it's in progress. The user can mark the transaction as uncommittable either when starting the transaction, or while it is still in progress. > Once marked uncommittable, do not allow the marker to be removed. > Hence, once deemed uncommittable, the transaction cannot be committed, > even intentionally. This protects against cases where one script > includes another (e.g. psql's \i command), and the included script may > have statements that turn this marker back on. Although the patch implements this desired behavior from the initial proposal, I'm no longer convinced of the need to prevent user from re-enabling committability of the transaction. > Any command that ends a transaction (END, COMMIT, ROLLBACK) must > result in a rollback. > > All of these properties seem useful for savepoints, too. But I want to > focus on just the top-level transactions, first. The patch does not change any behaviour related to savepoints. Having made it work for the top-level transaction, and having seen the savepoint/subtransaction code as I came across it as I developed this patch, I feel that it will be very tricky to implement this behavior safely for savepoints. Moreover, having thought more about the possible use cases, I don't think implementing uncommittability of savepoints will be of much use in the real world. > I feel like the BEGIN and SET TRANSACTION commands would be the right > places to introduce this feature. > > BEGIN [ work | transaction ] [ [ NOT ] COMMITTABLE ]; > SET TRANSACTION [ [ NOT ] COMMITTABLE ]; I tried to avoid adding a new keyword (COMMITTABLE) to the grammar, but could not think of a better alternative. E.g. DISALLOW COMMIT sounds like a good alternative, but DISALLOW is currently not a keyword, so this form doesn't buy us anything. > I'm not yet sure if the COMMIT AND CHAIN command should carry this > characteristic to the next transaction. After a little consideration, in the spirit of POLA, I have not done anything special to change the default behaviour of COMMIT/ROLLBACK AND CHAIN. Any feedback is welcome. Please see below an example session demonstrating this feature. postgres=# begin transaction committable; BEGIN postgres=# commit; COMMIT postgres=# begin transaction not committable; BEGIN postgres=# commit; WARNING: transaction is not committable ROLLBACK postgres=# begin transaction not committable; BEGIN postgres=# set transaction_committable = true; -- for clarity, we may want to emit additional "WARNING: cannot make transaction committable", although the patch currently doesn't do so. ERROR: invalid value for parameter "transaction_committable": 1 postgres=# commit; ROLLBACK postgres=# begin transaction not committable; BEGIN postgres=# set transaction committable ; -- for clarity, we may want to emit additional "WARNING: cannot make transaction committable", although the patch currently doesn't do so. ERROR: invalid value for parameter "transaction_committable": 1 postgres=# set transaction committable ; ERROR: current transaction is aborted, commands ignored until end of transaction block postgres=# commit; ROLLBACK Best regards, Gurjeet http://Gurje.et
Attachment
On Mon, 2023-06-05 at 00:22 -0700, Gurjeet Singh wrote: > On Sat, Apr 22, 2023 at 8:01 AM Gurjeet Singh <gurjeet@singh.im> wrote: > > > > This is a proposal for a new transaction characteristic. I haven't > > written any code, yet, and am interested in hearing if others may find > > this feature useful. > > Please see attached the patch that introduces this new feature. Can you explain why *you* would find this feature useful? Yours, Laurenz Albe
On Mon, Jun 5, 2023 at 12:32 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Mon, 2023-06-05 at 00:22 -0700, Gurjeet Singh wrote: > > On Sat, Apr 22, 2023 at 8:01 AM Gurjeet Singh <gurjeet@singh.im> wrote: > > > > > > This is a proposal for a new transaction characteristic. I haven't > > > written any code, yet, and am interested in hearing if others may find > > > this feature useful. > > > > Please see attached the patch that introduces this new feature. > > Can you explain why *you* would find this feature useful? The idea came to me while I was reading a blog post, where the author had to go to great lengths to explain to the reader why the queries would be disastrous, if run on production database, and that they should run those queries inside a transaction, and they _must_ rollback the transaction. Having written my fair share of tutorials, and blogs, I know how helpful it would be to start a transaction that the reader can't accidentally commit. As others have noted in this thread, this feature can be useful in other situations, as well, like when you're trying to export a sanitized copy of a production database. Especially in such a situation you do not want those sanitization operations to ever be committed on the source database. Best regards, Gurjeet http://Gurje.et
On Sat, Apr 22, 2023 at 4:33 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Hi, > > On Sat, Apr 22, 2023 at 12:53:23PM -0400, Isaac Morland wrote: > > > > I have an application for this: creating various dev/test versions of data > > from production. > > > > Start by restoring a copy of production from backup. Then successively > > create several altered versions of the data and save them to a place where > > developers can pick them up. For example, you might have one version which > > has all data old than 1 year deleted, and another where 99% of the > > students/customers/whatever are deleted. Anonymization could also be > > applied. This would give you realistic (because it ultimately originates > > from production) test data. > > > > This could be done by starting a non-committable transaction, making the > > adjustments, then doing a pg_dump in the same transaction (using --snapshot > > to allow it to see that transaction). Then rollback, and repeat for the > > other versions. This saves repeatedly restoring the (probably very large) > > production data each time. > > There already are tools to handle those use cases. Looks for instance at > https://github.com/mla/pg_sample to backup a consistent subset of the data, or > https://github.com/rjuju/pg_anonymize to transparently pg_dump (or > interactively query) anonymized data. > > Both tool also works when connected on a physical standby, while trying to > update data before dumping them wouldn't. Like everything in life, there are pros and cons to every approach. pg_anonymize is an extension that may not be installed on the database you're working with. And pg_sample (and similar utilities) may not have a way to extract or sanitize the exact data you want. With this feature built into Postgres, you'd not need any external utilities or extensions. The benefits of features built into Postgres are that the users can come up with ways of leveraging such a feature in future in a way that we can't envision today. Best regards, Gurjeet http://Gurje.et
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: not tested Hello It is one of those features that a handful of people would find useful in specific user cases. I think it is a nice to havefeature that safeguards your production database against unwanted commits when troubleshooting production problems. Yourpatch applies fine on master and I am able to run a couple tests on it and it seems to do as described. I noticed thatthe patch has a per-session variable "default_transaction_committable" that could make all transaction committable oruncommittable on the session even without specifying "begin transaction not committable;" I am wondering if we should havea configurable default at all as I think it should always defaults to true and unchangable. If an user wants a uncommittabletransaction, he/she will need to explicitly specify that during "begin". Having another option to change defaultbehavior for all transactions may be a little unsafe, it is possible someone could purposely change this default tofalse on a production session that needs transactions to absolutely commit, causing damages there. thank you Cary Huang ------------------ Highgo Software Canada www.highgo.ca
> On 5 Jun 2023, at 09:22, Gurjeet Singh <gurjeet@singh.im> wrote: > Please see attached the patch that introduces this new feature. This patch fails the guc_check test due to something missing in the sample configuration. Please send a rebased version with this test fixed. [08:31:26.680] --- stdout --- [08:31:26.680] # executing test in /tmp/cirrus-ci-build/build/testrun/test_misc/003_check_guc group test_misc test 003_check_guc [08:31:26.680] not ok 1 - no parameters missing from postgresql.conf.sample [08:31:26.680] ok 2 - no parameters missing from guc_tables.c [08:31:26.680] ok 3 - no parameters marked as NOT_IN_SAMPLE in postgresql.conf.sample [08:31:26.680] 1..3 [08:31:26.680] # test failed -- Daniel Gustafsson