Thread: Transaction ISOLATION LEVEL - have I missed something?
Hi all, not sure if this is for novice or general - however... Fiddle here: > https://dbfiddle.uk/?rdbms=postgres_12&fiddle=49de306c75e85ce4ba900719e2545c32 > SHOW TRANSACTION ISOLATION LEVEL; Result: > transaction_isolation > read committed So far, so good, then > CREATE TABLE x (y INT, z VARCHAR (25)); > INSERT INTO x VALUES (1, 'One'), (2, 'Two'), (3, 'Three'); and now for a transaction: > START TRANSACTION; > SELECT CURRENT_TIMESTAMP(6); > SELECT * FROM x; > SELECT CURRENT_TIMESTAMP(6); > COMMIT; Result: > current_timestamp 2021-05-13 15:51:11.194766+01 > y z > 1 One > 2 Two > 3 Three > current_timestamp 2021-05-13 15:51:11.194766+01 Now, my understanding of ISOLATION LEVEL = read-committed is that the current_timestamp should change going through the transaction and that it should not change when the level is serializable. Am I wrong about this? I did a similar experiment with MariaDB and none of the times are the same: > https://dbfiddle.uk/?rdbms=mariadb_10.5&fiddle=ad2d3b2f360c3dbe59fa00f623a3db6b Have I misunderstood transactions and isolation levels... URLs, references and pointers to same appreciated. TIA and rgs, Pól...
On Thursday, May 13, 2021, Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:
Am I wrong about this?
Transaction isolation levels are immaterial here. The functions define what time they return:
David J.
Hi David, and thanks for getting back to me. >> Am I wrong about this? > Transaction isolation levels are immaterial here. The functions define what time they return: Ah, yes, it now makes perfect sense... All statements in a transaction which is READ COMMITTED sees the data as of CLOCK_TIMESTAMP() and all statements in a transaction that is SERIALIZABLE see the data as of TRANSACTION_TIMESTAMP(). At least, that's how I understand it now - my error was not double-checking my function definitions. Thanks again and rgs, Pól Ua Laoínecháin... > David J.
On Thursday, May 13, 2021, Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:
Hi David, and thanks for getting back to me.
>> Am I wrong about this?
> Transaction isolation levels are immaterial here. The functions define what time they return:
Ah, yes, it now makes perfect sense...
All statements in a transaction which is READ COMMITTED sees the data
as of CLOCK_TIMESTAMP() and all statements in a transaction that is
SERIALIZABLE see the data as of TRANSACTION_TIMESTAMP().
At least, that's how I understand it now - my error was not
double-checking my function definitions.
No. The isolation level is immaterial to which times are returned. I welcome being proven wrong but nothing I’ve read in that link suggests otherwise so I’m disinclined to prove it correct.
David J.
Hi again, >> All statements in a transaction which is READ COMMITTED sees the data >> as of CLOCK_TIMESTAMP() and all statements in a transaction that is >> SERIALIZABLE see the data as of TRANSACTION_TIMESTAMP(). > No. The isolation level is immaterial to which times are returned. I welcome being proven wrong but nothing I’ve readin that link suggests otherwise so I’m disinclined to prove it correct. (maybe I expressed myself poorly previously) I didn't say that the times returned would vary depending on the isolation level - I don't see why they would. Or, at least, that's not what I meant to say! Take this scenario. START TRANSACTION ; -- either TA (READ COMMITTED - RC) or TB (SERIALIZABLE - S) SELECT TRANSACTION_TIMESTAMP AS tt_1, CLOCK_TIMESTAMP AS ct_1; -- tt_1 and ct_1 are (almost) identical, give or take a few microseconds - i.e. the transaction start time SELECT * FROM x; -- s1 @t1 -- TA and TB perform some long-running DW query.. the actual length of time isn't important - what's important is what happens in the meantime - could be < 0.1s... -- -- In the meantime, x gets modified by a totally different transaction by a totally different user - user_x... -- SELECT TRANSACTION_TIMESTAMP AS tt_2, CLOCK_TIMESTAMP AS ct_2; -- now, there is a big difference between tt_2 and ct_2 for both TA and TB. However, tt_1 and tt_2 are identical, again for both transactions - fixed at the start of either TA or TB. SELECT * FROM x; -- s2 @t2 COMMIT; For TA (RC), the result of s1 will be the state of x at time = t1 (~ = tt_1 and ~ = ct_1) the result of s2 will be the state of x at time = ct_2 and there will be a difference due to the user_x's modfications. For TB (S), the result of s1 @t1 will be the same as the result of s2 @t2 because of SERIALIZE-ability. I hope that clears up what I meant - and also that what I have written above demonstrates a reasonable understanding of the difference between the potential outcomes of an RC transaction and an S one? Thanks again for your input - I think that putting all of that down on paper (well, screen...) has helped crystallise the concepts for me (any corrections, addenda... appreciated - any references to useful URLs in this respect likewise). Pól... > David J.
On Thu, May 13, 2021 at 1:31 PM Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:
I hope that clears up what I meant - and also that what I have written
above demonstrates a reasonable understanding of the difference
between the potential outcomes of an RC transaction and an S one?
It does. I see now you are trying to describe snapshots of data available to statements in terms of database timestamps as opposed to referring to transactional boundaries.
I've never seen isolation levels discussed in those terms before and do not know whether it is accurate or missing some important nuances. With locks, snapshots, and valid interleaving versus invalid interleaving (for serializable) all coming into play the notion of timestamp isn't really in scope. But if it helps for basic comprehension then it does have merit and indeed the links you describe seem correct at first glance.
David J.
On Thu, 13 May 2021 at 22:02, David G. Johnston <david.g.johnston@gmail.com> wrote: > It does. I see now you are trying to describe snapshots of data available to statements in terms of database timestampsas opposed to referring to transactional boundaries. > I've never seen isolation levels discussed in those terms before and do not know whether it is accurate or missing someimportant nuances. With locks, snapshots, and valid interleaving versus invalid interleaving (for serializable) allcoming into play the notion of timestamp isn't really in scope. But if it helps for basic comprehension then it doeshave merit and indeed the links you describe seem correct at first glance. I was explaining to somebody who doesn't have your (or even my basic) grasp of isolation levels by using TRANSACTION_TIMESTAMP and CLOCK_TIMESTAMP as proxies for transaction ids... Below hopefully is the discussion reframed in terms of transaction ids - hopefully it's correct and demonstrates a complete understanding of the topic in terms of PostgreSQL's implementation of the TRANSACTION ISOLATION paradigm and the distinction between READ COMMITTED (RC) and SERIALIZABLE (S). I have put in material below about txid_current() and how RC statements can see the results of transactions where the transaction id is greater than txid_current(), but that when the transaction is S, it can only see the results of committed transactions < txid_current() and any changes made inside txid_current() itself... I found this helpful in understanding what I was seeing in the various pseudo-colum fields https://www.cybertec-postgresql.com/en/is-update-the-same-as-delete-insert-in-postgresql (I've seen this guy on the general list...) I'll have to read that article in-depth to fully grasp all of the subtleties - but it was good. So, for a translation of a time-based explanation of the difference between RC and S to a more PostgreSQL based tables, fields and tx ids explanation - read on! From a session - 1, I do this: =============================================== Session 1 -> START TRANSACTION; START TRANSACTION Session 1 -> SHOW TRANSACTION ISOLATION LEVEL; transaction_isolation ----------------------- read committed (1 row) Session 1 -> SELECT txid_current(), txid_status(txid_current()), xmin, xmax, cmin, cmax, ctid, x, y FROM t; txid_current | txid_status | xmin | xmax | cmin | cmax | ctid | x | y --------------+-------------+------+------+------+------+-------+---+------- 578 | in progress | 527 | 0 | 0 | 0 | (0,1) | 1 | One 578 | in progress | 528 | 0 | 0 | 0 | (0,2) | 2 | Two 578 | in progress | 552 | 0 | 0 | 0 | (0,3) | 3 | Three (3 rows) ================================================ So, I've started the READ COMMITTED transaction (txn) no. 578 - and in table t, there are three records (tuples) (1, 'One'), (2, 'Two') & (3, 'Three'). Txn 527 inserted the first tuple, txn 528 inserted the second one and the third txn was inserted by txn 529, but I updated it, so essentially, it was last modified by txn 552 as we learned from the cybertec article referenced above. So, now in session 2, I update the table by changing 'Three' to 'Trí' in tuple 3. =================================================== Session 2 -> UPDATE t SET y = 'Trí' WHERE x = 3; UPDATE 1 ==================================================== So, that's automatically committed - but, just to check, again from session 2, I do: ======================================================== Session 2 -> SELECT txid_current(), txid_status(txid_current()), xmin, xmax, cmin, cmax, ctid, x, y FROM t; txid_current | txid_status | xmin | xmax | cmin | cmax | ctid | x | y --------------+-------------+------+------+------+------+-------+---+----- 580 | in progress | 527 | 0 | 0 | 0 | (0,1) | 1 | One 580 | in progress | 528 | 0 | 0 | 0 | (0,2) | 2 | Two 580 | in progress | 579 | 0 | 0 | 0 | (0,4) | 3 | Trí (3 rows) =========================================================== So, the txid_current has been incremented - to 579 by the UPDATE and then by the SELECT here in session 2 to 580. Still in session 2, I run this query: ====================================================== SELECT pid, age(clock_timestamp(), query_start), usename, substring(query FROM 70 FOR 25), state, backend_xid FROM pg_stat_activity WHERE datname = 'test' AND query NOT ILIKE '%pg_stat_activity%' ORDER BY query_start desc; and we get pid | age | usename | substring | state | backend_xid -------+-----------------+----------+---------------------------+---------------------+------------- 10468 | 00:36:57.592779 | postgres | cmax, ctid, x, y FROM t; | idle in transaction | 578 (1 row) ========================================================= Now, we see that our txn 578 (which is now > 36 mins old) is still sitting there in an idle state. So, everything's rosy in the garden... now, back to session 1... I rerun my SELECT within txn 578 =========================================================== Session 1 -> SELECT txid_current(), txid_status(txid_current()), xmin, xmax, cmin, cmax, ctid, x, y FROM t; txid_current | txid_status | xmin | xmax | cmin | cmax | ctid | x | y --------------+-------------+------+------+------+------+-------+---+----- 578 | in progress | 527 | 0 | 0 | 0 | (0,1) | 1 | One 578 | in progress | 528 | 0 | 0 | 0 | (0,2) | 2 | Two 578 | in progress | 579 | 0 | 0 | 0 | (0,4) | 3 | Trí (3 rows) ========================================================== So, now the tuple where x = 3 has a value of 'Trí' for y. This is the crux of the TRANSACTION ISOLATION LEVEL being RC - txn 578 **can see** the result of txn 579 (from session 2)! ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ We change the TRANSACTION ISOLATION LEVEL to S in session 1. I start by issuing the command: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE; and then start a txn and SELECT from t as follows: ======================================================= Session 1 -> START TRANSACTION; START TRANSACTION Session 1 -> SHOW TRANSACTION ISOLATION LEVEL; transaction_isolation ----------------------- serializable (1 row) Session 1 -> SELECT txid_current(), txid_status(txid_current()), xmin, xmax, cmin, cmax, ctid, x, y FROM t; txid_current | txid_status | xmin | xmax | cmin | cmax | ctid | x | y --------------+-------------+------+------+------+------+-------+---+----- 582 | in progress | 527 | 0 | 0 | 0 | (0,1) | 1 | One 582 | in progress | 528 | 0 | 0 | 0 | (0,2) | 2 | Two 582 | in progress | 579 | 0 | 0 | 0 | (0,4) | 3 | Trí (3 rows) ====================================================================== Txn 582 (S) has been started and has tuple 3 as (3, 'Trí'). Back to session 2 and I issue the pg_stats_activity command ============================================================= SELECT pid, age(clock_timestamp(), query_start), usename, substring(query FROM 70 FOR 25), state, backend_xid FROM pg_stat_activity WHERE datname = 'test' AND query NOT ILIKE '%pg_stat_activity%' ORDER BY query_start desc; which shows pid | age | usename | substring | state | backend_xid -------+-----------------+----------+---------------------------+---------------------+------------- 10468 | 00:01:09.705916 | postgres | cmax, ctid, x, y FROM t; | idle in transaction | 582 (1 row) ==================================================================================== So, we can see that our session 1 transaction (582, S) is happily sitting there idling away... Again from session 2, I UPDATE the (3, 'Trí') tuple back to (3, 'Three') - no START TRANSACTION - the session 2 txn and the COMMIT; are implicit. ================================================================================ Session 2 -> UPDATE t SET y = 'Three' WHERE x = 3; UPDATE 1 then check my UPDATE from within session 2 Session 2 -> SELECT txid_current(), txid_status(txid_current()), xmin, xmax, cmin, cmax, ctid, x, y FROM t; txid_current | txid_status | xmin | xmax | cmin | cmax | ctid | x | y --------------+-------------+------+------+------+------+-------+---+------- 584 | in progress | 527 | 0 | 0 | 0 | (0,1) | 1 | One 584 | in progress | 528 | 0 | 0 | 0 | (0,2) | 2 | Two 584 | in progress | 583 | 0 | 0 | 0 | (0,5) | 3 | Three (3 rows) ================================================================================= Within session 2, I can now see my changed data - (3, 'Trí') has reverted to (3, 'Three'). I go back to session 1 and reissue the command above ================================================================================ Session 1 -> SELECT txid_current(), txid_status(txid_current()), xmin, xmax, cmin, cmax, ctid, x, y FROM t; txid_current | txid_status | xmin | xmax | cmin | cmax | ctid | x | y --------------+-------------+------+------+------+------+-------+---+----- 582 | in progress | 527 | 0 | 0 | 0 | (0,1) | 1 | One 582 | in progress | 528 | 0 | 0 | 0 | (0,2) | 2 | Two 582 | in progress | 579 | 583 | 0 | 0 | (0,4) | 3 | Trí (3 rows) ================================================================================ So, despite the fact that the S txn 582 is > 2 mins old and has been modified by the UPDATE in session 2, the data has remained unchanged from the perspective of the 582 S txn in session 1 Except for xmax which has incremented to 583, the tx_id of the UPDATE txn from session 2 - serves as notice that the tuple has been updated by a later txn! This is the distinction between READ COMMITTED and SERIALIZABLE. Now, I appreciate that there are performance/resource issues involved in escalating from RC to S levels - but that's a different day's work. I think that the first important thing to realise (at least it took me a while) is that transaction isolation levels only apply to multi-statement transactions - a single SQL statement txn only has one view of the database at the instant it is run and it will see all committed data at that point - txn isolation doesn't apply (I'm ignoring the possibility of dirty-reads which are the spawn of Satan...). Again, thanks for having made me think about my explanation and about the topic generally - I've heard it said and seen it written that you can't say that you understand something unless you can explain it clearly - apparently the Einstein quote is misattributed - I like Rutherford's one though: > https://www.linkedin.com/pulse/albert-einsteins-birthday-ode-misattributions-peter-smirniotopoulos I realise that there are potential added complications if session 1 decides to update table t, tuple (3, 'Trí') - not sure what happens then? Watch this space... Hopefully I have explained it clearly... Rgs, Pól... > David J.