Re: Transaction ISOLATION LEVEL - have I missed something? - Mailing list pgsql-novice

From Pól Ua Laoínecháin
Subject Re: Transaction ISOLATION LEVEL - have I missed something?
Date
Msg-id CAF4RT5Q9vBJ+qda7b2DhQFOLyzXPLVzXtc_WnTqjhs6K=F2zOw@mail.gmail.com
Whole thread Raw
In response to Re: Transaction ISOLATION LEVEL - have I missed something?  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-novice
On Thu, 13 May 2021 at 22:02, David G. Johnston
<david.g.johnston@gmail.com> wrote:


> It does.  I see now you are trying to describe snapshots of data available to statements in terms of database
timestampsas opposed to referring to transactional boundaries. 

> I've never seen isolation levels discussed in those terms before and do not know whether it is accurate or missing
someimportant nuances.  With locks, snapshots, and valid interleaving versus invalid interleaving (for serializable)
allcoming into play the notion of timestamp isn't really in scope.  But if it helps for basic comprehension then it
doeshave merit and indeed the links you describe seem correct at first glance. 



I was explaining to somebody who doesn't have your (or even my basic)
grasp of isolation levels by using TRANSACTION_TIMESTAMP and
CLOCK_TIMESTAMP as proxies for transaction ids...

Below hopefully is the discussion reframed in terms of transaction ids
- hopefully it's correct and demonstrates a complete understanding of
the topic in terms of PostgreSQL's implementation of the TRANSACTION
ISOLATION paradigm and the distinction between READ COMMITTED (RC) and
SERIALIZABLE (S).

I have put in material below about txid_current() and how RC
statements can see the results of transactions where the transaction
id is greater than txid_current(), but that when the transaction is S,
it can only see the results of committed transactions < txid_current()
and any changes made inside txid_current() itself...

I found this helpful in understanding what I was seeing in the various
pseudo-colum fields

https://www.cybertec-postgresql.com/en/is-update-the-same-as-delete-insert-in-postgresql
(I've seen this guy on the general list...)

I'll have to read that article in-depth to fully grasp all of the
subtleties - but it was good.

So, for a translation of a time-based explanation of the difference
between RC and S to a more PostgreSQL based tables, fields and tx ids
explanation - read on!


From a session - 1, I do this:

===============================================

Session 1 -> START TRANSACTION;
START TRANSACTION
Session 1 -> SHOW TRANSACTION ISOLATION LEVEL;
 transaction_isolation
-----------------------
 read committed
(1 row)


Session 1 -> SELECT txid_current(), txid_status(txid_current()), xmin,
xmax, cmin, cmax, ctid, x, y FROM t;
 txid_current | txid_status | xmin | xmax | cmin | cmax | ctid  | x |   y
--------------+-------------+------+------+------+------+-------+---+-------
          578 | in progress |  527 |    0 |    0 |    0 | (0,1) | 1 | One
          578 | in progress |  528 |    0 |    0 |    0 | (0,2) | 2 | Two
          578 | in progress |  552 |    0 |    0 |    0 | (0,3) | 3 | Three
(3 rows)

================================================

So, I've started the READ COMMITTED transaction (txn) no. 578 - and in
table t, there are three records (tuples) (1, 'One'), (2, 'Two') & (3,
'Three').

Txn 527 inserted the first tuple, txn 528 inserted the second one and
the third txn was inserted by txn 529, but I updated it, so
essentially, it was last modified by txn 552 as we learned from the
cybertec article referenced above.


So, now in session 2, I update the table by changing 'Three' to 'Trí'
in tuple 3.

===================================================

Session 2 -> UPDATE t SET y = 'Trí' WHERE x = 3;
UPDATE 1

====================================================

So, that's automatically committed - but, just to check, again from
session 2, I do:

========================================================

Session 2 -> SELECT txid_current(), txid_status(txid_current()), xmin,
xmax, cmin, cmax, ctid, x, y FROM t;
 txid_current | txid_status | xmin | xmax | cmin | cmax | ctid  | x |  y
--------------+-------------+------+------+------+------+-------+---+-----
          580 | in progress |  527 |    0 |    0 |    0 | (0,1) | 1 | One
          580 | in progress |  528 |    0 |    0 |    0 | (0,2) | 2 | Two
          580 | in progress |  579 |    0 |    0 |    0 | (0,4) | 3 | Trí
(3 rows)
===========================================================

So, the txid_current has been incremented - to 579 by the UPDATE and
then by the SELECT here in session 2 to 580.

Still in session 2, I run this query:

======================================================

SELECT
  pid,
  age(clock_timestamp(), query_start),
  usename,
  substring(query FROM 70 FOR 25),
  state,
  backend_xid
FROM pg_stat_activity
WHERE datname = 'test' AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;

and we get

  pid  |       age       | usename  |         substring         |
  state        | backend_xid
-------+-----------------+----------+---------------------------+---------------------+-------------
 10468 | 00:36:57.592779 | postgres |  cmax, ctid, x, y FROM t; | idle
in transaction |         578
(1 row)

=========================================================

Now, we see that our txn 578 (which is now > 36 mins old) is still
sitting there in an idle state.

So, everything's rosy in the garden... now, back to session 1... I
rerun my SELECT within txn 578

===========================================================

Session 1 -> SELECT txid_current(), txid_status(txid_current()), xmin,
xmax, cmin, cmax, ctid, x, y FROM t;

 txid_current | txid_status | xmin | xmax | cmin | cmax | ctid  | x |  y
--------------+-------------+------+------+------+------+-------+---+-----
          578 | in progress |  527 |    0 |    0 |    0 | (0,1) | 1 | One
          578 | in progress |  528 |    0 |    0 |    0 | (0,2) | 2 | Two
          578 | in progress |  579 |    0 |    0 |    0 | (0,4) | 3 | Trí
(3 rows)

==========================================================

So, now the tuple where x = 3 has a value of 'Trí' for y.

This is the crux of the TRANSACTION ISOLATION LEVEL being RC -  txn
578 **can see** the result of txn 579 (from session 2)!

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

We change the TRANSACTION ISOLATION LEVEL to S in session 1.

I start by issuing the command:

SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE;

and then start a txn and SELECT from t as follows:

=======================================================

Session 1 -> START TRANSACTION;
START TRANSACTION
Session 1 -> SHOW TRANSACTION ISOLATION LEVEL;
 transaction_isolation
-----------------------
 serializable
(1 row)


Session 1 -> SELECT txid_current(), txid_status(txid_current()), xmin,
xmax, cmin, cmax, ctid, x, y FROM t;
 txid_current | txid_status | xmin | xmax | cmin | cmax | ctid  | x |  y
--------------+-------------+------+------+------+------+-------+---+-----
          582 | in progress |  527 |    0 |    0 |    0 | (0,1) | 1 | One
          582 | in progress |  528 |    0 |    0 |    0 | (0,2) | 2 | Two
          582 | in progress |  579 |    0 |    0 |    0 | (0,4) | 3 | Trí
(3 rows)

======================================================================

Txn 582 (S) has been started and has tuple 3 as (3, 'Trí').

Back to session 2 and I issue the pg_stats_activity command

=============================================================

SELECT
  pid,
  age(clock_timestamp(), query_start),
  usename,
  substring(query FROM 70 FOR 25),
  state,
  backend_xid
FROM pg_stat_activity
WHERE datname = 'test' AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;

which shows

  pid  |       age       | usename  |         substring         |
  state        | backend_xid
-------+-----------------+----------+---------------------------+---------------------+-------------
 10468 | 00:01:09.705916 | postgres |  cmax, ctid, x, y FROM t; | idle
in transaction |         582
(1 row)

====================================================================================

So, we can see that our session 1 transaction (582, S) is happily
sitting there idling away...

Again from session 2, I UPDATE the (3, 'Trí') tuple back to (3,
'Three') - no START TRANSACTION - the session 2 txn and the COMMIT;
are implicit.

================================================================================

Session 2 -> UPDATE t SET y = 'Three' WHERE x = 3;
UPDATE 1

then check my UPDATE from within session 2

Session 2 ->  SELECT txid_current(), txid_status(txid_current()),
xmin, xmax, cmin, cmax, ctid, x, y FROM t;
 txid_current | txid_status | xmin | xmax | cmin | cmax | ctid  | x |   y
--------------+-------------+------+------+------+------+-------+---+-------
          584 | in progress |  527 |    0 |    0 |    0 | (0,1) | 1 | One
          584 | in progress |  528 |    0 |    0 |    0 | (0,2) | 2 | Two
          584 | in progress |  583 |    0 |    0 |    0 | (0,5) | 3 | Three
(3 rows)

=================================================================================

Within session 2, I can now see my changed data - (3, 'Trí') has
reverted to (3, 'Three').

I go back to session 1 and reissue the command above

================================================================================

Session 1 -> SELECT txid_current(), txid_status(txid_current()), xmin,
xmax, cmin, cmax, ctid, x, y FROM t;
 txid_current | txid_status | xmin | xmax | cmin | cmax | ctid  | x |  y
--------------+-------------+------+------+------+------+-------+---+-----
          582 | in progress |  527 |    0 |    0 |    0 | (0,1) | 1 | One
          582 | in progress |  528 |    0 |    0 |    0 | (0,2) | 2 | Two
          582 | in progress |  579 |  583 |    0 |    0 | (0,4) | 3 | Trí
(3 rows)

================================================================================

So, despite the fact that the S txn 582 is > 2 mins old and has been
modified by the UPDATE in session 2, the data has remained unchanged
from the perspective of the 582 S txn in session 1

Except for xmax which has incremented to 583, the tx_id of the UPDATE
txn from session 2 - serves as notice that the tuple has been updated
by a later txn!

This is the distinction between READ COMMITTED and SERIALIZABLE.

Now, I appreciate that there are performance/resource issues involved
in escalating from RC to S levels - but that's a different day's work.

I think that the first important thing to realise (at least it took me
a while) is that transaction isolation levels only apply to
multi-statement transactions - a single SQL statement txn only has one
view of the database at the instant it is run and it will see all
committed data at that point - txn isolation doesn't apply (I'm
ignoring the possibility of dirty-reads which are the spawn of
Satan...).

Again, thanks for having made me think about my explanation and about
the topic generally - I've heard it said and seen it written that you
can't say that you understand something unless you can explain it
clearly - apparently the Einstein quote is misattributed - I like
Rutherford's one though:

> https://www.linkedin.com/pulse/albert-einsteins-birthday-ode-misattributions-peter-smirniotopoulos

I realise that there are potential added complications if session 1
decides to update table t, tuple (3, 'Trí') - not sure what happens
then? Watch this space...


Hopefully I have explained it clearly...

Rgs,


Pól...


> David J.



pgsql-novice by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Transaction ISOLATION LEVEL - have I missed something?
Next
From: Simon Connah
Date:
Subject: Loading test data into a PostgreSQL database