Re: BUG #7623: Inconsistency on transaction isolation documentation - Mailing list pgsql-bugs

From Louis-Claude Canon
Subject Re: BUG #7623: Inconsistency on transaction isolation documentation
Date
Msg-id 508FD7A8.9010808@femto-st.fr
Whole thread Raw
In response to Re: BUG #7623: Inconsistency on transaction isolation documentation  ("Kevin Grittner" <kgrittn@mail.com>)
List pgsql-bugs
Thanks for your helpful explanations. Reading subsection 13.2.1 makes=20
sense now. I am not sure if and how the documentation could be improved,=20
but I propose below some suggestions that you may find relevant. My=20
confusion may have stem from my lack of knowledge in PostgreSQL locking=20
mechanism.

Le 29/10/2012 22:42, Kevin Grittner a =C3=A9crit :
> louis-claude.canon@univ-fcomte.fr wrote:
>
>> In the first paragraph of Section 13.2.1, it is first stated that
>> "[with] this isolation level, a SELECT query [...] never sees [...]
>> changes committed during query execution by concurrent
>> transactions." It is my understanding that this is untrue (as the
>> rest of the paragraph seems to imply).
> What are you seeing which you feels contradicts that? (I don't see
> anything which appears to contradict it, but maybe there's more than
> one way for someone to read it, and I'm missing something.)

My mistake, I interpreted "during query execution" as "during the=20
current transaction execution". It could be more precise to replace "it=20
never sees either uncommitted data or changes committed during query=20
execution by concurrent transactions" by one of the following :
- "it never sees either uncommitted data or changes committed during=20
_this_ query execution by concurrent transactions"
- "_this query_ never sees either uncommitted data or changes committed=20
during _its_ execution by concurrent transactions"

>> Then, in the same subsection, an example is given in which a DELETE
>> statement is affected by an uncommitted UPDATE.
> No, the DELETE statement is blocked by the UPDATE to see whether it
> commits or rolls back. If the transaction with the UPDATE rolls back,
> the results of the DELETE will not be affected by the UPDATE. If the
> UPDATE commits, the committed transaction will affect the DELETE.
>
>> It is unclear to me if this still corresponds to the standard
>> "Committed Read" isolation level.
> Yes, it's still in section 13.2.1, which is all about READ COMMITTED
> transactions. And I believe that it still complies with the
> requirements of the standard for that level.
>
>> It could be clearer with more precision: when is committed the
>> DELETE statement?
> I guess it could be more clear that the DELETE will block pending the
> completion of the transaction containing the previously run UPDATE.
> The commit of the DELETE will be some time after the commit of the
> transaction containing the UPDATE.
>
>> What is the snapshot that it sees (when does the transaction
>> start)? Or maybe this standard isolation level only specified that
>> SELECT statements must not be affected by uncommitted changes?
> No transaction is affected (other than blocking or possibly rolling
> back with a serialization failure), by an *uncommitted* transaction.
> In this example the DELETE is affected by a *committed* transaction.
> The snapshot for the DELETE is taken after the query is parsed and
> before it starts scanning for rows to delete. When there is a write
> conflict in READ COMMITTED it follows the update chain from the
> visible row to find the latest committed version of the row, blocking
> for a change in progress; so technically an UPDATE or DELETE at the
> READ COMMITTED transaction isolation level sometimes strays from a
> single snapshot for the statement. The alternative would be to
> restart the statement after a write conflict where the other
> transaction commits, which could perform much worse when a statement
> affects a large number of rows.

A writing query may therefore find rows by reading some committed=20
values, wait for a writing lock and then retest if the write should be=20
performed based on values from a subsequent commit. This is not=20
intuitive that data accessed from a single query may correspond to=20
several commits. The second paragraph could be started by something like=20
"Although, any single query will see only committed values, a writing=20
query may see an inconsistent state of the database" for a better=20
introducing this inconsistent idea. Also, as locking is described later=20
(section 13.3), the UPDATE/DELETE example explanation (the sentence=20
"This occurs because [...] the criteria.") may be too synthetic. I=20
propose to extend it by stating which snapshot are considered and how=20
operations are sequentially performed. For example: "The DELETE query=20
selects from the committed database the row in which hits=3D10. As this=20
row is locked by the UPDATE (see ROW EXCLUSIVE lock in Section 13.3),=20
the DELETE is blocked and wait for the COMMIT to occurs, which will free=20
the lock. At this point, the DELETE proceeds by retesting if its=20
selected row must be deleted. This is no longer the case as the DELETE=20
snapshot of the row results from the last COMMIT that incremented hits=20
to 11."

>> As I am not an expert, I cannot guarantee that there is indeed any
>> issue.
> Well, based on what you said there appears to be at lest some room to
> clarify or elaborate, so people better understand it.
>
>> However, I believe it could be useful that an expert proof-reads or
>> reviews this subsection. For example, I am unsure whether any query
>> will see a consistent snapshot with only committed transactions (as
>> it is stated at first) or if uncommitted transactions may have an
>> impact (as the last example suggests).
> READ COMMITTED can only see work of committed transactions, but does
> not necessarily see only data visible according to its snapshot.
>
>> I would also suggest to following change for clarity (or the
>> contrary change):
>> "The point at issue above is whether or not a single command sees
>> an absolutely consistent view of the database." -> "The point at
>> issue above is that any single command sees an absolutely
>> consistent view of the database."
> Based on my clarification here, do you agree that the current
> language is more accurate than your proposed revision?

I would propose instead the opposite change: "The point at issue above=20
is that any single command does not always see an absolutely consistent=20
view of the database." to emphasize the risk of this isolation level=20
(the current "whether or not" does not highlight which of the=20
alternatives is actually true).

Regards,

Louis-Claude Canon

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #7629: Suboptimal query plan when index search is possible and an additional search operator is given.
Next
From: gtsal@intracom.gr
Date:
Subject: BUG #7628: Installation of PostgreSQL 9.2.1 on Windows 7 may take too long