Thread: Definition of REPEATABLE READ

Definition of REPEATABLE READ

From

"Peter J. Holzer"

Date:

21 June 2020, 15:25:13

I've read http://jepsen.io/analyses/postgresql-12.3 which reports a
problem in PostgreSQL's SERIALIZABLE isolation leven (which has since been
fixed) and also shows an example of a violation of what they consider to
be the correct definition of REPEATABLE READ.

Specifically:

    T1: r 190 [1 2] | r 188 [4 5] | a 188 8

    T2: a 190 4 | a 190 5

    T3: r 190 [1 2 4 5] | r 188 [4 5]

This violates G2-item in Alya et al.[1]: There is a dependency cycle
(T1.1 rw-> T2.1, T2.2 wr-> T3.1, T3.2 rw-> T1.3) with at least one rw
anti-dependency.

It is however, fine in PostgreSQLs implementation of REPEATABLE READ:

T1                      T2                      T3
BEGIN ISOLATION LEVEL   BEGIN ISOLATION LEVEL   BEGIN ISOLATION LEVEL
REPEATABLE READ;        REPEATABLE READ;        REPEATABLE READ;

SELECT * FROM t
WHERE id = 190;

id  | v
190 | 1 2

SELECT * FROM t
WHERE id = 188;

id  | v
190 | 4 5

                        UPDATE t SET v = v || ' 4'
                        WHERE id = 190;

                        UPDATE t SET v = v || ' 5'
                        WHERE id = 190;

                        COMMIT;

                                                SELECT * FROM t
                                                WHERE id = 190;

                                                id  | v
                                                190 | 1 2 4 5

                                                SELECT * FROM t
                                                WHERE id = 188;

                                                id  | v
                                                190 | 4 5

                                                COMMIT;
UPDATE t SET v = v || ' 8'
WHERE id = 188;

COMMIT;

It also doesn't appear to violate the wording in SQL/92, which forbids

         1) P1 ("Dirty read"): SQL-transaction T1 modifies a row. SQL-
            transaction T2 then reads that row before T1 performs a COMMIT.
            If T1 then performs a ROLLBACK, T2 will have read a row that was
            never committed and that may thus be considered to have never
            existed.

         2) P2 ("Non-repeatable read"): SQL-transaction T1 reads a row. SQL-
            transaction T2 then modifies or deletes that row and performs
            a COMMIT. If T1 then attempts to reread the row, it may receive
            the modified value or discover that the row has been deleted.

for REPEATABLE READ (but allows P3 (Phantom reads)).

As far as I understand it, Alya et al. follow Berenson et.al[2] in that
the ANSI committee didn't mean what they wrote. Since excluding P1, P2
and P3 isn't sufficient to achieve serializabibility and the standard is
explizit that SERIALIZABLE should indeed achieve that, they argue that
REPEATABLE READ was really meant to exclude all phenomena except P3. So
for for Alya et. al., only (G2 - G2-item) should be allowed.

Since those papers are now 25 and 20 years old, respectively, and there
have been several revisions of the SQL standard in the meantime, has the
SQL committee come around to that view (SQL/2003 seems unchanged, I
couldn't find a later revision online)? And if it has, did it follow
those definitions or come up with different ones (it seems to me that
G2-item is much stronger that warranted by the wording in the standard)?

        hp


[1] http://pmg.csail.mit.edu/papers/icde00.pdf
[2] https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

signature.asc

Re: Definition of REPEATABLE READ

From

Thomas Munro

Date:

22 June 2020, 07:02:00

On Mon, Jun 22, 2020 at 12:25 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
> I've read http://jepsen.io/analyses/postgresql-12.3 which reports a
> problem in PostgreSQL's SERIALIZABLE isolation leven (which has since been
> fixed) and also shows an example of a violation of what they consider to
> be the correct definition of REPEATABLE READ.

In response to that report, we added a paragraph to the manual to
state explicitly that what we offer is snapshot isolation:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1575fbf1ac3c8464b2dade245deff537a3aa2498

https://www.postgresql.org/docs/devel/transaction-iso.html#XACT-REPEATABLE-READ

> Since those papers are now 25 and 20 years old, respectively, and there
> have been several revisions of the SQL standard in the meantime, has the
> SQL committee come around to that view (SQL/2003 seems unchanged, I
> couldn't find a later revision online)? And if it has, did it follow
> those definitions or come up with different ones (it seems to me that
> G2-item is much stronger that warranted by the wording in the standard)?

SQL:2016 is the latest, and I don't have it to hand right now, but
SQL:2011 still defined four degrees of isolation in terms of the three
phenomena (pages 133-135 of part II, "Foundations", at least in the
draft copy you can find on the net).

As for what else PostgreSQL should do about this historical confusion,
I suspect that a patch to pgsql-hackers to accept the name SNAPSHOT
ISOLATION as an alternative would be popular, and we could probably
even spin it as the "true" name of the level (while still accepting
REPEATABLE READ; there is no point in breaking that), and maybe add a
second table to the manual's Transaction Isolation section to cover
the later formalisations in a digested user friendly format (if that
is possible).