Thread: incoherent view of serializable transactions

incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

22 December 2008, 13:01:12

As I've understood limitations of the PostgreSQL implementation of
SERIALIZABLE transactions, at least the only example given in the
documentation, revolve around a rather unlikely situation:
Given concurrent transactions T1 and T2 and non-overlapping sets of
data A and B, T1 reads data including A and uses the data to modify B
while T2 reads data including B and uses that data to modify A, where
the modifications performed by either would affect the modifications
made by the other, if they were visible.
For reasons I'll omit here, that scenario didn't worry me for my
current uses of PostgreSQL.
I've found another form of deviation from the standard SERIALIZABLE
behavior, though, which does worry me.  Although the above appears to be
the only situation where the end result after everything commits is
inconsistent with standard SERIALIZABLE behavior, the PostgreSQL
implementation allows transactions to view the data in states which
would never be possible during the application of the transactions in
series in the order they will appear to have been applied after the
commit.
Imagine, as an example, a system which involves recording receipts,
each of which must go into a daily deposit.  There is a control table
with one row containing the current deposit date for receipts. 
Somewhere mid-afternoon that date is updated, all subsequent receipts
fall into the new day, and a report is run listing the receipts for the
day and giving the deposit total.
Under a standard-compliant implementation of SERIALIZABLE, this is
straightforward: a transaction which is inserting a receipt selects the
deposit date to use in its transaction, and any SELECT of receipts for a
date prior to the current deposit date will see the accurate, final
data.  Under the PostgreSQL implementation, although data eventually
gets to a coherent state, there can be a window of time where a SELECT
can return an incomplete list of receipts for a date which appears to be
closed, even if all transactions for modifying and viewing data are
SERIALIZABLE.
-- setup
create table ctl (k text not null primary key, deposit_date date not
null);
insert into ctl values ('receipt', date '2008-12-22');
create table receipt (receipt_no int not null primary key, deposit_date
date not null, amount numeric(13,2));
insert into receipt values (1, (select deposit_date from ctl where k =
'receipt'), 1.00);
insert into receipt values (2, (select deposit_date from ctl where k =
'receipt'), 2.00);

-- connection 1
start transaction isolation level serializable ;
insert into receipt values (3, (select deposit_date from ctl where k =
'receipt'), 4.00);

-- connection 2
start transaction isolation level serializable ;
update ctl set deposit_date = date '2008-12-23' where k = 'receipt';
commit transaction;
start transaction isolation level serializable ;
select * from ctl;
-- (deposit_date shows as 2008-12-23)
select * from receipt;
-- (Only receipts 1 and 2 show for 2008-12-22.)
commit;

-- connection 1
commit transaction;

-- connection 2
start transaction isolation level serializable ;
select * from receipt;
-- (All receipts for the 2008-12-22 deposit date now show.)
commit transaction;
At this point, SERIALIZABLE transactions appear to have worked, with
receipt 3 happening before the update of deposit_date; however, there
was a window of time when the update to deposit_date was visible and
receipt 3 was not.
This absolutely can't happen in a standard-compliant implementation.
At a minimum, this window where visible data lacks coherency should be
noted in the documentation.  I don't know if there's any way to fix
this without killing performance.
-Kevin

Re: incoherent view of serializable transactions

From

Martijn van Oosterhout

Date:

22 December 2008, 13:55:10

On Mon, Dec 22, 2008 at 11:00:53AM -0600, Kevin Grittner wrote:
> As I've understood limitations of the PostgreSQL implementation of
> SERIALIZABLE transactions, at least the only example given in the
> documentation, revolve around a rather unlikely situation:
>
> Given concurrent transactions T1 and T2 and non-overlapping sets of
> data A and B, T1 reads data including A and uses the data to modify B
> while T2 reads data including B and uses that data to modify A, where
> the modifications performed by either would affect the modifications
> made by the other, if they were visible.

In so far as the "modifications" are just INSERTs (no UPDATEs or
DELETEs), yes. This case is covered in the documentation.

> Imagine, as an example, a system which involves recording receipts,
> each of which must go into a daily deposit.  There is a control table
> with one row containing the current deposit date for receipts.
> Somewhere mid-afternoon that date is updated, all subsequent receipts
> fall into the new day, and a report is run listing the receipts for the
> day and giving the deposit total.

This is a variation of the above and has the same "proper" solution:
predicate locking. However, in this case the records in question are
already present so you can workaround it easily. First do a SELECT FOR
UPDATE on all the records you want to update. This will serialize all
parallel transactions to either before or after you. Then do your
update.

> This absolutely can't happen in a standard-compliant implementation.
> At a minimum, this window where visible data lacks coherency should be
> noted in the documentation.  I don't know if there's any way to fix
> this without killing performance.

Predicate locking is nasty and we don't try. I'm not sure if anybody
else does.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

22 December 2008, 14:38:08

>>> Martijn van Oosterhout <kleptog@svana.org> wrote:
> On Mon, Dec 22, 2008 at 11:00:53AM -0600, Kevin Grittner wrote:
>> As I've understood limitations of the PostgreSQL implementation of
>> SERIALIZABLE transactions, at least the only example given in the
>> documentation, revolve around a rather unlikely situation:
>>  
>> Given concurrent transactions T1 and T2 and non-overlapping sets of
>> data A and B, T1 reads data including A and uses the data to modify
B
>> while T2 reads data including B and uses that data to modify A,
where
>> the modifications performed by either would affect the
modifications
>> made by the other, if they were visible.
>
> In so far as the "modifications" are just INSERTs (no UPDATEs or
> DELETEs), yes. This case is covered in the documentation.
Let's not understate the scope of the issue.
UPDATEs and DELETEs count, too.  For example:
-- connection 1
cir=> create table mytab (class int not null, value int not null);
CREATE TABLE
cir=> copy mytab from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1    10
>> 1    20
>> 2    100
>> 2    200
>> \.
cir=> start transaction isolation level serializable;
START TRANSACTION
cir=> update mytab set value = (select sum(value) from mytab where
class = 2) where class = 1 and value = 10;
UPDATE 1
-- connection 2
cir=> start transaction isolation level serializable;
START TRANSACTION
cir=> update mytab set value = (select sum(value) from mytab where
class = 1) where class = 2 and value = 100;
UPDATE 1
cir=> commit transaction;
COMMIT
-- connection 1
cir=> commit transaction;
COMMIT
cir=> select * from mytab;class | value
-------+-------    1 |    20    2 |   200    1 |   300    2 |    30
(4 rows)
>> Imagine, as an example, a system which involves recording receipts,
>> each of which must go into a daily deposit.  There is a control
table
>> with one row containing the current deposit date for receipts. 
>> Somewhere mid-afternoon that date is updated, all subsequent
receipts
>> fall into the new day, and a report is run listing the receipts for
the
>> day and giving the deposit total.
>
> This is a variation of the above and has the same "proper" solution:
> predicate locking. However, in this case the records in question are
> already present so you can workaround it easily. First do a SELECT
FOR
> UPDATE on all the records you want to update. This will serialize
all
> parallel transactions to either before or after you. Then do your
> update.
My point isn't that there aren't workarounds, it is that people might
reasonably assume that SERIALIZABLE transactions provide sufficient
concurrency control for this, since the only example we give of a
problem is a rather contrived update anomaly.  The fact that even in
cases where the data settles into good form at commit leave windows
where race conditions could cause occasional bad results without extra
explicit locking is not obvious.
>> This absolutely can't happen in a standard-compliant
implementation.
>> At a minimum, this window where visible data lacks coherency should
be
>> noted in the documentation.  I don't know if there's any way to fix
>> this without killing performance.
>
> Predicate locking is nasty and we don't try. I'm not sure if anybody
> else does.
I know for a fact that Sybase ASE does.  I've heard from reasonably
reliable sources that DB2 does.  I know that Microsoft SQL Server did
for some time after the split from the Sybase code base, but I'm not
sure they've continued that; in fact there was a reference to
concurrency issues in wikipedia which implied that they no longer do.
The implementation is not pretty -- they do it by locking accessed
pages and insertion points, and in some cases entire tables.  It is so
ugly that at one point in discussing similar issues Tom said that it
couldn't really qualify as predicate locking, but in the face of the
fact that they have covered all the bases to provide true serializable
transactions, and that theory says that only predicate locking can do
that, he conceded that it was predicate locking -- but really ugly and
in a form he would never support.
Anyway, I didn't argue that we should provide truly serializable
transactions, just that we should provide a less contrived example of
where the PostgreSQL implementation can show anomalies, so that people
don't get burned through a false sense of security.
-Kevin

Re: incoherent view of serializable transactions

From

Tom Lane

Date:

23 December 2008, 00:44:22

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> At this point, SERIALIZABLE transactions appear to have worked, with
> receipt 3 happening before the update of deposit_date; however, there
> was a window of time when the update to deposit_date was visible and
> receipt 3 was not.
> This absolutely can't happen in a standard-compliant implementation.

I think you mean "you'd like to believe that can't happen in a
standard-compliant implementation".  It doesn't include any of the
specific behaviors that are forbidden by the spec, though, so I'm less
than convinced.

An appropriate way to prevent the problem is probably for the
transaction that changes the deposit_date to take out a write-excluding
lock on the receipts table before it does so.
        regards, tom lane

Re: incoherent view of serializable transactions

From

Emmanuel Cecchet

Date:

23 December 2008, 01:42:11

Kevin,

If you want to know how to build SERIALIZABLE with a database that 
provides SI (Snapshot Isolation), read 
http://portal.acm.org/citation.cfm?doid=1376616.137669
Note that in practice, READ COMMITTED is the most largely used isolation 
level and its limitations are relatively well understood by the average 
programmer that can program his application accordingly. I still don't 
get why people would use SERIALIZABLE since there is no efficient 
implementation of it.

My 2 cents.
Emmanuel


Kevin Grittner wrote:
> As I've understood limitations of the PostgreSQL implementation of
> SERIALIZABLE transactions, at least the only example given in the
> documentation, revolve around a rather unlikely situation:
>  
> Given concurrent transactions T1 and T2 and non-overlapping sets of
> data A and B, T1 reads data including A and uses the data to modify B
> while T2 reads data including B and uses that data to modify A, where
> the modifications performed by either would affect the modifications
> made by the other, if they were visible.
>  
> For reasons I'll omit here, that scenario didn't worry me for my
> current uses of PostgreSQL.
>  
> I've found another form of deviation from the standard SERIALIZABLE
> behavior, though, which does worry me.  Although the above appears to be
> the only situation where the end result after everything commits is
> inconsistent with standard SERIALIZABLE behavior, the PostgreSQL
> implementation allows transactions to view the data in states which
> would never be possible during the application of the transactions in
> series in the order they will appear to have been applied after the
> commit.
>  
> Imagine, as an example, a system which involves recording receipts,
> each of which must go into a daily deposit.  There is a control table
> with one row containing the current deposit date for receipts. 
> Somewhere mid-afternoon that date is updated, all subsequent receipts
> fall into the new day, and a report is run listing the receipts for the
> day and giving the deposit total.
>  
> Under a standard-compliant implementation of SERIALIZABLE, this is
> straightforward: a transaction which is inserting a receipt selects the
> deposit date to use in its transaction, and any SELECT of receipts for a
> date prior to the current deposit date will see the accurate, final
> data.  Under the PostgreSQL implementation, although data eventually
> gets to a coherent state, there can be a window of time where a SELECT
> can return an incomplete list of receipts for a date which appears to be
> closed, even if all transactions for modifying and viewing data are
> SERIALIZABLE.
>  
> -- setup
> create table ctl (k text not null primary key, deposit_date date not
> null);
> insert into ctl values ('receipt', date '2008-12-22');
> create table receipt (receipt_no int not null primary key, deposit_date
> date not null, amount numeric(13,2));
> insert into receipt values (1, (select deposit_date from ctl where k =
> 'receipt'), 1.00);
> insert into receipt values (2, (select deposit_date from ctl where k =
> 'receipt'), 2.00);
>
> -- connection 1
> start transaction isolation level serializable ;
> insert into receipt values (3, (select deposit_date from ctl where k =
> 'receipt'), 4.00);
>
> -- connection 2
> start transaction isolation level serializable ;
> update ctl set deposit_date = date '2008-12-23' where k = 'receipt';
> commit transaction;
> start transaction isolation level serializable ;
> select * from ctl;
> -- (deposit_date shows as 2008-12-23)
> select * from receipt;
> -- (Only receipts 1 and 2 show for 2008-12-22.)
> commit;
>
> -- connection 1
> commit transaction;
>
> -- connection 2
> start transaction isolation level serializable ;
> select * from receipt;
> -- (All receipts for the 2008-12-22 deposit date now show.)
> commit transaction;
>  
> At this point, SERIALIZABLE transactions appear to have worked, with
> receipt 3 happening before the update of deposit_date; however, there
> was a window of time when the update to deposit_date was visible and
> receipt 3 was not.
>  
> This absolutely can't happen in a standard-compliant implementation.
> At a minimum, this window where visible data lacks coherency should be
> noted in the documentation.  I don't know if there's any way to fix
> this without killing performance.
>  
> -Kevin
>
>   


-- 
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu@frogthinker.org
Skype: emmanuel_cecchet

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

23 December 2008, 10:51:14

>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> At this point, SERIALIZABLE transactions appear to have worked,
with
>> receipt 3 happening before the update of deposit_date; however,
there
>> was a window of time when the update to deposit_date was visible
and
>> receipt 3 was not.
>  
>> This absolutely can't happen in a standard-compliant
implementation.
> 
> I think you mean "you'd like to believe that can't happen in a
> standard-compliant implementation".
"The execution of concurrent SQL-transactions at isolation level
SERIALIZABLE is guaranteed to be serializable. A serializable
execution is defined to be an execution of the operations of
concurrently executing SQL-transactions that produces the same
effect as some serial execution of those same SQL-transactions. A
serial execution is one in which each SQL-transaction executes to
completion before the next SQL-transaction begins."
If you look at the serializable queries in the original post for this
thread, it's not hard to see that this standard is not met.  The
insert of receipt 3 appears to happen before the update of the control
record, since it has the old deposit date.  The transaction which
selects from both tables sees the update to the control record, so it
must come after that.  Yet it doesn't see the results of the first
transaction.  There is no sequence of serial execution which is
consistent with the behavior.
Perhaps you are suggesting that it's not possible in a practical sense
for a DBMS to meet this standard?  See below.
> It doesn't include any of the
> specific behaviors that are forbidden by the spec, though, so I'm
less
> than convinced.
Following the diagram of specific behaviors allowed at each
level is good evidence that these phenomena don't *define*
serializable:
"NOTE 53 * The exclusion of these phenomena for SQL-transactions
executing at isolation level SERIALIZABLE is a consequence of the
requirement that such transactions be serializable."
> An appropriate way to prevent the problem is probably for the
> transaction that changes the deposit_date to take out a
write-excluding
> lock on the receipts table before it does so.
Well, a serializable transaction operating to standard would probably
take out a write-excluding lock on the control table row when it is
read.  This would block the update to the control table until
completion of any pending receipts on the old date.  The blocked
update would then take out a read-excluding lock before updating the
row, which would then block receipt transactions looking to check the
date.  As soon as the update to the control table is committed, you
can see the new date, and you can have confidence that the old date's
receipts will all show on a SELECT.
Again, I'm not suggesting that we change the behavior of PostgreSQL to
match this, but that we document the difference so that those looking
to switch to PostgreSQL from databases which provide true serializable
transactions know that they have to explicitly lock rows to maintain a
coherent view of the data.  In the queries shown in the original post,
the INSERT of the receipts could read the deposit date FOR SHARE to
provide the equivalent functionality, although they would have to be
rejiggered a bit, since that isn't allowed in subqueries.
This isn't some hypothetical "maybe some day some product might
implement this, but it'll never catch on" sort of thing -- Microsoft
and Sybase SQL Server had this from version 1.  I used it from 1990
until the conversion to PostgreSQL over the last couple years. 
Serializable transactions worked as advertised.  Was there more
blocking than PostgreSQL?  Yes.  Were there more deadlocks than
PostgreSQL?  Yes.  Did it impact performance?  Well, PostgreSQL is
faster by enough that users commented that the applications seemed
"snappier" when we switched the database underneath them from Sybase
to PostgreSQL.
I'm going on second-hand information here, but I'm told that IBM DB2
has used similar techniques to provide true serializable transactions
for even longer.
I'm somewhat mystified at the reaction this topic gets here.  :-/
-Kevin

Re: incoherent view of serializable transactions

From

Emmanuel Cecchet

Date:

23 December 2008, 10:59:38

Kevin Grittner wrote:
> This isn't some hypothetical "maybe some day some product might
> implement this, but it'll never catch on" sort of thing -- Microsoft
> and Sybase SQL Server had this from version 1.  I used it from 1990
> until the conversion to PostgreSQL over the last couple years. 
>   
Have you ever used serializable transactions with Sybase? The locking is 
actually based on memory-pages and you end-up with deadlocks if you 
don't pad your data structures to prevent false sharing. Oracle also 
provides SI like Postgres and I don't think they are doing that bad.
> I'm going on second-hand information here, but I'm told that IBM DB2
> has used similar techniques to provide true serializable transactions
> for even longer.
>  
> I'm somewhat mystified at the reaction this topic gets here.  :-
I am somewhat mystified by the interest some people still have in 
serializable transactions. Why don't users program the application to 
deal with a lower isolation (actually I think they do)?

But I am probably missing the point which was to fix the doc?
Emmanuel

-- 
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu@frogthinker.org
Skype: emmanuel_cecchet

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

23 December 2008, 11:14:47

>>> Emmanuel Cecchet <manu@frogthinker.org> 12/23/08 8:59 AM >>> 
> Kevin Grittner wrote:
>> This isn't some hypothetical "maybe some day some product might
>> implement this, but it'll never catch on" sort of thing --
Microsoft
>> and Sybase SQL Server had this from version 1.  I used it from 1990
>> until the conversion to PostgreSQL over the last couple years. 
>
> Have you ever used serializable transactions with Sybase?
Every day for over 15 years.
> The locking is 
> actually based on memory-pages and you end-up with deadlocks if you 
> don't pad your data structures to prevent false sharing.
We only padded one table, which we were using to assign sequence
numbers.  Eventually they did come out with support for row level
locking, which could be chosen on a table by table basis, which
eliminated the need for the padding columns.  We didn't go to
row-level locking for all tables because the performance hit was
significant -- worse than the blocking.  Blocking was occasionally an
issue.  Deadlocks were initially a problem, but they can be handled
automatically with resubmit, which we eventually covered gracefully,
and they weren't an issue for us after that.
> Oracle also 
> provides SI like Postgres and I don't think they are doing that bad.
I don't quire understand.  Could you clarify? 
>> I'm going on second-hand information here, but I'm told that IBM
DB2
>> has used similar techniques to provide true serializable
transactions
>> for even longer.
>>
>> I'm somewhat mystified at the reaction this topic gets here.  :-
> I am somewhat mystified by the interest some people still have in 
> serializable transactions. Why don't users program the application to

> deal with a lower isolation (actually I think they do)?
There really are good reasons.  I'm not up to going through that now,
but if there is genuine interest in the topic perhaps I can follow up
later.
> But I am probably missing the point which was to fix the doc?
Thank you!
-Kevin

Re: incoherent view of serializable transactions

From

Heikki Linnakangas

Date:

23 December 2008, 11:48:22

Kevin Grittner wrote:
>>>> Emmanuel Cecchet <manu@frogthinker.org> 12/23/08 8:59 AM >>> 
>> I am somewhat mystified by the interest some people still have in 
>> serializable transactions. Why don't users program the application to
> 
>> deal with a lower isolation (actually I think they do)?
>  
> There really are good reasons.  I'm not up to going through that now,
> but if there is genuine interest in the topic perhaps I can follow up
> later.

Well, the reason why people rely on isolation provided by database in 
general is to make it easier to develop applications. One less thing to 
worry about. That's why people use RDBMS, transactions, etc. to begin with.

>> But I am probably missing the point which was to fix the doc?
>  
> Thank you!

If you have a concrete suggestion (= patch) for the documentation, I'm 
all ears.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

23 December 2008, 12:10:36

>>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> 12/23/08
9:47 AM >>> 
> Kevin Grittner wrote:
>>>>> Emmanuel Cecchet <manu@frogthinker.org> 12/23/08 8:59 AM >>> 
>>> Why don't users program the application to
>>> deal with a lower isolation (actually I think they do)?
>>  
>> There really are good reasons.  I'm not up to going through that
now,
>> but if there is genuine interest in the topic perhaps I can follow
up
>> later.
> 
> Well, the reason why people rely on isolation provided by database in

> general is to make it easier to develop applications. One less thing
to 
> worry about. That's why people use RDBMS, transactions, etc. to begin
with.
This is the heart of it.  The argument is getting made that you don't
need serializable transactions because you can duplicate the
concurrency control with a combination of read_committed transactions
and explicit lock requests.  I could also duplicate the functionality
of a DBMS with a bunch of disk files and custom, hard-coded access
logic, but I sure wouldn't want to do so.
> If you have a concrete suggestion (= patch) for the documentation,
I'm 
> all ears.
Well, I figured I should try to get a consensus here before submitting
a patch.  Last time I tried submitting a simple patch to remove the
line about the PostgreSQL community not knowing about any other
databases which use predicate locking, I got shot down hard.
Does the phantom receipt example from the original post seem like a
reasonable thing to include in the doc patch?  Does someone have a
more comprehensive summary of where the issues are than I've been able
to generate so far?
The patch should also include information on workarounds.  I'd be
inclined to frame them in terms of how to duplicate the behavior of
serializable transactions on other databases from which the reader
might be converting, which would start with using SELECT FOR SHARE for
any SELECT within a serializable transaction.  That doesn't cover
everything, because, unlike Sybase, et al, it doesn't block
conflicting INSERTs.  I'm not sure I have a firm conceptual grasp on
when and how that renders the SELECT FOR SHARE technique incomplete,
but I could try to think it through.  If someone already has
comprehensive guidelines on that, it would save time and might improve
the quality of the docs.
-Kevin

Re: incoherent view of serializable transactions

From

Gregory Stark

Date:

23 December 2008, 12:12:18

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

>>>> Emmanuel Cecchet <manu@frogthinker.org> 12/23/08 8:59 AM >>> 
>> Have you ever used serializable transactions with Sybase?
>  
> Every day for over 15 years.

Afaict doing a few google searches Sybase doesn't do predicate locking either.
It would very much surprise me if they did since they've always had the most
primitive locking infrastructure of the three major databases. Locking records
or pages isn't going to provide true standards-compliant serializable
transactions in the way you're describing.

Predicate locking means being able to lock records which don't actually exist
yet. Ie, locking all records "WHERE COLUMN=0" even if there are no such
records. This has to block any other transaction from inserting such a record
or updating another record to set COLUMN to 0.

>> Oracle also provides SI like Postgres and I don't think they are doing that
>> bad.
>  
> I don't quire understand.  Could you clarify?

The point is Oracle doesn't provide this kind of true serializable isolation
and people still find it useful. In fact Sybase and DB2 also don't provide
true serializable transactions -- nobody does. It's a fantasy.

> There really are good reasons.  I'm not up to going through that now,
> but if there is genuine interest in the topic perhaps I can follow up
> later.

I suppose I'm curious whether you're mistaken and your app isn't safe on
Sybase because it's depending on truly serializable transactions and Sybase
isn't doing that, or if you have examples of transactions which Sybase
provides proper serialized semantics for but Postgres doesn't.

>> But I am probably missing the point which was to fix the doc?

But missing the point and having pointless arguments is so much more fun than
documentation writing :)

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

23 December 2008, 12:28:11

>>> Gregory Stark <stark@enterprisedb.com> wrote: 
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>>>>> Emmanuel Cecchet <manu@frogthinker.org> 12/23/08 8:59 AM >>> 
>>> Have you ever used serializable transactions with Sybase?
>>  
>> Every day for over 15 years.
> 
> Afaict doing a few google searches Sybase doesn't do predicate
locking 
> either.
> It would very much surprise me if they did since they've always had
the most
> primitive locking infrastructure of the three major databases.
Locking 
> records
> or pages isn't going to provide true standards-compliant
serializable
> transactions in the way you're describing.
> 
> Predicate locking means being able to lock records which don't
actually 
> exist
> yet. Ie, locking all records "WHERE COLUMN=0" even if there are no
such
> records. This has to block any other transaction from inserting such
a 
> record
> or updating another record to set COLUMN to 0.
The page locking provides this because every index page or data page
the serializable transaction looks at is locked against updates until
the end of the transaction.  If it can see all the COLUMN=0 rows
through an index, the index locks protect the transaction.  If a table
scan is required, the entire table is locked against all
modifications.  (That's right, it is not unusual to have entire tables
locked against any modification until the end of a database
transaction.)
>>> Oracle also provides SI like Postgres and I don't think they are
doing that
>>> bad.
>>  
>> I don't quire understand.  Could you clarify?
> 
> The point is Oracle doesn't provide this kind of true serializable
isolation
> and people still find it useful.
Sure, and I find PostgreSQL useful.  I'm not proposing to change it.
> In fact Sybase and DB2 also don't provide
> true serializable transactions -- nobody does. It's a fantasy.
They do.  They have for over 15 years.  If people will read it, I'll
try to find the a web page where they give all the details of the
strategy.
>> There really are good reasons.  I'm not up to going through that
now,
>> but if there is genuine interest in the topic perhaps I can follow
up
>> later.
> 
> I suppose I'm curious whether you're mistaken and your app isn't safe
on
> Sybase because it's depending on truly serializable transactions and
Sybase
> isn't doing that, or if you have examples of transactions which
Sybase
> provides proper serialized semantics for but Postgres doesn't.
All the examples provided in this thread would be handled by Sybase
with proper serializable semantics.  When I proposed changing the docs
to omit the reference to our lack of knowledge about other database
products, there was a full example of code that didn't serialize
according to the mathematical definition.  I cut and pasted into
Sybase and provided the results -- a deadlock.
Can you provide any example or logical explanation of where the
technique I outline above (locking against modification for every
index and data page read during the transaction until the end of the
transaction) would NOT provide true serializable behavior?  (Keep in
mind that that's the broad stroke overview -- the full details include
various lock escalation techniques, etc.)
>>> But I am probably missing the point which was to fix the doc?
> 
> But missing the point and having pointless arguments is so much more
fun 
> than
> documentation writing :)
Frankly, I prefer other sports.   :-(
-Kevin

Re: incoherent view of serializable transactions

From

Gregory Stark

Date:

23 December 2008, 13:04:20

Reading the spec it does seem our initial statement "The SQL standard defines
four levels of transaction isolation in terms of three phenomena that must be
prevented between concurrent transactions" is not accurate.

The spec defines the first three modes in terms of P1,P2,P3 but serializable
is defined, as Kevin describes, as literally serializable. It is included in
the table but with the note:
 NOTE 71 — The exclusion of these phenomena for SQL-transactions executing at isolation level SERIALIZABLE is a
consequenceof the requirement that such transactions be serializable. 

So it's clear that that's not the definition. Excluding P1,P2,P3 is necessary
but not sufficient to meet the spec's definition of Serializable.

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

>>>> Gregory Stark <stark@enterprisedb.com> wrote:
>> Afaict doing a few google searches Sybase doesn't do predicate locking >
>> either.
>
> The page locking provides this because every index page or data page
> the serializable transaction looks at is locked against updates until
> the end of the transaction.  If it can see all the COLUMN=0 rows
> through an index, the index locks protect the transaction.  If a table
> scan is required, the entire table is locked against all
> modifications.  (That's right, it is not unusual to have entire tables
> locked against any modification until the end of a database
> transaction.)

Ah, so they don't actually use the term predicate locking which is why my
google-fu was inadequate. I'm not sure if this is technically "predicate
locking" or not. It sounds like it locks a whole lot more than just the
predicate.

> All the examples provided in this thread would be handled by Sybase
> with proper serializable semantics.  When I proposed changing the docs
> to omit the reference to our lack of knowledge about other database
> products, there was a full example of code that didn't serialize
> according to the mathematical definition.  I cut and pasted into
> Sybase and provided the results -- a deadlock.
>
> Can you provide any example or logical explanation of where the
> technique I outline above (locking against modification for every
> index and data page read during the transaction until the end of the
> transaction) would NOT provide true serializable behavior?  (Keep in
> mind that that's the broad stroke overview -- the full details include
> various lock escalation techniques, etc.)

I imagine they preemptively escalate to locking the table if you're going to
do a sequential scan? Otherwise an inserter might insert on a page you haven't
read yet (and therefore haven't locked yet)?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!

Re: incoherent view of serializable transactions

From

Stephan Szabo

Date:

23 December 2008, 13:15:43

On Tue, 23 Dec 2008, Kevin Grittner wrote:

> The page locking provides this because every index page or data page
> the serializable transaction looks at is locked against updates until
> the end of the transaction.  If it can see all the COLUMN=0 rows
> through an index, the index locks protect the transaction.  If a table
> scan is required, the entire table is locked against all
> modifications.  (That's right, it is not unusual to have entire tables
> locked against any modification until the end of a database
> transaction.)

Well, predicate locking for serializable also should only lock the
appropriate conditions. Getting a deadlock between two serializable
transactions for conditions that can be serialized would seemingly also be
disallowed by the definition of serializable since there would exist no
serial ordering of the transactions that has that effect.

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

23 December 2008, 13:25:13

>>> Gregory Stark <stark@enterprisedb.com> wrote: 
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>>>>> Gregory Stark <stark@enterprisedb.com> wrote: 
>>> Afaict doing a few google searches Sybase doesn't do predicate
locking >
>>> either.
>>  
>> The page locking provides this because every index page or data
page
>> the serializable transaction looks at is locked against updates
until
>> the end of the transaction.  If it can see all the COLUMN=0 rows
>> through an index, the index locks protect the transaction.  If a
table
>> scan is required, the entire table is locked against all
>> modifications.  (That's right, it is not unusual to have entire
tables
>> locked against any modification until the end of a database
>> transaction.)
> 
> Ah, so they don't actually use the term predicate locking which is
why my
> google-fu was inadequate. I'm not sure if this is technically
"predicate
> locking" or not. It sounds like it locks a whole lot more than just
the
> predicate.
Well, I'm not sure whether it is or not; it's a matter of definition. 
If predicate locking is required for true serializable transactions,
and this provides true serializable transactions, it must be, eh? 
Also, an argument could be made that if it locks every page which must
be viewed for execution based on the search predicates, it is doing
predicate locking -- if only indirectly.
>> All the examples provided in this thread would be handled by Sybase
>> with proper serializable semantics.  When I proposed changing the
docs
>> to omit the reference to our lack of knowledge about other database
>> products, there was a full example of code that didn't serialize
>> according to the mathematical definition.  I cut and pasted into
>> Sybase and provided the results -- a deadlock.
>>  
>> Can you provide any example or logical explanation of where the
>> technique I outline above (locking against modification for every
>> index and data page read during the transaction until the end of
the
>> transaction) would NOT provide true serializable behavior?  (Keep
in
>> mind that that's the broad stroke overview -- the full details
include
>> various lock escalation techniques, etc.)
> 
> I imagine they preemptively escalate to locking the table if you're
going to
> do a sequential scan? Otherwise an inserter might insert on a page
you 
> haven't
> read yet (and therefore haven't locked yet)?
I believe they do go straight to the table lock for a table scan, but
it isn't necessary for full semantics that the transaction lock all
pages in advance.  For most purposes the serializable transaction can
proceed to lock pages as it gets to them.  It will block or deadlock
if a conflict arises.  The transaction may serialize behind a
transaction which started later and read some page it hadn't gotten to
yet, but that doesn't violate the spec or cause any anomalies.  The
key phrase in the spec here is "produces the same effect as *some*
serial execution" [emphasis added].
It will escalate from page locks to a table lock if a (configurable)
number or percentage of a table's pages get locked.
-Kevin

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

23 December 2008, 13:58:37

>>> Stephan Szabo <sszabo@megazone.bigpanda.com> wrote: 
> On Tue, 23 Dec 2008, Kevin Grittner wrote:
> 
>> The page locking provides this because every index page or data
page
>> the serializable transaction looks at is locked against updates
until
>> the end of the transaction.  If it can see all the COLUMN=0 rows
>> through an index, the index locks protect the transaction.  If a
table
>> scan is required, the entire table is locked against all
>> modifications.  (That's right, it is not unusual to have entire
tables
>> locked against any modification until the end of a database
>> transaction.)
> 
> Well, predicate locking for serializable also should only lock the
> appropriate conditions. Getting a deadlock between two serializable
> transactions for conditions that can be serialized would seemingly
also be
> disallowed by the definition of serializable since there would exist
no
> serial ordering of the transactions that has that effect.
Clever.  If a DBMS is capable of reporting that it was unable to
serialize a transaction in a situation where there is a logical
possibility that a different implementation might have succeeded, it
hasn't implemented true serializable transactions?  That strikes me as
being on a level with Codd's assertion that any database which takes
two different statements which can be proven to be logically identical
and runs one with a different plan than the other should not be called
relational.  (i.e., possibly true from a technical perspective, but
hardly useful in the real world.)
Perhaps it would be best to accept this as proof that there is no
current DBMS implementing true serializable transactions and move on
to the issue of documenting real difference between PostgreSQL other
products which come closer to compliance, so that real people
converting from those products don't suffer real production problems. 
As I see it, that's what matters.
-Kevin

Re: incoherent view of serializable transactions

From

Jeff Davis

Date:

23 December 2008, 14:49:00

On Tue, 2008-12-23 at 00:42 -0500, Emmanuel Cecchet wrote:
> I still don't 
> get why people would use SERIALIZABLE since there is no efficient 
> implementation of it.

If they need true serializable transactions, and don't care about
efficiency.

Regards,Jeff Davis

Re: incoherent view of serializable transactions

From

"Robert Haas"

Date:

23 December 2008, 16:46:48

> The page locking provides this because every index page or data page
> the serializable transaction looks at is locked against updates until
> the end of the transaction.  If it can see all the COLUMN=0 rows
> through an index, the index locks protect the transaction.  If a table
> scan is required, the entire table is locked against all
> modifications.  (That's right, it is not unusual to have entire tables
> locked against any modification until the end of a database
> transaction.)

You've mentioned a couple of times that you're surprised by the
reaction of the community to your proposed documentation changes.  I
have (a more muted version of) the same reaction as several previous
posters, and I think in the context of this paragraph I can explain
why.

If we were to construct a database that had one giant lock for the
entire database so that only a single query could execute at one time,
transactions would be serializable (because they'd in fact be
serialized).  However, performance would suck.  Therefore, database
developers and researchers have spent the last twenty (or more?) years
trying to come up with ways to allow multiple transactions to execute
in parallel while maintaining the appearance of serialization.
They've done this by introducing lower level locks: table-level,
page-level, row-level.  For most users, the artifacts that have been
introduced by these fine-grained locks are outweighed by the
performance benefits of greater concurrency - but, as you point out,
not necessarily always.

I don't think I can overstate the extent to which fine-grained locking
is viewed as a good thing.  It wouldn't surprise me to find out that
the locking behavior that you're relying on in Sybase is one which
some group of Sybase developers (if there still are any) are busily
laboring to eliminate.  I think you can see this reflected in Greg
Stark's comment as well: "It would very much surprise me if they did
since they've always had the most primitive locking infrastructure of
the three major databases."

With respect to predicate locking, what you're describing is NOT
predicate locking.  It's coarse-grained locking that largely or
completely obviates the need for predicate locking by greatly reducing
concurrency.  Now, from the point of view of the application developer
who needs very strong serializability guarantees but doesn't care
about concurrency, that's six of one, half a dozen of the other, but
to me that's the opposite of the typical situation.  Maybe our
documentation could say something along the lines of "PostgreSQL's
MVCC framework and row-level locking permit a greater degree of
concurrency than some other databases.  Even when the transaction
isolation level is set to serializable, serialization anomalies can
occur in the following situations.  When it is important to prevent
these anomalies, explicit row-level or table-level locking can be used
at the expense of reduced concurrency."

...Robert

Re: incoherent view of serializable transactions

From

Simon Riggs

Date:

23 December 2008, 16:55:37

On Tue, 2008-12-23 at 10:10 -0600, Kevin Grittner wrote:

> Well, I figured I should try to get a consensus here before submitting
> a patch.  Last time I tried submitting a simple patch to remove the
> line about the PostgreSQL community not knowing about any other
> databases which use predicate locking, I got shot down hard.

The docs got changed though.

I think the current docs make too much of a deal about how hard it is to
do predicate locking in databases. Most RDBMS use predicate locking via
indexes, ie the locking happens in the index. One might also argue that
it is potentially more efficient design, as TPC-C shows, though such
cases of application scalability are rare in the extreme and the utility
of MVCC is by far the best general approach in terms of ease of use and
performance.

The example in the docs is not a realistic example, so your new one is
useful.

I would want you to update it though to show how use of row level locks
can be used to enforce correct behaviour when required, so provide a
problem and its solution. It will b useful for people moving from
systems like Sybase that use locking often fall foul of the *lack* of
locking in MVCC and write programs that won't work correctly as a
result.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

23 December 2008, 17:06:16

>>> "Robert Haas" <robertmhaas@gmail.com> wrote: 
> For most users, the artifacts that have been
> introduced by these fine-grained locks are outweighed by the
> performance benefits of greater concurrency - but, as you point out,
> not necessarily always.
That's what I don't understand.  I never did point that out.  I never
suggested it.  I never requested a change to the software.  I just
suggested that we document the artifacts.
Ah, well; thanks for the feedback.
> With respect to predicate locking, what you're describing is NOT
> predicate locking.
I never would have considered it to be such except for some people
insisting that a true serializable implementation is impossible
without predicate locking.  Either that assertion is false or this is
a form of predicate locking, crude as it might be.
> Maybe our
> documentation could say something along the lines of "PostgreSQL's
> MVCC framework and row-level locking permit a greater degree of
> concurrency than some other databases.  Even when the transaction
> isolation level is set to serializable, serialization anomalies can
> occur in the following situations.  When it is important to prevent
> these anomalies, explicit row-level or table-level locking can be
used
> at the expense of reduced concurrency."
That's responsive to my concern and nicely worded.  Thanks.
-Kevin

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

23 December 2008, 17:11:13

>>> Simon Riggs <simon@2ndQuadrant.com> wrote: 

> On Tue, 2008-12-23 at 10:10 -0600, Kevin Grittner wrote:
> 
> I think the current docs make too much of a deal about how hard it is
to
> do predicate locking in databases. Most RDBMS use predicate locking
via
> indexes, ie the locking happens in the index. One might also argue
that
> it is potentially more efficient design, as TPC-C shows, though such
> cases of application scalability are rare in the extreme and the
utility
> of MVCC is by far the best general approach in terms of ease of use
and
> performance.
> 
> The example in the docs is not a realistic example, so your new one
is
> useful.
The one currently in the docs is the simplest case I can see of an
anomaly which leaves the database in the wrong state after all the
commits go through.  I think it should probably be kept for that
reason.  The out-of-sequence appearance of changes before all commits
happen seems much more likely to bite people, though.  Agreed?
> I would want you to update it though to show how use of row level
locks
> can be used to enforce correct behaviour when required, so provide a
> problem and its solution. It will b useful for people moving from
> systems like Sybase that use locking often fall foul of the *lack*
of
> locking in MVCC and write programs that won't work correctly as a
> result.
Understood and agreed.
-Kevin

Re: incoherent view of serializable transactions

From

Emmanuel Cecchet

Date:

23 December 2008, 19:44:59

Jeff Davis wrote:
> On Tue, 2008-12-23 at 00:42 -0500, Emmanuel Cecchet wrote:
>   
>> I still don't 
>> get why people would use SERIALIZABLE since there is no efficient 
>> implementation of it.
>>     
>
> If they need true serializable transactions, and don't care about
> efficiency.
>   
Is there such people who really understand the thing and don't complain 
about the performance? ;-)

Happy holidays,
Emmanuel

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

24 December 2008, 16:06:27

>>> Emmanuel Cecchet <manu@frogthinker.org> 12/22/08 11:42 PM >>>
> If you want to know how to build SERIALIZABLE with a database that 
> provides SI (Snapshot Isolation), read 
> http://portal.acm.org/citation.cfm?doid=1376616.137669
The link didn't seem to work for me, but I think I found the article
you meant:  "Serializable Isolation for Snapshot Databases"
by Michael J. Cahill, et al

An interesting work.  If nothing else, it will help flesh out the
documentation of anomalies.  If the PostgreSQL community ever
does want to better approach true serializable behavior, this
should provide a good theoretical basis.
Thanks very much for the cite.
> I still don't 
> get why people would use SERIALIZABLE since there is no efficient 
> implementation of it.
Read the last paragraph of the secion 1 (Introduction)
of the article you just cited.
-Kevin

Re: incoherent view of serializable transactions

From

Emmanuel Cecchet

Date:

24 December 2008, 16:43:51

Hi Kevin,

> The link didn't seem to work for me, but I think I found the article
> you meant:  "Serializable Isolation for Snapshot Databases"
> by Michael J. Cahill, et al
>
> An interesting work.  If nothing else, it will help flesh out the
> documentation of anomalies.  If the PostgreSQL community ever
> does want to better approach true serializable behavior, this
> should provide a good theoretical basis.
>   
Sorry for the broken link. Yes this is the paper.
Note that the paper was not necessarily enthusiastically received by the 
community when presented at the conference. While this is an interesting 
academic paper, it's practicality left a majority of the audience 
perplex. There was an interesting comment by Oracle folks: Oracle does 
not provide serializability but only snapshot isolation, and still users 
prefer to 'downgrade' to read committed for better performance. The 
Oracle guys experience seemed to indicate that there was no need for 
serializability (well, that's also less work for them ;-)) in their 
customer base.
Having both a foot in academia and in industry, I understand the 
intellectual interest for serializability on the academic side, but I 
still have not seen a single use case in industry where this was a 
requirement (but my db experience is probably narrow).

Have nice serializable holidays ;-)
manu

-- 
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu@frogthinker.org
Skype: emmanuel_cecchet

Re: incoherent view of serializable transactions

From

Ron Mayer

Date:

24 December 2008, 19:46:22

Robert Haas wrote:
>> ... serializable transaction ...
> If we were to construct a database that had one giant lock for the
> entire database so that only a single query could execute at one time,
> transactions would be serializable (because they'd in fact be
> serialized).  However, performance would suck.  

I wonder if this giant-lock-for-isolation-level-serializable
is a mode postgres should support.  ISTM it would meet the
letter of the spec, and at least some of the people using
"transaction isolation level serializable" are doing so precisely
because they *want* the database to deal with all possible
serialization issues, and accepting performance penalties.

Re: incoherent view of serializable transactions

From

Robert Haas

Date:

24 December 2008, 19:54:08

On Dec 24, 2008, at 6:46 PM, Ron Mayer <rm_pg@cheapcomplexdevices.com>  
wrote:

> Robert Haas wrote:
>>> ... serializable transaction ...
>> If we were to construct a database that had one giant lock for the
>> entire database so that only a single query could execute at one  
>> time,
>> transactions would be serializable (because they'd in fact be
>> serialized).  However, performance would suck.
>
> I wonder if this giant-lock-for-isolation-level-serializable
> is a mode postgres should support.  ISTM it would meet the
> letter of the spec, and at least some of the people using
> "transaction isolation level serializable" are doing so precisely
> because they *want* the database to deal with all possible
> serialization issues, and accepting performance penalties.

No. :-)

Cheers,

...Robert

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

26 December 2008, 12:29:02

>>> Emmanuel Cecchet <manu@frogthinker.org> wrote: 
> There was an interesting comment by Oracle folks: Oracle does 
> not provide serializability but only snapshot isolation, and still
users 
> prefer to 'downgrade' to read committed for better performance. The 
> Oracle guys experience seemed to indicate that there was no need for

> serializability (well, that's also less work for them ;-)) in their 
> customer base.
Although I note that, while this work was done in 2006 at the
University of Sidney, Mr. Cahill is now an employee of Oracle
Corporation....
The interesting thing about this technique is that it works without
the kind of blocking which is the bane of other implementation
techniques.  Reads don't block writes or vice versa.  In my receipt
example, the fact that there is a transitory state where the view of
the data would not be consistent would cause no problems unless
someone actually tried to view the data within that (probably very
small) window of time.
-Kevin

Re: incoherent view of serializable transactions

From

Peter Eisentraut

Date:

29 December 2008, 08:20:42

On Tuesday 23 December 2008 16:51:03 Kevin Grittner wrote:
> If you look at the serializable queries in the original post for this
> thread, it's not hard to see that this standard is not met.  The
> insert of receipt 3 appears to happen before the update of the control
> record, since it has the old deposit date.  The transaction which
> selects from both tables sees the update to the control record, so it
> must come after that.  Yet it doesn't see the results of the first
> transaction.  There is no sequence of serial execution which is
> consistent with the behavior.

I am not sure yet whether or not your complaint is valid, but your arguments
are not very rigid.

Serializability is not defined in terms of what is visible, but what the state
of the database is.  If you can order the transactions without overlap so
that the state of the database is the same as in your original schedule, the
schedule is serializable.  It is not of concern what was "visible" in
between.  You may, however, be able to transform that argument to proving
that a phantom read is possible, which is how the SQL standard ultimately
defines serializability.

Also note that discussing what is visible necessarily implies the existence of
another transaction that does the reading, and that transaction does not
appear to be defined in your arguments.

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

29 December 2008, 11:17:02

>>> Peter Eisentraut <peter_e@gmx.net> wrote:
> On Tuesday 23 December 2008 16:51:03 Kevin Grittner wrote:
>> If you look at the serializable queries in the original post for
this
>> thread, it's not hard to see that this standard is not met.  The
>> insert of receipt 3 appears to happen before the update of the
control
>> record, since it has the old deposit date.  The transaction which
>> selects from both tables sees the update to the control record, so
it
>> must come after that.  Yet it doesn't see the results of the first
>> transaction.  There is no sequence of serial execution which is
>> consistent with the behavior.
> 
> I am not sure yet whether or not your complaint is valid, but your
arguments 
> are not very rigid.
> 
> Serializability is not defined in terms of what is visible, but what
the 
> state of the database is.
I don't read the standard that way.
I'll repeat the relevant part of the standard:
"The execution of concurrent SQL-transactions at isolation level
SERIALIZABLE is guaranteed to be serializable. A serializable
execution is defined to be an execution of the operations of
concurrently executing SQL-transactions that produces the same
effect as some serial execution of those same SQL-transactions. A
serial execution is one in which each SQL-transaction executes to
completion before the next SQL-transaction begins."
There is no end-to-end sequence in which the transactions listed in my
original email can be run that will produce the same effect that you
get within PostgreSQL, at least if you consider the results of a
SELECT statement to be significant.  (I do.)
> If you can order the transactions without overlap so 
> that the state of the database is the same as in your original
schedule, the 
> schedule is serializable.  It is not of concern what was "visible" in

> between.
On what do you base that assertion?
> You may, however, be able to transform that argument to proving 
> that a phantom read is possible, which is how the SQL standard
ultimately 
> defines serializability.
No it isn't.  The above quoted text is.
> Also note that discussing what is visible necessarily implies the
existence 
> of 
> another transaction that does the reading, and that transaction does
not 
> appear to be defined in your arguments.
It was this one, in my original post:
start transaction isolation level serializable ;
select * from ctl;
-- (deposit_date shows as 2008-12-23)
select * from receipt;
-- (Only receipts 1 and 2 show for 2008-12-22.)
commit;
-Kevin

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

29 December 2008, 11:28:04

>>> Peter Eisentraut <peter_e@gmx.net> wrote:
> Serializability is not defined in terms of what is visible, but what
the 
> state of the database is.
A belated thought on this: It would be easy enough to add a
daily_receipt_total table to the example to capture the result of the
query, instead of assuming that it merely prints on the daily bank
deposit report.  That would certainly meet your test of ending with an
invalid database state after all the transactions commit.
-Kevin

Re: incoherent view of serializable transactions

From

Greg Stark

Date:

29 December 2008, 13:22:21

Sorry for top posting -- damn phone.

The standard defines all the *other* transaction isolations in terms  
of phantom reads, dirty reads, and, uh one other phenomenon. But it  
defines serializability literally, as the transactions having the same  
effect as ift they were run serially. It explicitly says the fact that  
phantom reads can't occur in serializable is only a consequence of tfe  
definition.

And I don't see why you discard "visibility" as unimportant. All the  
transaction isolations are defined in terms of the results if the  
transactions. Those results include both the database state and the  
data returned by the queries. Otherwise "phantom read" is a  
meaningless concept.

-- 
Greg

On 29 Dec 2008, at 07:20, Peter Eisentraut <peter_e@gmx.net> wrote:

> On Tuesday 23 December 2008 16:51:03 Kevin Grittner wrote:
>> If you look at the serializable queries in the original post for this
>> thread, it's not hard to see that this standard is not met.  The
>> insert of receipt 3 appears to happen before the update of the  
>> control
>> record, since it has the old deposit date.  The transaction which
>> selects from both tables sees the update to the control record, so it
>> must come after that.  Yet it doesn't see the results of the first
>> transaction.  There is no sequence of serial execution which is
>> consistent with the behavior.
>
> I am not sure yet whether or not your complaint is valid, but your  
> arguments
> are not very rigid.
>
> Serializability is not defined in terms of what is visible, but what  
> the state
> of the database is.  If you can order the transactions without  
> overlap so
> that the state of the database is the same as in your original  
> schedule, the
> schedule is serializable.  It is not of concern what was "visible" in
> between.  You may, however, be able to transform that argument to  
> proving
> that a phantom read is possible, which is how the SQL standard  
> ultimately
> defines serializability.
>
> Also note that discussing what is visible necessarily implies the  
> existence of
> another transaction that does the reading, and that transaction does  
> not
> appear to be defined in your arguments.
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: incoherent view of serializable transactions

From

Peter Eisentraut

Date:

30 December 2008, 04:56:59

Greg Stark wrote:
> And I don't see why you discard "visibility" as unimportant. All the 
> transaction isolations are defined in terms of the results if the 
> transactions. Those results include both the database state and the data 
> returned by the queries. Otherwise "phantom read" is a meaningless concept.

Basically, if he wants to make a rigid argument that some scenario 
violates the serializability promise, then it is necessary to prove:

(1) There is no serial schedule for the set of transactions that 
achieves the same outcome.  (This proof is probably hard to work out, as 
many "there is no" proofs are.)

- or -

(2) A phantom read situation occurs.

His original argument uses terms like "window" where something is 
"visible" (to whom?), which can probably be transformed into a proof for 
(2), but is not convincing (to me) by itself.

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

30 December 2008, 13:28:20

>>> Peter Eisentraut <peter_e@gmx.net> wrote: 
> Greg Stark wrote:
>> And I don't see why you discard "visibility" as unimportant. All
>> the transaction isolations are defined in terms of the results if
>> the transactions. Those results include both the database state and
>> the data returned by the queries. Otherwise "phantom read" is a
>> meaningless concept.
> 
> Basically, if he wants to make a rigid argument that some scenario 
> violates the serializability promise, then it is necessary to prove:
> 
> (1) There is no serial schedule for the set of transactions that 
> achieves the same outcome.
> 
> - or -
> 
> (2) A phantom read situation occurs.
Agreed, except that (2) is a subset of (1), so (1) alone is
sufficient.  (How can a read not be repeatable if there are no
concurrent transactions?)
I feel that I did provide a proof that the transactions couldn't
represent any serial execution in my original email.  This seems to be
disputed with the argument that a SELECT from a database is not
required to provide coherent data that represents some point in the
serializable stream of transactions.  I disagree, but as I pointed out
previously, it is trivial to capture the results of any such query
into a table and thereby persist the problem within the database.
Here we go.  I've labeled the transactions consistently with new
thread I started trying to characterize the full scope of issues and
workarounds.
-- setup
drop if exists ctl, receipt, receipt_totals;
create table ctl (k text not null primary key, deposit_date date not
null);
insert into ctl values ('receipt', date '2008-12-22');
create table receipt (receipt_no int not null primary key,
deposit_date date not null, amount numeric(13,2));
insert into receipt values (1, (select deposit_date from ctl where k =
'receipt'), 1.00);
insert into receipt values (2, (select deposit_date from ctl where k =
'receipt'), 2.00);
create table receipt_totals (deposit_date date not null primary key,
next_date date not null, deposit_total numeric(13,2) not null);

-- connection 1 (start of T0)
start transaction isolation level serializable;
insert into receipt values (3, (select deposit_date from ctl where k =
'receipt'), 4.00);

-- connection 2 (T1)
start transaction isolation level serializable;
update ctl set deposit_date = date '2008-12-23' where k = 'receipt';
commit transaction;

-- connection 2 (TN version 1)
start transaction isolation level serializable;
select * from ctl;
-- (deposit_date shows as 2008-12-23)
select * from receipt where deposit_date = date '2008-12-22';
-- (Only receipts 1 and 2 show for 2008-12-22.)
commit transaction;

-- connection 2 (TN version 2)
start transaction isolation level serializable;
insert into receipt_totals select r.deposit_date, c.deposit_date, sum(amount)   from ctl c join receipt r     on (
r.deposit_date< c.deposit_date      and not exists          (            select * from receipt r2              where
r2.deposit_date< c.deposit_date                and r2.deposit_date > r.deposit_date          )        )   group by
r.deposit_date,c.deposit_date;
 
commit transaction;

-- connection 1 (end of T0)
commit transaction;
After all this is done, a select from receipt_totals shows:deposit_date | next_date  | deposit_total
--------------+------------+---------------2008-12-22   | 2008-12-23 |          3.00
(1 row)
Here goes the proof, although I'm not going to be overly formal in the
language.
(1)  If these transactions were serialized, T0 must come before T1,
because T0 uses the ctl.deposit_date before it is updated by T1.

(2)  If these transactions were serialized, T1 must come before TN
(either version), because the TN transactions see the ctl.deposit_date
set by T1.
(3)  If these transactions were serialized, the TN transactions must
come before T0, since they don't see the row inserted by T0.
(4)  Since serialization requires that T0 < T1 < TN < T0 (comparing
time sequence) the transactions cannot be considered serialized.
-Kevin

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

30 December 2008, 20:33:46

>>> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
> If you have a concrete suggestion (= patch) for the documentation,
I'm
> all ears.

I'm still working on section "Serializable Isolation versus True
Serializability", but here are all the changes I can see which precede
it.  Has the review of the SQL specs convinced everyone that this much
is appropriate?

It also seems like the "Data Consistency Checks at the Application
Level" section could use a little more detail.  Since referential
integrity checks are so well understood, and don't work reliably under
snapshot serialization without explicit locks, perhaps that could be
added?

-Kevin

Attachment

serializable-doc1.diff

Re: incoherent view of serializable transactions

From

Peter Eisentraut

Date:

03 January 2009, 18:06:28

On Wednesday 31 December 2008 02:33:26 Kevin Grittner wrote:
> I'm still working on section "Serializable Isolation versus True
> Serializability", but here are all the changes I can see which precede
> it.  Has the review of the SQL specs convinced everyone that this much
> is appropriate?

I don't agree with these changes.  You make it sound like serializability is 
an additional condition on the serializable isolation level on top of the 
no-phantom-reads condition.  I think that is not true, both mathematically 
and from the wording of the SQL standard.  It is an equivalent condition or a 
consequence, depending on how you view it.

Re: incoherent view of serializable transactions

From

Peter Eisentraut

Date:

03 January 2009, 18:20:24

On Tuesday 30 December 2008 19:28:01 Kevin Grittner wrote:
> Here we go.  I've labeled the transactions consistently with new
> thread I started trying to characterize the full scope of issues and
> workarounds.

OK, I believe it now. :-)

The missing predicate locking is again at fault here.  Because this ...

> -- connection 1 (start of T0)
> start transaction isolation level serializable;
> insert into receipt values (3, (select deposit_date from ctl where k =
> 'receipt'), 4.00);

... would lock ctl where k = 'receipt' ...

> -- connection 2 (T1)
> start transaction isolation level serializable;
> update ctl set deposit_date = date '2008-12-23' where k = 'receipt';
> commit transaction;

... and then this would have to wait ...

> -- connection 2 (TN version 2)
> start transaction isolation level serializable;
> insert into receipt_totals
>   select r.deposit_date, c.deposit_date, sum(amount)
>     from ctl c join receipt r
>       on ( r.deposit_date < c.deposit_date
>        and not exists
>            (
>              select * from receipt r2
>                where r2.deposit_date < c.deposit_date
>                  and r2.deposit_date > r.deposit_date
>            )
>          )
>     group by r.deposit_date, c.deposit_date;
> commit transaction;

... and this could never be executed before T0 commits.

Re: incoherent view of serializable transactions

From

Gregory Stark

Date:

03 January 2009, 18:54:06

Peter Eisentraut <peter_e@gmx.net> writes:

> On Wednesday 31 December 2008 02:33:26 Kevin Grittner wrote:
>> I'm still working on section "Serializable Isolation versus True
>> Serializability", but here are all the changes I can see which precede
>> it.  Has the review of the SQL specs convinced everyone that this much
>> is appropriate?
>
> I don't agree with these changes.  You make it sound like serializability is 
> an additional condition on the serializable isolation level on top of the 
> no-phantom-reads condition.  I think that is not true, both mathematically 
> and from the wording of the SQL standard.  It is an equivalent condition or a 
> consequence, depending on how you view it.

The standard explicitly says that the no-phantom-reads condition is a
consequence of the serializability constraint. Did you miss that whole
discussion this past week?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!

Re: incoherent view of serializable transactions

From

Peter Eisentraut

Date:

03 January 2009, 19:37:22

On Sunday 04 January 2009 00:53:51 Gregory Stark wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > On Wednesday 31 December 2008 02:33:26 Kevin Grittner wrote:
> >> I'm still working on section "Serializable Isolation versus True
> >> Serializability", but here are all the changes I can see which precede
> >> it.  Has the review of the SQL specs convinced everyone that this much
> >> is appropriate?
> >
> > I don't agree with these changes.  You make it sound like serializability
> > is an additional condition on the serializable isolation level on top of
> > the no-phantom-reads condition.  I think that is not true, both
> > mathematically and from the wording of the SQL standard.  It is an
> > equivalent condition or a consequence, depending on how you view it.
>
> The standard explicitly says that the no-phantom-reads condition is a
> consequence of the serializability constraint. Did you miss that whole
> discussion this past week?

The SQL standard says it in about three different ways, and textbooks might 
define it an three more ways.  That still doesn't negate my concern with the 
way his patch phrases the issue.

A language lawyer might also point out that the note that contains 
the "explicitness" isn't actually part of the formal standard.  The only 
thing that the standard formally defines are the excluded phenomena.

More to the point, think about how a user might want to think about these 
issues.

"The standard also requires that serializable transactions behave as though 
[...]" --- User: The standard requires it, but is it also implemented?  
(Apparently not, but that is explained somewhere else.)

"is a natural consequence of the fact" --- There is nothing natural about any 
of this.  Why is it a consequence and how?

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

05 January 2009, 11:12:13

>>> Peter Eisentraut <peter_e@gmx.net> wrote:
> A language lawyer might also point out that the note that contains 
> the "explicitness" isn't actually part of the formal standard.  The
only 
> thing that the standard formally defines are the excluded phenomena.
Previously quoted, from the standard:
"The execution of concurrent SQL-transactions at isolation level
SERIALIZABLE is guaranteed to be serializable. A serializable
execution is defined to be an execution of the operations of
concurrently executing SQL-transactions that produces the same
effect as some serial execution of those same SQL-transactions. A
serial execution is one in which each SQL-transaction executes to
completion before the next SQL-transaction begins."
> More to the point, think about how a user might want to think about
these 
> issues.
> 
> "The standard also requires that serializable transactions behave as
though 
> [...]" --- User: The standard requires it, but is it also
implemented?  
> (Apparently not, but that is explained somewhere else.)
That's something I thought about, but failed to find a good way to
incorporate at that point, and I thought the discussion in the
following sections covered it.  Perhaps a reference to those following
sections at the point of definition?
> "is a natural consequence of the fact" --- There is nothing natural
> about any of this.  Why is it a consequence and how?
How could you possibly get any of those phenomena if there are no
concurrent transactions?
-Kevin

Re: incoherent view of serializable transactions

From

Paul Schlie

Date:

05 January 2009, 19:10:10

Sorry if I'm restating the obvious, however I don't understand the
confusion, as it seems the standard's definition isn't mysterious; it
simply requires that the resulting state from the concurrent execution
of transactions (and implicitly any subset) designated to occur at the
isolation level SERIALIZABLE be equivalent to SOME LITERALLY SERIAL
execution of said transactions.

Thereby although of course transactions literally executed serially
comply; they need not be executed literally serially, as long as some
mechanism is used to warrant that the state resulting from the completion
of every such transaction in some sequence be equivalent to the literally
serial execution of the transactions in the same sequence.

Thereby even if only a subset of such concurrently executed transactions
may be complete at some point in time, any potentially remaining such
transactions may be restarted for whatever reason and remain compliant
(as the resulting state from the concurrent execution of such transactions
will be equivalent to some literal serial execution of said transactions
upon the completion of each so executed transaction; correspondingly any
implementation which can not warrant the same, is simply non-compliant).

Thereby although no pre-specified completion order exists per-se, the
order of completion of transactions implicitly determine if incomplete
transactions may need to be restarted if dependant on a preceding completed
transaction's resulting mutated state (or alternatively allowed to proceed
if previously blocked on said potentially mutated state).

Thereby although a collection of such concurrently executed transactions
my have mutual interdependencies, each subsequent complete transaction in
effect break dependency deadlocks as may otherwise logically exist; and
thereby may affect optimal logical execution order of remaining such
transactions potentially concurrently; and be warranted to never deadlock.
I believe IMHO.

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

06 January 2009, 12:34:17

>>> Paul Schlie <schlie@comcast.net> wrote: 
> Sorry if I'm restating the obvious, however I don't understand the
> confusion, as it seems the standard's definition isn't mysterious;
> it simply requires that the resulting state from the concurrent
> execution of transactions (and implicitly any subset) designated to
> occur at the isolation level SERIALIZABLE be equivalent to SOME
> LITERALLY SERIAL execution of said transactions.
I think that some of the confusion may result from changes in the
standard.  As far as I can recall, the language requiring that the
SERIALIZABLE transaction isolation level be truly serializable was not
in early versions of the standard, and it may be that there is some
reluctance to concede that a shift in the standard has rendered
PostgreSQL out of compliance on this point.
As I see it, the discussion on this thread is around recognition of
the requirements of the current standard within the PostgreSQL
documentation.
There is a related thread on which I'm attempting to come up with
documentation to assist those familiar with true serializable behavior
who are attempting to recognize application coding patterns where the
differences between that and snapshot isolation are material, with
tips on how to handle these differences.  There seems to be some
question whether the patterns in which anomalies occur are common
enough to merit comment.
If you could reference any concise and accessible work on these
anomalies and practical workarounds in application code, it would be
much appreciated.
-Kevin

Re: incoherent view of serializable transactions

From

Paul Schlie

Date:

06 January 2009, 16:33:22

> Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
>>>> Paul Schlie <schlie@comcast.net> wrote:
>> Sorry if I'm restating the obvious, however I don't understand the
>> confusion, as it seems the standard's definition isn't mysterious;
>> it simply requires that the resulting state from the concurrent
>> execution of transactions (and implicitly any subset) designated to
>> occur at the isolation level SERIALIZABLE be equivalent to SOME
>> LITERALLY SERIAL execution of said transactions.
>  
> I think that some of the confusion may result from changes in the
> standard.  As far as I can recall, the language requiring that the
> SERIALIZABLE transaction isolation level be truly serializable was not
> in early versions of the standard, and it may be that there is some
> reluctance to concede that a shift in the standard has rendered
> PostgreSQL out of compliance on this point.
>  
> As I see it, the discussion on this thread is around recognition of
> the requirements of the current standard within the PostgreSQL
> documentation.
>  
> There is a related thread on which I'm attempting to come up with
> documentation to assist those familiar with true serializable behavior
> who are attempting to recognize application coding patterns where the
> differences between that and snapshot isolation are material, with
> tips on how to handle these differences.  There seems to be some
> question whether the patterns in which anomalies occur are common
> enough to merit comment.
>  
> If you could reference any concise and accessible work on these
> anomalies and practical workarounds in application code, it would be
> much appreciated.

Personally; although compliance may reduce the execution performance of
such so designated transactions, it will correspondingly warrant correct
results, and should be the goal rather than documenting non-conformance;
as those who wish to embed more direct control over transaction evaluation
into their specification to enable their improved concurrent execution
efficiency by utilizing more relaxed evaluation semantics, remain free to
do without penalty. (Simple examples of the risk of non-compliance already
seem sufficiently identified in your example and first reference cited).

Merely documenting that transactions designated to be evaluated at the
isolation level SERIALIZABLE may not yield expected results, as currently
identified, seems sufficient in the short term; and as/if enough interest
develops otherwise, so may an effort to warrant compliance; I suspect.

(as that known to most often be fine, can't be relied upon in practice)

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

06 January 2009, 17:08:11

>>> Paul Schlie <schlie@comcast.net> wrote: 
>>  Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
>> There is a related thread on which I'm attempting to come up with
>> documentation to assist those familiar with true serializable
>> behavior who are attempting to recognize application coding
>> patterns where the differences between that and snapshot isolation
>> are material, with tips on how to handle these differences.  There
>> seems to be some question whether the patterns in which anomalies
>> occur are common enough to merit comment.
>>  
>> If you could reference any concise and accessible work on these
>> anomalies and practical workarounds in application code, it would
>> be much appreciated.
> 
> Personally; although compliance may reduce the execution performance
> of such so designated transactions, it will correspondingly warrant
> correct results, and should be the goal rather than documenting
> non-conformance; as those who wish to embed more direct control over
> transaction evaluation into their specification to enable their
> improved concurrent execution efficiency by utilizing more relaxed
> evaluation semantics, remain free to do without penalty. (Simple
> examples of the risk of non-compliance already seem sufficiently
> identified in your example and first reference cited).
> 
> Merely documenting that transactions designated to be evaluated at
> the isolation level SERIALIZABLE may not yield expected results, as
> currently identified, seems sufficient in the short term; and as/if
> enough interest develops otherwise, so may an effort to warrant
> compliance; I suspect.
> 
> (as that known to most often be fine, can't be relied upon in
> practice)
Thank you for your perspective.  I'm not sure that I totally followed
you, so let me restate to see if it sounds right to you.  You are
suggesting that minimal discussion of the problem, the initial example
I provided, and more discussion of how to ensure correct semantics
would be what is needed?  Filling in more detail if interest is
expressed by users?
If so, the draft of a partial replacement for the partial replacement
of text in "Serializable Isolation versus True Serializability" may be
close to what you're suggesting -- if additional guidance on when to
use what additional locks is provided.  I'll paste below my signature
for comment.  It's a little rough yet, but looking to see if I'm on
the right track.
The first paragraph is a slightly modified form of a suggestion from
Robert Haas in:
http://archives.postgresql.org/pgsql-hackers/2008-12/msg01732.php
-Kevin
PostgreSQL's MVCC framework, snapshot isolation, and limited automatic
row-level locking permit a greater degree of concurrency than some
other databases; however, even when the transaction isolation level is
set to serializable, serialization anomalies can occur in some
situations.  When it is important to prevent these anomalies, explicit
row-level or table-level locking can be used at the expense of reduced
concurrency.
Since PostgreSQL protects a serializable transaction against changes
in the view of the data, and uses locks to prevent modification of
data which is being modified by a concurrent transaction, the
anomalies can only occur when a transaction reads data which is
modified by a concurrent transaction, and uses that as the basis of
database modifications which are read by a concurrent transaction. 
Data consistency checks at the application level have a problem with
this in general, and are addressed in section 13.4.  Some examples of
other types of anomalies follow, with suggestions on how to use
explicit locking to prevent the anomalies where needed.
Consider a system which involves recording receipts, each of which
must go into a daily deposit.  There is a control table with one row
containing the current deposit date for receipts.  Each transaction
which is inserting a receipt selects the deposit date from the control
table within its transaction, and uses it for the receipt's deposit
date.  Somewhere mid-afternoon the control table's date is updated,
all subsequent receipts should fall into the new day, and a report is
run listing the receipts for the day and giving the deposit total.
If all transactions involved were truly serializable, any SELECT of
receipts for a date prior to the deposit date of the control table
would see the complete, final set of receipts.  Under the PostgreSQL
implementation, unless explicit locking is used, although data
eventually gets to that state there can be a window of time during
which a SELECT can return an incomplete list of receipts for a date
which appears to be closed, even if all transactions for modifying and
viewing data are SERIALIZABLE.  This window of time runs from the
commit of the transaction which updated the control table until the
commit of any pending transactions which are inserting receipts and
which obtained a snapshot before the update of the control table.
To prevent this anomaly, a lock can be taken out on the receipt table
to block all modification to that table.  This should be done in the
transaction which will update the control table, and should be
acquired before the transaction selects or modifies any data.

Re: incoherent view of serializable transactions

From

Peter Eisentraut

Date:

07 January 2009, 09:25:35

Kevin Grittner wrote:
>> "is a natural consequence of the fact" --- There is nothing natural
>> about any of this.  Why is it a consequence and how?
>  
> How could you possibly get any of those phenomena if there are no
> concurrent transactions?

I see what you mean now, but you could write out that logic in more detail.

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

07 January 2009, 11:40:20

>>> Peter Eisentraut <peter_e@gmx.net> wrote: 
> Kevin Grittner wrote:
>>> "is a natural consequence of the fact" --- There is nothing
>>> natural about any of this.  Why is it a consequence and how?
>>  
>> How could you possibly get any of those phenomena if there are no
>> concurrent transactions?
> 
> I see what you mean now, but you could write out that logic in more
> detail.
Those weren't my words; I was quoting the SQL spec.  It came about
half a page after they had defined serializable transactions by saying
that they must produce the same effect as if they had been run (in
some order) one at a time.  The spec then defined "phenomena that can
occur during the execution of concurrent SQL-transactions" and gave a
table of which phenomena could occur at which transaction isolation
level.  Immediately after the table was the note about "natural
consequence".
Since these phenomena are defined in terms of the visibility of the
effects of concurrent transactions, and serializable transactions must
have the same effect as if they were run one at a time, a natural
consequence is that none of the effects can occur.  What gaps in the
logic to you see?
-Kevin

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

08 January 2009, 10:46:24

>>> I wrote: 
>>>> Peter Eisentraut <peter_e@gmx.net> wrote: 
>> Kevin Grittner wrote:
>>>> "is a natural consequence of the fact" --- There is nothing
>>>> natural about any of this.  Why is it a consequence and how?
>>>  
>>> How could you possibly get any of those phenomena if there are no
>>> concurrent transactions?
>> 
>> I see what you mean now, but you could write out that logic in more
>> detail.
>  
> Those weren't my words; I was quoting the SQL spec.
Last night I was reviewing my proposed patch from this thread, to try
to address other expressed concerns, and noticed that I had used this
language from the SQL spec in the patch.  I see your point now. 
Without the same context as the spec, and when intended for a
different audience, this language probably isn't the best.  It now
also occurs to me that the spec is a copyrighted work, and it probably
isn't appropriate to copy a chunk that big into PostgreSQL docs.
I'll write something in my own words to replace this.
Thanks for the input, and sorry for misunderstanding.
-Kevin

Re: incoherent view of serializable transactions

From

"Kevin Grittner"

Date:

21 January 2009, 23:00:33

Here's a shot at a more radical revision, to try to address concerns
raised over my failure in the previous (very minimal) suggested patch
to address PostgreSQL behavior close to where the spec's behavior is
described, and my dragging in of language directly from the spec in a
confusing context.  I'd appreciate any corrections or suggestions
before I massage it into sgml.
Also, I don't know if I should leave it with the one example or
whether there should be more.  I could leave in the old example,
although the popular example of reversing updates (one transaction
updates all card rows to 'face-up' where they are 'face-down' and vice
versa) seems easier to understand.  One or both of these?  Other
suggestions?
Also, I tried using SELECT FOR SHARE and SELECT FOR HOLD as the
complete solution or instead of one of the table locks, but was able
to generate anomalies in all such cases.  If someone has a less
extreme technique for blocking the anomalies in the receipt example
(even when the SELECT of the deposit date for a receipt is in a
separate statement earlier in the transaction), please let me know, so
that I can include it.
If time permits I might take a stab at expanding the section on data
consistency checks at the application level; however, that seems less
urgent than correcting the obsolescent discussions of the SQL standard
and describing some of the anomalies not covered by a discussion of
consistency checks.
Thanks,
-Kevin
13.2. Transaction Isolation
The SQL standard defines four levels of transaction isolation in terms
of three phenomena that must be prevented between concurrent
transactions, with additional constraints on Serializable
transactions. These undesirable phenomena are: 
dirty read 
A transaction reads data written by a concurrent uncommitted
transaction. 
nonrepeatable read 
A transaction re-reads data it has previously read and finds that data
has been modified by another transaction (that committed since the
initial read). 
phantom read 
A transaction re-executes a query returning a set of rows that satisfy
a search condition and finds that the set of rows satisfying the
condition has changed due to another recently-committed transaction.
The four transaction isolation levels and the corresponding behaviors
are described in Table 13-1.
Table 13-1. SQL Transaction Isolation Levels
<table here>
The standard also requires that serializable transactions behave as
though they were run one at a time, even though their execution may
actually overlap. Since the phenomena described above relate to the
visibility of the effects of concurrent transactions, and each
serializable transaction must behave as though it were run in its
entirety either before or after every other transaction, none of the
above phenomena can occur within a serializable transaction.
In practice there is another popular transaction isolation level, not
mentioned in the standard, generally known as Snapshot isolation
level.  A transaction executed at this transaction isolation level
sees a consistent view of the data; changes made by other transactions
are not visible to it.  Because of this, none of the phenomena
described above are possible.  Additionally, when concurrent
transactions running at this level attempt to modify the same data,
the update conflict causes causes transaction rollback to prevent many
forms of update anomalies.  Still, data may be viewed or stored in a
state which is not consistent with any serial execution of
transactions run at this level, so although it is more strict than
required for Repeatable Read, it does not meet the standard's
definition of the Serializable transaction isolation level.
In PostgreSQL you can request any of the four standard transaction
isolation levels, but internally there are only two distinct isolation
levels, which correspond to the levels Read Committed and Snapshot.
When you select the level Read Uncommitted you really get Read
Committed, and when you select Repeatable Read or Serializable you
really get Snapshot.  Since the standard does not provide for the
Snapshot isolation level, PostgreSQL reports it as Serializable.  The
behavior of the available isolation levels is detailed in the
following subsections. 
To set the transaction isolation level of a transaction, use the
command SET TRANSACTION. or specify the desired transaction isolation
level on a BEGIN TRANSACTION or START TRANSACTION statement.
13.2.1. Read Committed Isolation Level
<unchanged>
13.2.2. Snapshot Isolation Level
The Snapshot level (reported as Serializable) provides the strictest
transaction isolation available in PostgreSQL. This level approximates
serial transaction execution, as if transactions had been executed one
after another, serially, rather than concurrently. However,
applications using this level must be prepared to retry transactions
due to serialization failures.
When a transaction is on the this level, a SELECT query sees only data
committed before the transaction began; it never sees either
uncommitted data or changes committed during transaction execution by
concurrent transactions. (However, the SELECT does see the effects of
previous updates executed within its own transaction, even though they
are not yet committed.) This is different from Read Committed in that
the SELECT sees a snapshot as of the start of the transaction, not as
of the start of the current query within the transaction. Thus,
successive SELECT commands within a single transaction always see the
same data.
UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands
behave the same as SELECT in terms of searching for target rows: they
will only find target rows that were committed as of the transaction
start time. However, such a target row might have already been updated
(or deleted or locked) by another concurrent transaction by the time
it is found. In this case, the transaction will wait for the first
updating transaction to commit or roll back (if it is still in
progress). If the first updater rolls back, then its effects are
negated and the transaction can proceed with updating the originally
found row. But if the first updater commits (and actually updated or
deleted the row, not just locked it) then the transaction will be
rolled back with the message
ERROR:  could not serialize access due to concurrent update
because a snapshot transaction cannot modify or lock rows changed by
other transactions after the transaction began.
When the application receives this error message, it should abort the
current transaction and then retry the whole transaction from the
beginning. The second time through, the transaction sees the
previously-committed change as part of its initial view of the
database, so there is no logical conflict in using the new version of
the row as the starting point for the new transaction's update.
Note that only updating transactions might need to be retried;
read-only transactions will never have serialization conflicts.
The Snapshot mode provides a rigorous guarantee that each transaction
sees and modifies an unchanging view of the database. However, the
application has to be prepared to retry transactions when concurrent
updates make it impossible to update the original snapshot. Since the
cost of redoing complex transactions might be significant, this mode
is recommended only when updating transactions contain logic
sufficiently complex that they might give wrong answers in Read
Committed mode. Most commonly, Snapshot mode is necessary when a
transaction executes several successive commands that must see
identical views of the database. 
13.2.2.1. PostgreSQL Serializable Isolation versus True
Serializability
With true serializability, any database transaction which can be shown
to be correct and safe if run by itself is automatically safe when run
in any mix of serializable transactions.  PostgreSQL's MVCC framework,
snapshot isolation, and limited automatic row-level locking permit a
greater degree of concurrency than some other databases; however, even
when the transaction isolation level is set to serializable,
serialization anomalies can occur in some situations. When it is
important to prevent these anomalies, explicit row-level or
table-level locking can be used at the expense of reduced concurrency.
Since PostgreSQL protects a Serializable transaction against changes
in the view of the data, and uses locks to prevent modification of
data which is being modified by a concurrent transaction, the
anomalies can only occur when a transaction reads data which is
modified by a concurrent transaction, and uses that as the basis for
database modifications which are read by a concurrent transaction.
Data consistency checks at the application level have a problem with
this in general, and are addressed in section 13.4.  Some examples of
other types of anomalies follow, with suggestions on how to use
explicit locking to prevent the anomalies where needed.
Consider a system which records receipts, each of which must go into a
daily deposit. There is a control table with one row containing the
current deposit date for receipts. Each transaction which is inserting
a receipt selects the deposit date from the control table within its
transaction, and uses it for the receipt's deposit date. Somewhere
mid-afternoon the control table's date is updated, all subsequent
receipts should fall into the new day, and a report is run listing the
receipts for the day and giving the deposit total.  Serializable
transaction isolation mode is requested for every transaction
involved.
If all transactions involved were truly serializable, any successful
SELECT of receipts for a date prior to the deposit date of the control
table (as shown by a SELECT in the same transaction) would see the
complete, final set of receipts for the completed deposit date.  To
preserve this view of the data, one or more transactions might need to
block until the commit or rollback of other transactions, and one or
more transaction might need to be rolled back and run again from the
start.
Under the PostgreSQL implementation there is no blocking and no chance
of rollbacks; however, there can be a window of time during which a
SELECT can return an incomplete list of receipts for a date which
appears to be closed, even if Serializable mode is requested for all
transactions modifying and viewing data. This window of time runs from
the commit of the transaction which updated the control table until
the commit of any pending transactions which are inserting receipts
and which obtained a snapshot before the update of the control table. 
A database transaction, even if declared Serializable, which selects
the sum of the receipts for the closed date and saves it into a daily
deposit table, will persist an inaccurate value if run during this
same window of time.
Alarming as this might sound, if the transactions which insert
receipts commit very quickly after the snapshot is obtained, this
anomaly might never appear in a real-life system.  If the risk is
acceptable, or can be controlled by external means, such as delaying
the run of the daily receipt report for a few seconds after the update
of the control table, no further action is required.  It is up to the
software developer to recognize where the risk is unacceptable and to
decide on a technique to control each known conflict.
To prevent the anomaly described above, transactions which insert
receipts and transactions which advance the deposit date must both
acquire a common lock as their first step.  This lock could be
acquired on either the control table or the receipt table.