Thread: Serializable Isolation without blocking

Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
While discussing potential changes to PostgreSQL documentation of
transaction isolation levels, Emmanuel Cecchet pointed out an
intriguing new paper[1] on a new algorithm to provide true
serializable behavior in a MVCC based database, with no additional
blocking; although, there being no such things as a free lunch, there
is an increase in serialization failures under contention.  I have
been hesitant to raise the issue while everyone was busy trying to
wrap up release 8.4; but since that is now in beta testing and PGCon
is fast approaching, I wanted to put the issue out there so that
people have a chance to review it and discuss it.
Michael Cahill has given me permission to quote from the paper.  He
has also provided a URL to his personal version of the work[2], which
people may directly access for their personal use, although
redistribution is prohibited by the ACM copyright.  He has asked to be
copied on the discussion here.
I know that some here have questioned why anyone would want
serializable transactions.  Our environment has 21 programmers, 21
business process analysts, and four DBAs.  A major part of the time
for this staff is enhancement of existing software and development of
new software.  We have many distinct databases, the largest of which
has a schema of over 300 tables.  (That's well normalized and not
partitioned -- the structure of the data really is that complex.)  We
have over 8,700 queries against these various databases, including
OLTP, reporting, statistics, public access, web queries, etc.  If one
were to go through the well-know techniques to identify all possible
interactions between these queries against these tables, it would not
only be a massive undertaking, the results would immediately be
obsolete.
The nice thing about serializable transactions is that if you can show
that a transaction does the right thing when run by itself, you
automatically know that it will function correctly when run in any
mix, or it will fail with a serializable error and can be safely
retried.  Our framework is designed so that serialization errors are
automatically detected and the transaction is retried without any user
interaction or application programming needed -- a serialization
failure appears to the application code and the user the same as
simple blocking.
Quoting the paper's abstract:
"Many popular database management systems offer snapshot isolation
rather than full serializability. There are well known anomalies
permitted by snapshot isolation that can lead to violations of data
consistency by interleaving transactions that individually maintain
consistency. Until now, the only way to prevent these anomalies was to
modify the applications by introducing artificial locking or update
conflicts, following careful analysis of conflicts between all pairs
of transactions.
"This paper describes a modification to the concurrency control
algorithm of a database management system that automatically detects
and prevents snapshot isolation anomalies at runtime for arbitrary
applications, thus providing serializable isolation. The new algorithm
preserves the properties that make snapshot isolation attractive,
including that readers do not block writers and vice versa. An
implementation and performance study of the algorithm are described,
showing that the throughput approaches that of snapshot isolation in
most cases."
Quoting a portion of the conclusions:
"A prototype of the algorithm has been implemented in Oracle Berkeley
DB and shown to perform significantly better that two-phase locking in
a variety of cases, and often comparably with snapshot isolation.
"One property of Berkeley DB that simplified our implementation was
working with page level locking and versioning. In databases that
version and lock at row-level granularity (or finer), additional
effort would be required to avoid phantoms, analogous to standard two
phase locking approaches such as multigranularity locking."
Quoting a snippet from the implementation section:
"Making these changes to Berkeley DB involved only modest changes to
the source code. In total, only 692 lines of code (LOC) were modified
out of a total of over 200,000 lines of code in Berkeley DB."
Michael J. Cahill has since implemented these techniques in InnoDB as
part of his PhD work.  While Microsoft SQL Server does provide full
serializability in an MVCC implementation, I believe they do it with
blocking rather than this newer and faster technique.
The paper is a very interesting read, and I fear that if we don't
pursue these techniques, InnoDB users will have legitimate bragging
rights over PostgreSQL users in an important area.
Oh, and I know that these issues are well known, and I know that the
solution involves predicate locks; although these won't necessarily be
locks which cause blocking.
-Kevin
[1] Michael J. Cahill, Uwe Röhm, Alan D. Fekete. Serializable
Isolation for Snapshot Databases. In the Proceedings of the 2008 ACM
SIGMOD international conference on management of data, pages 729-738.
http://doi.acm.org/10.1145/1376616.1376690
[2]
http://cahill.net.au/wp-content/uploads/2009/01/real-serializable.pdf



Re: Serializable Isolation without blocking

From
Neil Conway
Date:
On Tue, May 5, 2009 at 8:50 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> While discussing potential changes to PostgreSQL documentation of
> transaction isolation levels, Emmanuel Cecchet pointed out an
> intriguing new paper[1] on a new algorithm to provide true
> serializable behavior in a MVCC based database

I agree, this is very interesting work. I blogged about it a while ago[1].

> "Making these changes to Berkeley DB involved only modest changes to
> the source code. In total, only 692 lines of code (LOC) were modified
> out of a total of over 200,000 lines of code in Berkeley DB."

Tracking the read sets of each transaction would be very expensive.
Worse still, that information needs to be kept around after
end-of-transaction, which raises questions about where it should be
stored and how it should be cleaned up. Note that the benchmarks in
the paper involve transactions that perform "a small number of simple
read and update operations", which reduces the bookkeeping overhead.

Neil

[1] http://everythingisdata.wordpress.com/2009/02/25/february-25-2009/


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Neil Conway <neil.conway@gmail.com> wrote: 
> Tracking the read sets of each transaction would be very expensive.
> Worse still, that information needs to be kept around after
> end-of-transaction, which raises questions about where it should be
> stored and how it should be cleaned up. Note that the benchmarks in
> the paper involve transactions that perform "a small number of
> simple read and update operations", which reduces the bookkeeping
> overhead.
I know that some of the simplifying assumptions listed in 3.1 do not
currently hold for PostgreSQL.  A prerequisite for using the algorithm
would be to make them hold for PostgreSQL, or find some way to work
around their absence.  I hope people will read the paper and mull it
over, but these assumptions are probably the crux or whether this
enhancement is feasible.  I guess it would be best to throw that much
out to the list for discussion.
To quote:
> 1. For any data item x, we can efficiently get the list of
>    locks held on x.
> 2. For any lock l in the system, we can efficiently get
>    l.owner, the transaction object that requested the lock.
> 3. For any version xt of a data item in the system, we
>    can efficiently get xt.creator, the transaction object
>    that created that version.
> 4. When *nding a version of item x valid at some given
>    timestamp, we can efficiently get the list of other ver-
>    sions of x that have later timestamps.
Based on my limited understanding, I don't get the impression 2 or 3
are a problem.
I'm assuming that we would use the structures which back the pg_locks
view for the SIREAD locks implicit in 1, possibly with some scope
escalation as counts for a table rise (row to page to extent to
table).  This may require a larger allocation for this information by
those wanting to use serializable transactions.
I'm not sure how 4 could be handled.
I don't know how much work would be required to track the transaction
information listed in section 4.1 (or its functional equivalent).
I'd be happy to hear the opinion of those more familiar with the
internals than I.
-Kevin


Re: Serializable Isolation without blocking

From
"Albe Laurenz"
Date:
Kevin Grittner wrote:
> While discussing potential changes to PostgreSQL documentation of
> transaction isolation levels, Emmanuel Cecchet pointed out an
> intriguing new paper[1] on a new algorithm to provide true
> serializable behavior in a MVCC based database, with no additional
> blocking; although, there being no such things as a free lunch, there
> is an increase in serialization failures under contention.

I have read through the paper and will share my comments.

I hope I got it right:

To put it in a nutshell, the approach to true serializability presented
in the paper involves "intention locks" which do not block writers,
but cause transactions with conflict potential to be aborted.

The main cost incurred is the maintenance of these intention locks, which
need to be kept for a while even after a transaction has committed.
Moreover, there will potentially be many of these locks (think of
SELECT COUNT(*) FROM ...), so some lock escalation mechanism (to
page or table locks) will be necessary.

I don't know how hard that would be to implement, but I'd argue
that it is only worth considering if it guarantees true serializability.

The paper describes only intention locks for rows in the select list
of a statement and deliberately ignores rows which are examined in
the WHERE clause. The authors argue in section 3.4 that this is no
problem in their implementation since "Berkeley DB does all locking
and versioning on page granularity".

I don't buy that; maybe I misunderstood something.

Consider a function
"makehighlander(personid integer) RETURNS void"
defined like this:
  SELECT ishighlander INTO b FROM scots WHERE id=personid;  IF b THEN     RETURN; /* no need to do anything */  END IF;
UPDATE scots SET ishighlander=TRUE WHERE id=personid;  SELECT count(*) INTO n FROM scots WHERE ishighlander;  IF (n >
1)THEN     RAISE EXCEPTION 'There can be only one';  END IF; 

If we assume that "ishighlander" is false for all records in
the beginning, and there are two calls to the function with
two personid's of records *in different pages*, then there cannot be
any conflicts since all (write and intention) locks taken by each of
these calls should only affect the one page that contains the one
record that is updated and then found in the subsequent SELECT.

Yet if the two execute concurrently and the two first SELECTs are
executed before the two UPDATEs, then both functions have a snapshot
so that the final SELECT statements will return 1 and both functions will
succeed, leaving the table with two highlanders.


So I think one would have to add intention locks for rows considered
in the WHERE clause to guarantee true serializability.

It would be interesting to know how many lines of code would have
to be added to implement that and how performance would be affected.
I'm afraid that concurrency would suffer because more conflicting
transactions would be aborted.


Another thing I wonder is whether the authors have considered the
possibility that there are serializable and "cursor stability"
transactions at the same time, where the latter would not take
intention locks. Could that affect the considerations about
correctness of the algorithm?

Yours,
Laurenz Albe


Re: Serializable Isolation without blocking

From
Gregory Stark
Date:
"Albe Laurenz" <laurenz.albe@wien.gv.at> writes:

> So I think one would have to add intention locks for rows considered
> in the WHERE clause to guarantee true serializability.

Does the paper explain how to deal with rows "considered" in the WHERE clause
which don't yet exist? Ie, "SELECT count(*) WHERE foo" needs to take out a
lock which would cause any transaction which inserts a new record where foo is
true to be abort.

In MSSQL this requires locking the page of the index where such records would
be inserted (or the entire table if there's no index). In Predicate locking
schemes this requires a separate storage structure for storing such predicates
which can be arbitrarily complex expressions to check any new tuple being
inserted against.

Are these intention locks predicate locks, in that they're not associated with
actual pages or records but with potential records which might be inserted in
the future?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!


Re: Serializable Isolation without blocking

From
Simon Riggs
Date:
On Tue, 2009-05-05 at 10:50 -0500, Kevin Grittner wrote:

> "This paper describes a modification to the concurrency control
> algorithm of a database management system that automatically detects
> and prevents snapshot isolation anomalies at runtime for arbitrary
> applications, thus providing serializable isolation. The new algorithm
> preserves the properties that make snapshot isolation attractive,
> including that readers do not block writers and vice versa. An
> implementation and performance study of the algorithm are described,
> showing that the throughput approaches that of snapshot isolation in
> most cases."

I think this is important work in database theory and a good future
direction for us. It's the right thing to keep an eye on developments
and to consider implementation.

We would need to decide whether intention locks were indeed necessary
when we have MVCC also. Other DBMS without visibility may need to be
more restrictive than we would have to be to implement this. Not sure.

It wouldn't be 692 lines of code and even if it were the impact of that
code would be such that it would need to be optional, since it would
differ in definition from an existing SQL Standard isolation mode and it
would have additional performance implications.

If the use is optional, I would currently prefer the existing mechanism
for implementing serialization, which is to serialize access directly
using either a LOCK statement or an exclusive advisory lock. It's clear
that any new-theory solution will cost significantly more as the number
of users increases, at least O(N^2), whereas simply waiting is only
O(N), AFAICS. (Michael et al don't discuss the algorithmic complexity).
So it seems its use would require some thought and care and possibly
further research to uncover areas of applicability in real usage.

So for me, I would say we leave this be until the SQLStandard changes to
recognise the additional mode. I don't see much advantage for us in
breaking the ground on this feature and it will be costly to implement,
so is a good PhD project.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Serializable Isolation without blocking

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> It wouldn't be 692 lines of code and even if it were the impact of that
> code would be such that it would need to be optional, since it would
> differ in definition from an existing SQL Standard isolation mode and it
> would have additional performance implications.

I thought it would be equal to the SQL standard Serializable mode, 
whereas what we currently call serializable is in fact not as strong as 
the SQL standard Serializable mode.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Serializable Isolation without blocking

From
Simon Riggs
Date:
On Thu, 2009-05-07 at 15:26 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > It wouldn't be 692 lines of code and even if it were the impact of that
> > code would be such that it would need to be optional, since it would
> > differ in definition from an existing SQL Standard isolation mode and it
> > would have additional performance implications.
> 
> I thought it would be equal to the SQL standard Serializable mode, 
> whereas what we currently call serializable is in fact not as strong as 
> the SQL standard Serializable mode.

Our serializable is the same as Oracle's implementation. I think it
would be confusing and non-useful to redefine ours, when it has already
been accepted that the Oracle definition implements the standard
reasonably closely. If that changes, we should also, however.

Perhaps the key point is the O(N^2) complexity of the new algorithm.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Serializable Isolation without blocking

From
"Albe Laurenz"
Date:
Greg Stark wrote:
> > So I think one would have to add intention locks for rows considered
> > in the WHERE clause to guarantee true serializability.
>
> Does the paper explain how to deal with rows "considered" in the WHERE clause
> which don't yet exist? Ie, "SELECT count(*) WHERE foo" needs to take out a
> lock which would cause any transaction which inserts a new record where foo is
> true to be abort.

Quote:
"To prevent phantoms in a system with row-level locking and versioning,
the algorithm described here would need to be extended to take SIREAD locks
on larger granules analogously to multigranularity intention locks in
traditional two-phase locking systems."

[...]

"We have not pursued the details in this paper because the phantom
issue does not arise in our prototype implementation, since Oracle
Berkeley DB does all locking and versioning at page granularity."

End quote.

> Are these intention locks predicate locks, in that they're not associated with
> actual pages or records but with potential records which might be inserted in
> the future?

No, they are associated with the page that contains the actual record.

I think that's also meant with the "larger granules" in the above quote:
Take an intention lock on every page which might affect the condition.

Yours,
Laurenz Albe


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
"Albe Laurenz" <laurenz.albe@wien.gv.at> wrote: 
> maybe I misunderstood something.
> 
> Consider a function
> "makehighlander(personid integer) RETURNS void"
> defined like this:
> 
>    SELECT ishighlander INTO b FROM scots WHERE id=personid;
>    IF b THEN
>       RETURN; /* no need to do anything */
>    END IF;
>    UPDATE scots SET ishighlander=TRUE WHERE id=personid;
>    SELECT count(*) INTO n FROM scots WHERE ishighlander;
>    IF (n > 1) THEN
>       RAISE EXCEPTION 'There can be only one';
>    END IF;
> 
> If we assume that "ishighlander" is false for all records in
> the beginning, and there are two calls to the function with
> two personid's of records *in different pages*, then there cannot be
> any conflicts since all (write and intention) locks taken by each of
> these calls should only affect the one page that contains the one
> record that is updated and then found in the subsequent SELECT.
> 
> Yet if the two execute concurrently and the two first SELECTs are
> executed before the two UPDATEs, then both functions have a snapshot
> so that the final SELECT statements will return 1 and both functions
> will succeed, leaving the table with two highlanders.
I do think you misunderstood.  If there are two concurrent executions
and each reads one row, there will be an SIREAD lock for each of those
rows.  As an example, let's say that one of them (T0) updates its row
and does its count, finds everything looks fine, and commits.  In
reading the row the other transaction (T1) modified it sets the
T0.outConflict flag to true and the T1.inConflict flag to true.  No
blocking occurs.  Now T1 updates its row.  Still no problem, because
if it committed there, there would still be a sequence of transactions
(T0 followed by T1) which would be consistent with the results; but it
selects rows which include the one modified by T0, which causes
T0.inConflict and T1.outConflict to be set to true.  These would both
be pivots in an unsafe pattern of updates.  No mystery which one needs
to be rolled back -- T0 has already committed; so T1 is rolled back
with a serialization failure (probably indicating that it is an unsafe
update versus an update conflict or a deadlock, which are two other
forms of serialization failure).  Assuming that the software
recognizes the serialization failure code and retries, it now finds
that there is already a highlander and fails for real.
-Kevin


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Gregory Stark <stark@enterprisedb.com> wrote: 
> "Albe Laurenz" <laurenz.albe@wien.gv.at> writes:
> 
>> So I think one would have to add intention locks for rows
>> considered in the WHERE clause to guarantee true serializability.
> 
> Does the paper explain how to deal with rows "considered" in the
> WHERE clause which don't yet exist? Ie, "SELECT count(*) WHERE foo"
> needs to take out a lock which would cause any transaction which
> inserts a new record where foo is true to be abort.
The issue is mentioned, along with the note, quoted in my original
post, of why they were able to dodge the issue in the Berkeley DB
implementation.
> In MSSQL this requires locking the page of the index where such
> records would be inserted (or the entire table if there's no index).
This is the only form of predicate locking I've seen in real-world
production databases which provide true serializable behavior.
> In Predicate locking schemes this requires a separate storage
> structure for storing such predicates which can be arbitrarily
> complex expressions to check any new tuple being inserted against.
I've never seen that done in real-world production databases, although
I'm sure it's pretty in theory.
> Are these intention locks predicate locks, in that they're not
> associated with actual pages or records but with potential records
> which might be inserted in the future?
They are predicate locks in the sense that they detect all conflicts
which could occur based on the actual predicate, though they tend to
indicate conflicts in some situations where a rigorous (and expensive)
analisys of the actual predicates might not; but please note that such
locks would be SIREAD locks, which don't block any data modification,
but are only used to detect dangerous update patterns.
-Kevin


Re: Serializable Isolation without blocking

From
"Albe Laurenz"
Date:
Kevin Grittner wrote:
> > maybe I misunderstood something.
> >
> > Consider a function
> > "makehighlander(personid integer) RETURNS void"
> > defined like this:
> >
> >    SELECT ishighlander INTO b FROM scots WHERE id=personid;
> >    IF b THEN
> >       RETURN; /* no need to do anything */
> >    END IF;
> >    UPDATE scots SET ishighlander=TRUE WHERE id=personid;
> >    SELECT count(*) INTO n FROM scots WHERE ishighlander;
> >    IF (n > 1) THEN
> >       RAISE EXCEPTION 'There can be only one';
> >    END IF;
> >
> > If we assume that "ishighlander" is false for all records in
> > the beginning, and there are two calls to the function with
> > two personid's of records *in different pages*, then there cannot be
> > any conflicts since all (write and intention) locks taken by each of
> > these calls should only affect the one page that contains the one
> > record that is updated and then found in the subsequent SELECT.
> >
> > Yet if the two execute concurrently and the two first SELECTs are
> > executed before the two UPDATEs, then both functions have a snapshot
> > so that the final SELECT statements will return 1 and both functions
> > will succeed, leaving the table with two highlanders.
>
> I do think you misunderstood.  If there are two concurrent executions
> and each reads one row, there will be an SIREAD lock for each of those
> rows.  As an example, let's say that one of them (T0) updates its row
> and does its count, finds everything looks fine, and commits.  In
> reading the row the other transaction (T1) modified it sets the
> T0.outConflict flag to true and the T1.inConflict flag to true.

Where does T0 read the row that T1 modified?

>                                                                  No
> blocking occurs.  Now T1 updates its row.

Wait a minute, I am confused. I thought T1 had already modified the row
before T0 committed? Or is "modify" not the update?

>                                            Still no problem, because
> if it committed there, there would still be a sequence of transactions
> (T0 followed by T1) which would be consistent with the results; but it
> selects rows which include the one modified by T0, which causes
> T0.inConflict and T1.outConflict to be set to true.

Where does T1 select rows that were modified by T0? It selects only one
row, the one it modified itself, right?

>                                                      These would both
> be pivots in an unsafe pattern of updates.  No mystery which one needs
> to be rolled back -- T0 has already committed; so T1 is rolled back
> with a serialization failure (probably indicating that it is an unsafe
> update versus an update conflict or a deadlock, which are two other
> forms of serialization failure).  Assuming that the software
> recognizes the serialization failure code and retries, it now finds
> that there is already a highlander and fails for real.

You see, there must be something fundamental I am getting wrong.

Yours,
Laurenz Albe


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
"Albe Laurenz" <laurenz.albe@wien.gv.at> wrote: 
> Kevin Grittner wrote:
>> I do think you misunderstood.  If there are two concurrent
>> executions and each reads one row, there will be an SIREAD lock for
>> each of those rows.  As an example, let's say that one of them (T0)
>> updates its row and does its count, finds everything looks fine,
>> and commits.  In reading the row the other transaction (T1)
>> modified it sets the T0.outConflict flag to true and the
>> T1.inConflict flag to true.
> 
> Where does T0 read the row that T1 modified?
As I said in the original post, I think we would need to track SIREAD
locks in the structures which back the pg_locks view.
>> blocking occurs.  Now T1 updates its row.
> 
> Wait a minute, I am confused. I thought T1 had already modified the
> row before T0 committed? Or is "modify" not the update?
There are so many sequences that I didn't think it was worthwhile to
step through them all, I did say "As an example, let's say that one of
them (T0) updates its row and does its count, finds everything looks
fine, and commits."  If you want to work through the case where they
both UPDATE their rows before either commits, OK; it's not that
different.  Things are OK as far as the first select of a modified row
by the other transaction; you record inConflict for one and
outConflict for the other.  At the point where it goes both
directions, it is clear that there is a dangerous interaction and one
or the other is rolled back.
> Where does T1 select rows that were modified by T0? It selects only
> one row, the one it modified itself, right?
You have to select it to know whether to count it, right?
> You see, there must be something fundamental I am getting wrong.
It is such a radical departure from traditional blocking approaches,
that it can be hard to get your head around it.  :)
-Kevin


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: 
> Simon Riggs wrote:
>> It wouldn't be 692 lines of code and even if it were the impact of
>> that code would be such that it would need to be optional, since it
>> would differ in definition from an existing SQL Standard isolation
>> mode and it would have additional performance implications.
> 
> I thought it would be equal to the SQL standard Serializable mode, 
> whereas what we currently call serializable is in fact not as strong
> as the SQL standard Serializable mode.
Exactly.  The standard probably *should* add SNAPSHOT between
REPEATABLE READ and SERIALIZABLE, but so far have not.  As of the 2003
version of the SQL spec, they added explicit language that makes it
clear that what you get when you ask for SERIALIZABLE mode in
PostgreSQL is *not* compliant (although it is more than adequate for
REPEATABLE READ).
By the way, the other modes are all optional, as you're allowed to
escalate to a higher level whenever a lower level is requested;
SERIALIZABLE is required by the standard and is specified as the
default.
-Kevin


Re: Serializable Isolation without blocking

From
"Albe Laurenz"
Date:
Kevin Grittner wrote:
> > Where does T1 select rows that were modified by T0? It selects only
> > one row, the one it modified itself, right?
>
> You have to select it to know whether to count it, right?

We are getting closer.

So an SIREAD lock is taken for every row that is examined during
the execution of an execution plan?

Ah.

What if there is an index on the "ishighlander" row?
Then an index scan would find only one candidate to examine,
and the other rows would not even be touched by the execution plan.
Then how would they contract an SIREAD lock?

Yours,
Laurenz Albe


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
"Albe Laurenz" <laurenz.albe@wien.gv.at> wrote: 
> What if there is an index on the "ishighlander" row?
> Then an index scan would find only one candidate to examine,
> and the other rows would not even be touched by the execution plan.
I assume you're talking about this line of your function: SELECT count(*) INTO n FROM scots WHERE ishighlander;
I'm further assuming that you meant an index on the ishighlander
*column*.
I can think of more than one way to handle that.  Off the top of my
head, I think it would work to acquire an update lock on both old and
new index entries for a row when it is updated, and to lock the range
of an index used for a scan with the new SIREAD lock.  Or perhaps,
since the row must be visited to test visibility, the update lock
could be on the old and new rows, and the index scan would find the
conflict in that way.  Or it could keep track of the various tuples
which represent different mutations of a row, and link back from the
"not visible to me" row which has been updated to true, and find that
it is a mutation of a visible row.
These are important implementation details to be worked out (very
carefully!).  I don't claim to have worked through all such details
yet, or even to be familiar enough with the PostgreSQL internals to do
so in a reasonable time.  :-(
-Kevin


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Simon Riggs <simon@2ndQuadrant.com> wrote:
> It wouldn't be 692 lines of code
Agreed.  The original implementation was in an MVCC database which
already supported full serializability using strict 2 phase locking
and used page level locks.  Both of these made the implementation
simpler than it would be in PostgreSQL.  (And that's not even
mentioning sub-transactions and distributed transactions!)
> and even if it were the impact of that
> code would be such that it would need to be optional
I was thinking perhaps a GUC to allow "traditional" behavior when
SERIALIZABLE is requested versus using snapshot isolation for
REPEATABLE READ and this new technique for SERIALIZABLE.  Would that
be sane?
> If the use is optional, I would currently prefer the existing
> mechanism for implementing serialization, which is to serialize
> access directly using either a LOCK statement or an exclusive
> advisory lock.
I'm sure many will, particularly where the number of tables is less
than 100 and the number of queries which can be run concurrently is
only a thousand or two.  Picking out the potential conflicts and
hand-coding serialization techniques becomes more feasible on a small
scale like that.
That said, there's a lot less room for mistakes here, once this new
technique is implemented and settled in.  When I was discussing the
receipting and deposit scenario while trying to clarify the
documentation of current behavior, I received several suggestions from
respected members of this community for how that could be handled with
existing techniques which didn't, in fact, correct the problem.  That
just points out to me how tricky it is to solve on an ad hoc basis, as
opposed to a more rigorous technique like the one described in the
paper.
The only suggested fix which *did* work forced actual serialization of
all receipts as well as actual serialization of those with the deposit
report query.  The beauty of this new technique is that there would
not be any blocking in the described scenario, and there would be a
rollback with serialization failure if (and only if) there was an
attempt to run the deposit report query while a transaction for a
receipt on the old date was still pending.  I suspect that the
concurrency improvements of the new technique over existing safe
techniques would allow it to scale well, at least in our environment.
> It's clear that any new-theory solution will cost significantly more
> as the number of users increases, at least O(N^2), whereas simply
> waiting is only O(N), AFAICS.
I'm not following your reasoning on the O(N^2).  Could you explain why
you think it would follow that curve?
> So it seems its use would require some thought and care and possibly
> further research to uncover areas of applicability in real usage.
Care -- of course.  Real usage for serializable transactions -- well
known already.  (Or are you just questioning performance here?)
> So for me, I would say we leave this be until the SQLStandard
> changes to recognise the additional mode.
It already recognizes this mode; it doesn't yet recognize snapshot
isolation (more's the pity).
> I don't see much advantage for us in breaking the ground on this
> feature and it will be costly to > implement, so is a good PhD
> project.
Apparently it's already been done as a PhD project -- by Michael
Cahill, against InnoDB.
-Kevin


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Please keep Michael Cahill copied on this thread, per his request.
I just noticed the omission on a few messages and will forward them to
him.
-Kevin


Re: Serializable Isolation without blocking

From
Simon Riggs
Date:
On Thu, 2009-05-07 at 11:15 -0500, Kevin Grittner wrote:

> Please keep Michael Cahill copied on this thread, per his request.
>  
> I just noticed the omission on a few messages and will forward them to
> him.

Apologies Michael, I see that my mail did remove you. That was a
unconscious error; I was particularly interested in your comments
regarding my assessment of the algorithmic complexity of the new theory
and existing serialization technique.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Serializable Isolation without blocking

From
Simon Riggs
Date:
On Thu, 2009-05-07 at 10:56 -0500, Kevin Grittner wrote:
> > It's clear that any new-theory solution will cost significantly more
> > as the number of users increases, at least O(N^2), whereas simply
> > waiting is only O(N), AFAICS.
>  
> I'm not following your reasoning on the O(N^2).  Could you explain why
> you think it would follow that curve?

Each user must compare against work performed by all other users. O(N).

There are N users, so O(N^2).

With reasonable tuning we can make that work with 10 users each checking
the other's data, but with a 100 we'll end up spending more time
checking for aborts (and aborting) than we would if we had just queued
up for it.

If you want this, the simplest implementation is to quite literally
allow only a single SERIALIZABLE txn onto the system at any time. All
other SERIALIZABLEs queue. Note that simple serialization requires no
special handling for aborted transactions. Implementing that will be
fast, proving it works is trivial and it seems will work better in the
general case.

Yeh, it sucks for medium arrival rate transactions, but its great for
low or high arrival rate transactions. The new model is good for medium
arrival rates only and will take a lot of work to implement, correctly
and sufficiently optimally to keep the applicability window wide enough
to justify the effort.

Optimising it would basically entail implementing the equivalent of
block-level locking.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Simon Riggs <simon@2ndQuadrant.com> wrote:
> Each user must compare against work performed by all other users.
O(N).
> 
> There are N users, so O(N^2).
Why does that apply here and not in the update conflict detection?
-Kevin


Re: Serializable Isolation without blocking

From
Simon Riggs
Date:
On Thu, 2009-05-07 at 12:39 -0500, Kevin Grittner wrote:
> Simon Riggs <simon@2ndQuadrant.com> wrote:
>  
> > Each user must compare against work performed by all other users.
> O(N).
> > 
> > There are N users, so O(N^2).
>  
> Why does that apply here and not in the update conflict detection?

I think the shoe is on the other foot. :-) 

Explain what you think the algorithmic complexity is, and why, if that's
not correct. Can you beat O(N), with Postgres?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Simon Riggs <simon@2ndQuadrant.com> wrote:
> On Thu, 2009-05-07 at 12:39 -0500, Kevin Grittner wrote:
>> Simon Riggs <simon@2ndQuadrant.com> wrote:
>>  
>> > Each user must compare against work performed by all other users.
>> > O(N).
>> > 
>> > There are N users, so O(N^2).
>>  
>> Why does that apply here and not in the update conflict detection?
> 
> I think the shoe is on the other foot. :-) 
That's a question, and I think a fair one.  As with update conflict
detection, you check whether there are any conflicting locks for what
you are currently accessing.  For most usage patterns you won't find
conflicting access the vast majority of the time.  The assertion that
there is some need for each session to wade through something for
every other session seems baseless to me.  I'm wondering what I might
be missing.
If you throw a formula out there, I do think it's incumbent on you to
explain why you think it fits.  If I threw a formula out there, then
it would be fair of you to ask me to explain how I got to it.  I'm not
at a point where I think I can estimate performance impact.  I guess I
would tend to start from the benchmarks published in the paper, some
of which were confirmed by the ACM SIGMOD repeatability committee. 
Eyeballing that, it looks to me like the worst case they found was
about a 15% performance hit, with large stretches of some of the
graphs hanging within 1% of the performance of straight snapshot
isolation.
I think that given published benchmarks with confirmation from an
independent organization like ACM, it would be incumbent on anyone who
questions the benchmarks to explain why they think they're not
accurate or useful.  The only concern I've seen so far has been that
these benchmarks lack long and complex database transactions, which
seems like a fair concern.  Scalability with additional concurrent
sessions seems good as far as they took it, which was 50 sessions. 
Even on a server with 16 CPUs backing a database with 3 million to 4
million hit per day, with tens of millions of database transactions
per day, we use a connection pool with fewer sessions than that.
-Kevin


Re: Serializable Isolation without blocking

From
Simon Riggs
Date:
On Thu, 2009-05-07 at 15:10 -0500, Kevin Grittner wrote:
> The assertion that
> there is some need for each session to wade through something for
> every other session seems baseless to me.  I'm wondering what I might
> be missing.

That's Greg's point. Do we need full locking of everything we might
touch, or tracking of what we have touched? That question is still
unanswered.

If you need the "might touch" then you either need to implement locking
that will effect everybody (which ain't ever gonna fly round here), or
you implement a scheme that is harder work but avoids locking. That is
clearly O(N^2) for non-locking design.

If you track "have touched" only then we can do that with a hash table
in shared memory. That would be O(k), if it is possible.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Serializable Isolation without blocking

From
Greg Stark
Date:
On Thu, May 7, 2009 at 6:31 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Each user must compare against work performed by all other users. O(N).
>
> There are N users, so O(N^2).

i think this logic only works if you must scan every item for every
other user every time. If you have data structures like binary trees
or whatever to fine any matching predicate locks or intent locks or
whatever we're calling them then you can hopefully find them in faster
than O(N) time.

I'm not sure you can do better than a full linear search though. If I
do something like "SELECT count(*) FROM tab WHERE
complex_function(a,b) = 5"

And then you "INSERT INTO tab (a,b) VALUES (1,2)". How would you store
any record of the fact that there's a serialization failure iff
complex_function(1,2)=5 in any way that lets you look it up in any way
other than evaluating complex_function for every set of values
inserted?



-- 
greg


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Simon Riggs <simon@2ndQuadrant.com> wrote:
> Do we need full locking of everything we might
> touch, or tracking of what we have touched?
> If you need the "might touch" then you either need to implement
> locking that will effect everybody (which ain't ever gonna fly round
> here), or you implement a scheme that is harder work but avoids
> locking. That is clearly O(N^2) for non-locking design.
> 
> If you track "have touched" only then we can do that with a hash
> table in shared memory. That would be O(k), if it is possible.
To quote what I think is a relevant section from the paper:
> One property of Berkeley DB that simplified our implementation was
> working with page level locking and versioning. In databases that
> version and lock at row-level granularity (or finer), additional
> effort would be required to avoid phantoms, analogous to standard
> two phase locking approaches such as multigranularity locking.
Since these techniques are used in quite a few databases, I assume
that implementation is fairly well understood.  The big difference is
that rather than traditional read locks which block updates, it would
be these new non-blocking SIREAD locks.  As I understand it, the point
of this technique is to approximate "might touch" through locking
"have touched" on both rows and index ranges.  I know that is
considered crude by some, but the O(N^2) cost of actual predicate lock
calculation would be insane in most real-world environments.
I do have to concede that the paper is silent on how transactions at
other isolation levels behave in this mix.  On a first think-through,
it doesn't seem like they would need to obtain SILOCKs for their
reads, since there is no guarantee that they see things in a state
which would be consistent with some serial execution of the database
transactions.  I don't think transactions at less stringent
transaction isolation levels need to look for SILOCKs, either.  I
wouldn't consider my first pass thinking it through to be particularly
definitive, though.
That interpretation would mean, however, that while the serializable
transactions would satisfy the new, more stringent requirements of
recent versions of the SQL standard, they would still not provide
quite the same guarantees as traditional blocking serializable
transactions.  In my receipting example, traditional techniques would
cause the attempt to update the control record to block until the
receipts on the old date committed or rolled back, and the attempt to
report the day's receipts couldn't proceed until the control record
update was committed, so as long as the transactions which modify data
were serializable, no select at READ COMMITTED or highter could see a
state inconsistent with some serial application of the serializable
transactions.  With this interpretation, even a SELECT-only
transaction would need to be SERIALIZABLE to ensure that that it did
not see the new deposit date when there were still pending receipts
for the old deposit date.  I think I'm OK with that if everyone else
is.
-Kevin


Re: Serializable Isolation without blocking

From
Greg Stark
Date:
On Thu, May 7, 2009 at 6:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Apologies Michael, I see that my mail did remove you. That was a
> unconscious error; I was particularly interested in your comments
> regarding my assessment of the algorithmic complexity of the new theory
> and existing serialization technique.

confusingly you didn't CC him on this message either?

However on subsequent messages you attempted to re-add him but got his
email address wrong. I assume everyone else got a bounce like I got?

-- 
greg


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Greg Stark <stark@enterprisedb.com> wrote: 
> However on subsequent messages you attempted to re-add him but got
> his email address wrong. I assume everyone else got a bounce like I
> got?
Some of my emails are getting through; some not.  I haven't figured
out why.  I'm calling it "best effort" for now, and will send him a
link to the thread in the archives.
-Kevin


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Greg Stark <stark@enterprisedb.com> wrote: 
> If I do something like "SELECT count(*) FROM tab WHERE
> complex_function(a,b) = 5"
> 
> And then you "INSERT INTO tab (a,b) VALUES (1,2)". How would you
> store any record of the fact that there's a serialization failure
> iff complex_function(1,2)=5 in any way that lets you look it up in
> any way other than evaluating complex_function for every set of
> values inserted?
I'd be the last one to shoot down a brighter idea if someone has one,
but I would assume that SELECT shown above would either resolve to a
table scan, in which case you would have to have an SIREAD lock at the
table level, or there would be an index on that function, in which
case you could take out an SIREAD range lock on the appropriate part
of the index.
That said, the above would not cause a serialization failure.  It
would not cause any blocking.  Even if both queries were concurrent,
this would be fine in any order of the steps executing, and it would
meet the requirements of the standard because there is *some order of
serial execution* which would generate the same results as the
concurrent execution -- specifically, the SELECT would appear to have
run before the INSERT.
It would create an edge which would be *halfway* to a problem.  If the
transaction doing the SELECT also modified data which was selected by
some other transaction, or the transaction doing the insert also
selected data which was modified by some other transaction, *then*
something would need to roll back.
-Kevin


Re: Serializable Isolation without blocking

From
Greg Stark
Date:
On Thu, May 7, 2009 at 11:08 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> I would assume that SELECT shown above would either resolve to a
> table scan, in which case you would have to have an SIREAD lock at the
> table level

That sounds like we're back to the MSSQL/Sybase way of doing things
where you have to understand the query plan to understand why you're
getting spurious serialization failures. I don't think that's terribly
appealing. Consider, for example, that we might not *want* to do an
index scan just because there's an index. Or there could be multiple
indexes on the function, we definitely wouldn't want to have to check
for range locks on every index.

We have to think outside of the box and get away from the pre-existing
implementations which have solutions which aren't really applicable.

If we were to look at doing predicate locks in any way they would
probably be stored in a whole new data structure, not related to the
indexes on the table. We would need some way to index them so that we
can look for conflicting locks efficiently independently from the
query plan and table access methods.

I've removed the broken email address for now -- please re-add the
correct email address.

-- 
greg


Re: Serializable Isolation without blocking

From
Simon Riggs
Date:
On Thu, 2009-05-07 at 22:47 +0100, Greg Stark wrote:
> On Thu, May 7, 2009 at 6:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > Apologies Michael, I see that my mail did remove you. That was a
> > unconscious error; I was particularly interested in your comments
> > regarding my assessment of the algorithmic complexity of the new theory
> > and existing serialization technique.
> 
> confusingly you didn't CC him on this message either?
> 
> However on subsequent messages you attempted to re-add him but got his
> email address wrong. I assume everyone else got a bounce like I got?

Something wrong with the email address causes it to be removed after I
send. Not seen anything like that before; I'm not consciously removing
Michael anyway.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Serializable Isolation without blocking

From
"Albe Laurenz"
Date:
Kevin Grittner wrote:
> > What if there is an index on the "ishighlander" row?
> > Then an index scan would find only one candidate to examine,
> > and the other rows would not even be touched by the execution plan.
>
> I assume you're talking about this line of your function:
>
>   SELECT count(*) INTO n FROM scots WHERE ishighlander;

Right.

> I'm further assuming that you meant an index on the ishighlander
> *column*.

Of course. Sorry for the sloppiness.

> I can think of more than one way to handle that.  Off the top of my
> head, I think it would work to acquire an update lock on both old and
> new index entries for a row when it is updated, and to lock the range
> of an index used for a scan with the new SIREAD lock.  Or perhaps,
> since the row must be visited to test visibility,

As far as I know, only the table rows that are found in the index scan
are examined for visibility. Which would be only one in my example.

>                                                   the update lock
> could be on the old and new rows, and the index scan would find the
> conflict in that way.  Or it could keep track of the various tuples
> which represent different mutations of a row, and link back from the
> "not visible to me" row which has been updated to true, and find that
> it is a mutation of a visible row.
>
> These are important implementation details to be worked out (very
> carefully!).  I don't claim to have worked through all such details
> yet, or even to be familiar enough with the PostgreSQL internals to do
> so in a reasonable time.  :-(

Of course, and that is leading us too far. Thanks for your patience.

But in your attempt to sketch a way how true serializability could
be implemented, you went beyond the scope of the original paper,
which does not claim to tackle "phantoms".

I think the idea is promising, and it would be interesting to see
performance results for an implementation that covers predicates.


As you mentioned in your first post, there will not be a free lunch.
What hurts concurrency in an implementation with blocking read locks
might also hurt concurrency in an implementation where transactions
are frequently aborted and have to be retried.

Yours,
Laurenz Albe


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Greg Stark <stark@enterprisedb.com> wrote: 
> On Thu, May 7, 2009 at 11:08 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>> I would assume that SELECT shown above would either resolve to a
>> table scan, in which case you would have to have an SIREAD lock at
>> the table level
> 
> That sounds like we're back to the MSSQL/Sybase way of doing things
Other than not blocking, I suppose.
> where you have to understand the query plan to understand why you're
> getting spurious serialization failures.
You have to know a lot more than that to solve serialization problems
in PostgreSQL with current techniques.
> I don't think that's terribly appealing. Consider, for example, that
> we might not *want* to do an index scan just because there's an
> index.
Someone more familiar than I with S2PL would be better able to
respond, but I *think* you just need to track this on whatever path
is actually chosen, not on all possible paths.
> Or there could be multiple indexes on the function, we definitely
> wouldn't want to have to check for range locks on every index.
Agreed.  And I don't think we have to.
> We have to think outside of the box and get away from the
> pre-existing implementations which have solutions which aren't
> really applicable.
Well, this paper is well outside the box in many respects, although it
still does use well-worn techniques for some portions of the
processing.  If PostgreSQL developers can expand on that with their
own innovations, fantastic!
> If we were to look at doing predicate locks in any way they would
> probably be stored in a whole new data structure, not related to the
> indexes on the table. We would need some way to index them so that
> we can look for conflicting locks efficiently independently from the
> query plan and table access methods.
That might burden this with much worse performance.  I think that the
reason most products *do* base it on the actual rows, blocks, or
tables referenced is that it gives the needed behaviors with good
performance.  Like I said, if we want to combine the innovations in
the paper with something clever and new in the predicate locking area,
OK -- as long as that isn't the cause for sinking the part that can
already be shown to work.
> I've removed the broken email address for now -- please re-add the
> correct email address.
I'll see what I can find.  When I last corresponded with him, he was
still at the University of Sidney, working on his PhD by applying
these techniques to InnoDB.  He specified that email address for
copying him when I opened the dialog.  I don't think either of us
expected it to take this long for PostgreSQL to get to beta so that I
could do so. He said that when that was complete, he would be working
full-time for Oracle, so he's probably moved on to a new location and
this email address has gone stale.  I'll see what I can track down.
-Kevin


Re: Serializable Isolation without blocking

From
Greg Stark
Date:
On Fri, May 8, 2009 at 3:00 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
>
>> I've removed the broken email address for now -- please re-add the
>> correct email address.
>
> I'll see what I can find.  When I last corresponded with him, he was
> still at the University of Sidney, working on his PhD by applying
> these techniques to InnoDB.  He specified that email address for
> copying him when I opened the dialog.  I don't think either of us
> expected it to take this long for PostgreSQL to get to beta so that I
> could do so. He said that when that was complete, he would be working
> full-time for Oracle, so he's probably moved on to a new location and
> this email address has gone stale.  I'll see what I can track down.

For what it's worth I don't think this address has gone stale, I think
someone just got the email address messed up when adding it manually.
The address that was in the headers before was:
 "Michael Cahill mjc"@it.usyd.edu.au

Whereas I suspect the right address was:
 "Michael Cahill" mjc@it.usyd.edu.au

I'll add that on my followups, if others could do the same rif they're
responding to a message where he isn't he should end up back on all
the responses.

--
greg


Re: Serializable Isolation without blocking

From
Greg Stark
Date:
On Fri, May 8, 2009 at 3:00 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Greg Stark <stark@enterprisedb.com> wrote:
>> On Thu, May 7, 2009 at 11:08 PM, Kevin Grittner
>> <Kevin.Grittner@wicourts.gov> wrote:
>>> I would assume that SELECT shown above would either resolve to a
>>> table scan, in which case you would have to have an SIREAD lock at
>>> the table level
>>
>> That sounds like we're back to the MSSQL/Sybase way of doing things
>
> Other than not blocking, I suppose.
>
>> where you have to understand the query plan to understand why you're
>> getting spurious serialization failures.
>
> You have to know a lot more than that to solve serialization problems
> in PostgreSQL with current techniques.

Well you have to understand how Postgres locks work, but that's an
invariant. You don't have to know how Postgres is going to plan your
specific queries -- which you can't ever be sure of since it could
change as the data changes.

>> I don't think that's terribly appealing. Consider, for example, that
>> we might not *want* to do an index scan just because there's an
>> index.
>
> Someone more familiar than I with S2PL would be better able to
> respond, but I *think* you just need to track this on whatever path
> is actually chosen, not on all possible paths.

Well I don't understand what storing locks in an index can accomplish
if other queries might use other indexes or sequential scans to access
the records and never see those locks.

Or does this method only require that writers discover the locks and
therefore only writers can ever fail due to serialization failures
they cause?

I still haven't actually read the paper so I should probably bow out
from the conversation until I do.  I was apparently already under one
misapprehension as Laurenz just claimed the paper does not show how to
prevent "phantoms" (phantom reads I assume?). Perhaps it's not as
ambitious as achieving true serializability after all?

-- 
greg


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
"Albe Laurenz" <laurenz.albe@wien.gv.at> wrote: 
> As far as I know, only the table rows that are found in the index
> scan are examined for visibility. Which would be only one in my
> example.
S2PL locking schemes routinely use index range locks.
> But in your attempt to sketch a way how true serializability could
> be implemented, you went beyond the scope of the original paper,
> which does not claim to tackle "phantoms".
It doesn't put forward techniques to tackle that, apparently
considering the issues to be so well-known and obvious as to not
require more than a brief mention.  They have a section titled
"Phantoms":
>> Throughout the discussion so far, we have followed typi-
>> cal concurrency control theory and assumed that a transac-
>> tion is a sequence of reads and writes on named data items.
>> In general, a relational database engine must also deal with
>> predicate operations (such as SQL *where* clauses). These
>> mean that a concurrency control algorithm must also con-
>> sider phantoms, where an item created or deleted in one
>> transaction would change the result of a predicate operation
>> in a concurrent transaction if the two transactions executed
>> serially. The solution used in traditional two-phase locking
>> is multigranularity locking, where a predicate operation
>> takes intention locks on larger units, such as pages or ta-
>> bles, to prevent insertion of records that might match the
>> predicate.
> As you mentioned in your first post, there will not be a free lunch.
> What hurts concurrency in an implementation with blocking read locks
> might also hurt concurrency in an implementation where transactions
> are frequently aborted and have to be retried.
Possibly; although the benchmarks published in the paper show much
better throughput than traditional blocking techniques with long and
high-conflict transactions.  Eyeballing the graph (based on 20
concurrent sessions), blocking techniques got a 2% deadlock rate in
this benchmark, snapshot isolation got a 2.6% update conflict rate,
and the technique published in the paper got a 0.1% update conflict
rate plus another 7.2% snapshot unsafe rate.  Oddly, in spite of a
serialization failure rate of 7.3% versus snapshot isolation's 2.6%,
the performance of the new technique edged out snapshot isolation when
the number of connections was 35 or higher.  I assume that is because
the rollbacks and restarts somehow reduced thrashing.  The traditional
serializable techniques started to drop below the non-blocking
techniques at about 10 concurrent sessions, and performance kept
dropping from that point while the non-blocking performance continued
to rise all the way to 50.
-Kevin


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Greg Stark <stark@enterprisedb.com> wrote: 
> Well I don't understand what storing locks in an index can
> accomplish if other queries might use other indexes or sequential
> scans to access the records and never see those locks.
> 
> Or does this method only require that writers discover the locks and
> therefore only writers can ever fail due to serialization failures
> they cause?
Well, readers don't need to find the SIREAD locks which readers set. 
Conflicts between writers are handled the same as current PostgreSQL
techniques.  Readers need to look for write locks, and writers need to
look for SIREAD locks.  Neither is blocked by the other, but finding a
conflict sets both transactions with a directional "edge" boolean
flag.  (So we would need to track two booleans per transaction in
addition to the new SIREAD locks.)  When a transaction reaches a state
where both "edge" booleans are set, one of the two transactions
involved in setting that must be rolled back.
The prototype implementation in the Berkeley DB preferred to roll back
a "pivot" transaction (one with both edges set) where possible, so the
failure would probably usually be on a transaction which modified
data, but not necessarily -- if the writers involved have committed
and the reader transaction might see an invalid database state, the
reader would be rolled back.
> I still haven't actually read the paper so I should probably bow out
> from the conversation until I do.  I was apparently already under
> one misapprehension as Laurenz just claimed the paper does not show
> how to prevent "phantoms" (phantom reads I assume?). Perhaps it's
> not as ambitious as achieving true serializability after all?
It does achieve true serializability in terms of the definitions I've
read, although I've discovered at least one way in which its
guarantees aren't as strong as traditional blocking techniques -- it
doesn't guarantee that transactions at a level less strict than
serializable will see a state which would exist between some serial
execution of serializable transactions which modify the data, as the
blocking schemes do.  As I said in an earlier post, I'm OK with that,
personally.  We should probably document it as a difference, to alert
someone converting, but the standard doesn't seem to require the
behavior that traditional blocking approaches provide on this point.
-Kevin


Re: Serializable Isolation without blocking

From
Greg Stark
Date:
On Fri, May 8, 2009 at 4:12 PM, Greg Stark <stark@enterprisedb.com> wrote:
> On Fri, May 8, 2009 at 3:49 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>> Greg Stark <stark@enterprisedb.com> wrote:
>>
>>> Well I don't understand what storing locks in an index can
>>> accomplish if other queries might use other indexes or sequential
>>> scans to access the records and never see those locks.
>>>
>>> Or does this method only require that writers discover the locks and
>>> therefore only writers can ever fail due to serialization failures
>>> they cause?
>>
>> Well, readers don't need to find the SIREAD locks which readers set.
>> Conflicts between writers are handled the same as current PostgreSQL
>> techniques.  Readers need to look for write locks, and writers need to
>> look for SIREAD locks.
>
>
> Well this is where I'm failing to follow. If readers need to be sure
> they'll find write locks then surely they can't be stored in indexes
> without losing any flexibility.

Argh, sorry, as soon as I hit send I realized this is wrong. Writers
already need to insert into every index, so that's not a problem. The
problem is if readers need to see other reader locks. If that's not
true then I guess I'm all wet and I will wait until I read the paper.


--
greg


Re: Serializable Isolation without blocking

From
Tom Lane
Date:
Greg Stark <stark@enterprisedb.com> writes:
> ... Argh, sorry, as soon as I hit send I realized this is wrong. Writers
> already need to insert into every index, so that's not a problem.

What about HOT?
        regards, tom lane


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> Greg Stark <stark@enterprisedb.com> writes:
>> ... Argh, sorry, as soon as I hit send I realized this is wrong.
>> Writers already need to insert into every index, so that's not a
>> problem.
> 
> What about HOT?
I think that a read would need to lock both the row or tuple (not sure
exactly how that would work) and any index used to find the row or
tuple (or detect its absence).  If a table scan is used, the lock
would be at the table level (keeping in mind that this is not a lock
which ever blocks anything).  An insert or an update which created a
new conflicting tuple would hit the table or index lock.  A HOT update
would hit the row lock.
I think....
-Kevin


Re: Serializable Isolation without blocking

From
Greg Stark
Date:
On Fri, May 8, 2009 at 3:49 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Greg Stark <stark@enterprisedb.com> wrote:
>
>> Well I don't understand what storing locks in an index can
>> accomplish if other queries might use other indexes or sequential
>> scans to access the records and never see those locks.
>>
>> Or does this method only require that writers discover the locks and
>> therefore only writers can ever fail due to serialization failures
>> they cause?
>
> Well, readers don't need to find the SIREAD locks which readers set.
> Conflicts between writers are handled the same as current PostgreSQL
> techniques.  Readers need to look for write locks, and writers need to
> look for SIREAD locks.


Well this is where I'm failing to follow. If readers need to be sure
they'll find write locks then surely they can't be stored in indexes
without losing any flexibility.

You would need to be sure other readers will look at the same index
you put the lock in -- so you either need to put the lock in every
index, have other readers look in every index, or have a very limited
predictable way of ensuring everyone will use the same index for any
queries where that lock matters.

--
greg


Re: Serializable Isolation without blocking

From
"Albe Laurenz"
Date:
Kevin Grittner wrote:
> > I still haven't actually read the paper so I should probably bow out
> > from the conversation until I do.  I was apparently already under
> > one misapprehension as Laurenz just claimed the paper does not show
> > how to prevent "phantoms" (phantom reads I assume?). Perhaps it's
> > not as ambitious as achieving true serializability after all?
>
> It does achieve true serializability in terms of the definitions I've
> read, although I've discovered at least one way in which its
> guarantees aren't as strong as traditional blocking techniques -- it
> doesn't guarantee that transactions at a level less strict than
> serializable will see a state which would exist between some serial
> execution of serializable transactions which modify the data, as the
> blocking schemes do.

I still don't buy that this implementation guarantees serializability.

All the authors show with regard to predicate handling is handwaving,
and while you tried to come up with ideas how that could be improved
that is not what the implementation described in the paper does.

So this paper shows a performant implementation of something that is
closer to serializability than "snapshot isolation", but did not go
all the way.

As I said, I think it is promising, and it can only be hoped that
the authors pursue the path they have taken and share their experiences
with an implementation of full serializability with their technique.

Yours,
Laurenz Albe


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
"Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:
> All the authors show with regard to predicate handling is
> handwaving,
That is because predicate locking is a mature technology with many
known implementations.  The best technique for any database product
will depend on that product, and their technique doesn't depend on
which implementation is used.  Assuming some form of predicate
locking, do you have any other qualms about the the algorithm
presented in the paper?
-Kevin


Re: Serializable Isolation without blocking

From
Greg Stark
Date:
On Mon, May 11, 2009 at 2:49 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:
>
>> All the authors show with regard to predicate handling is
>> handwaving,
>
> That is because predicate locking is a mature technology with many
> known implementations.  The best technique for any database product
> will depend on that product, and their technique doesn't depend on
> which implementation is used.  Assuming some form of predicate
> locking, do you have any other qualms about the the algorithm
> presented in the paper?

I thought the big problem with providing true serializability was the
predicate locking. If it doesn't address that need then does this get
us any closer?

Is this like saying walls are a well understood technology so these
antilock brakes work great for stopping your car as long as you
combine them with a wall? :)



--
greg


Re: Serializable Isolation without blocking

From
"Albe Laurenz"
Date:
Kevin Grittner wrote:
> > All the authors show with regard to predicate handling is
> > handwaving,
>
> That is because predicate locking is a mature technology with many
> known implementations.  The best technique for any database product
> will depend on that product, and their technique doesn't depend on
> which implementation is used.  Assuming some form of predicate
> locking, do you have any other qualms about the the algorithm
> presented in the paper?

No - given that the algorithm is correct (which the authors cite from
another paper which I cannot easily access).

In my first reply I wondered if the presence of concurrent "read committed"
transactions would somehow affect the correctness of the algorithm,
as the authors don't mention that.

Yours,
Laurenz Albe


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Greg Stark <stark@enterprisedb.com> wrote:
> I thought the big problem with providing true serializability was
> the predicate locking. If it doesn't address that need then does
> this get us any closer?
I thought the big problem was the perception that performance would
suffer and that the level of blocking required would be unacceptable. 
This technique (based on available benchmarks from the prototype
implementation) seems to give performance very close to snapshot
isolation with no additional blocking beyond what snapshot isolation
already has to support "first committer wins" update conflict
detection.  Benchmarks showed much better performance than traditional
blocking techniques for achieving serializability.
Since it can markedly increase serialization failure rollbacks, the
software needs to be able to respond to those gracefully, but since
our framework automatically re-submits transactions which are
terminated with that SQLSTATE, this approach sound very useful for us.
> Is this like saying walls are a well understood technology so these
> antilock brakes work great for stopping your car as long as you
> combine them with a wall? :)
I see it more like saying that walls are a well understood technology,
and this is a proposal for a way to use them in putting up a
particular useful building.
-Kevin


Re: Serializable Isolation without blocking

From
Greg Stark
Date:
On Mon, May 11, 2009 at 3:11 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Greg Stark <stark@enterprisedb.com> wrote:
>
>> I thought the big problem with providing true serializability was
>> the predicate locking. If it doesn't address that need then does
>> this get us any closer?
>
> I thought the big problem was the perception that performance would
> suffer and that the level of blocking required would be unacceptable.

This thread has really been one of those cases where everyone thought
they were having a different kind of discussion.

If predicate locking is so well understood and if someone who
understands it and understands what kind of implementation would work
well in Postgres steps forward with an implementation which doesn't
cause major downsides then I suspect we might revisit our prejudices
against it. But as it stands I think the assumption is that having to
maintain locks on hypothetical records which don't exist would be an
expensive cost to impose on every query which would unduly impact
performance.

I, for one, certainly assumed if we did anything like that it would
work like our existing locks in that it wouldn't impose any additional
blocking. If there was any question of that then it sounds like this
paper might be a step forward in that you're on-side at least on that
question now?

-- 
greg


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
"Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:
> In my first reply I wondered if the presence of concurrent "read
> committed" transactions would somehow affect the correctness of the
> algorithm, as the authors don't mention that.
Yeah, I was concerned about that, too.  In thinking it through I've
convinced myself that there is a choice in implementation, which seems
to have a pretty obvious winner.
(1)  If READ COMMITTED and SNAPSHOT isolation levels don't change at
all, there would be a behavioral difference between this technique and
strict two phase locking (S2PL) implementations of serializable
transactions. With S2PL, even READ COMMITTED transactions can only
view the database in a state which is consistent with some serial
application of SERIALIZABLE transactions.  Under the algorithm from
this paper, without changes to other isolation levels, if you want to
view the database in a coherent state relative to SERIALIZABLE
transactions, you must use a SERIALIZABLE transaction.
(2)  Promote everything to SERIALIZABLE by having all transactions,
regardless of isolation level, take out SIREAD locks and check for
unsafe access patterns.  This would, strictly speaking, conform to the
SQL standard, because an implementation is free to promote requests
for any level of isolation to a more strict level; however, it hardly
seems useful.
So, I think the only sane thing to do in this regard would be to
document that there is a difference from blocking implementations of
SERIALIZABLE in the guarantees provided for non-serializable
transactions.
-Kevin


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Greg Stark <stark@enterprisedb.com> wrote: 
> Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
>> Greg Stark <stark@enterprisedb.com> wrote:
>>
>>> I thought the big problem with providing true serializability was
>>> the predicate locking. If it doesn't address that need then does
>>> this get us any closer?
>>
>> I thought the big problem was the perception that performance would
>> suffer and that the level of blocking required would be
>> unacceptable.
> 
> This thread has really been one of those cases where everyone
> thought they were having a different kind of discussion.
Apparently so.
> If predicate locking is so well understood and if someone who
> understands it and understands what kind of implementation would
> work well in Postgres steps forward with an implementation which
> doesn't cause major downsides then I suspect we might revisit our
> prejudices against it. But as it stands I think the assumption is
> that having to maintain locks on hypothetical records which don't
> exist would be an expensive cost to impose on every query which
> would unduly impact performance.
It would only impact transactions running at the full serializable
isolation level, and I'm guessing that the performance would be
reasonable if an implementation similar to that in DB2, Sybase,
Microsoft SQL Server, etc. is used.  Some here have derided that
approach as crude and implied that only something more aesthetically
pleasing would be considered, but that such implementations would be
prohibitively slow (which, of course, is exactly why they are not used
in these other products).
> I, for one, certainly assumed if we did anything like that it would
> work like our existing locks in that it wouldn't impose any
> additional blocking.
Until this paper, implementation of serializable transactions, even in
an MVCC database required S2PL techniques which caused a lot of
blocking, including readers blocking on writes and vice versa.  The
advance of this paper isn't any novel implementation of predicate
locking, but the reduction of the locks generated by reads to a new
SIREAD level lock which would not introduce any blocking; but instead
would assist in the detection of unsafe patterns of reads and writes
to allow rollbacks to prevent serialization anomalies.
> If there was any question of that then it sounds like this
> paper might be a step forward in that you're on-side at least on
> that question now?
I was never on the other side of that.  I know that some apparently
thought that my attempts to document PostgreSQL's deviation from
current standards in this regard, and to provide more real-world
examples of where people might run into trouble, were really sly
attempts to persuade people to implement full support for serializable
transactions.  That really wasn't the case.
We had been slightly burned by the differences in spite of my having
read the current documentation, because the example given is so
far-fetched and bizarre, that I rather naively thought "Well, if
that's how far out you have to go to hit a problem, the risk is quite
low."  I was trying to find something which gave people a clearer
picture of the issue, so others didn't make the same mistake.  I
wasn't advocating for full serializable support at that point, and
probably would have been reluctant to use it if available because of
the performance issues (assuming a traditional implementation).
In the course of discussing the issue, this paper, published by ACM
earlier in the year, was brought to my attention.  I see it as the
best of both worlds -- MVCC performance with the protections of
serializable transactions.  Back when I first read the paper, though,
it looked to be a struggle to get 8.4 to beta testing, so I sat on it
until now.
-Kevin


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
I wrote: 
> Greg Stark <stark@enterprisedb.com> wrote: 
>> This thread has really been one of those cases where everyone
>> thought they were having a different kind of discussion.
>
> Apparently so.
In light of discussions at PGCon, I'm going to abandon this thread in
favor of a discussion of what this feature would look like from a user
perspective.  Only after there is consensus (if any) on the
desirability of the behavior will I move on to a discussion of
implementation techniques -- probably on a fresh thread.
For the record, it became clear that I did a bad job of communicating
on this thread, so I'll finish it with some clarifications and
observations "just for the record."
First, the paper didn't propose any new techniques for predicate
locking, other than using a lock level which doesn't block anything
else.  It did reference existing papers on locking techniques.
Second, Robert Haas had an interesting insight: if the techniques from
the paper were implemented in the absence of predicate locks, that
would still reduce the frequency of serialization anomalies from
current snapshot isolation techniques.  Since the PostgreSQL community
generally prefers incremental enhancement patches on the way to a
final result, he suggested that this would be a good path -- do
everything except the predicate locks, show the incremental benefits,
then add the predicate locks.  We could probably even add the
predicate locks in stages: first implement table level predicate
locks, since that has to exist and would provide a complete, if
somewhat clumsy, serializable solution; then move on to more
fine-grained locks.  It would probably be workable, and possibly
optimal, to have just table and page locks; although it seems likely
that index range locks and row locks would also be worth it,
eventually.
But all that is just to get it on the record; I'm getting ahead of
myself here.  I'll be posting on the user-facing aspects of this soon.
-Kevin


Re: Serializable Isolation without blocking

From
Simon Riggs
Date:
On Wed, 2009-05-27 at 13:34 -0500, Kevin Grittner wrote:

> For the record, it became clear that I did a bad job of communicating
> on this thread...

You did good work, IMHO. Not everything will reach consensus and that's
not your fault.

> first implement table level predicate
> locks, since that has to exist and would provide a complete, if
> somewhat clumsy, serializable solution; then move on to more
> fine-grained locks.  It would probably be workable, and possibly
> optimal, to have just table and page locks; although it seems likely
> that index range locks and row locks would also be worth it,
> eventually.

Do we need table-level predicate locks at all? What would they give us?
Why not just go straight for fine-grained page-level locks?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Simon Riggs <simon@2ndQuadrant.com> wrote: 
> Do we need table-level predicate locks at all? What would they give
> us?  Why not just go straight for fine-grained page-level locks?
I don't want to get too far into implementation discussions at this
phase (see Tom's slides ;-)), but suffice it to say that a table scan
can cover more pages than we'd want to track individually....
The coursest possible resolution allows proof of concept.  Tests can
be written that work at that level which should not break as
finer-grained locks are implemented.  (See how I'm drawing from
another presentation? ;-))
-Kevin


Re: Serializable Isolation without blocking

From
Nicolas Barbier
Date:
[ Reviving this old thread because a recent one referred to it. ]

2009/5/7 Albe Laurenz <laurenz.albe@wien.gv.at>:

> Kevin Grittner wrote:
>
>> > maybe I misunderstood something.
>> >
>> > Consider a function
>> > "makehighlander(personid integer) RETURNS void"
>> > defined like this:
>> >
>> >    SELECT ishighlander INTO b FROM scots WHERE id=personid;
>> >    IF b THEN
>> >       RETURN; /* no need to do anything */
>> >    END IF;
>> >    UPDATE scots SET ishighlander=TRUE WHERE id=personid;
>> >    SELECT count(*) INTO n FROM scots WHERE ishighlander;
>> >    IF (n > 1) THEN
>> >       RAISE EXCEPTION 'There can be only one';
>> >    END IF;
>> >
>> > If we assume that "ishighlander" is false for all records in
>> > the beginning, and there are two calls to the function with
>> > two personid's of records *in different pages*, then there cannot be
>> > any conflicts since all (write and intention) locks taken by each of
>> > these calls should only affect the one page that contains the one
>> > record that is updated and then found in the subsequent SELECT.
>> >
>> > Yet if the two execute concurrently and the two first SELECTs are
>> > executed before the two UPDATEs, then both functions have a snapshot
>> > so that the final SELECT statements will return 1 and both functions
>> > will succeed, leaving the table with two highlanders.
>>
>> I do think you misunderstood.  If there are two concurrent executions
>> and each reads one row, there will be an SIREAD lock for each of those
>> rows.  As an example, let's say that one of them (T0) updates its row
>> and does its count, finds everything looks fine, and commits.  In
>> reading the row the other transaction (T1) modified it sets the
>> T0.outConflict flag to true and the T1.inConflict flag to true.
>
> Where does T0 read the row that T1 modified?

* Typically, concurrency theory doesn't care about the specifics of
relational databases: it works on a (possibly countably infinite)
number of data items (sometimes called "variables").
* If a certain concurrency control technique works for such data items
(i.e., can only result in serializable executions or whatever), then
it must necessarily also work for relational databases which map their
data in "pages", if those pages are treated the same way the data
items are. Indexes and any other structures that can be used to *find
out* which other pages to read/write must then also be treated this
way.
* To answer your specific question: T0 might not read that specific
row, but the COUNT(..) definitely must read *something* that must be
modified by T1 when it updates the ishighlander field: either the row
itself (which I would expect if no index on ishighlander exists), or
some page in an index that it used to find out that it didn't need to
inspect the row itself. Otherwise, the update wasn't effective because
re-executing the COUNT(..) later on would not result in any change in
the result (which leads to a contradiction: changing the ishighlander
field of one row must result in a change in the number of
highlanders).

Nicolas


Re: Serializable Isolation without blocking

From
Greg Stark
Date:
On Thu, Dec 31, 2009 at 12:45 PM, Nicolas Barbier
<nicolas.barbier@gmail.com> wrote:
> * To answer your specific question: T0 might not read that specific
> row, but the COUNT(..) definitely must read *something* that must be
> modified by T1 when it updates the ishighlander field:

The problem occurs when the update happens. It doesn't have any way to
know currently that a SELECT has already looked at that record and
that the same transaction has performed an update which this
transaction has already ignored when performing the count(*).

The unsolved problems that have been raised are:

- How and where do we have SELECTs note all the rows they read -- and
all the rows they *would* have read that don't exist already. Note
that something like select count(*) where id=? needs to be able to
detect new rows from being inserted with the specified value, not
merely lock the existing rows.

- Can we do it without requiring major code changes in every index am
and destroying modularity between the transaction management and the
index api.

- How do we do that without causing SELECTS to perform tons of write
i/o they don't have to do now. People already complain about the hint
bit updates the first time you do selects, doing i/o on every select
would be disastrous.

- Can we do that without destroying concurrency with course "locks" a
la MySQL ISAM tables.

- Can we do it without introducing unexpected serialization failure
between transactions updating unrelated rows. Ideally, can we do it in
a way that serialization errors are predictable rather than depending
on access paths the planner chooses so they don't just randomly start
happening when plans change.

-- 
greg


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Greg Stark  wrote:
> The unsolved problems that have been raised are:
> [legion]
Yeah, that's why this is a two to four year project.  And I would
point out that if there *wasn't* a performance hit in serializable
mode, none of the other isolation levels would exist.  These less
rigorous modes exist precisely because people are often willing to
give up some data integrity guarantees, or solve them with more
labor-intensive techniques, to gain performance.  I certainly
wouldn't consider removing any of the existing transaction isolation
levels or attempt to coerce anyone into using them against their will.
;-)
I am keeping track of the lists you're putting out there; they should
be quite useful in the optimization phase.  I do intend to first get
a patch which is "correct" in the sense of never allowing non-
serializable behavior, but which contains some of the problems you
list (although destroying modularity is obviously off the table even
for that point), and then refine the granularity and performance to
try to get within bounds which are acceptable for our use, and
hopefully (eventually) the PostgreSQL community.
One of the things I'm currently working on is what would make a good
set of tests to run during development to track progress.  I welcome
any particular use-cases you want to ensure are covered.  If you
could provide a detailed description or (even better) a self-
contained test case for something you would like to ensure is
covered, that would be most welcome.
Thanks,
-Kevin




Re: Serializable Isolation without blocking

From
Greg Stark
Date:
On Thu, Dec 31, 2009 at 3:11 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Yeah, that's why this is a two to four year project.  And I would
> point out that if there *wasn't* a performance hit in serializable
> mode, none of the other isolation levels would exist.  These less
> rigorous modes exist precisely because people are often willing to
> give up some data integrity guarantees, or solve them with more
> labor-intensive techniques, to gain performance.  I certainly
> wouldn't consider removing any of the existing transaction isolation
> levels or attempt to coerce anyone into using them against their will.
> ;-)

Hm, this raises the issue that you'll have to figure out what should
happen if two different transactions are using different isolation
modes. Currently our two isolation modes only control behaviour within
your transaction so they co-exist perfectly fine.

ISTM you would need to have transactions in read-comitted and snapshot
isolation modes recording what sets of records they read in order to
be able to guarantee serializable transactions to make any guarantees
as well.


Separately even if nobody on the system is using true serializable
isolation the on-disk data structures would need space to store the
additional information just in case anyone uses it. If that's a
substantial amount of space it might impact performance even if you
never use it.


> I am keeping track of the lists you're putting out there; they should
> be quite useful in the optimization phase.  I do intend to first get
> a patch which is "correct" in the sense of never allowing non-
> serializable behavior, but which contains some of the problems you
> list (although destroying modularity is obviously off the table even
> for that point), and then refine the granularity and performance to
> try to get within bounds which are acceptable for our use, and
> hopefully (eventually) the PostgreSQL community.

Yeah, I'm mostly concerned about the initial design question of where
to store all this extra information. It seems like once you've made
that decision most of the consequences will be pretty much set.

The two proposals on the table -- neither of which seem acceptable to me -- are:

1) Store information in indexes used in a scans indicating what scans
are in progress or have been done by the transaction. This means you
need some way to store the xid and the scan keys such that you can
guarantee any index maintenance will see it. You also have to be able
to handle arbitrary sets of xids similar to how we handle multi-xids
currently. Also you'll need some way to handle deletions which
currently don't do any index maintenance. This would have to be done
for every index type and would impose design constraints on the index
behaviours since in many cases it's not so easy to arrange things to
provide these guarantees. It also creates a big dependency on the
planner behaviour such that a change in plans will create user-visible
changes in the serialization failures that are possible. I fear it
would have terrible false-positive rates for some types of plans
possibly even being unable to ever complete some queries, notably
sequential table scans which would make people try to arrange for less
efficient plans because they're trying to avoid serialization
failures. It would also impose space costs on every index on disk.

2) Have some kind of in-memory data structure which stores the filter
conditions for different scans that are in progress or have been
executed by live transactions. This would have to be consulted by
updates to find conflicting filter conditions. This has the problem
that there's no efficient way to search for conflicting filter
conditions -- you would have to do a linear search across all filter
conditions on the same table and do fairly complex theorem-proving for
each one to prove there's no conflict. It would probably perform
terribly. It has the advantage of working regardless of the access
method and not touching index am code at all.


Both of these seem unacceptable to me but perhaps there are solutions
I'm not seeing. I wonder if some kind of hybrid approach is possible
but it seems like it would have the worst of both worlds in that case.

--
greg


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Greg Stark  wrote:
> Hm, this raises the issue that you'll have to figure out what
> should happen if two different transactions are using different
> isolation modes. Currently our two isolation modes only control
> behaviour within your transaction so they co-exist perfectly fine.
>
> ISTM you would need to have transactions in read-comitted and
> snapshot isolation modes recording what sets of records they read
> in order to be able to guarantee serializable transactions to make
> any guarantees as well.
No, there is no requirement that serializable transactions serialize
with weaker modes.  The Cahill thesis addresses this point directly.
Unless you can point out some flaw in his proofs, this is not an
issue.
> [criticisms of hypothetical implementation techniques]
There are no such proposals on the table, and the hypothetical
techniques you mention seem unlikely to be ones I would use.  The one
and only issue I have on the table at the moment is to create a new
lock method for SIREAD locks.  I'd welcome any comments on that.  If
I get to the point of feeling comfortable with that, I'll move
forward to other issues.
-Kevin



Re: Serializable Isolation without blocking

From
Robert Haas
Date:
On Thu, Dec 31, 2009 at 12:10 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Greg Stark  wrote:
>
>> Hm, this raises the issue that you'll have to figure out what
>> should happen if two different transactions are using different
>> isolation modes. Currently our two isolation modes only control
>> behaviour within your transaction so they co-exist perfectly fine.
>>
>> ISTM you would need to have transactions in read-comitted and
>> snapshot isolation modes recording what sets of records they read
>> in order to be able to guarantee serializable transactions to make
>> any guarantees as well.
>
> No, there is no requirement that serializable transactions serialize
> with weaker modes.  The Cahill thesis addresses this point directly.
> Unless you can point out some flaw in his proofs, this is not an
> issue.
>
>> [criticisms of hypothetical implementation techniques]
>
> There are no such proposals on the table, and the hypothetical
> techniques you mention seem unlikely to be ones I would use.  The one
> and only issue I have on the table at the moment is to create a new
> lock method for SIREAD locks.  I'd welcome any comments on that.  If
> I get to the point of feeling comfortable with that, I'll move
> forward to other issues.

Kevin,

I think I understand why you're trying to break this down into
manageable pieces, but I don't think it's really possible to have this
conversation in isolation.  If your question is "Could it ever be
acceptable to add a new lock mode?" then I answer "yes".  And I am
pretty confident that this will also be the consensus view.  But if
your question is "Does it make sense to handle SIREAD locks as a new
lock mode?" then I answer "I don't know, because I haven't seen the
whole design yet".  You just can't answer a question like this in
isolation.

It seems to me that the hard part of this problem is to describe the
general mechanism by which conflicts will be detected, with specific
references to the types of data structures that will be used to hold
that information.

...Robert


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Robert Haas  wrote:
> It seems to me that the hard part of this problem is to describe
> the general mechanism by which conflicts will be detected, with
> specific references to the types of data structures that will be
> used to hold that information.
Well, the general approach to tracking SIREAD locks I had in mind is
to keep them in the existing lock data structures.  I have in mind to
use multiple granularities, with automatic escalation to coarser
granularities at thresholds, to keep RAM usage reasonable.  There are
clearly some tough problems with the pluggable indexes, types,
operators, and such that will take time to sort out an acceptable
implementation at any fine-grained level, so my intent it to punt
those to very coarse granularity in the first pass, with "XXX SIREAD
optimization opportunity" comments where that's not a production-
quality solution or it just seems likely that we can do better with
some work.
I didn't want to get too detailed before I checked that creating a
new lock method for this seemed sane, since the details of the
implementation depend on that choice.  Lack of detail tends to draw
accusations of hand-waving, so I was trying to stay away from those
details until my intuition on the lock method was confirmed or shot
down, so I could solidify those details before presenting them.
There is a bit of a chicken and egg problem with moving this forward
-- I guess I was overly conservative on what I presented.
I do understand that this does mean that more RAM will need to be
allocated to the lock structures to support serializable mode.  I
don't think that any other option is likely to provide acceptable
performance.  I also realize that this means that in the final form,
optimized to where my shop considers it usable, there will still be
coarser granularity than theoretically possible and resulting false
positives causing serialization failures for which the cause is
obscure.  We don't care, and anyone who does will probably not want
to use this isolation level.  Traditional S2PL doesn't have that
fault, but it blocks so badly that performance is worse; I'll take
the transaction restarts over that any day.  I know there are others
who won't.
Basically, the reasons given for having separate lock methods for
DEFAULT ("normal") locks and USER locks seem to apply with almost as
much force to SIREAD locks (no blocking between them, different
source of setting, different lifespans), so I was pretty sure this
was a sane choice, but I just wanted a quick reality check before
developing the level of detail that would move this past hand-waving.
Other than the SIREAD locks to cover predicate locking for
serializable transactions, there is no change to what locks are
acquired.  There is no change to blocking or deadlock detection and
recovery.  Other transaction isolation levels do not need to change,
except perhaps to fast-path a skip over blocking and deadlock
checking against SIREAD locks (one of those details I'm looking at).
Let me know if you need more information to firm up an opinion on the
sanity of my intuition regarding the new lock method; I'm eager to
move on to the next level of detail.
And thanks for the feedback.  :-)
-Kevin


Re: Serializable Isolation without blocking

From
Robert Haas
Date:
On Thu, Dec 31, 2009 at 1:43 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas  wrote:
>> It seems to me that the hard part of this problem is to describe
>> the general mechanism by which conflicts will be detected, with
>> specific references to the types of data structures that will be
>> used to hold that information.
>
> Well, the general approach to tracking SIREAD locks I had in mind is
> to keep them in the existing lock data structures.  I have in mind to
> use multiple granularities, with automatic escalation to coarser
> granularities at thresholds, to keep RAM usage reasonable.

OK.  I think it will become more clear whether the existing lock data
structures are adequate as you move into detailed design.  It doesn't
seem critical to make a final decision about that right now.  One bad
thing about using the existing lock structures is that they are
entirely in shared memory, which limits how large they can be.  If you
should find out that you're going to need more work space than can be
conveniently accommodated in shared memory, you will have to think
about other options.  But I don't know for sure whether that will be
the case.  The fact that the locks need to be kept around until
transactions other than the owner commit is certainly going to drive
the size up.

> There are
> clearly some tough problems with the pluggable indexes, types,
> operators, and such that will take time to sort out an acceptable
> implementation at any fine-grained level, so my intent it to punt
> those to very coarse granularity in the first pass, with "XXX SIREAD
> optimization opportunity" comments where that's not a production-
> quality solution or it just seems likely that we can do better with
> some work.

It seems to me that if you lock the heap (either individual rows, or
the whole thing) none of that stuff really matters.  It might defeat
future optimizations such as index-only scans in some cases, and it
might create full-table locks in situations where a more intelligent
implementation might use less than a full-table lock, but those may be
(probably are) prices you are willing to pay.

As an overall design comment, I sometimes find that it helps to create
a working implementation of something, even if I know that the
performance will suck or that the result will not be committable for
other reasons.   There is often value to that just in terms of getting
your head around the parts of the code that need to be modified.  I
wonder if you couldn't start with something ridiculously poor, like
maybe an S2PL implementation with only table-level granularity - just
make any operation that reads or writes a table grab an ACCESS
EXCLUSIVE lock until transaction commit.  Convince yourself that it is
CORRECT - forget performance.  Then either change the locks to SIREAD,
or try to weaken the locks to row-level in certain cases.  Then do the
other one.  It'll take you a while before you have something that can
seriously be considered for commit, but that's not the point.  The
point is you'll have working code that you can fool with.

And use git so you can keep merging up to CVS HEAD easily.

> And thanks for the feedback.  :-)

Sure thing.  :-)

...Robert


Re: Serializable Isolation without blocking

From
Jeff Davis
Date:
On Thu, 2009-12-31 at 09:11 -0600, Kevin Grittner wrote:
> Yeah, that's why this is a two to four year project. 

I started a wiki page here:

http://wiki.postgresql.org/wiki/Serializable

I didn't add much content yet, but can participants in this discussion
please try to organize the various issues as they progress?

If there was an existing wiki page I couldn't find it.

Regards,Jeff Davis



Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Robert Haas  wrote:
> OK. I think it will become more clear whether the existing lock
> data structures are adequate as you move into detailed design.
I've gotten far enough in reviewing it to be pretty convinced that
they'll cover all the granularities I'm likely to want unless I get
to the point of wanting to try to lock individual columns within
individual rows.  It probably won't come to that, so I figure I'll
cross that bridge if and when I come to it.
> As an overall design comment, I sometimes find that it helps to
> create a working implementation of something, even if I know that
> the performance will suck or that the result will not be
> committable for other reasons.  There is often value to that just
> in terms of getting your head around the parts of the code that
> need to be modified.
That's exactly where I've been trying to go at this point.
> I wonder if you couldn't start with something ridiculously poor,
> like maybe an S2PL implementation with only table-level granularity
> - just make any operation that reads or writes a table grab an
> ACCESS EXCLUSIVE lock until transaction commit.
There's an idea I hadn't thought of -- doing S2PL but with ACCESS
EXCLUSIVE locks for the read locks to test the predicate locking.  It
would let me play around with testing that phase before I moved to
the next with minimal wasted effort.
> Convince yourself that it is CORRECT - forget performance. Then
> either change the locks to SIREAD, or try to weaken the locks to
> row-level in certain cases. Then do the other one. It'll take you
> a while before you have something that can seriously be considered
> for commit, but that's not the point. The point is you'll have
> working code that you can fool with.
We're very much on the same page.  My goal was to get predicate
locking that didn't miss anything, even though it was ridiculously
coarse, then implement the simplest possible SSI on top of it, without
worrying about optimizations, then incrementally move toward
production quality.  I clearly didn't communicate that as well as I'd
hoped.  :-(  Anyway, I'll think about the S2PL with ACCESS EXCLUSIVE
locks for reads; if I can't punch any holes in that as a valid
environment to test the predicate locking logic, I'll do that first,
then switch them to SIREAD locks and work on the SSI logic.
> And use git so you can keep merging up to CVS HEAD easily.
I know.  It's on my list to do soon.
Thanks again,
-Kevin


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Jeff Davis  wrote:
> I started a wiki page here:
>
> http://wiki.postgresql.org/wiki/Serializable
I'll try to get that filled in with something useful over the
weekend.  I'm heading to a party soon, and may not be in shape to
work on it tomorrow....  ;-)
Happy New Year, all!
-Kevin




Re: Serializable Isolation without blocking

From
Greg Stark
Date:
aaah... I think I see where we've gone off track in previous
discussions...you think postgres keeps row level locks in a shared
memory data structure.  It doesn't it stores all row level locks *in*
the tuple itself.  It only stores the lock in memory briefly while
actually acquiring the lock. Once it acquires it the only record of
the lock is the xid in the tuple itself.

This means there are no memory limits on the number of records locked
by a transaction.

storing the lock data in the tuples won't work for you at all because
you need to lock rows that don't exist yet at all.that's why "where to
store the lock" is a critical blocking issue to figure out to know
whether the plan is feasible at all.

On Thursday, December 31, 2009, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas  wrote:
>
>> It seems to me that the hard part of this problem is to describe
>> the general mechanism by which conflicts will be detected, with
>> specific references to the types of data structures that will be
>> used to hold that information.
>
> Well, the general approach to tracking SIREAD locks I had in mind is
> to keep them in the existing lock data structures.  I have in mind to
> use multiple granularities, with automatic escalation to coarser
> granularities at thresholds, to keep RAM usage reasonable.  There are
> clearly some tough problems with the pluggable indexes, types,
> operators, and such that will take time to sort out an acceptable
> implementation at any fine-grained level, so my intent it to punt
> those to very coarse granularity in the first pass, with "XXX SIREAD
> optimization opportunity" comments where that's not a production-
> quality solution or it just seems likely that we can do better with
> some work.
>
> I didn't want to get too detailed before I checked that creating a
> new lock method for this seemed sane, since the details of the
> implementation depend on that choice.  Lack of detail tends to draw
> accusations of hand-waving, so I was trying to stay away from those
> details until my intuition on the lock method was confirmed or shot
> down, so I could solidify those details before presenting them.
> There is a bit of a chicken and egg problem with moving this forward
> -- I guess I was overly conservative on what I presented.
>
> I do understand that this does mean that more RAM will need to be
> allocated to the lock structures to support serializable mode.  I
> don't think that any other option is likely to provide acceptable
> performance.  I also realize that this means that in the final form,
> optimized to where my shop considers it usable, there will still be
> coarser granularity than theoretically possible and resulting false
> positives causing serialization failures for which the cause is
> obscure.  We don't care, and anyone who does will probably not want
> to use this isolation level.  Traditional S2PL doesn't have that
> fault, but it blocks so badly that performance is worse; I'll take
> the transaction restarts over that any day.  I know there are others
> who won't.
>
> Basically, the reasons given for having separate lock methods for
> DEFAULT ("normal") locks and USER locks seem to apply with almost as
> much force to SIREAD locks (no blocking between them, different
> source of setting, different lifespans), so I was pretty sure this
> was a sane choice, but I just wanted a quick reality check before
> developing the level of detail that would move this past hand-waving.
>
> Other than the SIREAD locks to cover predicate locking for
> serializable transactions, there is no change to what locks are
> acquired.  There is no change to blocking or deadlock detection and
> recovery.  Other transaction isolation levels do not need to change,
> except perhaps to fast-path a skip over blocking and deadlock
> checking against SIREAD locks (one of those details I'm looking at).
>
> Let me know if you need more information to firm up an opinion on the
> sanity of my intuition regarding the new lock method; I'm eager to
> move on to the next level of detail.
>
> And thanks for the feedback.  :-)
>
> -Kevin
>

--
greg


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Greg Stark  wrote:
> aaah... I think I see where we've gone off track in previous
> discussions...you think postgres keeps row level locks in a shared
> memory data structure. It doesn't it stores all row level locks
> *in* the tuple itself. It only stores the lock in memory briefly
> while actually acquiring the lock. Once it acquires it the only
> record of the lock is the xid in the tuple itself.
> This means there are no memory limits on the number of records
> locked by a transaction.
> storing the lock data in the tuples won't work for you at all
> because you need to lock rows that don't exist yet at all.that's
> why "where to store the lock" is a critical blocking issue to
> figure out to know whether the plan is feasible at all.
I'm probably not quite as clueless as you think on this; I realize
that keeping SIREAD locks in memory will require many more slots for
locks, escalation from tuple level to page or coarser when there are
many on a table, or (most likely) both.  Please have patience while I
try to work out the details; until I get a bit farther, everyone will
be spinning their wheels if we try to get too far into details -- it
will all be speculation and/or hand-waving, with much mayhem to straw
men.
This much is fairly firm in my head, so I probably should share:
What I do think is that the structures for regular locks seem usable
to track the information I need without having to burden read
operations with disk output, which I see as absolutely necessary for
adequate performance.  It also gives me somewhere to store locks
related to search ranges where data doesn't currently exist, and the
ability to store read locks from many transactions against the same
data.
An open question in my mind is whether I might need to keep write
locks for serializable transactions in the shared memory lock hash
table until commit or rollback; I rather suspect that I will, with
similar granularity escalation policies.  That's likely to raise a
hue and cry, but like I've said before -- I won't try to force
anybody to use this, and the structures involved are of reasonable
size to allow using many of them.  I suppose these more persistent
write locks should be kept out of the DEFAULT lock method, too....
Thanks, and Happy New Year!
-Kevin


Re: Serializable Isolation without blocking

From
Robert Haas
Date:
On Thu, Dec 31, 2009 at 4:44 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
>> I wonder if you couldn't start with something ridiculously poor,
>> like maybe an S2PL implementation with only table-level granularity
>> - just make any operation that reads or writes a table grab an
>> ACCESS EXCLUSIVE lock until transaction commit.
>
> There's an idea I hadn't thought of -- doing S2PL but with ACCESS
> EXCLUSIVE locks for the read locks to test the predicate locking.  It
> would let me play around with testing that phase before I moved to
> the next with minimal wasted effort.

What predicate locking?  If you take ACCESS EXCLUSIVE locks on every
read, that should serialize all access to every table.  Predicate
locking wouldn't do anything, because the table would be completely
inaccessible to all competing transactions.

...Robert


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Robert Haas  wrote:
> What predicate locking? If you take ACCESS EXCLUSIVE locks on every
> read, that should serialize all access to every table. Predicate
> locking wouldn't do anything, because the table would be completely
> inaccessible to all competing transactions.
Yeah, that's the benefit of starting with the ACCESS EXCLUSIVE locks,
but once I've confirmed that I've found all the places to get the
table level locks, the next step is to turn them into table level
SIREAD locks, and then to implement the SSI.  Locking against
referenced objects is the only practical technique for implementing
predicate locking for production environments that I've seen.
The phase where I'm making each referenced table totally inaccessible
to all competing transaction should be pretty short-lived.  It just
gives me an interim milestone to test that piece in isolation before
going on to use it; which is great, but not a place to stop for long.
Or have I totally misunderstood your suggestion?
-Kevin


Re: Serializable Isolation without blocking

From
Robert Haas
Date:
On Thu, Dec 31, 2009 at 7:45 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas  wrote:
>
>> What predicate locking? If you take ACCESS EXCLUSIVE locks on every
>> read, that should serialize all access to every table. Predicate
>> locking wouldn't do anything, because the table would be completely
>> inaccessible to all competing transactions.
>
> Yeah, that's the benefit of starting with the ACCESS EXCLUSIVE locks,
> but once I've confirmed that I've found all the places to get the
> table level locks, the next step is to turn them into table level
> SIREAD locks, and then to implement the SSI.  Locking against
> referenced objects is the only practical technique for implementing
> predicate locking for production environments that I've seen.
>
> The phase where I'm making each referenced table totally inaccessible
> to all competing transaction should be pretty short-lived.  It just
> gives me an interim milestone to test that piece in isolation before
> going on to use it; which is great, but not a place to stop for long.
>
> Or have I totally misunderstood your suggestion?

Nope, you're on target.  Although - if I were you - I would post the
ACCESS EXCLUSIVE lock version of the patch for feedback.  I can't
speak for anyone else, but I'll read it.

(Just clearly label it as what it is, of course.)

...Robert


Re: Serializable Isolation without blocking

From
"Albe Laurenz"
Date:
Kevin Grittner wrote:
>> It seems to me that the hard part of this problem is to describe
>> the general mechanism by which conflicts will be detected, with
>> specific references to the types of data structures that will be
>> used to hold that information.
>
> Well, the general approach to tracking SIREAD locks I had in mind is
> to keep them in the existing lock data structures.
> [...]
If I remember right, one necessity for the SIREAD lock technique was
that SIREAD locks taken by a transaction have to be kept after the
transaction has ended.
Won't that require fundamental changes?
Yours,
Laurenz Albe


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
"Albe Laurenz"  wrote:
> If I remember right, one necessity for the SIREAD lock technique
> was that SIREAD locks taken by a transaction have to be kept after
> the transaction has ended.
Correct.  An SIREAD lock must be held until completion of the last
serializable transaction holding a snapshot earlier than the lock
transaction's commit.

Exceptions to that are:
- If a serializable transaction is rolled back, all SIREAD locks can be immediately released.
- If an an ACCESS EXCLUSIVE lock is acquired on the exact same object already locked by an SIREAD lock, the SIREAD lock
canbe released. (This is an optimization, not a requirement for correctness.)
 
> Won't that require fundamental changes?
We already have two different lock methods, one of which already
keeps locks past the end of a database transaction.  USER locks
persist until the session terminates or the lock is explicitly
released.  Given that, I don't think it would be fair to characterize
a third lock method with a third lifespan as a fundamental change --
just another implementation detail.
-Kevin


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Jeff Davis  wrote:
> I started a wiki page here:
>
> http://wiki.postgresql.org/wiki/Serializable
I've filled it in with all relevant information which came to mind.
If you can spot something I missed, please feel free to correct that
or let me know so that I can.
-Kevin




Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
> And use git so you can keep merging up to CVS HEAD easily.
Regarding this, I was thinking that it would make sense to use a
repository on git.postgresql.org to coordinate my work with any
other people interested in the project.  I would clone from there
for the repository on my own machine.  Does that make sense?  (I
hope so, since I applied for a "serializable" repository there two
or three days ago....)  I've read through some git tutorials, but
there's a lot to digest and I'm not entirely sure this is a good way
to proceed.
-Kevin


Re: Serializable Isolation without blocking

From
Robert Haas
Date:
On Wed, Jan 6, 2010 at 4:29 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> And use git so you can keep merging up to CVS HEAD easily.
>
> Regarding this, I was thinking that it would make sense to use a
> repository on git.postgresql.org to coordinate my work with any
> other people interested in the project.  I would clone from there
> for the repository on my own machine.  Does that make sense?  (I
> hope so, since I applied for a "serializable" repository there two
> or three days ago....)  I've read through some git tutorials, but
> there's a lot to digest and I'm not entirely sure this is a good way
> to proceed.

I think you should have users/kgrittner/postgres.git rather than
serializable.git.  serializable sounds more like the branch name.

...Robert


Re: Serializable Isolation without blocking

From
Dimitri Fontaine
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> I've read through some git tutorials, but
> there's a lot to digest and I'm not entirely sure this is a good way
> to proceed.

I found that the following video is really helpful at grasping the
concepts of git, that it exposes pretty directly even though it's meant
to promote a particular GUI for it. If you happen to use Emacs, consider
using magit, it's really good at what it does.
 http://alexvollmer.com/index.php/2009/01/18/meet-magit/

Regards,
-- 
dim


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
> I think you should have users/kgrittner/postgres.git rather than
> serializable.git.  serializable sounds more like the branch name.
Hmmm....  On a multi-year project it seemed more than remotely
possible that responsibility for the project could shift, either to
an external contractor or some other programmer here, so a name
related to the reason for the repository rather than the current
lead programmer seemed to make sense.  One wouldn't want the
repository location shifting based on assignments, I would have
thought.  But if that's going to throw people, I can easily withdraw
my original request and resubmit along the lines you mention.  I'll
wait a bit for other comments before taking any action.
-Kevin


Re: Serializable Isolation without blocking

From
Robert Haas
Date:
On Wed, Jan 6, 2010 at 5:04 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> I think you should have users/kgrittner/postgres.git rather than
>> serializable.git.  serializable sounds more like the branch name.
>
> Hmmm....  On a multi-year project it seemed more than remotely
> possible that responsibility for the project could shift, either to
> an external contractor or some other programmer here, so a name
> related to the reason for the repository rather than the current
> lead programmer seemed to make sense.  One wouldn't want the
> repository location shifting based on assignments, I would have
> thought.  But if that's going to throw people, I can easily withdraw
> my original request and resubmit along the lines you mention.  I'll
> wait a bit for other comments before taking any action.

Well the nice thing about git is that there is no reason why you need
to have a single central repository...

...Robert


Re: Serializable Isolation without blocking

From
Markus Wanner
Date:
Hi,

Greg Stark wrote:
> aaah... I think I see where we've gone off track in previous
> discussions...you think postgres keeps row level locks in a shared
> memory data structure.  It doesn't it stores all row level locks *in*
> the tuple itself.  It only stores the lock in memory briefly while
> actually acquiring the lock. Once it acquires it the only record of
> the lock is the xid in the tuple itself.

This is a very good point. However, I'm not clear if Kevin plans to go
down to tuple level locking with granularity of the SIREAD thing. (I
don't like calling it a lock, because it actually isn't. Let's better
call it a hint or a mark).

We certainly cannot store all of the SIREAD hint information within the
tuple header, as there may be multiple transactions having marked the
same tuple SIREAD, but we don't want to spend lots of bits for SSI there
(rather no single bit).

It's easy to see that the SIREAD hint information doesn't fit in shared
memory for large enough tables. The only way I can imagine tuple-level
SIREAD locking half-ways working is by using that for transactions
reading only a few tuples and fall back to some coarser grained locking
strategy after a certain limit (per transaction).

> storing the lock data in the tuples won't work for you at all because
> you need to lock rows that don't exist yet at all.

I'm not sure if I understand this correctly, but I don't think you need
to "lock" tuples that don't exist, at least not with SIREAD. The Cahill
paper covers this under "Detecting Phantoms" and proposes to use plain
predicate locking to cover tuple level granularity AFAIUI.

The proposed SIREAD hint seems to be an optimization that only works for
existing tuples. I don't find it hard to believe that it performs better
than a general purpose predicate locking strategy. (Especially for test
cases with short transactions, i.e. only few SIREAD locks per txn).

(It's interesting that with "database page" level granularity, he states
that predicate locking would not be necessary. Instead any page can be
locked at any time. For this to work, according to my reasoning, you'd
have to know in advance on which page potentially accessible (but not
yet visible) new tuples would end up. This is close to impossible for
Postgres, however, it seems to work for Berkeley DB).

> that's why "where to
> store the lock" is a critical blocking issue to figure out to know
> whether the plan is feasible at all.

I absolutely agree to that. As I came to think of it more as a hint or
mark (and because of the different lifecycle of SIREAD hints than
locks), I think basing that on existing table level locks isn't a good idea.

How about storing the SIREAD info in shared memory and using dynamic
granularity based on the conflict rate and available memory? *duck*

As this seems to be an optimization of predicate locking, don't we need
to implement that first?

Regards

Markus Wanner



Re: Serializable Isolation without blocking

From
Markus Wanner
Date:
Hi,

Kevin Grittner wrote:
> We're very much on the same page.  My goal was to get predicate
> locking that didn't miss anything, even though it was ridiculously
> coarse, then implement the simplest possible SSI on top of it, without
> worrying about optimizations, then incrementally move toward
> production quality.  I clearly didn't communicate that as well as I'd
> hoped.  :-(

I understood it now :-)  And I think it generally is a good plan.

I sort of missed the predicate locking step from the wiki's Development
Path. I now see it has its own section above. Is there any such plan of
development for predicate locking? To me that seems to be the harder
problem (due to being more general).

Regards

Markus Wanner






Re: Serializable Isolation without blocking

From
Nicolas Barbier
Date:
2010/1/7 Markus Wanner <markus@bluegap.ch>:

> (It's interesting that with "database page" level granularity, he states
> that predicate locking would not be necessary. Instead any page can be
> locked at any time. For this to work, according to my reasoning, you'd
> have to know in advance on which page potentially accessible (but not
> yet visible) new tuples would end up. This is close to impossible for
> Postgres, however, it seems to work for Berkeley DB).

The specifics of relation databases can be entirely ignored in case
serializability is provided on the "page layer" level. This also
implies that pages that are part of indexes and such must be treated
in exactly the same way as the pages that constitute the heap.
"Predicate locking" is only needed to provide the same semantics in a
context where the page layer doesn't provide serializability, but
other layers (that understand that we are talking about a relational
database) are supposed to provide it.

I am not suggesting that Postgres should start supporting page layer
level concurrency control, but rather indicating that most literature
should be interpreted this way: concurrency control can be viewed as
orthogonal to relational databases, so that is the way it was
originally modeled.

> As this seems to be an optimization of predicate locking, don't we need
> to implement that first?

Whole-table locking is a trivial implementation of predicate locking.

Nicolas


Re: Serializable Isolation without blocking

From
Markus Wanner
Date:
Hi,

Kevin Grittner wrote:
> I'm probably not quite as clueless as you think on this; I realize
> that keeping SIREAD locks in memory will require many more slots for
> locks, escalation from tuple level to page or coarser when there are
> many on a table, or (most likely) both.

..oh, there's the dynamic adjustment idea again. Cool!

> What I do think is that the structures for regular locks seem usable
> to track the information I need without having to burden read
> operations with disk output, which I see as absolutely necessary for
> adequate performance.

I certainly agree that SIREAD locks should never hit the disk. However,
thinking of SIREAD more as marks or hints, I'm not so sure the current
locking structures are appropriate.

If we are talking about table level locks, those are not fine grained
enough and have different semantics (i.e. a user can acquire these
locks, which certainly doesn't make any sense for SIREAD). My gut
feeling is that adjusting them for use with SIREAD is more work than
implementing your own shared memory area for SIREAD marks.

Row level locks are very fine grained, but those are spilled to disk in
its current implementation. So those are an even worse fit for the needs
of SIREAD.

> It also gives me somewhere to store locks
> related to search ranges where data doesn't currently exist, and the
> ability to store read locks from many transactions against the same
> data.

Isn't a hash table in shared memory enough to store the SIREAD locks
(and a lot simpler to manage)?

> An open question in my mind is whether I might need to keep write
> locks for serializable transactions in the shared memory lock hash
> table until commit or rollback

IIUC row level locks are kept in the tuple header until the end of the
transaction (and even beyond). Is this really a problem?

> I suppose these more persistent
> write locks should be kept out of the DEFAULT lock method, too....

I fail to understand that part. What's the DEFAULT lock method?

Regards

Markus Wanner



Re: Serializable Isolation without blocking

From
Markus Wanner
Date:
Hi,

Nicolas Barbier wrote:
> The specifics of relation databases can be entirely ignored in case
> serializability is provided on the "page layer" level.

Aha, I now see very vaguely how that could work, yes. Thank you for
elaborating on this. I agree that this isn't the best way forward for
Postgres.

>> As this seems to be an optimization of predicate locking, don't we need
>> to implement that first?
> 
> Whole-table locking is a trivial implementation of predicate locking.

..and whole-database locking is a trivial implementation of true
serializability. In a way, both are optimizations of that trivial
implementation. My point is that due to that dependency, the conceptual
design of a solution for predicate locking (with acceptable performance)
should at least be considered before going into details with true
serializability.

Regards

Markus Wanner



Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>  
>> I think you should have users/kgrittner/postgres.git rather than
>> serializable.git.  serializable sounds more like the branch name.
>  
> I'll wait a bit for other comments before taking any action.
Robert's advice being the last (and only) offered on the topic, I'm
taking the silence as agreement and have dropped the request for a
"serializable" repository and added one for /users/kgrittn/postgres
instead.
-Kevin


Re: Serializable Isolation without blocking

From
Magnus Hagander
Date:
On Thu, Jan 7, 2010 at 19:08, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
>> Robert Haas <robertmhaas@gmail.com> wrote:
>>
>>> I think you should have users/kgrittner/postgres.git rather than
>>> serializable.git.  serializable sounds more like the branch name.
>>
>> I'll wait a bit for other comments before taking any action.
>
> Robert's advice being the last (and only) offered on the topic, I'm
> taking the silence as agreement and have dropped the request for a
> "serializable" repository and added one for /users/kgrittn/postgres
> instead.

... and i've approved it :-)

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Serializable Isolation without blocking

From
Greg Stark
Date:
On Thu, Jan 7, 2010 at 11:08 AM, Markus Wanner <markus@bluegap.ch> wrote:
> Row level locks are very fine grained, but those are spilled to disk in
> its current implementation. So those are an even worse fit for the needs
> of SIREAD.
>

I think we're still talking past the issue. Predicate locks are not
row level, nor page level, nor table level. They're locks on
predicates. Ie, you have to lock against values which aren't actually
currently in the table at all. You need to be able to detect a
conflict between a transaction which read rows "WHERE i BETWEEN 1 AND
10" and one which tries to insert a row with i=1 even if the first
transaction didn't see any rows with i=1 (or indeed even if the first
transaction found no rows at all).

That's the hard part. How do you represent such a lock in a way that I
can efficiently find it and check for the conflict when it comes time
for me to do my insert.

And how do you do that without tying the implementation to specific
details of how our planner, table scans, and index access methods
work?

-- 
greg


Re: Serializable Isolation without blocking

From
Robert Haas
Date:
On Thu, Jan 7, 2010 at 2:40 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Thu, Jan 7, 2010 at 11:08 AM, Markus Wanner <markus@bluegap.ch> wrote:
>> Row level locks are very fine grained, but those are spilled to disk in
>> its current implementation. So those are an even worse fit for the needs
>> of SIREAD.
>
> I think we're still talking past the issue. Predicate locks are not
> row level, nor page level, nor table level.

They're not?  They're just floating out in space somewhere?  There are
several possible ways to implement predicate locks, but every possible
method I can think of involves attaching them at one of those places.

> And how do you do that without tying the implementation to specific
> details of how our planner, table scans, and index access methods
> work?

You don't.  You add additional methods to the index AMs to support
predicate locks, just lilke we do when we want them to support
anything else (bitmap index scans, index-only scans, etc.).

...Robert


Re: Serializable Isolation without blocking

From
Jeff Davis
Date:
On Thu, 2010-01-07 at 15:02 -0500, Robert Haas wrote:
> > I think we're still talking past the issue. Predicate locks are not
> > row level, nor page level, nor table level.
> 
> They're not?  They're just floating out in space somewhere?  There are
> several possible ways to implement predicate locks, but every possible
> method I can think of involves attaching them at one of those places.

Consider an exclusion constraint, which is a kind of predicate lock. You
could say that the lock is in the index (now) -- but my first
implementation used a shared memory structure instead, so it's clearly
not required to exist in the index. You could also say that the lock is
really attached to the table, because there are catalog entries
connected to the table that express the constraint -- but I think that's
splitting hairs.

What Greg is saying (as I understand it) is that the lock conflicts with
tuples that don't even exist yet. We can tack them on to any structure
that's convenient, of course. But just because you want to implement
them by locking a few index pages (assuming there is an index) doesn't
mean we should reject Greg's line of thinking.

Regards,Jeff Davis



Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Jeff Davis <pgsql@j-davis.com> wrote:
> Consider an exclusion constraint, which is a kind of predicate
> lock. You could say that the lock is in the index (now) -- but my
> first implementation used a shared memory structure instead, so
> it's clearly not required to exist in the index. You could also
> say that the lock is really attached to the table, because there
> are catalog entries connected to the table that express the
> constraint -- but I think that's splitting hairs.
Well, we're starting by locking entire tables (as a very simple way
to get correctness), then reducing the granularity where we can find
a way to do that.  I'll not exclusion constraints as an area needing
R&D.  Thanks for pointing it out.
> What Greg is saying (as I understand it) is that the lock
> conflicts with tuples that don't even exist yet. We can tack them
> on to any structure that's convenient, of course. But just because
> you want to implement them by locking a few index pages (assuming
> there is an index) doesn't mean we should reject Greg's line of
> thinking.
As I understand it, Greg's line of thinking is that we should use a
technique which has never proven practical on a large scale:
matching database changes against a list of predicate lock
expressions.  It's not that I want to do it any particular way, it's
that I want to get it working in the simplest possible way and then
find things which can be shown to improve overall performance of
meaningful work loads until we have something which has acceptable
performance.  I don't reject "pure" predicate tracking, per se -- I
just won't put any time into it, since I don't expect it to work.  I
would be overjoyed if Greg or anybody else could prove that wrong
with an optimization patch, say six to twelve months from now when
we hit that phase.
-Kevin


Re: Serializable Isolation without blocking

From
Jeff Davis
Date:
On Thu, 2010-01-07 at 15:01 -0600, Kevin Grittner wrote:
> As I understand it, Greg's line of thinking is that we should use a
> technique which has never proven practical on a large scale:
> matching database changes against a list of predicate lock
> expressions.  It's not that I want to do it any particular way, it's
> that I want to get it working in the simplest possible way and then
> find things which can be shown to improve overall performance of
> meaningful work loads until we have something which has acceptable
> performance.  I don't reject "pure" predicate tracking, per se -- I
> just won't put any time into it, since I don't expect it to work.  I
> would be overjoyed if Greg or anybody else could prove that wrong
> with an optimization patch, say six to twelve months from now when
> we hit that phase.

I think we are in agreement. I was responding to Robert Haas's comment,
because it sounded like he didn't understand Greg Stark's point.

Regards,Jeff Davis



Re: Serializable Isolation without blocking

From
Markus Wanner
Date:
Hi,

Greg Stark wrote:
> I think we're still talking past the issue. Predicate locks are not
> row level, nor page level, nor table level. They're locks on
> predicates. Ie, you have to lock against values which aren't actually
> currently in the table at all. You need to be able to detect a
> conflict between a transaction which read rows "WHERE i BETWEEN 1 AND
> 10" and one which tries to insert a row with i=1 even if the first
> transaction didn't see any rows with i=1 (or indeed even if the first
> transaction found no rows at all).

I absolutely agree to that. That's about predicate locks. I've been
talking about SIREAD, which is a different thing (and which I don't
consider to be a lock). The SIREAD thingie certainly doesn't help
implement predicate locks. And predicate locking isn't necessarily
sufficient to implement truly serializable isolation.

> That's the hard part. How do you represent such a lock in a way that I
> can efficiently find it and check for the conflict when it comes time
> for me to do my insert.

As it's not really a lock, but rather a mark or a tag, SIREAD may or may
not be implemented atop existing or non-existent locking structures, IMO.

I've made my points about implementing SIREAD atop table level or row
level locking structures. With (non-existent) predicate locking, I'm
still unsure. It might help to implement SIREAD atop such a predicate as
well. Predicate tagging?

Regards

Markus Wanner


Re: Serializable Isolation without blocking

From
Markus Wanner
Date:
Hi,

Kevin Grittner wrote:
> As I understand it, Greg's line of thinking is that we should use a
> technique which has never proven practical on a large scale:
> matching database changes against a list of predicate lock
> expressions.

I find that approach to predicate locking pretty interesting. However,
unlike others, it scales with the number of concurrently held locks. And
with the current trend towards multi-multi-core platforms, that might
get worse and worse (as concurrency must increase to efficiently use
these cores).

Regards

Markus Wanner



Re: Serializable Isolation without blocking

From
Greg Stark
Date:
On Thursday, January 7, 2010, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jan 7, 2010 at 2:40 PM, Greg Stark <gsstark@mit.edu> wrote:
>> On Thu, Jan 7, 2010 at 11:08 AM, Markus Wanner <markus@bluegap.ch> wrote:
>>> Row level locks are very fine grained, but those are spilled to disk in
>>> its current implementation. So those are an even worse fit for the needs
>>> of SIREAD.
>>
>> I think we're still talking past the issue. Predicate locks are not
>> row level, nor page level, nor table level.
>
> They're not?  They're just floating out in space somewhere?  There are
> several possible ways to implement predicate locks, but every possible
> method I can think of involves attaching them at one of those places.


well the one place you *cannot* attach them is on the tuples. because
you need to new able to lock hypothetical new tuples which don't exist
yet.

yes the implementation could work by attach them to tables or indexes
but it's representing an abstract concept which is just hanging out in
space.




>
>> And how do you do that without tying the implementation to specific
>> details of how our planner, table scans, and index access methods
>> work?
>
> You don't.  You add additional methods to the index AMs to support
> predicate locks, just lilke we do when we want them to support
> anything else (bitmap index scans, index-only scans, etc.).
>
> ...Robert
>

--
greg


Re: Serializable Isolation without blocking

From
Robert Haas
Date:
On Fri, Jan 8, 2010 at 7:34 AM, Greg Stark <gsstark@mit.edu> wrote:
> On Thursday, January 7, 2010, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Jan 7, 2010 at 2:40 PM, Greg Stark <gsstark@mit.edu> wrote:
>>> On Thu, Jan 7, 2010 at 11:08 AM, Markus Wanner <markus@bluegap.ch> wrote:
>>>> Row level locks are very fine grained, but those are spilled to disk in
>>>> its current implementation. So those are an even worse fit for the needs
>>>> of SIREAD.
>>>
>>> I think we're still talking past the issue. Predicate locks are not
>>> row level, nor page level, nor table level.
>>
>> They're not?  They're just floating out in space somewhere?  There are
>> several possible ways to implement predicate locks, but every possible
>> method I can think of involves attaching them at one of those places.
>
> well the one place you *cannot* attach them is on the tuples. because
> you need to new able to lock hypothetical new tuples which don't exist
> yet.

Well, index tuples maybe, for next-key locking...

> yes the implementation could work by attach them to tables or indexes
> but it's representing an abstract concept which is just hanging out in
> space.

Yeah, I understand that the concept is abstract, but I'm not sure it's
bad to be talking about other kinds of locks because that's how it
will be implemented.  I think a lot of this is going to end up falling
on the index AMs - we could lock have locks on GIST pages, next-key
locks on btree tuples. We'll ask each index AM if it can help with the
predicate that we're trying to lock against and if they all say no
then we lock the whole table... or that's what I'm thinking, anyway.

...Robert


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Markus Wanner <markus@bluegap.ch> wrote:
> Greg Stark wrote:
> That's about predicate locks. I've been talking about SIREAD,
> which is a different thing (and which I don't consider to be a
> lock). The SIREAD thingie certainly doesn't help implement
> predicate locks. And predicate locking isn't necessarily
> sufficient to implement truly serializable isolation.
SIREAD locks need to be acquired according to the exact same rules
as "normal" read locks in predicate locking schemes.  We're just
using a lock level where the set of locks which conflict with it is
empty.  ;-)  Predicate locking is clearly necessary and clearly not
sufficient to implement serializable isolation.  Was some part of
the Cahill paper unclear on that?
>> That's the hard part. How do you represent such a lock in a way
>> that I can efficiently find it and check for the conflict when it
>> comes time for me to do my insert.
> 
> As it's not really a lock, but rather a mark or a tag, SIREAD may
> or may not be implemented atop existing or non-existent locking
> structures, IMO.
Well, the development plan calls for it to start there.  Whether a
better way is found during optimization is anybody's guess at this
point.
> I've made my points about implementing SIREAD atop table level or
> row level locking structures. With (non-existent) predicate
> locking, I'm still unsure. It might help to implement SIREAD atop
> such a predicate as well. Predicate tagging?
It seems both respectful and less confusing to adopt the terminology
of those who developed SSI techniques.  I could invent new terms for
"vulnerable edges", "dangerous structures", "pivots" and such which
are used in the relevant literature.  But I won't.  It would just
serve to confuse those who have spent the time to read and
understand the literature.
I'm not quite sure I followed what you were getting at with the
"(non-existent) predicate locking" phrase -- was that just an
acknowledgment that it is exactly what is under development and, as
such, doesn't yet exist; or were you getting at something else?
-Kevin


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Markus Wanner <markus@bluegap.ch> wrote:
> Kevin Grittner wrote:
>> As I understand it, Greg's line of thinking is that we should use
>> a technique which has never proven practical on a large scale:
>> matching database changes against a list of predicate lock
>> expressions.
> 
> I find that approach to predicate locking pretty interesting.
Sure, I find it interesting, too.  Just not practical.
> unlike others, it scales with the number of concurrently held
> locks.
Times the number of checks for conflicting locks.  So, O(N^2).
No thanks.
Now if there is a breakthrough in that technology which allows it to
compete favorably with techniques which model the logical predicates
against database structures, I'm all ears.
-Kevin


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Greg Stark <gsstark@mit.edu> wrote:
> well the one place you *cannot* attach them is on the tuples.
The predicate locking schemes I've been reading about do attach
locks to tuples, as *part* of a complete strategy.
> you need to new able to lock hypothetical new tuples which don't
> exist yet.
That, too.  Which is where other granularities and objects come in,
as you later noted.
-Kevin


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
I had this flagged as needing a response, but it fell through the
cracks yesterday.  Apologies for the delayed response.
Markus Wanner <markus@bluegap.ch> wrote:
> I'm not clear if Kevin plans to go down to tuple level locking
> with granularity of the SIREAD thing.
Eventually, where possible, subject to escalation.
> (I don't like calling it a lock, because it actually isn't. Let's
> better call it a hint or a mark).
You have a point, but as I said in another email, I'm reluctant to
invent my own terminology which conflicts with the literature.  So
I'll remain "conventional" on this.
> We certainly cannot store all of the SIREAD hint information
> within the tuple header, as there may be multiple transactions
> having marked the same tuple SIREAD, but we don't want to spend
> lots of bits for SSI there (rather no single bit).
Absolutely.
> It's easy to see that the SIREAD hint information doesn't fit in
> shared memory for large enough tables. The only way I can imagine
> tuple-level SIREAD locking half-ways working is by using that for
> transactions reading only a few tuples and fall back to some
> coarser grained locking strategy after a certain limit (per
> transaction).
Exactly.
> I'm not sure if I understand this correctly, but I don't think you
> need to "lock" tuples that don't exist, at least not with SIREAD.
> The Cahill paper covers this under "Detecting Phantoms" and
> proposes to use plain predicate locking to cover tuple level
> granularity AFAIUI.
> 
> The proposed SIREAD hint seems to be an optimization that only
> works for existing tuples. I don't find it hard to believe that it
> performs better than a general purpose predicate locking strategy.
> (Especially for test cases with short transactions, i.e. only few
> SIREAD locks per txn).
When the Cahill paper talks about predicate locking, it *is* talking
about what to lock with SIREAD locks, at least as I read it.  If you
see something to indicate otherwise, please share.  (If I'm off base
on understanding any of this, I'd sure like to get straightened out
as soon as possible.)
> (It's interesting that with "database page" level granularity, he
> states that predicate locking would not be necessary. Instead any
> page can be locked at any time. For this to work, according to my
> reasoning, you'd have to know in advance on which page potentially
> accessible (but not yet visible) new tuples would end up. This is
> close to impossible for Postgres, however, it seems to work for
> Berkeley DB).
It was the way Sybase worked for ages, too.  If you consider that
you're locking both the heap and index pages read, it becomes clear
that in order to add a row which conflicts with a predicate, you
have to modify a page which would have been read to check the
predicate.
> How about storing the SIREAD info in shared memory and using
> dynamic granularity based on the conflict rate and available
> memory? *duck*
Don't duck.  That's the plan.  I really wish I was better at
communicating these things.  :-(  See lock.c and the associated
README and see if you don't think they would fit the bill.
> As this seems to be an optimization of predicate locking, don't we
> need to implement that first?
Exactly.  That's the plan.  Implement it in the simplest form, then
optimize.
-Kevin


Re: Serializable Isolation without blocking

From
Markus Wanner
Date:
Hi,

Greg Stark wrote:
> well the one place you *cannot* attach them is on the tuples. because
> you need to new able to lock hypothetical new tuples which don't exist
> yet.

Well, maybe "attaching" here is meant in a more general or theoretical
sense. I think we all agree that adding them to the tuple header on disk
is not an option.

Regards

Markus Wanner



Re: git help (was: Serializable Isolation without blocking)

From
"Kevin Grittner"
Date:
Magnus Hagander <magnus@hagander.net> wrote:
> On Thu, Jan 7, 2010 at 19:08, Kevin Grittner
>> Robert's advice being the last (and only) offered on the topic,
>> I'm taking the silence as agreement and have dropped the request
>> for a "serializable" repository and added one for
>> /users/kgrittn/postgres instead.
> 
> ... and i've approved it :-)
Being new to git, I'm hoping someone can suggest a reasonable
workflow for using this.  (I've found no shortage of suggestions for
possible workflows on the web, but am not clear what to pick or
synthesize as "best practice" for this particular situation.)  A
little advice may prevent a lot of flailing and false starts, and
would be much appreciated.
I assume I want to create a "serializable" branch in this new
repository and clone that on my desktop.  I expect I'll be granting
write permission on this repository to others, to facilitate
teamwork on this, but perhaps there's a better way to organize that
with git.  What would the normal workflows be to:
rebase the postgresql.git clone and the serializable branch on the
server?
rebase my working copy on my desktop from the serializable branch on
the server?
integrate work from my desktop to the serializable branch on the
server when I've hit a state which seems worth sharing with the
team?  (i.e., it compiles and isn't going to break something someone
else is working on)
Or is the above framed in a way which shows that I'm not properly in
the git mindset yet?  If so, a hint at what I'm missing would be
good.
Thanks for any help getting git usage figured out.
-Kevin


Re: Serializable Isolation without blocking

From
Markus Wanner
Date:
Hi,

Kevin Grittner wrote:
> SIREAD locks need to be acquired according to the exact same rules
> as "normal" read locks in predicate locking schemes.

Understood. I didn't take that into account at first. Thanks for
pointing it out. (Whatever "normal" read locks are...)

> We're just
> using a lock level where the set of locks which conflict with it is
> empty.  ;-)

Which kind of defeats the purpose of a lock. Anyway, that's a bike-shed
discussion, so I might as well agree to calling it a lock.

> Well, the development plan calls for it to start there.  Whether a
> better way is found during optimization is anybody's guess at this
> point.

Okay, so you know my guess ;-)

> It seems both respectful and less confusing to adopt the terminology
> of those who developed SSI techniques.  I could invent new terms for
> "vulnerable edges", "dangerous structures", "pivots" and such which
> are used in the relevant literature.  But I won't.  It would just
> serve to confuse those who have spent the time to read and
> understand the literature.

I agree.

My intention was to "translate" the literature terms to Postgres hacker
slang. At least I had trouble to understand that SIREAD is a lock that
doesn't actually block anything.

> I'm not quite sure I followed what you were getting at with the
> "(non-existent) predicate locking" phrase -- was that just an
> acknowledgment that it is exactly what is under development and, as
> such, doesn't yet exist; or were you getting at something else?

SIREAD atop predicate locking serves detecting vulnerable edges (I hope
I'm using that term correctly here) between newly inserted tuples and
reads, right? I was trying to figure if it would make sense to use
predicate locking (instead of table or row level locking) as well for
detecting vulnerable edges between updated or deleted tuples and reads.

At least you'd be free with its granularity (which cannot be said for
row or table level locking, for obvious reasons). However, it certainly
depends a lot on the implementation of predicate locking, which we don't
have, yet, so that's all speculation...

Regards

Markus Wanner



Re: git help (was: Serializable Isolation without blocking)

From
Robert Haas
Date:
On Fri, Jan 8, 2010 at 11:22 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Magnus Hagander <magnus@hagander.net> wrote:
>> On Thu, Jan 7, 2010 at 19:08, Kevin Grittner
>
>>> Robert's advice being the last (and only) offered on the topic,
>>> I'm taking the silence as agreement and have dropped the request
>>> for a "serializable" repository and added one for
>>> /users/kgrittn/postgres instead.
>>
>> ... and i've approved it :-)
>
> Being new to git, I'm hoping someone can suggest a reasonable
> workflow for using this.  (I've found no shortage of suggestions for
> possible workflows on the web, but am not clear what to pick or
> synthesize as "best practice" for this particular situation.)  A
> little advice may prevent a lot of flailing and false starts, and
> would be much appreciated.

Since we've talked about migrating to git as a project, I hope no one
will mind too much if I answer this on-list, even though it's a bit
OT.

> I assume I want to create a "serializable" branch in this new
> repository and clone that on my desktop.

Well, you'll create it on your desktop (probably with git checkout
master; git checkout -b serializable) and then push it to your
git.postgresql.org repository, actually.  You can't really do anything
on the server directly.

> I expect I'll be granting
> write permission on this repository to others, to facilitate
> teamwork on this, but perhaps there's a better way to organize that
> with git.

Another option is you everyone can just push and pull from each other.But shared write is fine too.

> What would the normal workflows be to:
>
> rebase the postgresql.git clone and the serializable branch on the
> server?
>
> rebase my working copy on my desktop from the serializable branch on
> the server?
>
> integrate work from my desktop to the serializable branch on the
> server when I've hit a state which seems worth sharing with the
> team?  (i.e., it compiles and isn't going to break something someone
> else is working on)

Generally the workflow here is going to be that you will:

1. Pull from the master branch in the official postgres.git into your
master branch.
2. Merge (not rebase, if you want to collaborate!) those changes into
your topic branch (serializable).
3. Hack on your topic branch.
4. Push your master and topic branches to users/kgrittn/postgres.git.

For example in my .git/config, I have:

[remote "origin"]url = git://git.postgresql.org/git/postgresql.gitfetch = +refs/heads/*:refs/remotes/origin/*
[remote "rhaas"]url = ssh://git@git.postgresql.org/users/rhaas/postgres.gitfetch = +refs/heads/*:refs/remotes/rhaas/*
[remote "heikki"]url = git://git.postgresql.org/git/users/heikki/postgres.gitfetch =
+refs/heads/*:refs/remotes/heikki/*

So typically what I do is sort of like this:

git checkout master
git pull (gets changes from upstream)
git checkout -b hackingproject (create new branch)
# hack hack
git commit -a
# hack some more
git commit -a
# ok, time to rebase or merge
git checkout master
git pull
git checkout hackingproject
git merge master (actually, i often rebase, but for this you REALLY
want to merge)
git push rhaas master hackingproject (push to my repo, need a + to
force it if you are rebasing rather than merging)

...other handy stuff...
git log master...hackingproject (see my commits)
git diff master...hackingproject (format as a patch)

...patch gets committed...
git checkout master
git branch -D hackingproject
git push rhaas :hackingproject (delete branch off server)

...Robert


Re: Serializable Isolation without blocking

From
Markus Wanner
Date:
Hi,

Kevin Grittner wrote:
> I had this flagged as needing a response, but it fell through the
> cracks yesterday.  Apologies for the delayed response.

No problem.

> Markus Wanner <markus@bluegap.ch> wrote:
> When the Cahill paper talks about predicate locking, it *is* talking
> about what to lock with SIREAD locks, at least as I read it.  If you
> see something to indicate otherwise, please share.  (If I'm off base
> on understanding any of this, I'd sure like to get straightened out
> as soon as possible.)

I don't remember reading about predicate locking in the paper I read.
Either he didn't cover that in his first implementation (based on page
level locking), or I've simply re-used that part of my in-brain-memory.

> It was the way Sybase worked for ages, too.  If you consider that
> you're locking both the heap and index pages read, it becomes clear
> that in order to add a row which conflicts with a predicate, you
> have to modify a page which would have been read to check the
> predicate.

Hm.. interesting, thanks for this quick explanation.

>> How about storing the SIREAD info in shared memory and using
>> dynamic granularity based on the conflict rate and available
>> memory? *duck*
>  
> Don't duck.  That's the plan.  I really wish I was better at
> communicating these things.  :-(  See lock.c and the associated
> README and see if you don't think they would fit the bill.

Cool!

> Exactly.  That's the plan.  Implement it in the simplest form, then
> optimize.

Perfect!

Regards

Markus Wanner



Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Markus Wanner <markus@bluegap.ch> wrote:
> I don't remember reading about predicate locking in the paper I
> read.  Either he didn't cover that in his first implementation
> (based on page level locking), or I've simply re-used that part of
> my in-brain-memory.
If you read the first paper but not the second, all you would have
seen was the mention that predicate locking was covered because
Berkeley DB uses page level locks for everything.  The second paper
gets into a little more detail on the subject, with references to a
good overview of the topic as well as some authoritative papers.
-Kevin


Re: Serializable Isolation without blocking

From
Nicolas Barbier
Date:
2010/1/8 Markus Wanner <markus@bluegap.ch>:

> SIREAD atop predicate locking serves detecting vulnerable edges (I hope
> I'm using that term correctly here) between newly inserted tuples and
> reads, right? I was trying to figure if it would make sense to use
> predicate locking (instead of table or row level locking) as well for
> detecting vulnerable edges between updated or deleted tuples and reads.

AFAICS, detecting a "rw" dependency where the read executes after the
write is rather easy: the writer has created a row version that is not
visible to the reader's snapshot. Therefore, any time a reader reads a
non-last version of a row, there is a rw dependency between it and the
creators of any newer versions.

Btw, rw dependencies are the only thing that needs to be checked under
SI, as "wr" and "ww" dependencies never lead to problems: one can only
see (or update) a certain row version using a transaction that has
taken its snapshot after the writing transaction already committed.
Therefore, the "must be before" relationship between such transactions
is trivially satisfied. (I assume here that PG's non-SI-compatible
behavior of not always rollbacking any transaction that writes to a
non-last version will be disabled in serializable mode.)

Nicolas


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
> AFAICS, detecting a "rw" dependency where the read executes after
> the write is rather easy: the writer has created a row version
> that is not visible to the reader's snapshot. Therefore, any time
> a reader reads a non-last version of a row, there is a rw
> dependency between it and the creators of any newer versions.
That *seems* safe on the face of it, but with HOT and such I wanted
to defer addressing that until I had the crude version working.  I'm
probably being a bit paranoid, though.  [thinks...]  I should
probably try to satisfy myself that this is indeed safe before I
start populating the new locks.  Basically, I have to confirm that
the read will see *all* new versions of a row without jumping out
early on any code path.  If that pans out, it'll save some work;
it's not something which should wait until the optimization phase if
we can help it.  Thanks.
> Btw, rw dependencies are the only thing that needs to be checked
> under SI, as "wr" and "ww" dependencies never lead to problems
Agreed; those are already covered by the underlying snapshot
isolation.
> I assume here that PG's non-SI-compatible behavior of not always
> rollbacking any transaction that writes to a non-last version will
> be disabled in serializable mode.
Can that currently happen in PostgreSQL's snapshot isolation?!?  I
thought that was a READ COMMITTED issue.  If I've missed something
there, I need to understand what.  Anybody?
Thanks for your posts, by the way -- you clearly have a solid grasp
of the issues.  I've been paying attention; I had just never felt
any of your posts needed any response from me before.  :-)
-Kevin


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Again, a somewhat tardy post from a question found in review.

Markus Wanner <markus@bluegap.ch> wrote:
>> I suppose these more persistent write locks should
>> be kept out of the DEFAULT lock method, too....
> 
> I fail to understand that part. What's the DEFAULT lock method?
With some adjustment of line wrapping for email...
From src/include/storage/lock.h:
------------------------------------------------------------
/** Lock methods are identified by LOCKMETHODID.  (Despite the* declaration as uint16, we are constrained to 256
lockmethodsby* the layout of LOCKTAG.)*/
 
typedef uint16 LOCKMETHODID;

/* These identify the known lock methods */
#define DEFAULT_LOCKMETHOD    1
#define USER_LOCKMETHOD        2
------------------------------------------------------------
From the src/backend/storage/lmgr/README:
------------------------------------------------------------
Lock Data Structures
--------------------

Lock methods describe the overall locking behavior.  Currently there
are two lock methods: DEFAULT and USER.

Lock modes describe the type of the lock (read/write or shared/
exclusive).  In principle, each lock method can have its own set of
lock modes with different conflict rules, but currently DEFAULT and
USER methods use identical lock mode sets.  See
src/tools/backend/index.html and src/include/storage/lock.h for more
details.  (Lock modes are also called lock types in some places in
the code and documentation.)
------------------------------------------------------------
At least the initial implementation of the in-memory locks to
support detection of rw-dependencies will use the structures defined
in the lock.c file.
-Kevin


Re: git help

From
Greg Smith
Date:
Kevin Grittner wrote:
> What would the normal workflows be to:
>  
> rebase the postgresql.git clone and the serializable branch on the
> server?
>  
> rebase my working copy on my desktop from the serializable branch on
> the server?
>   

The way you asked this suggests you've chewed on someone's advanced 
workflow and swallowed some wrong bits.  Rebase is an operation you do 
when you're trying to be picky about what and how you publish your 
changes to the world.  Let's say you've updated your local branch:

serialized -> commit A -> commit B

Meanwhile, the upstream master has gotten a bunch of new commits made to it:

master -> commit C -> commit D

If you then merge those changes in, you'll have the following history:

synchronized -> commit A -> commit B -> commit C -> commit D

Now, what happens if you want to submit a patch showing your changes?  
This isn't really what you want, is it?  Ideally, you'd want to see your 
changes--A,B--on top of the updated master instead.  Basically, throw 
these new C,D changes into history behind yours.  That's what rebase 
allows you to do.

On a normal day, you won't be doing that at all.  You'll be merging from 
the various sources and pushing to others.  Rebasing is something you do 
before publishing your changes to the world, or as part of repo history 
cleanup.  It's not something you have to do all the time.

Also:  rebase can allow you to squash A&B into one commit before you 
publish them, too, to make the commit log easier to read.  Did a bunch 
of dumb stuff on your local copy?  Squash them into a single, final 
version before sharing those with the world.  Again, not necessarily a 
routine operation you have to do regularly.

You could easily commit happily and usefully share your work without 
running rebase once.  However, note that once you do push something out, 
and people grab it, you've lose the opportunity to do some rebase 
cleanup without making life difficult for them--you can't wipe out 
commit log history once it's in somebody else's repo or your copy is 
going to diverge from theirs, making a future merge messy for one of 
you.  This is why some people compulsively rebase everything before they 
publish.  You shouldn't assume you have to do that from day one though.  
On this project, if your code gets committed one day, it's not going to 
be with the commit history intact anyway, so it really doesn't matter at 
the end.

-- 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com



Re: Serializable Isolation without blocking

From
Markus Wanner
Date:
Hi,

Kevin Grittner wrote:
> Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
>> AFAICS, detecting a "rw" dependency where the read executes after
>> the write is rather easy: the writer has created a row version
>> that is not visible to the reader's snapshot. Therefore, any time
>> a reader reads a non-last version of a row, there is a rw
>> dependency between it and the creators of any newer versions.

Sure.

As you have written below, we still need the SIREAD lock on the tuple
read to be able to detect rw dependencies (where the read executes
before the write).

I was trying to point out the difference between rw dependencies on
existing tuples (where the write is an update/delete) vs those on newly
created tuples (inserts) that *would* have been read. The later clearly
needs predicate locking. For the former, basing on table- as well as
row-level locking have been discussed so far.

What I'm thinking about is if it would make sense to use the very same
locking infrastructure for both, which needs to be some kind of
predicate locking. Maybe that was intended by Kevin and already clear to
others. It wasn't for me.

As we don't yet have predicate locking or tagging, let's check what SSI
needs from that building block. AFAICT we currently have the following
(long term) requirements:

A) reference rows that don't exist, yet
B) keep the lock (tag) until after the session that created it has
expired (yes, session, not only transaction).
C) fit in (shared) memory, no matter how many rows get tagged (thus
maybe use dynamical adjustment of granularity)

AFAICT using the existing locking structure will get complicated mainly
because of B), because we sometimes lack a PGPROC for the transaction
holding the lock. It's specific to SSI (not generally usable for
predicate locking).

A) and C) might be usable for a general purpose predicate locking
mechanism as well, but those requirements make it hard to map the object
to be locked to a LOCKTAG.

Unlike a general purpose predicate locking implementation, we don't need
to block on SIREAD locks, so we don't need waiting queues nor deadlock
detection for SSI.

> Basically, I have to confirm that
> the read will see *all* new versions of a row without jumping out
> early on any code path.

Well, a heap scan immediately returns the currently visible version of a
row to the caller, for example. That one may or may not continue the
scan. So I fear yes, current code paths jump out early very often (as
that's an optimization for the LIMIT case as well).

>> I assume here that PG's non-SI-compatible behavior of not always
>> rollbacking any transaction that writes to a non-last version will
>> be disabled in serializable mode.
>  
> Can that currently happen in PostgreSQL's snapshot isolation?!?  I
> thought that was a READ COMMITTED issue.  If I've missed something
> there, I need to understand what.  Anybody?

A write to a non-last (or non-head) version would lead to having
multiple "last" (or rather multiple head) versions, which is not
something that's ever supposed to happen in Postgres, IIRC.

Regards

Markus Wanner



Re: Serializable Isolation without blocking

From
Nicolas Barbier
Date:
2010/1/8 Kevin Grittner <Kevin.Grittner@wicourts.gov>:

> Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
>
>> I assume here that PG's non-SI-compatible behavior of not always
>> rollbacking any transaction that writes to a non-last version will
>> be disabled in serializable mode.
>
> Can that currently happen in PostgreSQL's snapshot isolation?!?  I
> thought that was a READ COMMITTED issue.  If I've missed something
> there, I need to understand what.  Anybody?

Ah woops, you are right. The manual says in [1]:

---- >8 ----
13.2.2. Serializable Isolation Level

[..]

UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands
behave the same as SELECT in terms of searching for target rows: they
will only find target rows that were committed as of the transaction
start time. However, such a target row might have already been updated
(or deleted or locked) by another concurrent transaction by the time
it is found. In this case, the serializable transaction will wait for
the first updating transaction to commit or roll back (if it is still
in progress). If the first updater rolls back, then its effects are
negated and the serializable transaction can proceed with updating the
originally found row. But if the first updater commits (and actually
updated or deleted the row, not just locked it) then the serializable
transaction will be rolled back with the message

ERROR:  could not serialize access due to concurrent update
because a serializable transaction cannot modify or lock rows changed
by other transactions after the serializable transaction began.
---- 8< ----

Sorry for the noise,

Nicolas

[1] <URL:http://www.postgresql.org/docs/8.4/static/transaction-iso.html#XACT-SERIALIZABLE>


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:

> Nope, you're on target.  Although - if I were you - I would post
> the ACCESS EXCLUSIVE lock version of the patch for feedback.  I
> can't speak for anyone else, but I'll read it.

Here you go!  :-)

This is the milestone of having full serializable behavior, albeit
with horrible performance, using the simplest implementation
possible.  I didn't use ACCESS EXCLUSIVE locks, because on review it
seemed to me that a SHARE lock would be strong enough.  It compiles
and passes the regression tests, and I've been testing some of the
scenarios previously used to show the snapshot anomalies; I now get
correct behavior through blocking.

I identified the points to insert predicate locking by looking for
places where ExecStoreTuple was called with a valid heap buffer; if
there is anywhere that obtains tuples from the heap without going
through that method, I have more work to do.  If anyone knows of
such locations, I'd be grateful for a "heads up".

If I've done anything horribly wrong in organizing the code, that'd
be nice to hear about before I go too much farther, too.

I'm definitely not looking for this to be committed, but should I
add it to the CF page just for a "feedback" review?  (I'm OK with
keeping it more ad hoc, especially if it's going to hold up the
beta at all.)

-Kevin

Attachment

Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> Here you go!  :-)
Hmmm...   I just got a write skew example to show a snapshot
isolation anomaly, so I've got something wrong still.  :-(
Will continue to work on it.  Sorry.
-Kevin


Re: Serializable Isolation without blocking

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
> This is the milestone of having full serializable behavior, albeit
> with horrible performance, using the simplest implementation
> possible.
A tad too simple, as it turns out.  It did it's main job of
providing a simple milestone to identify code organization and lock
points, but I'd have to jigger some things to make S2PL work with
snapshot isolation which aren't needed for SSI.  So, for those
keeping score, I'm moving on down the checklist to get to the SSI
implementation, rather than futzing with code which would just be
thrown away.
No need for anyone to test for full serializable behavior.
Comments on technique welcome.
-Kevin