Thread: SSI memory mitigation & false positive degradation
To recap, I've had an open question on the Serializable Wiki page[1]
since January about how we should handle long-running transactions.
The algorithm published by Cahill et al requires keeping some
transaction information in memory for all committed transactions
which overlapped a still-running transaction. Since we need to keep
this in shared memory, and the structures must have a finite
allocation, there's an obvious looming limit, even if the allocation
is relatively generous.
The two obvious solutions on resource exhaustion are to roll back the
oldest active serializable transaction or to block or cancel new
serializable transactions. Neither is particularly appealing. In
September and October Heikki was kind enough to share some insights
and ideas about alternatives. I've done what I can to pursue this,
but my availability to follow up on it has been limited until now. I
have blocked out a lot of time between now and the middle of January
to try to resolve the issue, and Dan has been able to devote time to
this lately as well.
I've been pursuing the ideas sketched out in a prior post[2] as a way
to work into the approach suggested by Heikki, which amounts to
summarizing old data (at the risk of some false positive
serialization failures) and accessing old data through the SLRU API
(and suffering some performance hit if that data spills to disk).
Pursuing mitigation through more aggressive cleanup seemed important
to me both to reduce the performance impact of the graceful
degradation approach, and to better understand which data are needed
when.
Dan and I have now implemented most of the mitigation techniques
described in [2], and I now feel confident I have a good grasp of how
long each type of data is useful. (By useful I mean that to maintain
data integrity without them it will be necessary to roll back some
transactions which could have been allowed to commit had the data
been available.) I'll be adding this to the Wiki page, but below are
the results, as I understand them. I have yet to define exactly what
we will drop at which point, but that should be coming within a few
days.
Read only transactions only have one real need to track committed
data, but it is the most complex to state:
(1) An active read only transaction needs to be able to recognize
when it is reading a tuple which was written by an overlapping
transaction which has committed, but only if that read write
transaction has a rw-conflict out to a transaction committed before
the read only transaction acquired its snapshot.
A read only transaction which detects such a rw-conflict out must
fail with a serialization conflict. The multiple conditions required
for this, however, have an up-side -- they allow us to detect when a
read only transaction no longer has any possibility of creating such
a conflict, and therefore entirely removing it from predicate locking
and conflict detection. This is true whether the read only
transaction is committed, active, or the new DEFERRABLE type of
transaction which will wait for these conditions before it starts.
I've found four statements about what committed data are useful for
read write transactions:
(2) An active read write transaction needs to be able to recognize
when it is reading a tuple which was written by an overlapping
transaction which has committed, and to know whether that committed
transaction had any rw-conflict(s) out to previously committed
transaction(s). This is rather similar to (1).
(3) An active read write transaction needs to be able to detect when
one of its writes conflicts with a predicate lock from an overlapping
transaction which has committed. There's no need to know which one,
but by the definition of a rw-conflict, it must have overlapped.
(4) An active read write transaction needs to know that it had a
rw-conflict out to a committed transaction. There's no need to know
which one, but by the definition of a rw-conflict, it must have
overlapped.
(5) An active read write transaction needs to know that it had a
rw-conflict in from a committed transaction. There's no need to know
which one, but by the definition of a rw-conflict, it must have
overlapped.
Since I know some people prefer to look at a patch than to poke
around someone's git repo, I'm attaching a patch reflecting what's in
the repo at the moment. Consider it WIP. While I hope to post a
patch as a serious candidate for the release within a couple weeks, I
would sure welcome any feedback before then, so the candidate patch
is in the best shape possible. (To avoid confusion with Florian's
patch, I'm using ssi instead of serializable in file name and subject
lines.)
-Kevin
[1] http://wiki.postgresql.org/wiki/Serializable
[2] http://archives.postgresql.org/pgsql-hackers/2010-10/msg01754.php
since January about how we should handle long-running transactions.
The algorithm published by Cahill et al requires keeping some
transaction information in memory for all committed transactions
which overlapped a still-running transaction. Since we need to keep
this in shared memory, and the structures must have a finite
allocation, there's an obvious looming limit, even if the allocation
is relatively generous.
The two obvious solutions on resource exhaustion are to roll back the
oldest active serializable transaction or to block or cancel new
serializable transactions. Neither is particularly appealing. In
September and October Heikki was kind enough to share some insights
and ideas about alternatives. I've done what I can to pursue this,
but my availability to follow up on it has been limited until now. I
have blocked out a lot of time between now and the middle of January
to try to resolve the issue, and Dan has been able to devote time to
this lately as well.
I've been pursuing the ideas sketched out in a prior post[2] as a way
to work into the approach suggested by Heikki, which amounts to
summarizing old data (at the risk of some false positive
serialization failures) and accessing old data through the SLRU API
(and suffering some performance hit if that data spills to disk).
Pursuing mitigation through more aggressive cleanup seemed important
to me both to reduce the performance impact of the graceful
degradation approach, and to better understand which data are needed
when.
Dan and I have now implemented most of the mitigation techniques
described in [2], and I now feel confident I have a good grasp of how
long each type of data is useful. (By useful I mean that to maintain
data integrity without them it will be necessary to roll back some
transactions which could have been allowed to commit had the data
been available.) I'll be adding this to the Wiki page, but below are
the results, as I understand them. I have yet to define exactly what
we will drop at which point, but that should be coming within a few
days.
Read only transactions only have one real need to track committed
data, but it is the most complex to state:
(1) An active read only transaction needs to be able to recognize
when it is reading a tuple which was written by an overlapping
transaction which has committed, but only if that read write
transaction has a rw-conflict out to a transaction committed before
the read only transaction acquired its snapshot.
A read only transaction which detects such a rw-conflict out must
fail with a serialization conflict. The multiple conditions required
for this, however, have an up-side -- they allow us to detect when a
read only transaction no longer has any possibility of creating such
a conflict, and therefore entirely removing it from predicate locking
and conflict detection. This is true whether the read only
transaction is committed, active, or the new DEFERRABLE type of
transaction which will wait for these conditions before it starts.
I've found four statements about what committed data are useful for
read write transactions:
(2) An active read write transaction needs to be able to recognize
when it is reading a tuple which was written by an overlapping
transaction which has committed, and to know whether that committed
transaction had any rw-conflict(s) out to previously committed
transaction(s). This is rather similar to (1).
(3) An active read write transaction needs to be able to detect when
one of its writes conflicts with a predicate lock from an overlapping
transaction which has committed. There's no need to know which one,
but by the definition of a rw-conflict, it must have overlapped.
(4) An active read write transaction needs to know that it had a
rw-conflict out to a committed transaction. There's no need to know
which one, but by the definition of a rw-conflict, it must have
overlapped.
(5) An active read write transaction needs to know that it had a
rw-conflict in from a committed transaction. There's no need to know
which one, but by the definition of a rw-conflict, it must have
overlapped.
Since I know some people prefer to look at a patch than to poke
around someone's git repo, I'm attaching a patch reflecting what's in
the repo at the moment. Consider it WIP. While I hope to post a
patch as a serious candidate for the release within a couple weeks, I
would sure welcome any feedback before then, so the candidate patch
is in the best shape possible. (To avoid confusion with Florian's
patch, I'm using ssi instead of serializable in file name and subject
lines.)
-Kevin
[1] http://wiki.postgresql.org/wiki/Serializable
[2] http://archives.postgresql.org/pgsql-hackers/2010-10/msg01754.php
Attachment
I wrote: > Dan and I have now implemented most of the mitigation techniques > ..., and I now feel confident I have a good grasp of how long each > type of data is useful. (By useful I mean that to maintain data > integrity without them it will be necessary to roll back some > transactions which could have been allowed to commit had the data > been available.) I think that we need to be able keep something on the order of 10 times the max_connections number of SERIALIZABLEXACT structures in shared memory for mitigation techniques to have a chance to work well for most workloads. When that fills, we will start pushing the oldest committed transactions out to make room, and fall back on the graceful degradation. Heikki said in a previous post that he didn't care if it 10 times or 100 times so long as it was finite and there was graceful degradation after that, so it would appear that unless someone else objects, this should fly. This structure fits (barely) into 128 bytes when pointers are 64-bits. > (1) An active read only transaction needs to be able to recognize > when it is reading a tuple which was written by an overlapping > transaction which has committed, but only if that read write > transaction has a rw-conflict out to a transaction committed > before the read only transaction acquired its snapshot. > (2) An active read write transaction needs to be able to > recognize when it is reading a tuple which was written by an > overlapping transaction which has committed, and to know whether > that committed transaction had any rw-conflict(s) out to > previously committed transaction(s). When committed transactions which have written to permanent tables need to be pushed from the main structures, I think that keeping the xid and the 64 bit commit seq no of the earliest rw-conflict out is needed. Zero would mean no conflict. Such a list could address these two needs. We could keep track of min and max xid values on the list to avoid searches for values out of range. This seems like a reasonable fit for the SLRU technique suggested by Heikki. Read only transactions (declared or de facto) don't need to be included in this list. > (3) An active read write transaction needs to be able to detect > when one of its writes conflicts with a predicate lock from an > overlapping transaction which has committed. There's no need to > know which one, but by the definition of a rw-conflict, it must > have overlapped. I think the cleanest way to handle this need is to have a "dummy" SERIALIZABLEXACT structure sitting around to represent displaced committed transactions and to move predicate locks to that as transactions are pushed out of the primary structures. We would add a commit seq no field to the predicate lock structure which would only be used when locks were moved here. Duplicate locks (locks on the same target) would collapse to a single lock and would use the latest commit seq no. This is conceptually very similar to Heikki's initial suggestion on this topic. > (4) An active read write transaction needs to know that it had a > rw-conflict out to a committed transaction. There's no need to > know which one, but by the definition of a rw-conflict, it must > have overlapped. > > (5) An active read write transaction needs to know that it had a > rw-conflict in from a committed transaction. There's no need to > know which one, but by the definition of a rw-conflict, it must > have overlapped. These two are easy -- we can define a couple more flag bits for active transactions to check when determining whether a new rw-conflict has created a dangerous structure which must be rolled back. Any comments welcome. Barring surprises, I start coding on this tomorrow. -Kevin
On 26.12.2010 21:40, Kevin Grittner wrote: > To recap, I've had an open question on the Serializable Wiki page[1] > since January about how we should handle long-running transactions. > The algorithm published by Cahill et al requires keeping some > transaction information in memory for all committed transactions > which overlapped a still-running transaction. Since we need to keep > this in shared memory, and the structures must have a finite > allocation, there's an obvious looming limit, even if the allocation > is relatively generous. Looking at the predicate lock splitting, it occurs to me that it's possible for a non-serializable transaction to be canceled if it needs to split a predicate lock held by a concurrent serializable transaction, and you run out of space in the shared memory predicate lock area. Any chance of upgrading the lock to a relation lock, or killing the serializable transaction instead? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Looking at the predicate lock splitting, it occurs to me that > it's possible for a non-serializable transaction to be canceled if > it needs to split a predicate lock held by a concurrent > serializable transaction, and you run out of space in the shared > memory predicate lock area. Good point. We don't want that, for sure. > Any chance of upgrading the lock to a relation lock, or killing > the serializable transaction instead? Absolutely. Good suggestion. Thanks! -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: >> Any chance of upgrading the lock to a relation lock, or killing >> the serializable transaction instead? > > Absolutely. Good suggestion. Thanks! I pushed a TODO SSI comment at the appropriate point with my ideas on how best to fix this. I want to stick with the SLRU changes for now, rather than risk flushing "brain cache" on the topic just now. If Dan (or anyone else, for that matter) wants to fix this, feel free; just post first, as will I if nobody beats me to it. There are actually two spots in PredicateLockPageSplit and one in PredicateLockPageCombine where this needs to be addressed. I can't think of any other functions where we're vulnerable to having an impact on non-serializable transactions. We sure want to plug those -- I see it as critical to acceptance that we can honor the promise of no impact on any transactions at REPEATABLE READ or less strict isolation levels. -Kevin