Re: Serializable snapshot isolation patch - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: Serializable snapshot isolation patch |
Date | |
Msg-id | 4CBC4AEF0200002500036AB2@gw.wicourts.gov Whole thread Raw |
In response to | Serializable snapshot isolation patch (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Serializable snapshot isolation patch
|
List | pgsql-hackers |
First off, thanks for the review! I know that it's a lot of work, and I do appreciate it. Jeff Davis <pgsql@j-davis.com> wrote: > * Trivial stuff: > > I get a compiler warning: > > indexfsm.c: In function *RecordFreeIndexPage*: > indexfsm.c:55: warning: implicit declaration of function > *PageIsPredicateLocked* Oops. Fixed on my git repo now. > * Open issues, as I see it: > > 1. 2PC and SSI don't mix (this may be a known issue, because > there's not really any code in the current implementation to deal > with 2PC): > Looks like we need to track information about prepared > transactions in shared memory. I think you'll need to keep the > information in the 2PC state file as well, so that it can be > rebuilt after a crash or restart. It all looks solvable at first > glance, but it looks like it might be some work. Argh! I didn't anticipate an interaction with prepared transactions, probably because I've never used them or taken a close look at them. I'll take a look. > 2. I think there's a GiST bug (illustrating with PERIOD type): > Can you clarify your design for GiST and the interaction with > page-level locks? It looks like you're making some assumption > about which pages will be visited when searching for conflicting > values which doesn't hold true. However, that seems odd, because > even if the value is actually inserted in one transaction, the > other doesn't seem to find the conflict. Perhaps the bug is > simpler than that? Or perhaps I have some kind of odd bug in > PERIOD's gist implementation? My assumptions for GiST were that: (1) A search for a matching value could bail out at any level in the tree; there is no requirement for the search to proceed to the leaf level to get a negative index probe. (2) An index insertion which would cause a row to be found if an earlier search was repeated must modify some page which was read by the earlier search. (3) Because of the above, it is necessary and sufficient to acquire an SIRead lock all pages visited in a search of a GiST index, and flag a conflict on insertion into a locked page at any level of the index. That logic still seems sound to me, so if someone else sees a flaw in it, please point it out. Assuming that logic is sound, I'll poke around to see where the flaw in implementation may be. If you have a full self-contained test case to demonstrate the failure here, could you send it to me? > Also, it appears to be non-deterministic, to a degree at least, so > you may not observe the problem in the exact way that I do. Yeah, I have tested this area without seeing the failure , so that self-contained example would be a big help. > 3. Limited shared memory space to hold information about committed > transactions that are still "interesting". Relevant thread: > > http://archives.postgresql.org/pgsql-hackers/2010-09/msg01735.php > > It's a challenging problem, however, and the current solution is > less than ideal. I'd go further than that and say that it clearly needs to be fixed. The scale of the issue was somewhat disguised in my testing when I was using the hash table for managing acquisition of shared memory for what was essentially an unordered list. Now that I've made it a proper list with a hard limit on entries, the problem is pretty easy to provoke, and just reserving space for A Very Large Number Of Entries is clearly not an adequate solution. :-( > Idle transactions can mean that all new serializable transactions > fail until the idle transactions start to terminate. I don't like > that very much, because people expect to have to retry > serializable transactions, but retrying here has no real hope > (except that some time has elapsed, and maybe the other > transaction decided to commit). Agreed. > Does it make sense to kill the existing transactions that are > holding everything up, rather than the new transaction? Or would > that just confuse matters more? This does not necessarily > guarantee that progress can be made, either, but intuitively it > seems more likely. Canceling an old transaction to allow new transactions to begin *might* be better than the current situation, but I think we can and should do better. The best I have been able to come up with is in this post: http://archives.postgresql.org/pgsql-hackers/2010-09/msg01724.php There's a fair amount of work there, so I was hoping for some confirmation that this combination of steps was both sane and had some chance of being considered sufficient and acceptable before diving in to code it. I also *really* hope to add the "SERIALIZABLE READ ONLY DEFERRABLE" mode so that pg_dump and other read-only transactions don't push things into a state where the rollback rate spikes: http://archives.postgresql.org/pgsql-hackers/2010-09/msg01771.php > 4. A few final details: > > a. We should probably have a compatibility GUC that makes > SERIALIZABLE equal to REPEATABLE READ. My opinion is that this > should be only for compatibility, and should default to "off" > (i.e. SSI code enabled) either in 9.1 or soon after. I'm inclined to agree with you, with the strong preference for a 9.1 setting of off. This was previously discussed, and there were people who felt that we should avoid a "behavior-changing" GUC like that, so I didn't add it. It wouldn't be hard to put it in, if that's the community consensus. > b. Docs. > > * Questions: > > 1. For TargetTagIsCoveredBy(), why is it impossible for the > covering tag to have an offset? Because a "covering" lock is a lock at a coarser granularity than the "covered" lock, which includes what the finer-grained lock covers. Since the tuple level is as fine as we take it, and we only use the offset for tuple locks, a covering lock would not have that set. (We also check for duplicate locks.) I'll expand that comment a bit, since it obviously wasn't clear enough. > 2. The explanation for SerializablePredicateLockListLock is a > little confusing. It makes it sound like other processes can't > walk the list, but they can modify it? That's a bit unconventional, I admit, but it was a big part of bringing the benchmarks for a fully cached, read-only transaction load down to running just 1.8% longer than REPEATABLE READ (which is the same as current SERIALIZABLE). I'm open to other locking strategies, or an explanation of where this one doesn't work, but I'd want to benchmark carefully. Since you took it to mean the right thing, but found that surprising enough to doubt that it was accurate, do you have any particular suggestions about how to clarify it? (Maybe just, "Yeah, that's unconventional, but it works and is faster than the alternatives we've tried."?) > * Summary > > Great patch! Thanks! > I didn't make it through the patch in as much detail as I would > have liked, because the theory behind it is quite complex and it > will take longer for me to absorb. Yeah, there are some mind-bending ideas in there. If I recall correctly, Dan said he spent one week reading the academic papers on which it was based and another week reading the patch before he felt comfortable starting to code the parts he wrote. I spent a lot more time than that reading papers, and I still didn't quite have my head around some of the concepts until I went back to some of Fekete's work which led up to this innovation. While many of the concepts go back to the '80s and '90s, I found Fekete's work "clicked" with me better than some of the earlier, more theoretical work because he was looking at real-world examples of how these manifest in actual, working production software and what techniques could be used to identify and mitigate the problems. Sometimes the theory is hard for me to properly digest without such concrete details. > But the implementation looks good and the use case is very > important. It certainly is for our shop, and I've heard from others who are very interested in it. I think, that it's generally of more benefit to "big shops" than situations where you have just a few programmers coding against a schema with just a few dozen tables; although it's also somewhat a matter of style: I would rather deal with *one* issue (serialization failures) in one place than scatter defensive code all over my application codebase, even on a moderate scale. Anyway, given the issues raised and the date, it seems reasonable to mark this as "Returned with Feedback" for CF management purposes. I'll get fixes out as soon as I can. -Kevin
pgsql-hackers by date: