Re: SSI patch version 14 - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: SSI patch version 14 |
Date | |
Msg-id | 1295946459.11513.338.camel@jdavis Whole thread Raw |
In response to | SSI patch version 14 ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Responses |
Re: SSI patch version 14
Re: SSI patch version 14 |
List | pgsql-hackers |
On Mon, 2011-01-24 at 21:30 -0600, Kevin Grittner wrote: > Attached. > > Dan and I have spent many, many hours desk-check and testing, > including pounding for many hours in DBT-2 at high concurrency > factors on a 16 processor box. In doing so, we found and fixed a few > issues. Neither of us is aware of any bugs or deficiencies > remaining, even after a solid day of pounding in DBT-2, other than > the failure to extend any new functionality to hot standby. > > At Heikki's suggestion I have included a test to throw an error on an > attempt to switch to serializable mode during recovery. Anything > more to address that issue can be a small follow-up patch -- probably > mainly a few notes in the docs. Here is my first crack at a review: First of all, I am very impressed with the patch. I was pleased to see that both READ ONLY DEFERRABLE and 2PC work! Also, I found the patch very usable and did not encounter any bugs or big surprises. Second, I do not understand this patch entirely, so the following statements can be considered questions as much as answers. At a high level, there is a nice conceptual simplicity. Let me try to summarize it as I understand it: * RW dependencies are detected using predicate locking. * RW dependencies are tracked fromthe reading transaction (as an "out") conflict; and from the writing transaction (as an "in" conflict). * Beforecommitting a transaction, then by looking only at the RW dependencies (and predicate locks) for current and past transactions, you can determine if committing this transaction will result in a cycle (and therefore a serializationanomaly); and if so, abort it. That's where the simplicity ends, however ;) For starters, the above structures require unlimited memory, while we have fixed amounts of memory. The predicate locks are referenced by a global shared hash table as well as per-process SHMQueues. It can adapt memory usage downward in three ways: * increase lock granularity -- e.g. change N page locks into a table lock * "summarization"-- fold multiple locks on the same object across many old committed transactions into a single lock * usethe SLRU Tracking of RW conflicts of current and past transactions is more complex. Obviously, it doesn't keep _all_ past transactions, but only ones that overlap with a currently-running transaction. It does all of this management using SHMQueue. There isn't much of an attempt to gracefully handle OOM here as far as I can tell, it just throws an error if there's not enough room to track a new transaction (which is reasonable, considering that it should be quite rare and can be mitigated by increasing max_connections). The heavy use of SHMQueue is quite reasonable, but for some reason I find the API very difficult to read. I think it would help (at least me) quite a lot to annotate (i.e. with a comment in the struct) the various lists and links with the types of data being held. The actual checking of conflicts isn't quite so simple, either, because we want to detect problems before the victim transaction has done too much work. So, checking is done along the way in several places, and it's a little difficult to follow exactly how those smaller checks add up to the overall serialization-anomaly check (the third point in my simple model). There are also optimizations around transactions declared READ ONLY. Some of these are a little difficult to follow as well, and I haven't followed them all. There is READ ONLY DEFERRABLE, which is a nice feature that waits for a "safe" snapshot, so that the transaction will never be aborted. Now, on to my code comments (non-exhaustive): * I see that you use a union as well as using "outLinks" to also be a free list. I suppose you did this to conserve shared memory space, right? * In RegisterSerializableTransactionInt(), for a RO xact, it considers any concurrent RW xact a possible conflict. It seems like it would be possible to know whether a RW transaction may have overlapped with any committed RW transaction (in finishedLink queue), and only add those as potential conflicts. Would that work? If so, that would make more snapshots safe. * When a transaction finishes, then PID should probably be set to zero. You only use it for waking up a deferrable RO xact waiting for a snapshot, right? * Still some compiler warnings: twophase.c: In function ‘FinishPreparedTransaction’: twophase.c:1360: warning: implicit declaration of function ‘PredicateLockTwoPhaseFinish’ * You have a function called CreatePredTran. We are already using "Transaction" and "Xact" -- we probably don't need "Tran" as well. * HS error has a typo: "ERROR: cannot set transaction isolationn level to serializable during recovery" * I'm a little unclear on summarization and writing to the SLRU. I don't see where it's determining that the predicate locks can be safely released. Couldn't the oldest transaction still have relevant predicate locks? * In RegisterSerializableTransactionInt, if it doesn't get an sxact, it goes into summarization. But summarization assumes that it has at least one finished xact. Is that always true? If you have enough memory to hold a transaction for each connection, plus max_prepared_xacts, plus one, I think that's true. But maybe that could be made more clear? I'll keep working on this patch. I hope I can be of some help getting this committed, because I'm looking forward to this feature. And I certainly think that you and Dan have applied the amount of planning, documentation, and careful implementation necessary for a feature like this. Regards,Jeff Davis
pgsql-hackers by date: