Thread: SSI patch(es)
I just finished implementing the SLRU techniques to limit shared memory usage and provide graceful degradation under pessimal loads (as suggested by Heikki), Dan seems to be wrapping up work on preventing non-serializable transactions from being rolled back with a serialization failure if they split a predicate-locked page at the point were we're running out of space to allocate predicate locks (as suggested by Heikki), and John's working on documentation. We've recently committed documentation for new GUCs, modified statements, and the new switch on pg_dump. The main things I see that we still need in documentation are a README.SSI file and some serious work in mvcc.sgml. I'm going through the old emails to see what issues people may have raised that might need to be addressed; besides making the AMs for GIN, GiST, and hash SSI aware (so that they have fewer false positive rollbacks than with the default handling), are there any issues people want to be sure I look at before posting a patch? Then there's the question of whether to submit it in pieces. There are going to be big chunks no matter how I slice it, but here are the ideas I have. (All numbers are for context diff format.) If I cut a patch right now for everything, it would be 7742 lines. Right now a patch of the doc/ changes would be 413 lines. If I split out the src/test/regress/ part it would be 1340 lines, mostly python code for dtester tests. If I split out just the src/bin/pg_dump/ changes it would be 98 lines. Splitting out those three would leave src/backend/ and src/include/ which comes in at a svelte 5891 lines. With a little more work I could split the three new files (predicate.c, predicate.h, and predicate_internals.h) out from the changes scattered around the rest of the code. That's 4346 lines and 1545 lines, respectively. Now, these numbers are likely to change a little in the next few days, but not much as a percentage outside the documentation. Thoughts? -Kevin
On Sat, Jan 8, 2011 at 4:10 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Splitting out those three would leave src/backend/ and src/include/ > which comes in at a svelte 5891 lines. > > With a little more work I could split the three new files > (predicate.c, predicate.h, and predicate_internals.h) out from the > changes scattered around the rest of the code. That's 4346 lines and > 1545 lines, respectively. > > Now, these numbers are likely to change a little in the next few > days, but not much as a percentage outside the documentation. > > Thoughts? Well, my first thought is - I'm not sure it's realistic to think we're going to get this committed to 9.1. But that's not a very helpful thought. I just don't know who is going to review 7700 lines of code in the next month, and it doesn't sound like it can be decomposed into independent subfeatures that can be committed independently. Splitting it up by directory isn't really all that helpful. I hope someone will step up to the plate; I'm pretty sure I can't do it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > Well, my first thought is - I'm not sure it's realistic to think > we're going to get this committed to 9.1. > > But that's not a very helpful thought. I just don't know who is > going to review 7700 lines of code in the next month, and it > doesn't sound like it can be decomposed into independent > subfeatures that can be committed independently. Splitting it up by > directory isn't really all that helpful. I hope someone will step > up to the plate; I'm pretty sure I can't do it. I hope so, too. FWIW, I submitted this patch with almost 2000 fewer lines in what I hoped was a form suitable for initial commit in the 2010-09 CF, knowing full well there were a number of optimizations and improvements I would like to get in before release. But Heikki felt that it wasn't acceptable without those changes -- and for reasons which I find totally understandable. There's sort of a Catch-22 here for large features like this -- if you submit them in skeletal form they aren't accepted because we don't want code in the official repository which isn't production quality yet. But if you flesh it out to where it is production quality, then it's large enough to be hard to review. I know this isn't the first time this issue has been brought up, but I'm feeling it keenly at the moment. There are three contributors who have already been through the code for this patch in sufficient detail to help advance it -- and I'm most grateful for what they've already done. Hopefully those who have already done that won't find it too hard to digest the patch with its latest improvements, and will have the time and inclination to give it a go. One thing that would help a lot besides code review is performance testing. I did some months ago and I know Dan booked time on MIT benchmarking systems and got good numbers, but with the refactoring it would be good to redo that, and benchmarking properly can be very time consuming. Existing benchmark software might need to be tweaked to retry transactions which fail with SQLSTATE 40001, or at least continue on with out counting those in TPS figures, since applications using this feature will generally have frameworks which automatically do retries for that SQLSTATE. -Kevin
On 09.01.2011 05:06, Robert Haas wrote: > On Sat, Jan 8, 2011 at 4:10 PM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: >> Splitting out those three would leave src/backend/ and src/include/ >> which comes in at a svelte 5891 lines. >> >> With a little more work I could split the three new files >> (predicate.c, predicate.h, and predicate_internals.h) out from the >> changes scattered around the rest of the code. That's 4346 lines and >> 1545 lines, respectively. >> >> Now, these numbers are likely to change a little in the next few >> days, but not much as a percentage outside the documentation. >> >> Thoughts? > > Well, my first thought is - I'm not sure it's realistic to think we're > going to get this committed to 9.1. > > But that's not a very helpful thought. I just don't know who is going > to review 7700 lines of code in the next month, and it doesn't sound > like it can be decomposed into independent subfeatures that can be > committed independently. I'm tempted to raise my hand and volunteer, but I don't want to make promises I might not be able to keep. But I'll definitely at least take another look at it. I don't think pushing this to 9.2 helps at all. This patch has gone through several rounds of reviews already, and unless someone sees a major issue with it, it's not going to get any more ready by postponing it. And it wouldn't be fair to Kevin and others who've worked hard on it. We'll just have to slurp it in somehow. > Splitting it up by directory isn't really > all that helpful. Agreed. If it can't easily be split into increments by functionality, then we'll just have to deal with it as on big patch. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Sat, Jan 08, 2011 at 10:20:22PM -0600, Kevin Grittner wrote: > One thing that would help a lot besides code review is performance > testing. I did some months ago and I know Dan booked time on MIT > benchmarking systems and got good numbers, but with the refactoring > it would be good to redo that, and benchmarking properly can be very > time consuming. Existing benchmark software might need to be tweaked > to retry transactions which fail with SQLSTATE 40001, or at least > continue on with out counting those in TPS figures, since > applications using this feature will generally have frameworks which > automatically do retries for that SQLSTATE. I can certainly try to get a more complete set of DBT-2 results -- and possibly even do that in a timely manner :-) -- but I doubt I'll have time in the near future to do anything more comprehensive. It would be great to have some more results beyond DBT-2/TPC-C. Although it's certainly an interesting benchmark, it's known not to exhibit any serialization anomalies under snapshot isolation. (And, of course, it's seek-bound, so results may not be representative of workloads that aren't.) Dan -- Dan R. K. Ports MIT CSAIL http://drkp.net/