Thread: SSI patch(es)

SSI patch(es)

From
"Kevin Grittner"
Date:
I just finished implementing the SLRU techniques to limit shared
memory usage and provide graceful degradation under pessimal loads
(as suggested by Heikki), Dan seems to be wrapping up work on
preventing non-serializable transactions from being rolled back with
a serialization failure if they split a predicate-locked page at the
point were we're running out of space to allocate predicate locks (as
suggested by Heikki), and John's working on documentation.
We've recently committed documentation for new GUCs, modified
statements, and the new switch on pg_dump.  The main things I see
that we still need in documentation are a README.SSI file and some
serious work in mvcc.sgml.  I'm going through the old emails to see
what issues people may have raised that might need to be addressed;
besides making the AMs for GIN, GiST, and hash SSI aware (so that
they have fewer false positive rollbacks than with the default
handling), are there any issues people want to be sure I look at
before posting a patch?
Then there's the question of whether to submit it in pieces.  There
are going to be big chunks no matter how I slice it, but here are the
ideas I have.  (All numbers are for context diff format.)
If I cut a patch right now for everything, it would be 7742 lines.
Right now a patch of the doc/ changes would be 413 lines.
If I split out the src/test/regress/ part it would be 1340 lines,
mostly python code for dtester tests.
If I split out just the src/bin/pg_dump/ changes it would be 98
lines.
Splitting out those three would leave src/backend/ and src/include/
which comes in at a svelte 5891 lines.
With a little more work I could split the three new files
(predicate.c, predicate.h, and predicate_internals.h) out from the
changes scattered around the rest of the code.  That's 4346 lines and
1545 lines, respectively.

Now, these numbers are likely to change a little in the next few
days, but not much as a percentage outside the documentation.
Thoughts?
-Kevin



Re: SSI patch(es)

From
Robert Haas
Date:
On Sat, Jan 8, 2011 at 4:10 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Splitting out those three would leave src/backend/ and src/include/
> which comes in at a svelte 5891 lines.
>
> With a little more work I could split the three new files
> (predicate.c, predicate.h, and predicate_internals.h) out from the
> changes scattered around the rest of the code.  That's 4346 lines and
> 1545 lines, respectively.
>
> Now, these numbers are likely to change a little in the next few
> days, but not much as a percentage outside the documentation.
>
> Thoughts?

Well, my first thought is - I'm not sure it's realistic to think we're
going to get this committed to 9.1.

But that's not a very helpful thought.  I just don't know who is going
to review 7700 lines of code in the next month, and it doesn't sound
like it can be decomposed into independent subfeatures that can be
committed independently.  Splitting it up by directory isn't really
all that helpful.  I hope someone will step up to the plate; I'm
pretty sure I can't do it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: SSI patch(es)

From
"Kevin Grittner"
Date:
Robert Haas  wrote:
> Well, my first thought is - I'm not sure it's realistic to think
> we're going to get this committed to 9.1.
>
> But that's not a very helpful thought. I just don't know who is
> going to review 7700 lines of code in the next month, and it
> doesn't sound like it can be decomposed into independent
> subfeatures that can be committed independently. Splitting it up by
> directory isn't really all that helpful. I hope someone will step
> up to the plate; I'm pretty sure I can't do it.
I hope so, too.
FWIW, I submitted this patch with almost 2000 fewer lines in what I
hoped was a form suitable for initial commit in the 2010-09 CF,
knowing full well there were a number of optimizations and
improvements I would like to get in before release.  But Heikki felt
that it wasn't acceptable without those changes -- and for reasons
which I find totally understandable.  There's sort of a Catch-22 here
for large features like this -- if you submit them in skeletal form
they aren't accepted because we don't want code in the official
repository which isn't production quality yet.  But if you flesh it
out to where it is production quality, then it's large enough to be
hard to review.  I know this isn't the first time this issue has been
brought up, but I'm feeling it keenly at the moment.
There are three contributors who have already been through the code
for this patch in sufficient detail to help advance it -- and I'm
most grateful for what they've already done.  Hopefully those who
have already done that won't find it too hard to digest the patch
with its latest improvements, and will have the time and inclination
to give it a go.
One thing that would help a lot besides code review is performance
testing.  I did some months ago and I know Dan booked time on MIT
benchmarking systems and got good numbers, but with the refactoring
it would be good to redo that, and benchmarking properly can be very
time consuming.  Existing benchmark software might need to be tweaked
to retry transactions which fail with SQLSTATE 40001, or at least
continue on with out counting those in TPS figures, since
applications using this feature will generally have frameworks which
automatically do retries for that SQLSTATE.
-Kevin


Re: SSI patch(es)

From
Heikki Linnakangas
Date:
On 09.01.2011 05:06, Robert Haas wrote:
> On Sat, Jan 8, 2011 at 4:10 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov>  wrote:
>> Splitting out those three would leave src/backend/ and src/include/
>> which comes in at a svelte 5891 lines.
>>
>> With a little more work I could split the three new files
>> (predicate.c, predicate.h, and predicate_internals.h) out from the
>> changes scattered around the rest of the code.  That's 4346 lines and
>> 1545 lines, respectively.
>>
>> Now, these numbers are likely to change a little in the next few
>> days, but not much as a percentage outside the documentation.
>>
>> Thoughts?
>
> Well, my first thought is - I'm not sure it's realistic to think we're
> going to get this committed to 9.1.
>
> But that's not a very helpful thought.  I just don't know who is going
> to review 7700 lines of code in the next month, and it doesn't sound
> like it can be decomposed into independent subfeatures that can be
> committed independently.

I'm tempted to raise my hand and volunteer, but I don't want to make 
promises I might not be able to keep. But I'll definitely at least take 
another look at it.

I don't think pushing this to 9.2 helps at all. This patch has gone 
through several rounds of reviews already, and unless someone sees a 
major issue with it, it's not going to get any more ready by postponing 
it. And it wouldn't be fair to Kevin and others who've worked hard on 
it. We'll just have to slurp it in somehow.

>  Splitting it up by directory isn't really
> all that helpful.

Agreed. If it can't easily be split into increments by functionality, 
then we'll just have to deal with it as on big patch.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: SSI patch(es)

From
Dan Ports
Date:
On Sat, Jan 08, 2011 at 10:20:22PM -0600, Kevin Grittner wrote:
> One thing that would help a lot besides code review is performance
> testing.  I did some months ago and I know Dan booked time on MIT
> benchmarking systems and got good numbers, but with the refactoring
> it would be good to redo that, and benchmarking properly can be very
> time consuming.  Existing benchmark software might need to be tweaked
> to retry transactions which fail with SQLSTATE 40001, or at least
> continue on with out counting those in TPS figures, since
> applications using this feature will generally have frameworks which
> automatically do retries for that SQLSTATE.

I can certainly try to get a more complete set of DBT-2 results -- and
possibly even do that in a timely manner :-) -- but I doubt I'll have
time in the near future to do anything more comprehensive.

It would be great to have some more results beyond DBT-2/TPC-C.
Although it's certainly an interesting benchmark, it's known not to
exhibit any serialization anomalies under snapshot isolation. (And, of
course, it's seek-bound, so results may not be representative of
workloads that aren't.)

Dan

-- 
Dan R. K. Ports              MIT CSAIL                http://drkp.net/