All,
This is my first patch review for PostgreSQL. I did submit a patch last
commit fest (Boyer-Moore) so I feel I should review one this commit fest.
I'm quite new to PostgreSQL so please don't rely on me totally. I'll do my
best. Heikki is also reviewing this patch which makes me feel better.
My aim is to get the author has much feed back as quickly as possible. For
this reason I'm going to be breaking down my reviews into the following
topics.
1. Does patch apply cleanly?
2. Any compiler warnings?
3. Do the results follow the SQL standard?
4. Performance Comparison, does it perform better than alternate ways of
doing things. Self joins, sub queries etc.
5. Performance, no comparison. How does it perform with larger tables?
Things I probably won't attempt to review:
Source code; best practises, making use of existing APIs etc. I'd rather
leave that for Heikki and possibly others that join in reviewing this patch.
It's not that I'm too lazy, just that I don't feel that I know the source
well enough. Plus it's a complex patch.
Really I should follow my list in order but I'm going to do number 4 first
in order to get some quick feedback to the author.
I've created some "real world" tests where windowing functions will be
useful. I created some tables then populated with data. I then wrote 2
queries; 1 to make use of windowing functions, the other that uses a method
without windowing functions.
Test Results:
Test Normal Windowing UOM Increase %
Test 1 498.00 578.00 Trans/Sec 16.06%
Test 2 336.00 481.00 Trans/Sec 43.15%
Test 3 1.30 8.45 Trans/Sec 550.00%
Test 4 424.00 629.00 Trans/Sec 48.35%
Test 5 8.89 31052.69 Trans/Hour 349114.85%
Test 6 253.00 305.00 Trans/Sec 20.55%
(Please see attached document for the actual tests)
Note: The above results will much depend on the set of data. Most of my
tests use a very small volume of data. Test 3 and 5 use more data that the
other tests. It's quite obvious that the more data there is in my tests the
bigger the margin between the two methods becomes. I originally ran test 3
with 40000 rows to simulate a large marathon but the "normal query" was
going to take hours... I reduced the rows to 10000.
Obervations:
Test 3 and 5 did not seem to make use of an index to get a sorted list of
results. I disabled enable_seqscan but the planner still failed to choose
index_scan. Is there any reason for this? Perhaps I'm missing something.
Hitoshi, can you take a look at this?
Tests:
Please see attached file. Perhaps there were more efficient ways for certain
queries, I just couldn't think of them...
Please let me know if you feel I should be conducting the review in another
way.
David.