Hi,
I've updated the patch to include the optimization described in the
previous post, i.e. if the number of relations is below a certain
threshold, use a simple for loop, for large numbers of relations use
bsearch calls.
This is done by a new constant BSEARCH_LIMIT, which is set to 10 in the
patch. Then I've modified the 'drop-test' script to take yet another
argument - number of indexes for each table. So for example this
./drop-test.py 10000 100 3 'dbname=test'
means "create 10000 tables, 3 indexes for each of them, and then drop
them in batches of 100 tables."
Then I've run the test with 0, 1, 2, ... 11 indexes for each table for
(a) unpatched HEAD
(b) patch v3.1 (without the optimization)
(c) patch v3.3 (with BSEARCH_LIMIT=10)
and I got these results:
1) dropping one-by-one
----------------------
This is the really interesting part - the issue with the v3.1 is that
for a single table, it's ~2x slower than unpatched PostgreSQL.
0 1 2 3 4 5 6 7 8 9 10 11
--------------------------------------------------------------
unpatched 16 28 40 52 63 75 87 99 110 121 135 147
v3.1 33 43 46 56 58 60 63 72 75 76 79 80
v3.3 16 20 23 25 29 33 36 40 44 47 79 82
The values are durations in seconds, rounded to integer values. I've run
the test repeatedly and there's very small variance in the numbers.
The v3.3 improves that and it's actually even faster than unpatched
PostgreSQL. How can this happen? I believe it's because the code is
rewritten from
for each relation (r) in the drop list
DropRelFileNodeAllBuffers (r)
for each shared buffer
check and invalidate
to
copy the relations from drop list to array (a)
DropRelFileNodeAllBuffers(a)
for each shared buffer
for each relation (r) in the array
check and invalidate
At least that's the only explanation I was able to come up with.
Yet another interesting observation is that even the v3.1 is about as
fast as the unpatched code once there are 3 or more indexes (or TOAST
tables).
So my opinion is that the optimizated patch works as expected, and that
even without the optimization the performance would be acceptable for
most real-world cases.
2) dropping in transaction
--------------------------
This is mostly to verify that the code did not break anything, because
the optimization should not kick-in in this case at all. And that seems
to be the case:
0 1 2 3 4 5 6 7 8 9 10 11
--------------------------------------------------------------
unpatched 13 24 35 46 58 69 81 92 103 115 127 139
v3.1 3 5 7 8 10 12 14 15 16 18 20 21
v3.3 3 4 6 7 8 10 11 13 14 15 18 20
The differences between v3.1 and v3.3 are mostly due to rounding etc.
Attached is the v3.3 patch and the testing script I've been using for
the tests above. Feel free to run the tests on your hardware, with your
hardware, shared buffers size etc. I've run that on a 4-core i5-2500 CPU
with 2GB shared buffers.
Tomas