Kris Jurka wrote:
> I'm not super impressed with these timing results. They are certainly
> showing some effects due to GC, consider the rise in time here at 10.5MB.
The method isn't neccessarily much faster, especially when there are
only a few megabytes involved. This is very difficult to benchmark in
the presence of a garbage collector.
> I've committed this to CVS HEAD with a rather arbitrarily set
> MAX_3_BUFF_SIZE value of 2MB. Note that this is also the escaped size, so
> we may actually be dealing with output data a quarter of that size. If
> anyone could do some more testing of what a good crossover point would be
> that would be a good thing.
AFAIK the MAX_3_BUFF_SIZE entry was a debug artifact. Not needed any
more. The new method is always faster or at least as fast as the old
method, because it requires fewer memory accesses.
3 Buffers:
Buffer1 zeroing (vm intern)
Buffer1 filling
Buffer2 zeroing (vm intern)
Buffer1 reading
Buffer2 writing
Buffer3 zeroing (vm intern)
Buffer2 reading
Buffer3 writing
Total: 8 memory accesses.
Eventually Buffer3 reading, but that's not part of the driver.
2 Buffers:
Buffer1 zeroing (vm intern)
Buffer1 filling
Buffer1 reading (the new pass)
Buffer2 zeroing (vm intern)
Buffer1 reading
Buffer2 writing
Total: 6 memory accesses.
Conclusion: The new method uses less memory. It must be faster as well,
since everything else is fast in comparison to memory access.
Additionally, it requires only 2 allocations, and memory allocation have
some overhead as well, and mean more work for the garbage collector in
the end. Even if the VM can do some magic to avoid zeroing the buffers,
the newer method has one less memory access. It is always the winner.