Mark Mielke <mark@mark.mielke.cc> writes:
> Jeff Davis wrote:
>> Also, there is probably a lot of memory copying going on, and that
>> probably destroys a lot of the effectiveness of L2 caching. When L2
>> caching is ineffective, the CPU spends a lot of time just waiting on
>> memory. In that case, it's better to have P threads of execution all
>> waiting on memory operations in parallel.
>>
> I didn't consider the high throughput / high latency effect. This could
> be true if the CPU prefetch isn't effective enough.
Note that if this is the argument, then there's a ceiling on the speedup
you can expect to get: it's just the extent of mismatch between the CPU
and memory speeds. I can believe that suitable test cases would show
2X improvement for 2 threads, but it doesn't follow that you will get
10X improvement with 10 threads, or even 4X with 4.
regards, tom lane