On Mon, Mar 12, 2012 at 10:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Mar 12, 2012 at 12:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> On Nate Boley's machine, the difference was ~100% increase rather than
>> ~10%.
>
> Oh, right. I had forgotten how dramatic the changes were in those
> test runs. I guess I should be happy that the absolute numbers on
> this machine were as high as they were. This machine seems to be
> beating that one on every metric.
>
>> Do you think the difference is in the CPU architecture, or the
>> IO subsystem?
>
> That is an excellent question. I tried looking at vmstat output, but
> a funny thing kept happening: periodically, the iowait column would
> show a gigantic negative number instead of a number between 0 and 100.
On which machine was that happening?
> This makes me a little chary of believing any of it. Even if I did,
> I'm not sure that would fully answer the question. So I guess the
> short answer is that I don't know, and I'm not even sure how I might
> go about figuring it out. Any ideas?
Rerunning all 4 benchmarks (both 16MB and 32MB wal_buffers on both
machines) with fsync=off (as well as synchronous_commit=off still)
might help clarify things.
If it increases the TPS of Nate@16MB, but doesn't change the other 3
situations much, then that suggests the IO system is driving it.
Basically moving up to 32MB is partially innoculating against slow
fsyncs upon log switch on that machine.
Does the POWER7 have a nonvolatile cache? What happened with
synchronous_commit=on?
Also, since all data fits in shared_buffers, making
checkpoint_segments and checkpoint_timeout be larger than the
benchmark period should remove the only other source of writing from
the system. With no checkpoints, no evictions, and no fysncs, it is
unlikely for the remaining IO to be the bottleneck.
Cheers,
Jeff