On Tue, Jun 26, 2012 at 3:58 PM, Nils Goroll <slink@schokola.de> wrote:
>> It's
>> still unproven whether it'd be an improvement, but you could expect to
>> prove it one way or the other with a well-defined amount of testing.
>
> I've hacked the code to use adaptive pthread mutexes instead of spinlocks. see
> attached patch. The patch is for the git head, but it can easily be applied for
> 9.1.3, which is what I did for my tests.
>
> This had disastrous effects on Solaris because it does not use anything similar
> to futexes for PTHREAD_PROCESS_SHARED mutexes (only the _PRIVATE mutexes do
> without syscalls for the simple case).
>
> But I was surprised to see that it works relatively well on linux. Here's a
> glimpse of my results:
>
> hacked code 9.1.3:
...
> tps = 485.964355 (excluding connections establishing)
> original code (vanilla build on amd64) 9.1.3:
...
> tps = 510.410883 (excluding connections establishing)
It looks like the hacked code is slower than the original. That
doesn't seem so good to me. Am I misreading this?
Also, 20 transactions per connection is not enough of a run to make
any evaluation on.
How many cores are you testing on?
> Regarding the actual production issue, I did not manage to synthetically provoke
> the saturation we are seeing in production using pgbench - I could not even get
> anywhere near the production load.
What metrics/tools are you using to compare the two loads? What is
the production load like?
Each transaction has to update one of ten pgbench_branch rows, so you
can't have more than ten transactions productively active at any given
time, even though you have 768 connections. So you need to jack up
the pgbench scale, or switch to using -N mode.
Also, you should use -M prepared, otherwise you spend more time
parsing and planning the statements than executing them.
Cheers,
Jeff