Re: mosbench revisited - Mailing list pgsql-hackers

From Robert Haas
Subject Re: mosbench revisited
Date
Msg-id CA+TgmoYeS+RgQvnQEYNpA7JCjjd_0SkjSwF29Lrsy+vkGxcvrQ@mail.gmail.com
Whole thread Raw
In response to Re: mosbench revisited  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: mosbench revisited
List pgsql-hackers
On Wed, Aug 3, 2011 at 5:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> That still seems utterly astonishing to me.  We're touching each of
> those files once per query cycle; a cycle that contains two message
> sends, who knows how many internal spinlock/lwlock/heavyweightlock
> acquisitions inside Postgres (some of which *do* contend with each
> other), and a not insignificant amount of plain old computing.
> Meanwhile, this particular spinlock inside the kernel is protecting
> what, a single doubleword fetch?  How is that the bottleneck?

Spinlocks seem to have a very ugly "tipping point".  When I tested
pgbench -S on a 64-core system with the lazy vxid patch applied and a
patch to use random_r() in lieu of random, the amount of system time
used per SELECT-only transaction at 48 clients was 3.59 times as much
as it was at 4 clients.  And the amount used per transaction at 52
clients was 3.63 times the amount used per transaction at 48 clients.
And the amount used at 56 clients was 3.25 times the amount used at 52
clients.  You can see the throughput graph starting to flatten out in
the 32-44 client range, but it's not particularly alarming.  However,
once you pass that point things rapidly get totally out of control in
a real hurry.  A few more clients and the machine is basically doing
nothing but spin.

> I am wondering whether kernel spinlocks are broken.

I don't think so.  Stefan Kaltenbrunner had one profile where he
showed something like sixty or eighty percent of the usermode CPU time
in s_lock.  I didn't have access to that particular hardware, but the
testing I've done strongly suggests that most of that was the
SInvalReadLock spinlock.  And before I patched pgbench to avoid
calling random(), that was doing the same thing - literally flattening
a 64-core box fighting over a single futex that normally costs almost
nothing.  (That one wasn't quite as bad because the futex actually
deschedules the waiters, but it was still bad.)  I'm actually not
really sure why it shakes out this way (birthday paradox?) but having
seen the effect several times now, I'm disinclined to believe it's an
artifact.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: jordani@go-link.net
Date:
Subject: Re: Incremental checkopints
Next
From: Robert Haas
Date:
Subject: Re: Compressing the AFTER TRIGGER queue