Re: spinlock contention - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: spinlock contention
Date
Msg-id 534AA79D-9B14-4A1E-A3A7-9F0B69DD93F4@phlo.org
Whole thread Raw
In response to Re: spinlock contention  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: spinlock contention
List pgsql-hackers
On Jun28, 2011, at 22:18 , Robert Haas wrote:
> On Tue, Jun 28, 2011 at 2:33 PM, Florian Pflug <fgp@phlo.org> wrote:
>> [ testing of various spinlock implementations ]
>
> I set T=30 and N="1 2 4 8 16 32" and tried this out on a 32-core
> loaner from Nate Boley:

Cool, thanks!

> 100 counter increments per cycle
> worker    1        2        4        8        16    32
> time    wall    user    wall    user    wall    user    wall    user    wall    user    wall    user
> none    2.8e-07    2.8e-07    1.5e-07    3.0e-07    8.0e-08    3.2e-07    4.2e-08    3.3e-07    2.1e-08    3.3e-07
1.1e-08   3.4e-07 
> atomicinc    3.6e-07    3.6e-07    2.6e-07    5.1e-07    1.4e-07    5.5e-07    1.4e-07    1.1e-06    1.5e-07
2.3e-06   1.5e-07    4.9e-06 
> cmpxchng    3.6e-07    3.6e-07    3.4e-07    6.9e-07    3.2e-07    1.3e-06    2.9e-07    2.3e-06    4.2e-07
6.6e-06   4.5e-07    1.4e-05 
> spin    4.1e-07    4.1e-07    2.8e-07    5.7e-07    1.6e-07    6.3e-07    1.2e-06    9.4e-06    3.8e-06    6.1e-05
1.4e-05   4.3e-04 
> pg_lwlock    3.8e-07    3.8e-07    2.7e-07    5.3e-07    1.5e-07    6.2e-07    3.9e-07    3.1e-06    1.6e-06
2.5e-05   6.4e-06    2.0e-04 
> pg_lwlock_cas    3.7e-07    3.7e-07    2.8e-07    5.6e-07    1.4e-07    5.8e-07    1.4e-07    1.1e-06    1.9e-07
3.0e-06   2.4e-07    7.5e-06 

Here's the same table, formatted with spaces.

worker          1               2               4               8               16              32
time            wall    user    wall    user    wall    user    wall    user    wall    user    wall    user
none            2.8e-07 2.8e-07 1.5e-07 3.0e-07 8.0e-08 3.2e-07 4.2e-08 3.3e-07 2.1e-08 3.3e-07 1.1e-08 3.4e-07
atomicinc       3.6e-07 3.6e-07 2.6e-07 5.1e-07 1.4e-07 5.5e-07 1.4e-07 1.1e-06 1.5e-07 2.3e-06 1.5e-07 4.9e-06
cmpxchng        3.6e-07 3.6e-07 3.4e-07 6.9e-07 3.2e-07 1.3e-06 2.9e-07 2.3e-06 4.2e-07 6.6e-06 4.5e-07 1.4e-05
spin            4.1e-07 4.1e-07 2.8e-07 5.7e-07 1.6e-07 6.3e-07 1.2e-06 9.4e-06 3.8e-06 6.1e-05 1.4e-05 4.3e-04
pg_lwlock       3.8e-07 3.8e-07 2.7e-07 5.3e-07 1.5e-07 6.2e-07 3.9e-07 3.1e-06 1.6e-06 2.5e-05 6.4e-06 2.0e-04
pg_lwlock_cas   3.7e-07 3.7e-07 2.8e-07 5.6e-07 1.4e-07 5.8e-07 1.4e-07 1.1e-06 1.9e-07 3.0e-06 2.4e-07 7.5e-06

And here's the throughput table calculated from your results,
i.e. the wall time per cycle relative to the wall time per cycle
for 1 worker.

workers           2   4   8  16  32
none            1.9 3.5 6.7  13  26
atomicinc       1.4 2.6 2.6 2.4 2.4
cmpxchng        1.1 1.1 1.2 0.9 0.8
spin            1.5 2.6 0.3 0.1 0.0
pg_lwlock       1.4 2.5 1.0 0.2 0.1
pg_lwlock_cas   1.3 2.6 2.6 1.9 1.5

Hm, so in the best case we get 2.6x the throughput of a single core,
and that only for 4 and 8 workers (1.4e-7 second / cycle vs 3.6e-7).
In that case, there also seems to be little difference between
pg_lwlock{_cas} and atomicinc. atomicinc again manages to at least
sustain that throughput when the worker count is increased, while
for for the others the throughput actually *decreases*.

What totally puzzles me is that your results don't show any
trace of a decreased system load for the pg_lwlock implementation,
which I'd have expected due to the sleep() in the contested
path. Here are the user vs. wall time ratios - I'd have expected
to see value significantly below the number of workers for pg_lwlock

none          1.0 2.0 4.0 7.9 16 31
atomicinc     1.0 2.0 3.9 7.9 15 33
cmpxchng      1.0 2.0 4.1 7.9 16 31
spin          1.0 2.0 3.9 7.8 16 31
pg_lwlock     1.0 2.0 4.1 7.9 16 31
pg_lwlock_cas 1.0 2.0 4.1 7.9 16 31

> I wrote a little script to show to reorganize this data in a
> possibly-easier-to-understand format - ordering each column from
> lowest to highest, and showing each algorithm as a multiple of the
> cheapest value for that column:

If you're OK with that, I'd like to add that to the lockbench
repo.

> There seems to be something a bit funky in your 3-core data, but
> overall I read this data to indicate that 4 cores aren't really enough
> to see a severe problem with spinlock contention.

Hm, it starts to show if you lower the counter increment per cycle
(the D constant in run.sh). But yeah, it's never as bad as the
32-core results above..

best regards,
Florian Pflug



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: [v9.2] Fix leaky-view problem, part 1
Next
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade defaulting to port 25432