Re: pgbench vs. wait events - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: pgbench vs. wait events
Date
Msg-id CAB7nPqR2_0fqRH94-sqZq0DKLQTh7yjWC3HmCkvxvKHQAKV8qw@mail.gmail.com
Whole thread Raw
In response to pgbench vs. wait events  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Oct 7, 2016 at 3:38 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I decided to do some testing on hydra (IBM-provided community
> resource, POWER, 16 cores/64 threads, kernel 3.2.6-3.fc16.ppc64) using
> the newly-enhanced wait event stuff to try to get an idea of what
> we're waiting for during pgbench.  I did 30-minute pgbench runs with
> various configurations, but all had max_connections = 200,
> shared_buffers = 8GB, maintenance_work_mem = 4GB, synchronous_commit =
> off, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9,
> log_line_prefix = '%t [%p] ', max_wal_size = 40GB, log_checkpoints =
> on.  During each run, I ran this psql script in another window and
> captured the output:

Nice. Thanks for sharing.

> \t
> select wait_event_type, wait_event from pg_stat_activity where pid !=
> pg_backend_pid()
> \watch 0.5
>
> Then, I used a little shell-scripting to count up the number of times
> each wait event occurred in the output.

Or an INSERT SELECT on an unlogged table? No need of extra maths then.

> First, I tried scale factor
> 3000 with 32 clients and got these results:
>
>       1  LWLockTranche   | buffer_mapping
> [...]
>   21014  LWLockNamed     | WALWriteLock
>   28497                  |
>   58279  LWLockTranche   | wal_insert
>
> tps = 1150.803133 (including connections establishing)
>
> What jumps out here is, at least to me, is that there is furious
> contention on both the wal_insert locks and on WALWriteLock.
> Apparently, the system simply can't get WAL on disk fast enough to
> keep up with this workload.   Relation extension locks and
> buffer_content locks also are also pretty common, both ahead of
> ClientRead, a relatively uncommon wait event on this test.  The load
> average on the system was only about 3 during this test, indicating
> that most processes are in fact spending most of their time off-CPU.

Increasing the number of WAL insert slots would help? With your tests
this is at 8 all the time.

> The first thing I tried was switching to unlogged tables, which
> produces these results:
>
>       1  BufferPin       | BufferPin
>       1  LWLockTranche   | lock_manager
>       2  LWLockTranche   | buffer_mapping
>       8  LWLockNamed     | ProcArrayLock
>       9  LWLockNamed     | CheckpointerCommLock
>       9  LWLockNamed     | CLogControlLock
>      11  LWLockTranche   | buffer_content
>      37  LWLockTranche   | clog
>     153  Lock            | tuple
>     388  LWLockNamed     | XidGenLock
>     827  Lock            | transactionid
>    1267  Client          | ClientRead
>   20631  Lock            | extend
>   91767                  |
>
> tps = 1223.239416 (including connections establishing)
>
> If you don't look at the TPS number, these results look like a vast
> improvement.  The overall amount of time spent not waiting for
> anything is now much higher, and the problematic locks have largely
> disappeared from the picture.  However, the load average now shoots up
> to about 30, because most of the time that the backends are "not
> waiting for anything" they are in fact in kernel wait state D; that
> is, they're stuck doing I/O.  This suggests that we might want to
> consider advertising a wait state when a backend is doing I/O, so we
> can measure this sort of thing.

Maybe something in fd.c.. It may be a good idea to actually have a
look at a perf output on Linux to see where this contention is
happening, use this conclusion to decide the place of a wait point,
and then see if on other OSes share a similar pattern.
-- 
Michael



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: amcheck (B-Tree integrity checking tool)
Next
From: Jeff Janes
Date:
Subject: Re: VACUUM's ancillary tasks