XLogInsert scaling, revisited - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject XLogInsert scaling, revisited
Date
Msg-id 505B3648.1040801@vmware.com
Whole thread Raw
Responses Re: XLogInsert scaling, revisited  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: XLogInsert scaling, revisited  (Simon Riggs <simon@2ndQuadrant.com>)
Re: XLogInsert scaling, revisited  (Fujii Masao <masao.fujii@gmail.com>)
Re: XLogInsert scaling, revisited  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
I've been slowly continuing to work that I started last winder to make
XLogInsert scale better. I have tried quite a few different approaches
since then, and have settled on the attached. This is similar but not
exactly the same as what I did in the patches I posted earlier.

The basic idea, like before, is to split WAL insertion into two phases:

1. Reserve the right amount of WAL. This is done while holding just a
spinlock. Thanks to the changes I made earlier to the WAL format, the
space calculations are now much simpler and the critical section boils
down to almost just "CurBytePos += size_of_wal_record". See
ReserveXLogInsertLocation() function.

2. Copy the WAL record to the right location in the WAL buffers. This
slower part can be done mostly in parallel.

The difficult part is tracking which insertions are currently in
progress, and being able to wait for an insertion to finish copying the
record data in place. I'm using a small number (7 at the moment) of WAL
insertion slots for that. The first thing that XLogInsert does is to
grab one of the slots. Each slot is protected by a LWLock, and
XLogInsert reserves a slot by acquiring its lock. It holds the lock
until it has completely finished copying the WAL record in place. In
each slot, there's an XLogRecPtr that indicates how far the current
inserter has progressed with its insertion. Typically, for a short
record that fits on a single page, it is updated after the insertion is
finished, but if the insertion needs to wait for a WAL buffer to become
available, it updates the XLogRecPtr before sleeping.

To wait for all insertions up to a point to finish, you scan all the
insertion slots, and observe that the XLogRecPtrs in them are >= the
point you're interested in. The number of slots is a tradeoff: more
slots allow more concurrency in inserting records, but makes it slower
to determine how far it can be safely flushed.

I did some performance tests with this, on an 8-core HP Proliant server,
in a VM running under VMware vSphere 5.1. The tests were performed with
Greg Smith's pgbench-tools kit, with one of two custom workload scripts:

1. Insert 1000 rows in each transaction. This is exactly the sort of
workload where WALInsertLock currently becomes a bottleneck. Without the
the patch, the test scales very badly, with about 420 TPS with a single
client, peaking only at 520 TPS with two clients. With the patch, it
scales up to about 1200 TPS, with 7 clients. I believe the test becomes
I/O limited at that point; looking at iostat output while the test is
running shows about 200MB/s of writes, and that is roughly what the I/O
subsystem of this machine can do, according to a simple test with 'dd
...; sync". Or perhaps having more "insertion slots" would allow it to
go higher - the patch uses exactly 7 slots at the moment.

http://hlinnaka.iki.fi/xloginsert-scaling/results-1k/

2. Insert only 10 rows in each transaction. This simulates an OLTP
workload with fairly small transactions. The patch doesn't make a huge
difference with that workload. It performs somewhat worse with 4-16
clients, but then somewhat better with > 16 clients. The patch adds some
overhead to flushing the WAL, I believe that's what's causing the
slowdown with 4-16 clients. But with more clients, the WALInsertLock
bottleneck becomes more significant, and you start to see a benefit again.

http://hlinnaka.iki.fi/xloginsert-scaling/results-10/

Overall, the results look pretty good. I'm going to take a closer look
at the slowdown in the second test. I think it might be fixable with
some changes to how WaitInsertionsToFinish() and WALWriteLock work
together, although I'm not sure how exactly it ought to work.

Comments, ideas?

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Kohei KaiGai
Date:
Subject: Re: [v9.3] Extra Daemons (Re: elegant and effective way for running jobs inside a database)
Next
From: Tom Lane
Date:
Subject: Re: XLogInsert scaling, revisited