Home > mailing lists

Re: XLogInsert scaling, revisited - Mailing list pgsql-hackers

From	Fujii Masao
Subject	Re: XLogInsert scaling, revisited
Date	September 24, 2012 21:06:11
Msg-id	CAHGQGwHqT=xZ0JEgXB4XLWXj67F7p0NgL+6syGw0NO7Qzg_G0w@mail.gmail.com Whole thread Raw
In response to	XLogInsert scaling, revisited (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses	Re: XLogInsert scaling, revisited (Heikki Linnakangas <hlinnakangas@vmware.com>)
List	pgsql-hackers

Tree view

On Fri, Sep 21, 2012 at 12:29 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> I've been slowly continuing to work that I started last winder to make
> XLogInsert scale better. I have tried quite a few different approaches since
> then, and have settled on the attached. This is similar but not exactly the
> same as what I did in the patches I posted earlier.
>
> The basic idea, like before, is to split WAL insertion into two phases:
>
> 1. Reserve the right amount of WAL. This is done while holding just a
> spinlock. Thanks to the changes I made earlier to the WAL format, the space
> calculations are now much simpler and the critical section boils down to
> almost just "CurBytePos += size_of_wal_record". See
> ReserveXLogInsertLocation() function.
>
> 2. Copy the WAL record to the right location in the WAL buffers. This slower
> part can be done mostly in parallel.
>
> The difficult part is tracking which insertions are currently in progress,
> and being able to wait for an insertion to finish copying the record data in
> place. I'm using a small number (7 at the moment) of WAL insertion slots for
> that. The first thing that XLogInsert does is to grab one of the slots. Each
> slot is protected by a LWLock, and XLogInsert reserves a slot by acquiring
> its lock. It holds the lock until it has completely finished copying the WAL
> record in place. In each slot, there's an XLogRecPtr that indicates how far
> the current inserter has progressed with its insertion. Typically, for a
> short record that fits on a single page, it is updated after the insertion
> is finished, but if the insertion needs to wait for a WAL buffer to become
> available, it updates the XLogRecPtr before sleeping.
>
> To wait for all insertions up to a point to finish, you scan all the
> insertion slots, and observe that the XLogRecPtrs in them are >= the point
> you're interested in. The number of slots is a tradeoff: more slots allow
> more concurrency in inserting records, but makes it slower to determine how
> far it can be safely flushed.
>
> I did some performance tests with this, on an 8-core HP Proliant server, in
> a VM running under VMware vSphere 5.1. The tests were performed with Greg
> Smith's pgbench-tools kit, with one of two custom workload scripts:
>
> 1. Insert 1000 rows in each transaction. This is exactly the sort of
> workload where WALInsertLock currently becomes a bottleneck. Without the the
> patch, the test scales very badly, with about 420 TPS with a single client,
> peaking only at 520 TPS with two clients. With the patch, it scales up to
> about 1200 TPS, with 7 clients. I believe the test becomes I/O limited at
> that point; looking at iostat output while the test is running shows about
> 200MB/s of writes, and that is roughly what the I/O subsystem of this
> machine can do, according to a simple test with 'dd ...; sync". Or perhaps
> having more "insertion slots" would allow it to go higher - the patch uses
> exactly 7 slots at the moment.
>
> http://hlinnaka.iki.fi/xloginsert-scaling/results-1k/
>
> 2. Insert only 10 rows in each transaction. This simulates an OLTP workload
> with fairly small transactions. The patch doesn't make a huge difference
> with that workload. It performs somewhat worse with 4-16 clients, but then
> somewhat better with > 16 clients. The patch adds some overhead to flushing
> the WAL, I believe that's what's causing the slowdown with 4-16 clients. But
> with more clients, the WALInsertLock bottleneck becomes more significant,
> and you start to see a benefit again.
>
> http://hlinnaka.iki.fi/xloginsert-scaling/results-10/
>
> Overall, the results look pretty good. I'm going to take a closer look at
> the slowdown in the second test. I think it might be fixable with some
> changes to how WaitInsertionsToFinish() and WALWriteLock work together,
> although I'm not sure how exactly it ought to work.
>
> Comments, ideas?

Sounds good.

The patch could be applied cleanly and the compile could be successfully done.
But when I ran initdb, I got the following assertion error:

------------------------------------------
$ initdb -D data --locale=C --encoding=UTF-8
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C".
The default text search configuration will be set to "english".

creating directory data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
creating configuration files ... ok
creating template1 database in data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... TRAP: FailedAssertion("!(((uint64)
currpos) % 8192 >= (((intptr_t) ((sizeof(XLogPageHeaderData))) + ((8)
- 1)) & ~((intptr_t) ((8) - 1))) || rdata_len == 0)", File: "xlog.c",
Line: 1363)
sh: line 1: 29537 Abort trap: 6           "/dav/hoge/bin/postgres"
--single -F -O -c search_path=pg_catalog -c exit_on_error=true
template1 > /dev/null
child process exited with exit code 134
initdb: removing data directory "data"
------------------------------------------

I got the above problem on MacOS:

$ uname -a
Darwin hrk.local 11.4.0 Darwin Kernel Version 11.4.0: Mon Apr  9
19:32:15 PDT 2012; root:xnu-1699.26.8~1/RELEASE_X86_64 x86_64

Regards,

-- 
Fujii Masao

pgsql-hackers by date:

From: "md@rpzdesign.com"
Date: 24 September 2012, 20:24:37
Subject: Re: External Replication

From: Satoshi Nagayasu
Date: 24 September 2012, 21:29:37
Subject: Re: [PoC] load balancing in libpq

Re: XLogInsert scaling, revisited - Mailing list pgsql-hackers

Previous

Next