Re: Moving more work outside WALInsertLock - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Moving more work outside WALInsertLock
Date
Msg-id 4EF43837.8040306@enterprisedb.com
Whole thread Raw
In response to Re: Moving more work outside WALInsertLock  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Moving more work outside WALInsertLock
List pgsql-hackers
On 16.12.2011 15:42, Heikki Linnakangas wrote:
> On 16.12.2011 15:03, Simon Riggs wrote:
>> On Fri, Dec 16, 2011 at 12:50 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> On 16.12.2011 14:37, Simon Riggs wrote:
>>>>
>>>> I already proposed a design for that using page-level share locks any
>>>> reason not to go with that?
>>>
>>> Sorry, I must've missed that. Got a link?
>>
>> From nearly 4 years ago.
>>
>> http://grokbase.com/t/postgresql.org/pgsql-hackers/2008/02/reworking-wal-locking/145qrhllcqeqlfzntvn7kjefijey
>>
>
> Ah, thanks. That is similar to what I'm experimenting, but a second
> lwlock is still fairly heavy-weight. I think with many backends, you
> will be beaten badly by contention on the spinlocks alone.
>
> I'll polish up and post what I've been experimenting with, so we can
> discuss that.

So, here's a WIP patch of what I've been working on. The WAL insertions 
is split into two stages:

1. Reserve the space from the WAL stream. This is done while holding a 
spinlock. The page holding the reserved space doesn't necessary need to 
be in cache yet, the reservation can run ahead of the WAL buffer cache. 
(quick testing suggests that a lwlock is too heavy-weight for this)

2. Ensure the page is in the WAL buffer cache. If not, initialize it, 
evicting old pages if needed. Then finish the CRC calculation of the 
header and memcpy the record in place. (if the record spans multiple 
pages, it operates on one page at a time, to avoid problems with running 
out of WAL buffers)

As long as wal_buffers is high enough, and the I/O can keep up, stage 2 
can happen in parallel in many backends. The WAL writer process 
pre-initializes new pages ahead of the insertions, so regular backends 
rarely need to do that.

When a page is written out, with XLogWrite(), you need to wait for any 
in-progress insertions to the pages you're about to write out to finish. 
For that, every backend has slot with an XLogRecPtr in shared memory. 
Iẗ́'s set to the position where that backend is currently inserting to. 
If there's no insertion in-progress, it's invalid, but when it's valid 
it acts like a barrier, so that no-one is allowed to XLogWrite() beyond 
that position. That's very lightweight to the backends, but I'm using 
busy-waiting to wait on an insertion to finish ATM. That should be 
replaced with something smarter, that's the biggest missing part of the 
patch.

One simple way to test the performance impact of this is:

psql -c "DROP TABLE IF EXISTS foo; CREATE TABLE foo (id int4); 
CHECKPOINT" postgres
echo "BEGIN; INSERT INTO foo SELECT i FROM generate_series(1, 10000) i; 
ROLLBACK" > parallel-insert-test.sql
pgbench -n -T 10 -c4 -f parallel-insert-test.sql postgres

On my dual-core laptop, this patch increases the tps on that from about 
60 to 110.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Allow substitute allocators for PGresult.
Next
From: Heikki Linnakangas
Date:
Subject: Re: Moving more work outside WALInsertLock