Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)
Date
Msg-id 4F69C119.9050301@enterprisedb.com
Whole thread Raw
In response to Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)  (Robert Haas <robertmhaas@gmail.com>)
Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 21.03.2012 13:14, Fujii Masao wrote:
>      PANIC:  space reserved for WAL record does not match what was
> written, CurrPos: C/0, EndPos: B/FF000000
>
> So I think that the patch would have a bug which handles WAL boundary wrongly.

Thanks for the testing! These WAL boundary issues are really tricky, you 
found bugs in that area before, and I found and fixed one before posting 
the last version, and apparently there's still at least one left.

Overall, what do you (and others) think about the state of this patch? 
I'm starting to feel that this needs to be pushed to 9.3. That bug might 
not be very hard to fix, but the fact that these bugs are still cropping 
up at this late stage makes me uneasy. That boundary calculation in 
particular is surprisingly tricky, and I think it could be made less 
tricky with some refactoring of the WAL-logging code, replacing 
XLogRecPtr with uint64s, like Peter (IIRC) suggested a while ago. And 
that seems like 9.3 material. Also, there's still these two known issues:

1. It slows down the WAL insertion in a single backend by about 10%
2. With lots of backends inserting tiny records concurrently, you get 
spinlock contention, which consumes a lot of CPU. Without the patch, you 
get lwlock contention and bandwidth isn't any better, but you sleep 
rather than spin.

I'm afraid those issues aren't easily fixable. I haven't been able to 
identify the source of slowdown in the single-backend case, it seems to 
be simply the distributed cost of the extra bookkeeping. That might be 
acceptable, 10% slowdown of raw WAL insertion speed is not good, but WAL 
insertion only accounts for a fraction of the total CPU usage for any 
real workload, so I believe the slowdown of a real application would be 
more like 1-3%, at worst. But I would feel more comfortable if we had 
more time to test that.

The spinlock contention issue might be acceptable too. I think it would 
be hard to run into it in a real application, and even then, the 
benchmarks show that although you spend a lot of CPU time spinning, you 
get at least the same overall bandwidth with the patch, which is what 
really matters. And I think it could be alleviated by reducing the time 
the spinlock is held, and I think that could be done by making the space 
reservation calculations simpler. If we got rid of the limitation that 
the WAL record header is never split across WAL pages, and always stored 
the continuation record header on all WAL pages, the space reservation 
calculation could be reduced to essentially "currentpos += size". But 
that again seems 9.3 material.

So, although none of the issues alone is a show-stopper, but considering 
all these things together, I'm starting to feel that this needs to be 
pushed to 9.3. Thoughts?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)
Next
From: Robert Haas
Date:
Subject: Re: Proposal: Create index on foreign table