Re: Bulk Inserts - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Bulk Inserts
Date
Msg-id f67928030909152004i103d7c4by3353826ac21aaa5d@mail.gmail.com
Whole thread Raw
In response to Re: Bulk Inserts  (Pierre Frédéric Caillaud<lists@peufeu.com>)
Responses Re: Bulk Inserts
List pgsql-hackers
2009/9/15 Pierre Frédéric Caillaud <lists@peufeu.com>
Does that heuristic change the timings much?  If not, it seems like it would
better to keep it simple and always do the same thing, like log the tuples
(if it is done under one WALInsertLock, which I am assuming it is..)

It is the logging of whole pages that makes it faster.
If you fill a page with tuples in one operation (while holding exclusive lock) and then insert WAL records for each tuple, there is no speed gain.

Inserting a full page WAL record (since you just filled the page completely) :

- only takes WalInsertLock once instead of once per tuple


OK, that makes sense.  I thought you had hacked either XLogInsert or the heap WAL replay code so that you could just accumulate tuples in the rdata chain and then submit them all under the cover of a single WALInsertLock.  If you haven't done that, then of course doing the bulk insert doesn't help much if you still to tuple-by-tuple XLogInsert.  So in the case that it is under the limit, you first run through the tuples putting them into the block, then run through the tuples again doing the XLogInserts?
 
- reduces wal traffic
- is about 2x faster in my benchmark

And inserting a "clear new page" record (if the page was previously new/empty and relation is fsync'd at the end) :

- only takes WalInsertLock once instead of once per tuple
- reduces wal traffic a lot
- is about 4x faster in my benchmark

Do you have an IO bottleneck even in the absence of fsyncs?  My experience on multi-core machines with decent IO systems has been that the amount of WAL traffic (by volume) matters rather little, as opposed to the number WALInsertLocks taken, which matter quite a bit.  Of course this depends quite a bit on your OS and hardware.
 

...

Another angle of attack would be to make wal-writing more efficient...

If you mean to do this without changing the xlog interfaces, I'm not optimistic. 

If you have to call XLogInsert once per row that is copied (or insert...select), then my experiments show that simply taking the WALInsertLock and immediately releasing it, doing absolutely no real work while it is held, is already a substanial multi-core scalibility bottleneck.  Once we accept that this must be done, the next existing bottleneck is the memcpy of the first byte from the rdata chain into the shared wal_buffers, presumably because this copy involves fighting the cache line away from other cores.  Once you've copied the first byte, the rest of them seem to be almost free.  (Again, this is probably hardware and situation dependent). 

I've seen some suggestions that the wal_buffer block initation work be moved from being done by AdvanceXLInsert to instead be done by XLogWrite.  However, I've not seen any indication that AdvanceXLInsert is a meaningful bottlneck in the first place.  Except when wal_buffers is too small: then AdvanceXLInsert is a bottleneck, but only because XLogWrite is getting called from within it, in which case moving work from one to the other is probably not going to make things better.

Jeff

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: updated join removal patch
Next
From: Dan Colish
Date:
Subject: Re: [PATCH] pgbench: new feature allowing to launch shell commands