Re: Using multi-row technique with COPY - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Using multi-row technique with COPY
Date
Msg-id 1133298645.2906.426.camel@localhost.localdomain
Whole thread Raw
In response to Re: Using multi-row technique with COPY  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, 2005-11-28 at 09:26 -0500, Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > I don't see why couldn't have an additional index access method entry
> > point to insert multiple rows on one call.
> 
> I think Simon was mainly on about the idea of inserting multiple *heap*
> entries with one call, ie, only one cycle of locking a heap buffer.
> It's not apparent to me that there'd be any noticeable win from batching
> index insertions, because even if the heap entries are all on one page,
> it's unlikely their index entries would be.

It might be a valid idea for monotonically increasing key indexes, but
I'll leave that idea alone for now. Tom's right: I was discussing heap
blocks only (for now).

> I'm a bit dubious about the whole thing from a concurrency standpoint,
> too: holding locks longer is not a great thing in general.  The recent
> patch to avoid repeated locking cycles during a read seems OK, because
> (1) checking visibility of a tuple is ordinarily pretty cheap, and
> (2) it's only a shared lock so other people can be doing the same thing
> concurrently.  However, heap insertion can make neither of those claims.
> You're talking about an exclusive lock, and you're talking about holding
> it while copying data around and then making multiple WAL entries.

I wasn't talking about holding locks for any longer than normal.

Heres the rough code:

heap_bulk_insert (tuple, bool needsToasting)
{/* if tuple needs toasting, unpick the buffer */if (needsToasting){    for (i=0; i < nTuplesBuffered; i++)
heap_insert(tupleBuffer[i]);   heap_insert(tuple);    nTuplesBuffered = 0;    sizeTupleBuffer = 0;    return;}
 
/* if buffer overflows, write the whole tuplebuffer in one * go to a freshly allocated shared_buffer/table block * so
thatwe do only one buffer manager lock pair per block * rather than one per row */tuple_length = length(tuple);if
((length(tuple)+ sizeTupleBuffer) > MAXDATAINBLOCK){    RelationGetBufferForTupleArray(nTuplesBuffered,
   tupleBuffer);    tupleBuffer[0] = tuple;    nTuplesBuffered = 1;    sizeTupleBuffer = tuple_length;}else{
tupleBuffer[++nTuplesBuffered]= tuple;    sizeTupleBuffer += tuple_length;}
 
return;
}
in heapam.c

(needsToasting would be set for a tuple during parsing by COPY, which is
fine since it does actually know how long columns are. Actual decision
to toast or not can be wrapped in a modular call to heapam.c)

RelationGetBufferForTupleArray() in hio.c
very similar to RelationGetBufferForTupleArray but less complex because
it never needs to handle the Update case.
If we do choose to write WAL records for these inserts, then we can
simply log the whole block, rather than making repeated individual
inserts into WALbuffers, so we save on WALInsertLock overhead too. Thats
no more than we do after a checkpoint anyway, so no problems.

If the transaction fails, then we lose the buffer: so what? Slightly
better than writing them to the table then failing.    
Most importantly, we don't actually allocate a new data block until we
decide to flush the tupleBuffer. So the exclusive lock is only held
momentarily while we insert all the rows from the tupleBuffer into the
fresh block.

There's no modularity wierdness, but we do need some code in COPY to
decide whether it should use fast mode or not. 

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Using multi-row technique with COPY
Next
From: Gavin Sherry
Date:
Subject: Re: ice-broker scan thread