Re: Inserting heap tuples in bulk in COPY - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Inserting heap tuples in bulk in COPY |
Date | |
Msg-id | CA+U5nMJ_fHuwFLDCDvCBJt1jsVw4oenn46zGwaXt0-FEQj85Jw@mail.gmail.com Whole thread Raw |
In response to | Re: Inserting heap tuples in bulk in COPY (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: Inserting heap tuples in bulk in COPY
Re: Inserting heap tuples in bulk in COPY |
List | pgsql-hackers |
On 7 August 2012 20:58, Jeff Janes <jeff.janes@gmail.com> wrote: > On Fri, Aug 12, 2011 at 2:59 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> On 13.08.2011 00:17, Simon Riggs wrote: >>> >>> Also, we discussed that you would work on buffering the index inserts, >>> which is where the main problem lies. The main heap is only a small >>> part of the overhead if we have multiple indexes already built on a >>> table - which is the use case that causes the most problem. >> >> >> Sure, if you have indexes on the table already, then the overhead of >> updating them is more significant. I am actually working on that too, I will >> make a separate post about that. > > Hi Heikki, > > Is the bulk index insert still an active area for you? > > If not, is there some kind of summary of design or analysis work > already done, which would be a useful for someone else interested in > picking it up? The main cost comes from repeated re-seeking down the index tree to find the insertion point, but repeated lock and pin operations on index buffers are also high. That is optimisable if the index inserts are grouped, as they will be with monotonic indexes, (e.g. SERIAL), or with partial monotonic (i.e. with Parent/Child relationship, on Child table many inserts will be of the form (x,1), (x,2), (x, 3) etc, e.g. Order/OrderLine). All we need do is buffer the inserts in the inserts, before inserting them into the main index. As long as we flush the buffer before end of transaction, we're good. Incidentally, we can also optimise repeated inserts within a normal transaction using this method, by implementing deferred unique constraints. At present we say that unique constraints aren't deferrable, but there's no reason they can't be, if we allow buffering in the index. (Implementing a deferred constraint that won't fail if we do UPDATE foo SET pk = pk+1 requires a buffer the size of foo, which is probably a bad plan, plus you'd need to sort the inputs, so that particular nut is another issue altogether, AFAICS). Suggested implementation is to buffer index tuples at the generic index API, then implement a bulk insert index API call that can then be implemented for each AM separately. Suggested buffer size is work_mem. Note we must extract index tuple from heap tuples - refetching heap blocks to get rows is too costly. I think we need to implement buffering both to end of statement or end of transaction, not just one or the other. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: