Re: Bulk Insert tuning - Mailing list pgsql-patches

From Simon Riggs
Subject Re: Bulk Insert tuning
Date
Msg-id 1204061796.4252.453.camel@ebony.site
Whole thread Raw
In response to Re: Bulk Insert tuning  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Bulk Insert tuning  (Bruce Momjian <bruce@momjian.us>)
Re: Bulk Insert tuning  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-patches
On Tue, 2008-02-26 at 15:12 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Following patch implements a simple mechanism to keep a buffer pinned
> > while we are bulk loading.
>
> This will fail to clean up nicely after a subtransaction abort, no?

Yes, will fix.

> (For that matter I don't think it's right even for a top-level abort.)
> And I'm pretty sure it will trash your table entirely if someone
> inserts into another relation while a bulk insert is happening.
> (Not at all impossible, think of triggers for instance.)

The pinned buffer is separate from the preferred block for each
relation; BulkInsertBuffer isn't used for determining the block to
insert into. If you try to insert into a block that differs from the
pinned one it unpins it and re-pins the new one. So it is always safe
with respect to the data in the table.

It can run into recursive bulk insert ops but that just destroys the
performance advantage, its not actually dangerous.

> >From a code structural point of view, we are already well past the
> number of distinct options that heap_insert ought to have.  I was
> thinking the other day that bulk inserts ought to use a ring-buffer
> strategy to avoid having COPY IN trash the whole buffer arena, just
> as we've taught COPY OUT not to.  So maybe a better idea is to
> generalize BufferAccessStrategy to be able to handle write as well
> as read concerns; or have two versions of it, one for writing and one
> for reading.  In any case the point being to encapsulate all these
> random little options in a struct, which could also carry along
> state that needs to be saved across a series of inserts, such as
> the last pinned buffer.

That was actually my first thought when I realised recursive ops were
possible. I don't think its necessary from a code correctness
perspective but it might be an appropriate re-factoring considering
those little bool-s seem to be breeding.

I think we need two Strategy types since CTAS would need one of each.
But then VACUUM is mid-way on that. Hmmm. Will consider.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: Bulk Insert tuning
Next
From: Euler Taveira de Oliveira
Date:
Subject: Re: lc_time and localized dates