Re: Bulk Insert tuning - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: Bulk Insert tuning
Date
Msg-id 200803032059.m23Kxe012260@momjian.us
Whole thread Raw
In response to Re: Bulk Insert tuning  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-patches
Added to TODO:

        o Consider using a ring buffer for COPY FROM

          http://archives.postgresql.org/pgsql-patches/2008-02/msg00140.php


---------------------------------------------------------------------------

Simon Riggs wrote:
> On Tue, 2008-02-26 at 15:12 -0500, Tom Lane wrote:
> > Simon Riggs <simon@2ndquadrant.com> writes:
> > > Following patch implements a simple mechanism to keep a buffer pinned
> > > while we are bulk loading.
> >
> > This will fail to clean up nicely after a subtransaction abort, no?
>
> Yes, will fix.
>
> > (For that matter I don't think it's right even for a top-level abort.)
> > And I'm pretty sure it will trash your table entirely if someone
> > inserts into another relation while a bulk insert is happening.
> > (Not at all impossible, think of triggers for instance.)
>
> The pinned buffer is separate from the preferred block for each
> relation; BulkInsertBuffer isn't used for determining the block to
> insert into. If you try to insert into a block that differs from the
> pinned one it unpins it and re-pins the new one. So it is always safe
> with respect to the data in the table.
>
> It can run into recursive bulk insert ops but that just destroys the
> performance advantage, its not actually dangerous.
>
> > >From a code structural point of view, we are already well past the
> > number of distinct options that heap_insert ought to have.  I was
> > thinking the other day that bulk inserts ought to use a ring-buffer
> > strategy to avoid having COPY IN trash the whole buffer arena, just
> > as we've taught COPY OUT not to.  So maybe a better idea is to
> > generalize BufferAccessStrategy to be able to handle write as well
> > as read concerns; or have two versions of it, one for writing and one
> > for reading.  In any case the point being to encapsulate all these
> > random little options in a struct, which could also carry along
> > state that needs to be saved across a series of inserts, such as
> > the last pinned buffer.
>
> That was actually my first thought when I realised recursive ops were
> possible. I don't think its necessary from a code correctness
> perspective but it might be an appropriate re-factoring considering
> those little bool-s seem to be breeding.
>
> I think we need two Strategy types since CTAS would need one of each.
> But then VACUUM is mid-way on that. Hmmm. Will consider.
>
> --
>   Simon Riggs
>   2ndQuadrant  http://www.2ndQuadrant.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: CopyReadLineText optimization
Next
From: Bruce Momjian
Date:
Subject: Re: Reference by output in : \d