Re: Bulk Insert tuning - Mailing list pgsql-patches

From Simon Riggs
Subject Re: Bulk Insert tuning
Date
Msg-id 1206538121.4285.1181.camel@ebony.site
Whole thread Raw
In response to Re: Bulk Insert tuning  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-patches
On Tue, 2008-02-26 at 15:12 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Following patch implements a simple mechanism to keep a buffer pinned
> > while we are bulk loading.
>
> This will fail to clean up nicely after a subtransaction abort, no?
> (For that matter I don't think it's right even for a top-level abort.)
> And I'm pretty sure it will trash your table entirely if someone
> inserts into another relation while a bulk insert is happening.
> (Not at all impossible, think of triggers for instance.)
>
> >From a code structural point of view, we are already well past the
> number of distinct options that heap_insert ought to have.  I was
> thinking the other day that bulk inserts ought to use a ring-buffer
> strategy to avoid having COPY IN trash the whole buffer arena, just
> as we've taught COPY OUT not to.

Agree with that. That was mentioned here again
http://archives.postgresql.org/pgsql-hackers/2008-02/msg01080.php

What do you think of the "Full Block List" idea?

> So maybe a better idea is to
> generalize BufferAccessStrategy to be able to handle write as well
> as read concerns; or have two versions of it, one for writing and one
> for reading.  In any case the point being to encapsulate all these
> random little options in a struct, which could also carry along
> state that needs to be saved across a series of inserts, such as
> the last pinned buffer.

I'm trying to implement this, but it begins to look quite ugly.

First, if we allow multiple BulkInsertBuffers then we have to remember
them all in a list so we can unpin them all in case of abort. The change
isn't needed for correctness, as explained before.

Second, we need multiple BufferIOStrategy objects for various purposes.
There is no single default strategy, since normal inserts, normal
updates and toast all have different default behaviour. That makes it
ugly because we either need to differentiate between update/insert and
toast/main inserts - which leads to just as many options, or we need to
have lots of statically defined BufferIOStrategy objects for various
purposes.

I did agree that it was sensible to try to refactor the code, but now it
just looks like a big pile of ugly changes for no benefit. I'll save my
WIP so we can judge.

Coding it using flags that can be ORd together makes more sense than a
list of bools and will be easier to read. I'm going to try that approach
instead.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


pgsql-patches by date:

Previous
From: "Heikki Linnakangas"
Date:
Subject: Re: Consistent \d commands in psql
Next
From: "Heikki Linnakangas"
Date:
Subject: Re: \password in psql help