Re: Batch update of indexes on data loading - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Batch update of indexes on data loading
Date
Msg-id 1203840864.4247.2.camel@ebony.site
Whole thread Raw
In response to Batch update of indexes on data loading  (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
Responses Re: Batch update of indexes on data loading  (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
List pgsql-hackers
On Thu, 2008-02-21 at 13:26 +0900, ITAGAKI Takahiro wrote:
> This is a proposal of fast data loading using batch update of indexes for 8.4.
> It is a part of pg_bulkload (http://pgbulkload.projects.postgresql.org/) and
> I'd like to integrate it in order to cooperate with other parts of postgres.
> 
> The basic concept is spooling new coming data, and merge the spool and
> the existing indexes into a new index at the end of data loading. It is 
> 5-10 times faster than index insertion per-row, that is the way in 8.3.
> 
> 
> One of the problem is locking; Index building in bulkload is similar to
> REINDEX rather than INSERT, so we need ACCESS EXCLUSIVE LOCK during it.
> Bulkloading is not a upper compatible method, so I'm thinking about
> adding a new "WITH LOCK" option for COPY command.
> 
>   COPY tbl FROM 'datafile' WITH LOCK;
> 

I'm very excited to see these concepts going into COPY.

One of the reasons why I hadn't wanted to pursue earlier ideas to use
LOCK was that applying a lock will prevent running in parallel, which
ultimately may prevent further performance gains.

Is there a way of doing this that will allow multiple concurrent COPYs?

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



pgsql-hackers by date:

Previous
From: "Florian G. Pflug"
Date:
Subject: Re: Behaviour of rows containg not-null domains in plpgsql
Next
From: Mark Mielke
Date:
Subject: insert ... delete ... returning ... ?