Re: Batch update of indexes on data loading - Mailing list pgsql-hackers

From ITAGAKI Takahiro
Subject Re: Batch update of indexes on data loading
Date
Msg-id 20080226135216.60CF.52131E4D@oss.ntt.co.jp
Whole thread Raw
In response to Re: Batch update of indexes on data loading  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Batch update of indexes on data loading  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
Simon Riggs <simon@2ndquadrant.com> wrote:

> One of the reasons why I hadn't wanted to pursue earlier ideas to use
> LOCK was that applying a lock will prevent running in parallel, which
> ultimately may prevent further performance gains.
> 
> Is there a way of doing this that will allow multiple concurrent COPYs?

I think there is same difficulty as parallel queries. It requires tighter
communication among COPY threads whether we will use multi-process model
or multi-thread model.

We have independent concurrent COPYs now; COPYs are not aware of each
other because no intermediate status during COPY. However, COPY will
have "phases" if we use bulkbuild. Therefore, we will need joining
COPY threads and passing each working memories between threads.

Here is a possible multi-threaded workload:
 A. For each row:     1. Parsing new coming data     2. Add the row into the heap.     3. Spool index entries to each
indexspooler. B. Wait for all threads. C. Merge spools and corresponding existing indexes into new ones.
 

Phase A could be concurrently as same as now. A1 and A2 are independent
jobs. We could have shared spooler or per-thread spooler.
Phase B is needed to build indexes at once, or it will be double work.
Phase C could be concurrently for each indexes. A thread is responsible
to build one index. It merges the existing index and one shared spool
or multiple spools if we use per-thread spooler.

One of the issues is how to pass or share spoolers between COPY threads.
Another is how to make it transaction safe. If one of the thread fails to
build its index, all thread should be rollback.
I'm not sure how to do them...

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_dump additional options for performance
Next
From: "Joshua D. Drake"
Date:
Subject: Re: 8.3 / 8.2.6 restore comparison