Toru SHIMOGAKI <shimogaki.toru@oss.ntt.co.jp> wrote:
> Andrew Dunstan wrote:
> > we could get a performance gain from building multiple indexes from a
> > single sequential pass over the base table?
>
> It is already implemented in pg_bulkload
> (http://pgbulkload.projects.postgresql.org/).
I think there are two ways to implement multiple index creation. 1. Add multiple indexes AFTER data loading. 2. Define
multipleindexes BEFORE data loading.
pg_bulkload uses the 2nd way, but the TODO item seems to target
the 1st, right? -- Both are useful, though.
| Allow multiple indexes to be created concurrently, ideally via a
| single heap scan, and have pg_restore use it
In either case, we probably need to renovate ambuild interface.
I'm thinking to reverse the control of heap sequential scans;
Seq scan is done in ambuild for now, but it will be controlled in
an external loop in the new method.
Define a new IndexBulder interface, something like: interface IndexBuilder { addTuple(IndexTuple tuple);
finishBuild(); }
and make ambuild() to return an IndexBuilder instance implemented in each AM.
However, it cannot use multiple CPUs if indexes are built in one process.
A less granular method might be better for Postgres, like synchronized scans,
as already pointed out.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center