Home > mailing lists

Parallel index build during COPY - Mailing list pgsql-hackers

From	Jim C. Nasby
Subject	Parallel index build during COPY
Date	June 9, 2006 20:43:13
Msg-id	20060609204246.GG57289@pervasive.com Whole thread Raw
Responses	Re: Parallel index build during COPY
List	pgsql-hackers

Tree view

It's not uncommon for index creation to take a substantial amount of
time for loading data, even when using the 'trick' of loading the data
before building the indexes. On fast RAID arrays, it's also possible for
this to be a CPU-bound operation, so I've been wondering if there was
some reasonable way to parallelize it in the context of a restore from
pg_dump. Needless to say, that's a non-trivial proposition.

But the thought occured to me: why read from the table we just loaded
multiple times to create the indexes on it? If we're loading into an
empty table, we could feed newly created pages (or tuples) into sort
processes, one for each index. After the entire table is loaded, each
sort could then be finalized, and the appropriate index written out.
It's unclear if this would be a win on a small table, but not needing to
make multiple read passes over a large table would almost certainly be a
win.

If someone wants to hack up a patch to allow testing this, I can get
some benchmark numbers.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

pgsql-hackers by date:

From: "Jim C. Nasby"
Date: 09 June 2006, 20:27:52
Subject: Re: List schema contents

From: Tom Lane
Date: 09 June 2006, 20:55:19
Subject: Re: List schema contents

Parallel index build during COPY - Mailing list pgsql-hackers

Previous

Next