Home > mailing lists

Re: pg_dump additional options for performance - Mailing list pgsql-hackers

From	Dimitri Fontaine
Subject	Re: pg_dump additional options for performance
Date	February 27, 2008 06:19:38
Msg-id	200802271119.28655.dfontaine@hi-media.com Whole thread Raw
In response to	Re: pg_dump additional options for performance ("Joshua D. Drake" <jd@commandprompt.com>)
List	pgsql-hackers

Tree view

Le mardi 26 février 2008, Joshua D. Drake a écrit :
> > Think 100GB+ of data that's in a CSV or delimited file.  Right now
> > the best import path is with COPY, but it won't execute very fast as
> > a single process.  Splitting the file manually will take a long time
> > (time that could be spend loading instead) and substantially increase
> > disk usage, so the ideal approach would figure out how to load in
> > parallel across all available CPUs against that single file.
>
> You mean load from position? That would be very, very cool.

Did I mention pgloader now does exactly this when configured like
this:http://pgloader.projects.postgresql.org/dev/pgloader.1.html#_parallel_loadingsection_threads = N
split_file_reading= True 

IIRC, Simon and Greg Smith asked for pgloader to get those parallel loading
features in order to have some first results and ideas about the performance
gain, as a first step in the parallel COPY backend implementation design.

Hope this helps,
--
dim

pgsql-hackers by date:

From: Magnus Hagander
Date: 27 February 2008, 06:06:08
Subject: Re: win32 build problem (cvs, msvc 2005 express)

From: Simon Riggs
Date: 27 February 2008, 06:47:42
Subject: Re: An idea for parallelizing COPY within one backend

Re: pg_dump additional options for performance - Mailing list pgsql-hackers

Previous

Next