Thread: Most effective and fast way to load few Tbyte of data from flat files into postgresql
Most effective and fast way to load few Tbyte of data from flat files into postgresql
From
Dirk Krautschick
Date:
Hi, what would be the fastest or most effective way to load few (5-10) TB of data from flat files into a postgresql database, including some 1TB tables and blobs? There is the copy command but there is no way for native parallelism, right? I have found pg_bulkload but haven't tested it yet. As far I can see EDB has its EDB*Loader as a commercial option. Anything else to recommend? Thanks and best regards Dirk
Re: Most effective and fast way to load few Tbyte of data from flat files into postgresql
From
"Peter J. Holzer"
Date:
On 2020-08-24 21:17:36 +0000, Dirk Krautschick wrote: > what would be the fastest or most effective way to load few (5-10) TB > of data from flat files into a postgresql database, including some 1TB > tables and blobs? > > There is the copy command but there is no way for native parallelism, > right? I have found pg_bulkload but haven't tested it yet. As far I > can see EDB has its EDB*Loader as a commercial option. A single COPY isn't parallel, but you can run several of them in parallel (that's what pg_restore -j N does). So the total time may be dominated by your largest table (or I/O bandwidth). hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
Re: [E] Re: Most effective and fast way to load few Tbyte of data from flat files into postgresql
From
"Saha, Sushanta K"
Date:
You can explore "pgloader" also.
.... Sushanta
On Tue, Aug 25, 2020 at 7:24 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2020-08-24 21:17:36 +0000, Dirk Krautschick wrote:
> what would be the fastest or most effective way to load few (5-10) TB
> of data from flat files into a postgresql database, including some 1TB
> tables and blobs?
>
> There is the copy command but there is no way for native parallelism,
> right? I have found pg_bulkload but haven't tested it yet. As far I
> can see EDB has its EDB*Loader as a commercial option.
A single COPY isn't parallel, but you can run several of them in
parallel (that's what pg_restore -j N does). So the total time may be
dominated by your largest table (or I/O bandwidth).
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
Sushanta Saha|MTS IV-Cslt-Sys Engrg|WebIaaS_DB Group|HQ - VerizonWireless
O 770.797.1260 C 770.714.6555 Iaas Support Line 949-286-8810
Re: Most effective and fast way to load few Tbyte of data from flat files into postgresql
From
Shaozhong SHI
Date:
On Tue, 25 Aug 2020 at 12:24, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2020-08-24 21:17:36 +0000, Dirk Krautschick wrote:
> what would be the fastest or most effective way to load few (5-10) TB
> of data from flat files into a postgresql database, including some 1TB
> tables and blobs?
>
> There is the copy command but there is no way for native parallelism,
> right? I have found pg_bulkload but haven't tested it yet. As far I
> can see EDB has its EDB*Loader as a commercial option.
A single COPY isn't parallel, but you can run several of them in
parallel (that's what pg_restore -j N does). So the total time may be
dominated by your largest table (or I/O bandwidth).
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
This topic is interesting. Any examples for parallel copy?
Regards,
SS