Thread: Most effective and fast way to load few Tbyte of data from flat files into postgresql

Most effective and fast way to load few Tbyte of data from flat files into postgresql

From

Dirk Krautschick

Date:

24 August 2020, 21:17:36

Hi,

what would be the fastest or most effective way to load few (5-10) TB of data from flat files into
a postgresql database, including some 1TB tables and blobs?

There is the copy command but there is no way for native parallelism, right? I have found pg_bulkload
but haven't tested it yet. As far I can see EDB has its EDB*Loader as a commercial option.

Anything else to recommend?

Thanks and best regards

Dirk

Re: Most effective and fast way to load few Tbyte of data from flat files into postgresql

From

"Peter J. Holzer"

Date:

25 August 2020, 11:24:00

On 2020-08-24 21:17:36 +0000, Dirk Krautschick wrote:
> what would be the fastest or most effective way to load few (5-10) TB
> of data from flat files into a postgresql database, including some 1TB
> tables and blobs?
>
> There is the copy command but there is no way for native parallelism,
> right? I have found pg_bulkload but haven't tested it yet. As far I
> can see EDB has its EDB*Loader as a commercial option.

A single COPY isn't parallel, but you can run several of them in
parallel (that's what pg_restore -j N does). So the total time may be
dominated by your largest table (or I/O bandwidth).

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

signature.asc

Re: [E] Re: Most effective and fast way to load few Tbyte of data from flat files into postgresql

From

"Saha, Sushanta K"

Date:

25 August 2020, 12:18:08

You can explore "pgloader" also.

.... Sushanta

On Tue, Aug 25, 2020 at 7:24 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

On 2020-08-24 21:17:36 +0000, Dirk Krautschick wrote:
> what would be the fastest or most effective way to load few (5-10) TB
> of data from flat files into a postgresql database, including some 1TB
> tables and blobs?
>
> There is the copy command but there is no way for native parallelism,
> right? I have found pg_bulkload but haven't tested it yet. As far I
> can see EDB has its EDB*Loader as a commercial option.

A single COPY isn't parallel, but you can run several of them in
parallel (that's what pg_restore -j N does). So the total time may be
dominated by your largest table (or I/O bandwidth).

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

Sushanta Saha|MTS IV-Cslt-Sys Engrg|WebIaaS_DB Group|HQ - VerizonWireless
O 770.797.1260 C 770.714.6555 Iaas Support Line 949-286-8810

Re: Most effective and fast way to load few Tbyte of data from flat files into postgresql

From

Shaozhong SHI

Date:

27 August 2020, 19:13:44

On Tue, 25 Aug 2020 at 12:24, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:

On 2020-08-24 21:17:36 +0000, Dirk Krautschick wrote:
> what would be the fastest or most effective way to load few (5-10) TB
> of data from flat files into a postgresql database, including some 1TB
> tables and blobs?
>
> There is the copy command but there is no way for native parallelism,
> right? I have found pg_bulkload but haven't tested it yet. As far I
> can see EDB has its EDB*Loader as a commercial option.

A single COPY isn't parallel, but you can run several of them in
parallel (that's what pg_restore -j N does). So the total time may be
dominated by your largest table (or I/O bandwidth).

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

This topic is interesting. Any examples for parallel copy?

Regards,