Home > mailing lists

Re: Breaking up a PostgreSQL COPY command into chunks? - Mailing list pgsql-general

From	Victor Hooi
Subject	Re: Breaking up a PostgreSQL COPY command into chunks?
Date	November 8, 2013 11:28:55
Msg-id	CAMnnoULRY8gdDboH=Z_cdsxVyRC5C7DuCMdb0nFjm8qzaEHOMw@mail.gmail.com Whole thread Raw
In response to	Re: Breaking up a PostgreSQL COPY command into chunks? (Francisco Olarte <folarte@peoplecall.com>)
Responses	Re: Breaking up a PostgreSQL COPY command into chunks?
List	pgsql-general

Tree view

Hi,

@Francisco - Yeah, the file is around 600 Mb currently, uncompressed.

You're right, our internet connection is going to be the limiting factor.

Essentially, the PostgreSQL server is in a datacentre, the server we're dumping to is in the office.

Running a script on the PostgreSQL server in the datacentre is going to be tricky (not so much technically, just from a procedures/security point of view).

Dumping to a spare table seems like an interesting point - so we'd just create the table, COPY the results to that table, then use LIMIT/OFFSET to paginate through that, then drop the table afterwards?

Currently, I'm doing a quick hack where we download an ordered list of the ids (auto-incrementing integer) into Python, chunk it up into groups of ids, then use a WHERE IN clause to download each chunk via COPY.

Would dumping to a spare table and paginating a better approach? Reasons? (Not challenging it, I just want to understand everything).

Cheers,

Victor

On Fri, Nov 8, 2013 at 6:36 PM, Francisco Olarte <folarte@peoplecall.com> wrote:

On Fri, Nov 8, 2013 at 5:09 AM, Victor Hooi <victorhooi@yahoo.com> wrote:
> They think that it might be limited by the network, and how fast the
> PostgreSQL server can push the data across the internet. (The Postgres
> server and the box running the query are connected over the internet).

You previously said you had 600Mb. Over the internet. ¿ Is it a very
fat pipe ? Because otherwise the limitng factor is probably not the
speed at which postgres can push the resuts, but he throughput of your
link.

If, as you stated, you need a single transaction to get a 600Mb
snapshot I would recommend to dump it to disk, compressing on the fly
( you should get easily four o five fold reduction on a CSV file using
any decent compressor ), and then send the file. If you do not have
disk for the dump but can run programs near the server, you can try
compressing on the fly. If you have got none of this but have got
space for a spare table, use a select into, paginate this output and
drop it after. Or just look at the configs and set longer query times,
if your app NEEDS two hour queries, they can be enabled. But anyway,
doing a long transaction over the internet does not seem like a good
idea to me.

Francisco Olarte

pgsql-general by date:

From: Raphael Bauduin
Date: 08 November 2013, 11:03:26
Subject: Re: problem with partitioned table and indexed json field

From: si24
Date: 08 November 2013, 12:08:32
Subject: Re: changing port numbers so pgbouncer can read geoserver and postgres

Re: Breaking up a PostgreSQL COPY command into chunks? - Mailing list pgsql-general

Previous

Next