Thread: Batch copying of databases

Batch copying of databases

From
"Tim Mohler"
Date:
Hi all,

I'm new to Postgres, and am not even the DBA for the system. I'm just a
sysadmin trying to make things run faster. Every month, we copy over a 25
million row table from the production server to the reporting server. Total
size is something like 40 gigabytes.

The copy in takes close to 24 hours, and I see the disks being hammered by
hundreds of small writes every second. The system is mostly waiting on I/O.
Is there any facility in Postgres to force batching of the I/O transactions
to something more reasonable than 8K?

Thanks for any advice,
Tim


Re: Batch copying of databases

From
Josh Berkus
Date:
Tim,

> I'm new to Postgres, and am not even the DBA for the system. I'm just a
> sysadmin trying to make things run faster. Every month, we copy over a 25
> million row table from the production server to the reporting server. Total
> size is something like 40 gigabytes.

Are you doing this through COPY files, or some other means?

--
Josh Berkus
josh@agliodbs.com
Aglio Database Solutions
San Francisco

Re: Batch copying of databases

From
"Shridhar Daithankar"
Date:
On 5 Mar 2003 at 15:26, Tim Mohler wrote:

> Hi all,
>
> I'm new to Postgres, and am not even the DBA for the system. I'm just a
> sysadmin trying to make things run faster. Every month, we copy over a 25
> million row table from the production server to the reporting server. Total
> size is something like 40 gigabytes.
>
> The copy in takes close to 24 hours, and I see the disks being hammered by
> hundreds of small writes every second. The system is mostly waiting on I/O.
> Is there any facility in Postgres to force batching of the I/O transactions
> to something more reasonable than 8K?

Well, 8K has nothing to with transactions in postgresql.

You need to make sure at least two things.

1. You are using copy. By default postgresql writes each inserts in it's own
transaction which is seriously slow for bulk load. Copy bunches the rwos in a
single transaction and is quite fast.

if you need to preprocess the data, batch something like 1K-10K records in a
single transaction.

2. Postgresql bulk load is not as fast as many of us would like, especially
when compared to oracle. So if you know you are going to bulk load using say
copy, don't load the data from a single connection. Split the data file in say
5-10 parts and start loading all of them simaltaneously. It does speed up the
things. At least it certainly saturates the disk bandwidth which single load
does not do many times.

On a side note, for such a bulk load consider dropping any indexes and foreign
key contraints.

HTH


Bye
 Shridhar

--
Turnaucka's Law:    The attention span of a computer is only as long as its
electrical cord.