On 5 Mar 2003 at 15:26, Tim Mohler wrote:
> Hi all,
>
> I'm new to Postgres, and am not even the DBA for the system. I'm just a
> sysadmin trying to make things run faster. Every month, we copy over a 25
> million row table from the production server to the reporting server. Total
> size is something like 40 gigabytes.
>
> The copy in takes close to 24 hours, and I see the disks being hammered by
> hundreds of small writes every second. The system is mostly waiting on I/O.
> Is there any facility in Postgres to force batching of the I/O transactions
> to something more reasonable than 8K?
Well, 8K has nothing to with transactions in postgresql.
You need to make sure at least two things.
1. You are using copy. By default postgresql writes each inserts in it's own
transaction which is seriously slow for bulk load. Copy bunches the rwos in a
single transaction and is quite fast.
if you need to preprocess the data, batch something like 1K-10K records in a
single transaction.
2. Postgresql bulk load is not as fast as many of us would like, especially
when compared to oracle. So if you know you are going to bulk load using say
copy, don't load the data from a single connection. Split the data file in say
5-10 parts and start loading all of them simaltaneously. It does speed up the
things. At least it certainly saturates the disk bandwidth which single load
does not do many times.
On a side note, for such a bulk load consider dropping any indexes and foreign
key contraints.
HTH
Bye
Shridhar
--
Turnaucka's Law: The attention span of a computer is only as long as its
electrical cord.