Thread: postgresql multiple insert slow

postgresql multiple insert slow

From
"Michael L. Hostbaek"
Date:
Hello, 


I've got a table in an oracle database with approx. 100000 records, that
I'd like to put into a table in a postgresql database. (This should be
done a couple of times per week)

I have written a short perl script, on a server that has remote access
to both the oracle database as well as the postgresql database. I am
running postgresql 7.4.1 on FreeBSD.

My perl script looks something like this:

[...]
my $sth2 = $cnx2->prepare('SELECT * FROM oracle_table');
my $res2 = $sth2->execute();

while(my($field2,$field5,$field6) = ($sth2->fetchrow_array)) {if(defined($field2)) {    my $sth = $cnx->prepare('INSERT
INTO
the_pg_table(field1, field2) VALUES(?,?)');    my $result = $sth->execute($field2,$field5);    $sth->finish;
}
}
[...]

I runs fine - and I get no errors - but it takes almost 25 minutes to
complete.. I tried running the script while just grabbing the rows from
the oracle database and writing to a text file - and then it only takes
a couple of minutes .. So it must be the INSERT command that chokes - is
there a better way to do it ? 

Any advise much appreciated.

/mich


-- 
Best Regards,Michael L. Hostbaek 
*/ PGP-key available upon request /*


Re: postgresql multiple insert slow

From
Tom Lane
Date:
"Michael L. Hostbaek" <mich@freebsdcluster.org> writes:
> I runs fine - and I get no errors - but it takes almost 25 minutes to
> complete.. I tried running the script while just grabbing the rows from
> the oracle database and writing to a text file - and then it only takes
> a couple of minutes .. So it must be the INSERT command that chokes - is
> there a better way to do it ? 

You probably want to add BEGIN/COMMIT operations around the loop.  See
the documentation's tips on bulk data loading:
http://www.postgresql.org/docs/7.4/static/populate.html
        regards, tom lane


Re: postgresql multiple insert slow

From
Christopher Browne
Date:
Centuries ago, Nostradamus foresaw when mich@freebsdcluster.org ("Michael L. Hostbaek") would write:
> I runs fine - and I get no errors - but it takes almost 25 minutes to
> complete.. I tried running the script while just grabbing the rows from
> the oracle database and writing to a text file - and then it only takes
> a couple of minutes .. So it must be the INSERT command that chokes - is
> there a better way to do it ? 
>
> Any advise much appreciated.

The problem here is that each insert implicitly sets up a transaction
BEGIN and COMMIT.

If you were to wrap the loop with a BEGIN at the start and a COMMIT at
the end, you would likely find it would complete much, much faster.
-- 
output = ("cbbrowne" "@" "acm.org")
http://www3.sympatico.ca/cbbrowne/wp.html
"Java and C++ make you think that the new ideas are like the old ones.
Java is the most distressing thing to hit computing since MS-DOS."
-- Alan Kay


Re: postgresql multiple insert slow

From
Harald Fuchs
Date:
In article <20040219163238.GD10913@mich2.itxmarket.com>,
"Michael L. Hostbaek" <mich@freebsdcluster.org> writes:

> Hello, 
> I've got a table in an oracle database with approx. 100000 records, that
> I'd like to put into a table in a postgresql database. (This should be
> done a couple of times per week)

> I have written a short perl script, on a server that has remote access
> to both the oracle database as well as the postgresql database. I am
> running postgresql 7.4.1 on FreeBSD.

> My perl script looks something like this:

> [...]
> my $sth2 = $cnx2->prepare('SELECT * FROM oracle_table');
> my $res2 = $sth2->execute();

> while(my($field2,$field5,$field6) = ($sth2->fetchrow_array)) {
>     if(defined($field2)) {
>         my $sth = $cnx->prepare('INSERT INTO
> the_pg_table(field1, field2) VALUES(?,?)');
>         my $result = $sth->execute($field2,$field5);
>         $sth->finish;

>     }
> }
> [...]

> I runs fine - and I get no errors - but it takes almost 25 minutes to
> complete.. I tried running the script while just grabbing the rows from
> the oracle database and writing to a text file - and then it only takes
> a couple of minutes .. So it must be the INSERT command that chokes - is
> there a better way to do it ? 

First of all, you should prepare the insert statement only once,
outside of the loop.  Then you could use fetchrow_arrarref instead of
fetchrow_array; this should eliminate a copy operation.

But the biggest win would be not to use INSERT at all.  Instruct
Oracle to dump the rows into a CSV file, and then do just

$cnx->do ("COPY the_pg_table FROM 'csv.file'")