Thread: slow inserts
I am currently using a Perl data loader that was set up to load data to three particular tables. The majority of this data is entered into one table. The scripts obtain the data by parsing it out of an Excel spreadsheet. My loads seem to take a very long time. Each file only has about 12,000 rows, yet it takes close to 25 minutes to load one file. I have placed some debugging syntax in the code and it seems that the extra time if related to postgres as I had originally thought it may have to do with the parsing of the Excel file.
I have tried turning off FSYNC but there is no difference in load time. I also tried removing the indexes from the table in which most of the data is loaded but no luck. By the way, is there anyway to confirm that the FSYNC option was turned off correctly? a way to view status or something?
Any thoughts on what might be going on? another performance tuning trick that I have not thought of?
Thanks for your help,
Jodi Kanter
_______________________________
Jodi L Kanter
BioInformatics Database Administrator
University of Virginia
(434) 924-2846
jkanter@virginia.edu
Jodi,
I believe DBI defaults to autocommitting queries, which slows it down. If you aren't specifically turning off the autocommit, I'd suggest it, which would make all the inserts go in one transaction which makes it much faster. You'd do something like:
DBI->connect("dbi:Pg:dbname=dbname;host=hostname", 'user', 'password', {AutoCommit => 0})
Thanks,
Peter Darley
Peter Darley
-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Jodi Kanter
Sent: Wednesday, March 20, 2002 11:19 AM
To: Postgres Admin List
Subject: [ADMIN] slow insertsI am currently using a Perl data loader that was set up to load data to three particular tables. The majority of this data is entered into one table. The scripts obtain the data by parsing it out of an Excel spreadsheet. My loads seem to take a very long time. Each file only has about 12,000 rows, yet it takes close to 25 minutes to load one file. I have placed some debugging syntax in the code and it seems that the extra time if related to postgres as I had originally thought it may have to do with the parsing of the Excel file.I have tried turning off FSYNC but there is no difference in load time. I also tried removing the indexes from the table in which most of the data is loaded but no luck. By the way, is there anyway to confirm that the FSYNC option was turned off correctly? a way to view status or something?Any thoughts on what might be going on? another performance tuning trick that I have not thought of?Thanks for your help,Jodi Kanter_______________________________
Jodi L Kanter
BioInformatics Database Administrator
University of Virginia
(434) 924-2846
jkanter@virginia.edu
Jodi- Have you tried turning autocommit off & doing a single commit after the load? -NickF -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Jodi Kanter Sent: Wednesday, March 20, 2002 2:19 PM To: Postgres Admin List Subject: [ADMIN] slow inserts I am currently using a Perl data loader that was set up to load data to three particular tables. The majority of this data is entered into one table. The scripts obtain the data by parsing it out of an Excel spreadsheet. My loads seem to take a very long time. Each file only has about 12,000 rows, yet it takes close to 25 minutes to load one file. I have placed some debugging syntax in the code and it seems that the extra time if related to postgres as I had originally thought it may have to do with the parsing of the Excel file. I have tried turning off FSYNC but there is no difference in load time. I also tried removing the indexes from the table in which most of the data is loaded but no luck. By the way, is there anyway to confirm that the FSYNC option was turned off correctly? a way to view status or something? Any thoughts on what might be going on? another performance tuning trick that I have not thought of? Thanks for your help, Jodi Kanter _______________________________ Jodi L Kanter BioInformatics Database Administrator University of Virginia (434) 924-2846 jkanter@virginia.edu
None of the data is actually committed to the database until the scripts complete so I believe that autocommit is turned off. ----- Original Message ----- From: "Nick Fankhauser" <nickf@ontko.com> To: "Jodi Kanter" <jkanter@virginia.edu>; "Postgres Admin List" <pgsql-admin@postgresql.org> Sent: Wednesday, March 20, 2002 2:42 PM Subject: Re: [ADMIN] slow inserts > Jodi- > > Have you tried turning autocommit off & doing a single commit after the > load? > > -NickF > > -----Original Message----- > From: pgsql-admin-owner@postgresql.org > [mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Jodi Kanter > Sent: Wednesday, March 20, 2002 2:19 PM > To: Postgres Admin List > Subject: [ADMIN] slow inserts > > > I am currently using a Perl data loader that was set up to load data to > three particular tables. The majority of this data is entered into one > table. The scripts obtain the data by parsing it out of an Excel > spreadsheet. My loads seem to take a very long time. Each file only has > about 12,000 rows, yet it takes close to 25 minutes to load one file. I have > placed some debugging syntax in the code and it seems that the extra time if > related to postgres as I had originally thought it may have to do with the > parsing of the Excel file. > > I have tried turning off FSYNC but there is no difference in load time. I > also tried removing the indexes from the table in which most of the data is > loaded but no luck. By the way, is there anyway to confirm that the FSYNC > option was turned off correctly? a way to view status or something? > > Any thoughts on what might be going on? another performance tuning trick > that I have not thought of? > > Thanks for your help, > Jodi Kanter > > _______________________________ > Jodi L Kanter > BioInformatics Database Administrator > University of Virginia > (434) 924-2846 > jkanter@virginia.edu > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Hi Jodi, > None of the data is actually committed to the database until the scripts > complete so I believe that autocommit is turned off. > what if you try to write the output of your script into a separate file and pipe it to a psql as input? What i mean is to strip of the processing time for the "excel-part". Still 25 minutes to do the job? We often insert data in the same amount (10.000 - 100.000 rows per job) within a few seconds/minutes. A few months ago I had the same problem "writing" a dbf-file from postgres-data: The select-statement took milliseconds, but the conversion into db-format seems to be endless. BTW: We also had a table (10.000s of rows / daily vacuumed) which was rather slow during inserts (PostgreSQL 7.1.2). After upgrading to version 7.1.3 and completely rebuild the tables, the problem went away. Hope it helps R. Luettecke -- MICHAEL TELECOM AG Bruchheide 34 - 49163 Bohmte Fon: +49 5471 806-0 rolf.luettecke@michael-telecom.de http://www.michael-telecom.de
Jodi Kanter wrote: > I am currently using a Perl data loader that was set up to load data to three particular > tables. (Snip) > I have placed some debugging syntax in the code and it seems that the extra time if > related to postgres as I had originally thought it may have to do with the parsing of > the Excel file. You don't mention it, but I assume you are using DBI/pg. You are sure you are setting up you insert qyery handlers once and then reuse them? >s setting up a query handler takes a lot of time. ie my $dbh=DBI->connect(dbi:Pg ...); my $insh = $dbh->prepare("Insert into table values (?,?,?)"; foreach ($excelrow){ parse; $insh->execute($data1,$data2,$data3); } I have written a few script of that kind my self, and I was really surprised how much it mattered when I managed to move a $dbi->prepare out of the insert loop. regards -- Morten Sickel Norwegian Radiation Protection Authority
<ZIP> > my $dbh=DBI->connect(dbi:Pg ...); > > my $insh = $dbh->prepare("Insert into table values (?,?,?)"; > $insh->begin_work; > foreach ($excelrow){ > parse; > $insh->execute($data1,$data2,$data3); > } $insh->commit; > > I have written a few script of that kind my self, and I was really > surprised how much it mattered when I managed to move a $dbi->prepare out > of the insert loop. Try to use transactions, that increates the speed too. Regards, Ferdinand