psql vs perl prepared inserts - Mailing list pgsql-general

From Matt Van Mater
Subject psql vs perl prepared inserts
Date
Msg-id 424db35505041214582ebef95d@mail.gmail.com
Whole thread Raw
Responses Re: psql vs perl prepared inserts  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: psql vs perl prepared inserts  (Dawid Kuroczko <qnex42@gmail.com>)
List pgsql-general
I've been experimenting with loading a large amount of data into a
fairly simple database using both psql and perl prepared statements.
Unfortunately I'm seeing no appreciable differences between the two
methods, where I was under the impression that prepared statements
should be much faster (in my case, they are slightly slower).

I have included a pseudocode/subset of the perl code I use below.  You
can see the prepare statement outside the loop and the execute
statement inside the loop.  Alternatively you can see that I write
every INSERT statement to a text file which I then load by using `psql
dbname -f bulksql_load.sql`.  Normally I only have either the prepare
or the print-to-file in the loop, but i've included both to make the
pseudocode smaller.

Using a simple `time` command from my system it looks like the
execution time for loading all the data in both scenarios is about 50
seconds.  FYI, the infile has 18k lines, and after parsing and the db
enforcing uniqueness there are 15.5k rows in the results table of the
db.  This works out to ~300 inserts per second with on pgsql 7.3.2
with fsync turned off.  I think that is a decent performance for this
old box, I'm just confused as to why the prepared statements don't
seem to give any speed boost as advertised.

Could the fact that many of my inserts have 15 columns slow down the
prepared statements to the point where they're no longer useful as a
speed enhancement?  Or perhaps it's because I'm explicitly inserting
each field/value pair, even if many of them are null (which I think is
generally considered a good habit).  Any other suggestions you might
have would be welcome.  Please forgive me if I should have posted this
to pgsql-performance or some perl list, but I think the comparison
with psql makes it relevant.


Hopefully this won't be too messy:

#!/usr/bin/perl

open (IN,"scan.nbe");
open (OUT,">bulksql_load.sql");


use DBI;
$dbh = DBI->connect("dbi:Pg:dbname=nessus");
$sth_res  = $dbh->prepare("INSERT INTO results

(scan_id,subnet,host,service,plugin_id,type,description,see_also,solution,risk_factor,cve_id,bid_id,port,proto,service_name)
VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)");
$sth_tim  = $dbh->prepare("INSERT INTO timestamps
(scan_id,hostip_or_nul,start_or_end,scan_time) VALUES (?,?,?,?)");

while (<IN>){

    if (/^timestamps/){
        parse_ts();

        $sth_tim->execute($scan_id,$hostip_or_nul,$start_or_end,$scan_time);

        print OUT "INSERT INTO timestamps
(scan_id,hostip_or_nul,start_or_end,scan_time) VALUES
($scan_id,$hostip_or_nul,$start_or_end,$scan_time)\n";

    } elsif (/^results/) {
        parse_res();


$sth_res->execute($scan_id,$subnet,$host,$service,$plugin_id,$type,$description,$see_also,$solution,$risk_factor,$cve_id,$bid_id,$port,$proto,$service_name);

        print OUT "INSERT INTO results

(scan_id,subnet,host,service,plugin_id,type,description,see_also,solution,risk_factor,cve_id,bid_id,port,proto,service_name)
VALUES
($scan_id,$subnet,$host,$service,$plugin_id,$type,$description,$see_also,$solution,$risk_factor,$cve_id,$bid_id,$port,$proto,$service_name)\n";

    }
}

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Composite type versus Domain constraints.
Next
From: Tom Lane
Date:
Subject: Re: psql vs perl prepared inserts