Thread: Array interface
I wrote a little Perl script, intended to test the difference that array insert makes with PostgreSQL. Imagine my surprise when a single record insert into a local database was faster than batches of 100 records. Here are the two respective routines: sub do_ssql { my $exec_cnt = 0; while (<FL>) { chomp; my @row = split /$sep/; $sth->execute(@row); $exec_cnt++; } $dbh->commit(); print "Insert executed $exec_cnt times.\n"; } sub do_msql { my $bsz = shift; die("Batch size must be >0!\n") unless $bsz > 0; my $exec_cnt = 0; my @tstat; my (@col1, @col2, @col3); while (<FL>) { chomp; my @row = split /$sep/; push @col1, $row[0]; push @col2, $row[1]; push @col3, $row[2]; if ($. % $bsz == 0) { my $tuples = $sth->execute_array({ArrayTupleStatus => \@tstat}, \@col1, \@col2, \@col3); die("Multiple insert failed!\n") if (!$tuples); @col1 = (); @col2 = (); @col3 = (); $exec_cnt++; } } if ($#col1 >= 0) { my $tuples = $sth->execute_array({ArrayTupleStatus => \@tstat}, \@col1, \@col2, \@col3); die("Multiple insert failed!\n") if (!$tuples); $exec_cnt++; } $dbh->commit(); print "Insert executed $exec_cnt times.\n"; } The variable "$sth" is a prepared statement handle. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions
On 03/11/10 08:46, Mladen Gogala wrote: > I wrote a little Perl script, intended to test the difference that > array insert makes with PostgreSQL. Imagine my surprise when a single > record insert into a local database was faster than batches of 100 > records. Here are the two respective routines: Interesting - I'm seeing a modest but repeatable improvement with bigger array sizes (using attached program to insert pgbench_accounts) on an older dual core AMD box with a single SATA drive running Ubuntu 10.04 i686. rows arraysize elapsed(s) 1000000 1 161 1000000 10 115 1000000 100 110 1000000 1000 109 This is *despite* the fact that tracing the executed sql (setting log_min_duration_statement = 0) shows that there is *no* difference (i.e 1000000 INSERT executions are performed) for each run. I'm guessing that some perl driver overhead is being saved here. I'd be interested to see if you can reproduce the same or similar effect. What might also be interesting is doing each INSERT with an array-load of bind variables appended to the VALUES clause - as this will only do 1 insert call per "array" of values. Cheers Mark
Attachment
On 10/11/10 22:10, Mark Kirkwood wrote: > > What might also be interesting is doing each INSERT with an array-load > of bind variables appended to the VALUES clause - as this will only do > 1 insert call per "array" of values. This is probably more like what you were expecting: rows num values tuples(i.e array size) elapsed 1000000 1 106 1000000 10 14 1000000 100 13 1000000 1000 14 I didn't try to use PREPARE + EXECUTE here, just did "do" with the INSERT + array size number of VALUES tuples (execute could well be faster). The obvious difference here is we only do rows/(array size) number of insert calls. Cheers Mark