Thread: Array interface

Array interface

From
Mladen Gogala
Date:
I wrote a little Perl script, intended to test the difference that array
insert makes with PostgreSQL. Imagine my surprise when a single  record
insert into a local database was faster than batches of 100 records.
Here are the two respective routines:

sub do_ssql
{
    my $exec_cnt = 0;
    while (<FL>)
    {
        chomp;
        my @row = split /$sep/;
        $sth->execute(@row);
        $exec_cnt++;
    }
    $dbh->commit();
    print "Insert executed $exec_cnt times.\n";
}

sub do_msql
{
    my $bsz = shift;
    die("Batch size must be >0!\n") unless $bsz > 0;
    my $exec_cnt = 0;
    my @tstat;
    my (@col1, @col2, @col3);
    while (<FL>)
    {
        chomp;
        my @row = split /$sep/;
        push @col1, $row[0];
        push @col2, $row[1];
        push @col3, $row[2];
        if ($. % $bsz == 0)
        {
            my $tuples = $sth->execute_array({ArrayTupleStatus => \@tstat},
                                             \@col1, \@col2, \@col3);
            die("Multiple insert failed!\n") if (!$tuples);
            @col1 = ();
            @col2 = ();
            @col3 = ();
            $exec_cnt++;
        }

    }
    if ($#col1 >= 0)
    {
        my $tuples = $sth->execute_array({ArrayTupleStatus => \@tstat},
                                         \@col1, \@col2, \@col3);
        die("Multiple insert failed!\n") if (!$tuples);
        $exec_cnt++;
    }
    $dbh->commit();
    print "Insert executed $exec_cnt times.\n";
}


The variable "$sth" is a prepared  statement handle.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




Re: Array interface

From
Mark Kirkwood
Date:
On 03/11/10 08:46, Mladen Gogala wrote:
> I wrote a little Perl script, intended to test the difference that
> array insert makes with PostgreSQL. Imagine my surprise when a single
> record insert into a local database was faster than batches of 100
> records. Here are the two respective routines:

Interesting - I'm seeing a modest but repeatable improvement with bigger
array sizes (using attached program to insert pgbench_accounts) on an
older dual core AMD box with a single SATA drive running Ubuntu 10.04 i686.

    rows arraysize elapsed(s)
1000000         1        161
1000000        10        115
1000000       100        110
1000000      1000        109

This is *despite* the fact that tracing the executed sql (setting
log_min_duration_statement = 0) shows that there is *no* difference (i.e
1000000 INSERT executions are performed) for each run. I'm guessing that
some perl driver overhead is being saved here.

I'd be interested to see if you can reproduce the same or similar effect.

What might also be interesting is doing each INSERT with an array-load
of bind variables appended to the VALUES clause - as this will only do 1
insert call per "array" of values.

Cheers

Mark

Attachment

Re: Array interface

From
Mark Kirkwood
Date:
On 10/11/10 22:10, Mark Kirkwood wrote:
>
> What might also be interesting is doing each INSERT with an array-load
> of bind variables appended to the VALUES clause - as this will only do
> 1 insert call per "array" of values.

This is probably more like what you were expecting:

rows    num values tuples(i.e array size) elapsed
1000000                 1                     106
1000000                10                      14
1000000               100                      13
1000000              1000                      14

I didn't try to use PREPARE + EXECUTE here, just did "do" with the
INSERT + array size number of VALUES tuples (execute could well be
faster). The obvious difference here is we only do rows/(array size)
number of insert calls.

Cheers

Mark


Attachment