Thread: DBI/AutoCommit/Postgres

DBI/AutoCommit/Postgres

From
Fran Fabrizio
Date:
Hello all,

I'm trying to speed up some insert statements.  I have been tinkering
with the postmaster and DBI parameters I did some timings on my insert
and copy commands.  Here is a sample insert query:

010430.18:31:18.199  [2604] query: insert into log values
(0,0,lower('blah.blah.mydomain.com'),lower('foo'),lower('bar'),lower('blah'),upper('Me'),
upper('Myself'), upper('I'), upper('INFO'), 'String Here', '20010430
16:00:00')

Pretty straightforward.  Table log looks like:

             Table "log"
  Attribute   |   Type    | Modifier
--------------+-----------+----------
 site_id      | bigint    |
 host_id      | bigint    |
 fqdn         | varchar() | not null
 site         | varchar() | not null
 region       | varchar() | not null
 hostname     | varchar() | not null
 product      | varchar() | not null
 class        | varchar() | not null
 subclass     | varchar() | not null
 status       | varchar() | not null
 msg          | varchar() | not null
 remote_stamp | timestamp | not null
 tstamp       | timestamp | not null

Here are my non-scientific timings:
with AutoCommit on, using DBI across TCP/IP:   1.3 INSERTS/second
with AutoCommit off, DBI, TCP/IP, committing after every 100:   1.6
INSERTS/second
using psql -h host -U user -c "copy log from stdin" dbname < datafile
1.73 rows/second
using COPY LOG FROM 'filename' on the db machine itself:  1.73
rows/second

Another crucial piece of information is that each insert kicks off a
trigger.  I did not write the trigger, and do not know how to write
triggers, but I think that might be the contributing factor to the
slowness.  Here is the text file used to create the trigger:

drop function update_host_table();
drop trigger incoming_trigger on incoming ;

create function update_host_table()
returns opaque
as 'declare

myrec           record;
new_hostid      int4;
begin

new.timestamp := now() ;
/* check to see if we have see this machine before */

select * into myrec
from knownhosts k
        where k.fqdn = new.fqdn and
        k.hostname = new.hostname ;

/* -- if we have not found the machine name we are going to
insert a new record into the knownhosts table and set the init_contact
to now
*/

if not found
then
        insert into knownhosts
        values (new.fqdn,new.hostname,new.timestamp,new.timestamp) ;
else
        update knownhosts
        set last_contact = new.timestamp
        where knownhosts.fqdn = new.fqdn ;
end if ;
/* now we are going to update the status table with the new record */

select * into myrec
        from status s where
        s.fqdn = new.fqdn and s.hostname=new.hostname
        and s.class=new.class and s.sub_class=new.sub_class ;

if not found
then
        insert into status
        values (new.fqdn,new.hostname,new.class,
        new.sub_class,new.level,new.msg,new.timestamp) ;
else
        update status
        set level = new.level,
        timestamp = new.timestamp
        where fqdn=new.fqdn and hostname=new.hostname and
                class = new.class and sub_class = new.sub_class ;
end if;

return new;
end ;'
language 'plpgsql';

create trigger incoming_trigger
before insert on incoming
for each row
execute procedure update_host_table();

1.73 INSERTS/second seems awfully slow, but maybe I have set my
expectations too high.  Now that you all can see the table and the kind
of data I am trying to put into it, do you have any suggestions?  The
hardware specs of the database machine are:   Pentium III 733Mhz, 512
megs memory, 7 gigs free on the partition.  Seems like I should be
getting a lot more horsepower.  I really need to speed this up somehow.
Does anyone see anything in the trigger or otherwise that would cause
this to be so slow?

Thank you very much,
Fran


Re: DBI/AutoCommit/Postgres

From
Tom Lane
Date:
Fran Fabrizio <ffabrizio@Exchange.WebMD.net> writes:
> Another crucial piece of information is that each insert kicks off a
> trigger.  I did not write the trigger, and do not know how to write
> triggers, but I think that might be the contributing factor to the
> slowness.

Bingo.  Your trigger is adding four complete queries (two selects and
two inserts or updates) for every row inserted.

Think about ways to avoid some or all of those.  (For example, does the
knownhosts table really need a last_contact column, or could you obtain
that on-the-fly from a query over the incoming or status table when
needed?)  Also look to see if these are being done with reasonable query
plans ... perhaps you need some indexes and/or vacuum operations ...

            regards, tom lane