Re: INSERT extremely slow with large data sets - Mailing list pgsql-hackers

From Dann Corbit
Subject Re: INSERT extremely slow with large data sets
Date
Msg-id D90A5A6C612A39408103E6ECDD77B829408C54@voyager.corporate.connx.com
Whole thread Raw
In response to INSERT extremely slow with large data sets  (Slavisa Garic <Slavisa.Garic@infotech.monash.edu.au>)
List pgsql-hackers
> -----Original Message-----
> From: Slavisa Garic [mailto:Slavisa.Garic@infotech.monash.edu.au]
> Sent: Friday, November 14, 2003 5:12 PM
> To: Dann Corbit
> Cc: Slavisa Garic; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] INSERT extremely slow with large data sets
>
>
> Hi Dann
>
> Here is the schema and also could you just be more specific
> on COPY command.

http://www.postgresql.org/docs/7.3/static/sql-copy.html

And

http://techdocs.postgresql.org/techdocs/usingcopy.php

May be helpful.

> ALso does talking dirrectly to API speed
> things up ? (I am new to databases but i am learning quickly)

Not particularly.  This is the copy command API:
http://www.postgresql.org/docs/7.3/static/libpq-copy.html

What the API can allow you to do (for instance) would be to never let
the data touch the ground.  Instead of writing to a text file or even a
binary format copy input file, you use the API to take the incoming data
and insert it directly.

Like everything else, there is a dark side.  Read the documents and they
will explain it.  But if you need to move a giant pile of data into the
database as fast as possible, it is the copy command that is the most
efficient.
>     -- NimrodEnfJob --
>
>     create table NimrodEnfJob(
>         exp_id  INTEGER not null references NimrodEnfExperiment,
>         task_id INTEGER not null references NimrodTask,
>         pgroup_id   INTEGER not null references
> NimrodParameterGroup,
>         agent_id    INTEGER references NimrodAgent on
> delete set null,
>         jobname varchar(255) not null,
>         admin   char(1) not null default 'F'
>                 check (admin in ('F', 'T')),
>         taskname    varchar(255) not null,
>         param_text  TEXT not null,
>         open    char(1) not null default 'F'
>                 check (open in ('F', 'T')),
>         control varchar(8) not null default 'start'
>                 check (control in ('nostart', 'start', 'stop')),
>         status  varchar(16) not null default 'ready'
>                 check (status in ('ready', 'executing', 'failed',
>                     'done')),
>         cpulength   real not null,
>         sleeptime   real not null,
>         filesize    real not null,
>         cputime   real,
>         waittime    real,
>         filetime    real,
>         filebytes   integer,
>         priority    integer not null default 100,
>         create_time timestamp not null default CURRENT_TIMESTAMP,
>         start_time timestamp,
>         finish_time timestamp,
>         budget      real not null default 0.0,
>         servername  varchar(255),
>         error_info  varchar(255) not null default '',
>         more_info   TEXT not null default '',
>         primary key (exp_id, jobname),
>         foreign key (exp_id, taskname) references NimrodEnfTask
>     );
>
> Also these are the indexes on this table. I created them on
> the columnt that are most commonly accessed:
>     create unique index nej_idx
>     ON NimrodEnfJob (exp_id, pgroup_id);
>
>     create unique index nej_idx1
>     ON NimrodEnfJob (pgroup_id);
>
>     create index nej_idx2
>     ON NimrodEnfJob (status);
>
>     create unique index nej_idx3
>     ON NimrodEnfJob (status, pgroup_id);
>
>     create index nej_idx4
>     ON NimrodEnfJob (status, agent_id);
>
>     create index nej_idx5
>     ON NimrodEnfJob (agent_id);
>
> I did notice that removing those indexes doesn't import by
> much. Similar behaviour is observed but it just takes a bit
> less time to insert (0.01 less then usually at 60000 records)

I am quite surprised that removing the indexes does not have a large
impact on insert speed, especially, since you have 6 of them.  Most of
the other costs that I can think of are fixed for inserts into a "bare
table".  Perhaps someone with more intimate knowledge of the inner
working may know why inserts into a table without any index will trail
off in speed as the table grows.


[snip]


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: oh dear ...
Next
From: "Marc G. Fournier"
Date:
Subject: Re: oh dear ...