Re: INSERT extremely slow with large data sets - Mailing list pgsql-hackers
From | Dann Corbit |
---|---|
Subject | Re: INSERT extremely slow with large data sets |
Date | |
Msg-id | D90A5A6C612A39408103E6ECDD77B829408C54@voyager.corporate.connx.com Whole thread Raw |
In response to | INSERT extremely slow with large data sets (Slavisa Garic <Slavisa.Garic@infotech.monash.edu.au>) |
List | pgsql-hackers |
> -----Original Message----- > From: Slavisa Garic [mailto:Slavisa.Garic@infotech.monash.edu.au] > Sent: Friday, November 14, 2003 5:12 PM > To: Dann Corbit > Cc: Slavisa Garic; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] INSERT extremely slow with large data sets > > > Hi Dann > > Here is the schema and also could you just be more specific > on COPY command. http://www.postgresql.org/docs/7.3/static/sql-copy.html And http://techdocs.postgresql.org/techdocs/usingcopy.php May be helpful. > ALso does talking dirrectly to API speed > things up ? (I am new to databases but i am learning quickly) Not particularly. This is the copy command API: http://www.postgresql.org/docs/7.3/static/libpq-copy.html What the API can allow you to do (for instance) would be to never let the data touch the ground. Instead of writing to a text file or even a binary format copy input file, you use the API to take the incoming data and insert it directly. Like everything else, there is a dark side. Read the documents and they will explain it. But if you need to move a giant pile of data into the database as fast as possible, it is the copy command that is the most efficient. > -- NimrodEnfJob -- > > create table NimrodEnfJob( > exp_id INTEGER not null references NimrodEnfExperiment, > task_id INTEGER not null references NimrodTask, > pgroup_id INTEGER not null references > NimrodParameterGroup, > agent_id INTEGER references NimrodAgent on > delete set null, > jobname varchar(255) not null, > admin char(1) not null default 'F' > check (admin in ('F', 'T')), > taskname varchar(255) not null, > param_text TEXT not null, > open char(1) not null default 'F' > check (open in ('F', 'T')), > control varchar(8) not null default 'start' > check (control in ('nostart', 'start', 'stop')), > status varchar(16) not null default 'ready' > check (status in ('ready', 'executing', 'failed', > 'done')), > cpulength real not null, > sleeptime real not null, > filesize real not null, > cputime real, > waittime real, > filetime real, > filebytes integer, > priority integer not null default 100, > create_time timestamp not null default CURRENT_TIMESTAMP, > start_time timestamp, > finish_time timestamp, > budget real not null default 0.0, > servername varchar(255), > error_info varchar(255) not null default '', > more_info TEXT not null default '', > primary key (exp_id, jobname), > foreign key (exp_id, taskname) references NimrodEnfTask > ); > > Also these are the indexes on this table. I created them on > the columnt that are most commonly accessed: > create unique index nej_idx > ON NimrodEnfJob (exp_id, pgroup_id); > > create unique index nej_idx1 > ON NimrodEnfJob (pgroup_id); > > create index nej_idx2 > ON NimrodEnfJob (status); > > create unique index nej_idx3 > ON NimrodEnfJob (status, pgroup_id); > > create index nej_idx4 > ON NimrodEnfJob (status, agent_id); > > create index nej_idx5 > ON NimrodEnfJob (agent_id); > > I did notice that removing those indexes doesn't import by > much. Similar behaviour is observed but it just takes a bit > less time to insert (0.01 less then usually at 60000 records) I am quite surprised that removing the indexes does not have a large impact on insert speed, especially, since you have 6 of them. Most of the other costs that I can think of are fixed for inserts into a "bare table". Perhaps someone with more intimate knowledge of the inner working may know why inserts into a table without any index will trail off in speed as the table grows. [snip]
pgsql-hackers by date: