Thread: INSERT performance
I'm running PostgreSQL 7.3.4 on Windows 2000/Cygwin (AMD Athlon XP+ 1800, 512MB RAM). I want to insert about 500 records into a table. The table is heavily indexed (has about 10-12 indices created on it). The insert is performed in a transaction block. If I keep the indices on the table, the insert takes about 12 seconds. If I drop the indices, do the insert, and then re-create the indices, the entire process takes about 3-4 seconds. I am somewhat surprised at both of those performance numbers above -- I would have expected both of them to be a lot smaller (about half of what they are). 500 records is not that big of a number! In particular, the transacted insert without indices should be very fast! Are these numbers about in range to what I should expect from PostgreSQL? I looked through the various performance FAQs and I cannot use the COPY command since the data needs to be processed first, and this can only be done in memory. Any suggestions/pointers would be much appreciated. Thanks, Razvan.
> -----Original Message----- > From: Razvan Surdulescu [mailto:surdules@yahoo.com] > Sent: Wednesday, October 29, 2003 8:41 PM > To: pgsql-general@postgresql.org > Subject: [GENERAL] INSERT performance > > > I'm running PostgreSQL 7.3.4 on Windows 2000/Cygwin (AMD Athlon XP+ > 1800, 512MB RAM). > > I want to insert about 500 records into a table. The table is heavily > indexed (has about 10-12 indices created on it). The insert > is performed > in a transaction block. > > If I keep the indices on the table, the insert takes about 12 > seconds. > If I drop the indices, do the insert, and then re-create the indices, > the entire process takes about 3-4 seconds. > > I am somewhat surprised at both of those performance numbers > above -- I > would have expected both of them to be a lot smaller (about > half of what > they are). 500 records is not that big of a number! In > particular, the > transacted insert without indices should be very fast! > > Are these numbers about in range to what I should expect from > PostgreSQL? I looked through the various performance FAQs and > I cannot > use the COPY command since the data needs to be processed first, and > this can only be done in memory. > > Any suggestions/pointers would be much appreciated. Cygwin? It will be faster under Linux. 500 records in 12 seconds is about 42/second. Hard to know if that is good or bad. Is the machine under heavy use? Are the records extremely long? You can still use the copy command as an API. It will be faster than the inserts, but there are (of course) caveats with its use. http://developer.postgresql.org/docs/postgres/libpq-copy.html Provide the SQL that defines the table and its indexes. Provide the cardinality of the table. Provide the average machine load during the insert operation. Probably, you can get better answers if you provide more information. What kind of disk drives do you have on your machine? (More importantly, where does PostgreSQL data reside?)
DCorbit@connx.com ("Dann Corbit") wrote in message news:<D90A5A6C612A39408103E6ECDD77B829408C2E@voyager.corporate.connx.com>... > Cygwin? It will be faster under Linux. I agree, I would also expect it to be faster under Linux. Would you expect that the performance under Linux should be an order of magnitude faster? I know that Cygwin I/O goes through something akin to a translation layer, but I don't know how much of a performance hit I should expect from that. > 500 records in 12 seconds is about 42/second. Hard to know if that is > good or bad. Is the machine under heavy use? Are the records extremely > long? No one else is using the machine, and the records are short (at most around 1k each). > You can still use the copy command as an API. It will be faster than > the inserts, but there are (of course) caveats with its use. > http://developer.postgresql.org/docs/postgres/libpq-copy.html Thanks, I'll look into it. > Provide the SQL that defines the table and its indexes. Here is the approximate SQL statement (I cannot provide the original statement for intellectual property reasons): CREATE TABLE data ( id char(32) NOT NULL, -- auto-generated from PHP using md5(...) -- the fieldN fields below have different lengths field1 varchar(5), field2 varchar(50), field3 varchar(10), ... field 20 varchar(255), PRIMARY KEY (id) ); CREATE INDEX idx_field1 ON data(field1); CREATE INDEX idx_field2 ON data(field2); ... CREATE INDEX idx_field20 ON data(field20); > Provide the cardinality of the table. The table is empty (cardinality = 0). > Provide the average machine load during the insert operation. I will have to measure this and get back to you with it in a future post. Here is what I can say from memory right now: * If I do the INSERT with the indexes enabled, the HDD thrashes visibly (audibly?) and the operation takes about 12 seconds. * If I drop the indices, do the INSERT, and re-create the indices, the HDD no longer thrashes, and the operation takes about 3-4 seconds. > Probably, you can get better answers if you provide more information. I agree -- I hope the information above is more illuminating. > What kind of disk drives do you have on your machine? (More > importantly, where does PostgreSQL data reside?) I have an ATA-100 7200 RPM HDD. The PostgreSQL data resides on this drive (which also contains the Cygwin installation). Thanks again, Razvan.
surdules@yahoo.com (Razvan Surdulescu) writes: > DCorbit@connx.com ("Dann Corbit") wrote in message news:<D90A5A6C612A39408103E6ECDD77B829408C2E@voyager.corporate.connx.com>... >> 500 records in 12 seconds is about 42/second. Hard to know if that is >> good or bad. Is the machine under heavy use? Are the records extremely >> long? >> Provide the SQL that defines the table and its indexes. > CREATE INDEX idx_field1 ON data(field1); > CREATE INDEX idx_field2 ON data(field2); > ... > CREATE INDEX idx_field20 ON data(field20); Uh, do you actually need an index on every column? It's obvious that the index insertions are where the time is going. You're getting close to 900 index insertions per second, which is not bad at all on consumer-grade ATA disk hardware, if you ask me. It might help to raise shared_buffers, if you didn't already do that ... but the real solution here is to only index the columns that you are actually intending to search on. regards, tom lane
--- Tom Lane <tgl@sss.pgh.pa.us> wrote: > surdules@yahoo.com (Razvan Surdulescu) writes: > > CREATE INDEX idx_field1 ON data(field1); > > CREATE INDEX idx_field2 ON data(field2); > > ... > > CREATE INDEX idx_field20 ON data(field20); > > Uh, do you actually need an index on every column? Yes -- I need to search on all these columns, the size of the table is expected to get very large, and each column contains very heterogenous data (so indexing makes sense). > It's obvious that the index insertions are where the > time is going. > You're getting close to 900 index insertions per > second, which is not > bad at all on consumer-grade ATA disk hardware, if > you ask me. That's helpful to know -- thanks. > It might > help to raise shared_buffers, if you didn't already > do that ... but the > real solution here is to only index the columns that > you are actually > intending to search on. I will look into the shared_buffers setting -- I have not done anything with it thus far. Razvan. __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree