Thread: slow inserts and updates on large tables
i'm developing a database for one of my clients. unfortunately, the combination of PostgreSQL 6.4 and FreeBSD 3.0 don't seem to be giving me the performance i would expect. i have a table which has some 60 fields, largely fixed length strings (of a variety of sizes) and indexed on a 20 character string field. currently i have some 5,866,667 records in the table and when i use a c program to insert records, i get a max of 102 inserts per second. updates average about 40/second, sometimes much slower. i intend to have a sliding group of about 10-15 million records in this table (deleting some each night, while constantly adding new records). if 100 inserts/second and 40 updates/second is all i can realistically expect, then i will have to investigate alternate databases (likely commercial ones like Progress on SCO, yech!). if anyone has any pointers as to why this is so slow, lemme know. the system is: FreeBSD 3.0-RELEASE CPU: Pentium II (299.17-MHz 686-class CPU) avail memory = 62836736 (61364K bytes) ahc0: <Adaptec aic7880 Ultra SCSI adapter> rev 0x00 int a irq 15 on pci1.4.0 da1: <IBM DDRS-34560W S71D> Fixed Direct Access SCSI2 device da1: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled da2: <IBM DDRS-34560W S71D> Fixed Direct Access SCSI2 device da2: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled da3: <QUANTUM VIKING 4.5 WSE 880P> Fixed Direct Access SCSI2 device da3: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled da4: <QUANTUM VIKING 4.5 WSE 880P> Fixed Direct Access SCSI2 device da4: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled postgres v6.4 (not 6.4.2 yet) the databases live on a 16 gig striped (not RAID) array across the above 4 drives. the operating system lives on a separate drive. we are going to upgrade to 512M RAM soon, but i don't think that RAM is the issue on this beast. bonnie and iozone show that the array is kicking ass as far as throughput goes. -- [ Jim Mercer Reptilian Research jim@reptiles.org +1 416 410-5633 ] [ The telephone, for those of you who have forgotten, was a commonly used ] [ communications technology in the days before electronic mail. ] [ They're still easy to find in most large cities. -- Nathaniel Borenstein ]
Jim Mercer wrote: > > i'm developing a database for one of my clients. > > unfortunately, the combination of PostgreSQL 6.4 and FreeBSD 3.0 don't > seem to be giving me the performance i would expect. > > i have a table which has some 60 fields, largely fixed length strings > (of a variety of sizes) and indexed on a 20 character string field. > > currently i have some 5,866,667 records in the table and when i use > a c program to insert records, i get a max of 102 inserts per second. > > updates average about 40/second, sometimes much slower. > > i intend to have a sliding group of about 10-15 million records in this > table (deleting some each night, while constantly adding new records). > > if 100 inserts/second and 40 updates/second is all i can realistically > expect, then i will have to investigate alternate databases (likely > commercial ones like Progress on SCO, yech!). > > if anyone has any pointers as to why this is so slow, lemme know. What's -B ? Do you use BEGIN/END to run _many_ updates/inserts in _single_ transaction ? Vadim
> > if 100 inserts/second and 40 updates/second is all i can realistically > > expect, then i will have to investigate alternate databases (likely > > commercial ones like Progress on SCO, yech!). > > > > if anyone has any pointers as to why this is so slow, lemme know. > > What's -B ? it is whatever the default is. > Do you use BEGIN/END to run _many_ updates/inserts in > _single_ transaction ? i tried that and it didn't seem to make much difference. -- [ Jim Mercer Reptilian Research jim@reptiles.org +1 416 410-5633 ] [ The telephone, for those of you who have forgotten, was a commonly used ] [ communications technology in the days before electronic mail. ] [ They're still easy to find in most large cities. -- Nathaniel Borenstein ]
At 5:02 +0200 on 17/2/99, Jim Mercer wrote: > if anyone has any pointers as to why this is so slow, lemme know. Have you checked the usual stuff: 1) Each insert and update creates its own transaction. That is, an insert is in fact: BEGIN TRANSACTION; INSERT...; COMMIT; So, to make things faster you should BEGIN TRANSACTION explicitly before all the inserts and COMMIT after them. Or separate into manageable bulks if you run into a memory problem. 2) Indexing tables mean faster SELECTs at the expense of slower INSERTs and UPDATEs. There is no magic. The reasoning is that normally you query the data a lot more than you change it. Thus, it is preferable, before doing bulk inserts, to drop the indices and recreate them afterwards. This is true when you are not expectind the database to be queried at the same time the inserts are made. As for updates, it's trickier, because you actually use the index for the WHERE part of the update. If speed is of an essence, I would probably try the following: SELECT * INTO TABLE temp_table FROM your_table WHERE update_condition; DELETE FROM your_table WHERE update_condition; DROP INDEX...; INSERT INTO your_table SELECT ... FROM temp_table; -- update within select CREATE INDEX...; 3) Back to the issue of INSERTS - copies are faster. If you can transform the data into tab-delimited format as required by COPY, you save a lot of time on parsing, planning etc. Herouth -- Herouth Maoz, Internet developer. Open University of Israel - Telem project http://telem.openu.ac.il/~herutma
> > if anyone has any pointers as to why this is so slow, lemme know. > > Have you checked the usual stuff: > > 1) Each insert and update creates its own transaction. That is, an > insert is in fact: > > BEGIN TRANSACTION; > INSERT...; > COMMIT; > > So, to make things faster you should BEGIN TRANSACTION explicitly > before all the inserts and COMMIT after them. Or separate into > manageable bulks if you run into a memory problem. i have a script which times various methods: ------ data is 6047 records insertama time was 56 seconds psqlinsert time was 51 seconds traninsert time was 51 seconds psqlselect time was 0 ------ the script drops and recreates the testtable before each test. insertama read the data from a text file and uses PQexec to insert each record. it does not use begin/commit. psqlinsert is "psql -f insert.sql" where insert.sql is a file of insert commands (created by printf'ing the PQexec args from insertama). traninsert is the same as psqlinsert, except it has "begin transaction" and "commit transaction" at the start and end. psqlselect is a "psql -c select count(callid) from testtable". so, according to my tests, the begin/commit stuff doesn't do me any good. > 2) Indexing tables mean faster SELECTs at the expense of slower INSERTs > and UPDATEs. There is no magic. The reasoning is that normally you > query the data a lot more than you change it. > > Thus, it is preferable, before doing bulk inserts, to drop the > indices and recreate them afterwards. This is true when you are not > expectind the database to be queried at the same time the inserts are > made. the production database currently has over 5 million records, and is expected to run at 10-15 million is full production. the inserts happen in batches, every 10 minutes or so. recreating the index takes some 30 minutes as it is. however, if i use the above bench test script, but with the index removed, i get: ------ data is 6047 records insertama time was 53 seconds psqlinsert time was 47 seconds traninsert time was 48 seconds psqlselect time was 0 seconds ------ as you can see, it didn't make much difference. > As for updates, it's trickier, because you actually use the index > for the WHERE part of the update. If speed is of an essence, I would > probably try the following: > > SELECT * INTO TABLE temp_table FROM your_table WHERE update_condition; > DELETE FROM your_table WHERE update_condition; > DROP INDEX...; > INSERT INTO your_table SELECT ... FROM temp_table; -- update within select > CREATE INDEX...; as stated above, the size of the table means i can't do this on every update. > 3) Back to the issue of INSERTS - copies are faster. If you can transform > the data into tab-delimited format as required by COPY, you save a lot > of time on parsing, planning etc. this sorta defeats the purpose of putting the data in an SQL database. 8^) -- [ Jim Mercer Reptilian Research jim@reptiles.org +1 416 410-5633 ] [ The telephone, for those of you who have forgotten, was a commonly used ] [ communications technology in the days before electronic mail. ] [ They're still easy to find in most large cities. -- Nathaniel Borenstein ]
At 16:10 +0200 on 17/2/99, Jim Mercer wrote: > > > 3) Back to the issue of INSERTS - copies are faster. If you can transform > > the data into tab-delimited format as required by COPY, you save a lot > > of time on parsing, planning etc. > > this sorta defeats the purpose of putting the data in an SQL database. 8^) You probably didn't understand me. If you convert it to tab delimited text and then use COPY table_name FROM filename/stdin instead of INSERT, it will be much faster, because you don't have to do the parsing and planning on each line, but only on the whole copy. I didn't tell you to use the data directly from those text files... In fact, it doesn't require using text files at all, just reformatting your program. If until now it did - - - - while (data_still_coming) { sprintf( command, "INSERT INTO table1 VALUES( %s, %s, %s )", item1, item2, item3 ); PQexec( con, command ); } - - - - Now you have to do instead - - - - PQexec( con, "COPY table1 FROM stdin" ); while (data_still_coming) { sprintf( line, "%s\t%s\t%s\n" , item1, item2, item3 ); PQputline( con, line ); } PQputline( con, ".\n" ); PQendcopy(con); - - - - It's simply a different formatting to your data insertion. Herouth
> At 16:10 +0200 on 17/2/99, Jim Mercer wrote: > You probably didn't understand me. If you convert it to tab delimited text > and then use COPY table_name FROM filename/stdin instead of INSERT, it will > be much faster, because you don't have to do the parsing and planning on > each line, but only on the whole copy. > > I didn't tell you to use the data directly from those text files... > > PQexec( con, "COPY table1 FROM stdin" ); > > while (data_still_coming) { > > sprintf( line, "%s\t%s\t%s\n" , item1, item2, item3 ); > PQputline( con, line ); > > } > > PQputline( con, ".\n" ); > PQendcopy(con); i will test this with my insertama program, but i see some problems with this. firstly, it assumes that all of your applications programs are updated each time you modify the structure of the table. i am using "insert into testtable (fieldname1, fieldname2) values ('1', '2');" this allows the applications to remain unchanged if new fields are added. also, it doesn't seem to address the issue of updates, which suffer from worse performance than inserts. what is a realistic number of inserts per second under postgresql, with or without an index? -- [ Jim Mercer Reptilian Research jim@reptiles.org +1 416 410-5633 ] [ The telephone, for those of you who have forgotten, was a commonly used ] [ communications technology in the days before electronic mail. ] [ They're still easy to find in most large cities. -- Nathaniel Borenstein ]
At 16:47 +0200 on 17/2/99, Jim Mercer wrote: > i will test this with my insertama program, but i see some problems with >this. > > firstly, it assumes that all of your applications programs are updated each > time you modify the structure of the table. This is true. That's the sacrifice you get for COPY's fast transfers. > also, it doesn't seem to address the issue of updates, which suffer >from worse performance than inserts. Did you try my trick, but without removing the indices? Move the data over to a temporary table, delete from the original, insert updated data back? (Assuming you don't have a separate update for each line). Herouth -- Herouth Maoz, Internet developer. Open University of Israel - Telem project http://telem.openu.ac.il/~herutma
Jim Mercer wrote: > > i'm developing a database for one of my clients. > > unfortunately, the combination of PostgreSQL 6.4 and FreeBSD 3.0 don't > seem to be giving me the performance i would expect. > > i have a table which has some 60 fields, largely fixed length strings > (of a variety of sizes) and indexed on a 20 character string field. > > currently i have some 5,866,667 records in the table and when i use > a c program to insert records, i get a max of 102 inserts per second. > > updates average about 40/second, sometimes much slower. > > i intend to have a sliding group of about 10-15 million records in this > table (deleting some each night, while constantly adding new records). > > if 100 inserts/second and 40 updates/second is all i can realistically > expect, then i will have to investigate alternate databases (likely > commercial ones like Progress on SCO, yech!). > > if anyone has any pointers as to why this is so slow, lemme know. What's -B ? Do you use BEGIN/END to run _many_ updates/inserts in _single_ transaction ? Vadim
At 16:47 +0200 on 17/2/99, Jim Mercer wrote: > i will test this with my insertama program, but i see some problems with >this. > > firstly, it assumes that all of your applications programs are updated each > time you modify the structure of the table. This is true. That's the sacrifice you get for COPY's fast transfers. > also, it doesn't seem to address the issue of updates, which suffer >from worse performance than inserts. Did you try my trick, but without removing the indices? Move the data over to a temporary table, delete from the original, insert updated data back? (Assuming you don't have a separate update for each line). Herouth -- Herouth Maoz, Internet developer. Open University of Israel - Telem project http://telem.openu.ac.il/~herutma