Re: [GENERAL] slow inserts and updates on large tables - Mailing list pgsql-general
From | jim@reptiles.org (Jim Mercer) |
---|---|
Subject | Re: [GENERAL] slow inserts and updates on large tables |
Date | |
Msg-id | m10D7g0-00080dC@mailbox.reptiles.org Whole thread Raw |
In response to | Re: [GENERAL] slow inserts and updates on large tables (Herouth Maoz <herouth@oumail.openu.ac.il>) |
Responses |
Re: [GENERAL] slow inserts and updates on large tables
|
List | pgsql-general |
> > if anyone has any pointers as to why this is so slow, lemme know. > > Have you checked the usual stuff: > > 1) Each insert and update creates its own transaction. That is, an > insert is in fact: > > BEGIN TRANSACTION; > INSERT...; > COMMIT; > > So, to make things faster you should BEGIN TRANSACTION explicitly > before all the inserts and COMMIT after them. Or separate into > manageable bulks if you run into a memory problem. i have a script which times various methods: ------ data is 6047 records insertama time was 56 seconds psqlinsert time was 51 seconds traninsert time was 51 seconds psqlselect time was 0 ------ the script drops and recreates the testtable before each test. insertama read the data from a text file and uses PQexec to insert each record. it does not use begin/commit. psqlinsert is "psql -f insert.sql" where insert.sql is a file of insert commands (created by printf'ing the PQexec args from insertama). traninsert is the same as psqlinsert, except it has "begin transaction" and "commit transaction" at the start and end. psqlselect is a "psql -c select count(callid) from testtable". so, according to my tests, the begin/commit stuff doesn't do me any good. > 2) Indexing tables mean faster SELECTs at the expense of slower INSERTs > and UPDATEs. There is no magic. The reasoning is that normally you > query the data a lot more than you change it. > > Thus, it is preferable, before doing bulk inserts, to drop the > indices and recreate them afterwards. This is true when you are not > expectind the database to be queried at the same time the inserts are > made. the production database currently has over 5 million records, and is expected to run at 10-15 million is full production. the inserts happen in batches, every 10 minutes or so. recreating the index takes some 30 minutes as it is. however, if i use the above bench test script, but with the index removed, i get: ------ data is 6047 records insertama time was 53 seconds psqlinsert time was 47 seconds traninsert time was 48 seconds psqlselect time was 0 seconds ------ as you can see, it didn't make much difference. > As for updates, it's trickier, because you actually use the index > for the WHERE part of the update. If speed is of an essence, I would > probably try the following: > > SELECT * INTO TABLE temp_table FROM your_table WHERE update_condition; > DELETE FROM your_table WHERE update_condition; > DROP INDEX...; > INSERT INTO your_table SELECT ... FROM temp_table; -- update within select > CREATE INDEX...; as stated above, the size of the table means i can't do this on every update. > 3) Back to the issue of INSERTS - copies are faster. If you can transform > the data into tab-delimited format as required by COPY, you save a lot > of time on parsing, planning etc. this sorta defeats the purpose of putting the data in an SQL database. 8^) -- [ Jim Mercer Reptilian Research jim@reptiles.org +1 416 410-5633 ] [ The telephone, for those of you who have forgotten, was a commonly used ] [ communications technology in the days before electronic mail. ] [ They're still easy to find in most large cities. -- Nathaniel Borenstein ]
pgsql-general by date: