Thread: How import big amounts of data?
Hi all, Which is the best way to import data to tables? I have to import 90000 rows into a column and doing it as inserts takes ages. Would be faster with copy? is there any other alternative to insert/copy? Cheers!
On Thu, Dec 29, 2005 at 10:48:26AM +0100, Arnau wrote: > Which is the best way to import data to tables? I have to import > 90000 rows into a column and doing it as inserts takes ages. Would be > faster with copy? is there any other alternative to insert/copy? There are multiple reasons why your INSERT might be slow: - Are you using multiple transactions instead of batching them in all or a few transactions? (Usually, the per-transaction cost is a lot higher than the per-row insertion cost.) - Do you have a foreign key without a matching index in the other table? (In newer versions of PostgreSQL, EXPLAIN ANALYZE can help with this; do a single insert and see where it ends up doing work. Older won't show such things, though.) - Do you have an insertion trigger taking time? (Ditto wrt. EXPLAIN ANALYZE.) COPY will be faster than INSERT regardless, though (for more than a few rows, at least). /* Steinar */ -- Homepage: http://www.sesse.net/
On Thu, 29 Dec 2005, Arnau wrote: > Which is the best way to import data to tables? I have to import > 90000 rows into a column and doing it as inserts takes ages. Would be > faster with copy? is there any other alternative to insert/copy? Wrap the inserts inside a BEGIN/COMMIT block and it will be a lot faster. Copy is even faster, but for just 90000 rows I wouldn't bother. -- /Dennis Björklund
At 04:48 AM 12/29/2005, Arnau wrote: >Hi all, > > Which is the best way to import data to tables? I have to import > 90000 rows into a column and doing it as inserts takes ages. Would > be faster with copy? is there any other alternative to insert/copy? Compared to some imports, 90K rows is not that large. Assuming you want the table(s) to be in some sorted order when you are done, the fastest way to import a large enough amount of data is: -put the new data into a temp table (works best if temp table fits into RAM) -merge the rows from the original table and the temp table into a new table -create the indexes you want on the new table -DROP the old table and its indexes -rename the new table and its indexes to replace the old ones. If you _don't_ care about having the table in some sorted order, -put the new data into a new table -COPY the old data to the new table -create the indexes you want on the new table -DROP the old table and its indexes -rename the new table and its indexes to replace the old ones Either of these procedures will also minimize your downtime while you are importing. If one doesn't want to go to all of the trouble of either of the above, at least DROP your indexes, do your INSERTs in batches, and rebuild your indexes. Doing 90K individual INSERTs should usually be avoided. cheers, Ron
On Thursday 29 December 2005 10:48, Arnau wrote: > Which is the best way to import data to tables? I have to import > 90000 rows into a column and doing it as inserts takes ages. Would be > faster with copy? is there any other alternative to insert/copy? I am doing twice as big imports daily, and found the follwing method most efficient (other than using copy): - Use plpgsql function to do the actual insert (or update/insert if needed). - Inside a transaction, execute SELECT statements with maximum possible number of insert function calls in one go. This minimizes the number of round trips between the client and the server. Teemu
> > I am doing twice as big imports daily, and found the follwing method > most efficient (other than using copy): > > - Use plpgsql function to do the actual insert (or update/insert if > needed). > > - Inside a transaction, execute SELECT statements with maximum possible > number of insert function calls in one go. This minimizes the number > of round trips between the client and the server. Thanks Teemu! could you paste an example of one of those functions? ;-) An example of those SELECTS also would be great, I'm not sure I have completly understood what you mean. -- Arnau
On Thursday 29 December 2005 17:19, Arnau wrote: > > - Use plpgsql function to do the actual insert (or update/insert if > > needed). > > > > - Inside a transaction, execute SELECT statements with maximum > > possible number of insert function calls in one go. This minimizes > > the number of round trips between the client and the server. > > Thanks Teemu! could you paste an example of one of those functions? > ;-) An example of those SELECTS also would be great, I'm not sure I > have completly understood what you mean. An insert function like: CREATE OR REPLACE FUNCTION insert_values (the_value1 numeric, the_value2 numeric) RETURNS void LANGUAGE plpgsql VOLATILE AS $$ BEGIN INSERT INTO values (value1, value2) VALUES (the_value1, the_value2); RETURN; END; $$; Then execute queries like SELECT insert_values(1,2), insert_values(2,3), insert_values(3,4); with maximum number of insert_values calls as possible. I think the transaction (BEGIN/COMMIT) has little time benefit if you have at least hundreds of calls in one SELECT. Teemu