Thread: How import big amounts of data?

How import big amounts of data?

From
Arnau
Date:
Hi all,

   Which is the best way to import data to tables? I have to import
90000 rows into a column and doing it as inserts takes ages. Would be
faster with copy? is there any other alternative to insert/copy?

Cheers!

Re: How import big amounts of data?

From
"Steinar H. Gunderson"
Date:
On Thu, Dec 29, 2005 at 10:48:26AM +0100, Arnau wrote:
>   Which is the best way to import data to tables? I have to import
> 90000 rows into a column and doing it as inserts takes ages. Would be
> faster with copy? is there any other alternative to insert/copy?

There are multiple reasons why your INSERT might be slow:

- Are you using multiple transactions instead of batching them in all or a
  few transactions? (Usually, the per-transaction cost is a lot higher than
  the per-row insertion cost.)
- Do you have a foreign key without a matching index in the other table? (In
  newer versions of PostgreSQL, EXPLAIN ANALYZE can help with this; do a
  single insert and see where it ends up doing work. Older won't show such
  things, though.)
- Do you have an insertion trigger taking time? (Ditto wrt. EXPLAIN ANALYZE.)

COPY will be faster than INSERT regardless, though (for more than a few rows,
at least).

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: How import big amounts of data?

From
Dennis Bjorklund
Date:
On Thu, 29 Dec 2005, Arnau wrote:

>    Which is the best way to import data to tables? I have to import
> 90000 rows into a column and doing it as inserts takes ages. Would be
> faster with copy? is there any other alternative to insert/copy?

Wrap the inserts inside a BEGIN/COMMIT block and it will be a lot faster.
Copy is even faster, but for just 90000 rows I wouldn't bother.

--
/Dennis Björklund


Re: How import big amounts of data?

From
Ron
Date:
At 04:48 AM 12/29/2005, Arnau wrote:
>Hi all,
>
>   Which is the best way to import data to tables? I have to import
> 90000 rows into a column and doing it as inserts takes ages. Would
> be faster with copy? is there any other alternative to insert/copy?
Compared to some imports, 90K rows is not that large.

Assuming you want the table(s) to be in some sorted order when you
are done, the fastest way to import a large enough amount of data is:
-put the new data into a temp table (works best if temp table fits into RAM)
-merge the rows from the original table and the temp table into a new table
-create the indexes you want on the new table
-DROP the old table and its indexes
-rename the new table and its indexes to replace the old ones.

If you _don't_ care about having the table in some sorted order,
-put the new data into a new table
-COPY the old data to the new table
-create the indexes you want on the new table
-DROP the old table and its indexes
-rename the new table and its indexes to replace the old ones

Either of these procedures will also minimize your downtime while you
are importing.

If one doesn't want to go to all of the trouble of either of the
above, at least DROP your indexes, do your INSERTs in batches, and
rebuild your indexes.
Doing 90K individual INSERTs should usually be avoided.

cheers,
Ron



Re: How import big amounts of data?

From
Teemu Torma
Date:
On Thursday 29 December 2005 10:48, Arnau wrote:
>    Which is the best way to import data to tables? I have to import
> 90000 rows into a column and doing it as inserts takes ages. Would be
> faster with copy? is there any other alternative to insert/copy?

I am doing twice as big imports daily, and found the follwing method
most efficient (other than using copy):

- Use plpgsql function to do the actual insert (or update/insert if
needed).

- Inside a transaction, execute SELECT statements with maximum possible
number of insert function calls in one go.  This minimizes the number
of round trips between the client and the server.

Teemu

Re: How import big amounts of data?

From
Arnau
Date:
>
> I am doing twice as big imports daily, and found the follwing method
> most efficient (other than using copy):
>
> - Use plpgsql function to do the actual insert (or update/insert if
> needed).
>
> - Inside a transaction, execute SELECT statements with maximum possible
> number of insert function calls in one go.  This minimizes the number
> of round trips between the client and the server.

Thanks Teemu! could you paste an example of one of those functions? ;-)
An example of those SELECTS also would be great, I'm not sure I have
completly understood what you mean.

--
Arnau

Re: How import big amounts of data?

From
Teemu Torma
Date:
On Thursday 29 December 2005 17:19, Arnau wrote:
> > - Use plpgsql function to do the actual insert (or update/insert if
> > needed).
> >
> > - Inside a transaction, execute SELECT statements with maximum
> > possible number of insert function calls in one go.  This minimizes
> > the number of round trips between the client and the server.
>
> Thanks Teemu! could you paste an example of one of those functions?
> ;-) An example of those SELECTS also would be great, I'm not sure I
> have completly understood what you mean.

An insert function like:

CREATE OR REPLACE FUNCTION
insert_values (the_value1 numeric, the_value2 numeric)
RETURNS void
LANGUAGE plpgsql VOLATILE AS $$
BEGIN
  INSERT INTO values (value1, value2)
    VALUES (the_value1, the_value2);
RETURN;
END;
$$;

Then execute queries like

SELECT insert_values(1,2), insert_values(2,3), insert_values(3,4);

with maximum number of insert_values calls as possible.

I think the transaction (BEGIN/COMMIT) has little time benefit if you
have at least hundreds of calls in one SELECT.

Teemu