Thread: Fastest method to insert data.

Fastest method to insert data.

From

Sam Varshavchik

Date:

19 April 2002, 15:13:31

What would be the fastest way to insert a bunch of data (in 7.1)?
Currently, a bunch of addBatch()es followed by executeBatch() is pretty
fast, but not as fast as COPY TABLE via psql.  I suppose I can use a single
executeBatch() to insert the entire table, but AFAIK addBatch() just buffers
the SQL in memory, and I prefer to have some kind of an upper limit on my
memory footprint.


--
Sam

Re: Fastest method to insert data.

From

Barry Lind

Date:

19 April 2002, 15:49:56

COPY will always be your best option if speed is the most important factor.

--Barry

Sam Varshavchik wrote:
>
> What would be the fastest way to insert a bunch of data (in 7.1)?
> Currently, a bunch of addBatch()es followed by executeBatch() is pretty
> fast, but not as fast as COPY TABLE via psql.  I suppose I can use a
> single executeBatch() to insert the entire table, but AFAIK addBatch()
> just buffers the SQL in memory, and I prefer to have some kind of an
> upper limit on my memory footprint.
>

Re: Fastest method to insert data.

From

Sam Varshavchik

Date:

19 April 2002, 19:36:08

On Fri, Apr 19, 2002 at 12:49:38PM -0700, Barry Lind wrote:

> COPY will always be your best option if speed is the most important factor.

Would COPY issued through JDBC actually work?  My understanding is that for
non-postgres uid clients the COPY command is restricted to using stdin/stdout
only.  Or perhaps it's an artificial limitation imposed by psql.  I don't think
I've seen a COPY TABLE JDBC extension in the documentation.

Resetting System.cin/System.cout to a file, then issuing a COPY TABLE might
be worth investigating, but that's a hack.

>
> --Barry
>
> Sam Varshavchik wrote:
> >
> > What would be the fastest way to insert a bunch of data (in 7.1)?
> > Currently, a bunch of addBatch()es followed by executeBatch() is pretty
> > fast, but not as fast as COPY TABLE via psql.  I suppose I can use a
> > single executeBatch() to insert the entire table, but AFAIK addBatch()
> > just buffers the SQL in memory, and I prefer to have some kind of an
> > upper limit on my memory footprint.
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

--
Sam

Re: Fastest method to insert data.

From

"Dennis R. Gesker"

Date:

20 April 2002, 15:16:09

I'd be interested to hear thoughts on a "COPY TABLE" method or set of libraries. In my case I'm reading the table structure of tables in a Microsoft SQL database, using this information to recreate these tables in PostgreSql (some mapping of data types was needed of course) then pulling the data contained in the MS tables and storing them in PG.

In the sequences of actually transferring the data I encountered some memory problems when using the .addBatch() methods. I probably should have expected this since the tables I'm seeking to draw data from are kind of large. I was thinking that using the .addBatch() would be a good approach since I could treat the whole batch as a transaction helping to ensure that I pulled everything I intended.

Now I'm reading a record from the MS database and then executing the insert statement on the PG. database right away. My routines are pretty rough at this point (I'm still new to the language) but I would have to imagine that a more sophisticated programmers have probably already worked this out.

Earlier I was thinking that I would loop through the MS resultset from and then use and updatable resultset on the PG. side using the movetoinsertrow and like methods available to an updatable resultset. I don't think that updatable result sets are supported by the PG. JDBC driver yet. (This may be in there now) But from a performance point of view I don't know if it would make a huge difference.

Now, if there were classes that could GULP whole tables from a database (PG., MS or otherwise) at one shot and recreate these tables in PG. this would be great!

Sam: Is the direction in which you were thinking?

If this was doable I would have to wonder how much of a difference in speed would be say in a scenario that is closer to Sam's where a a tables is being copied both to and from a PG. database versus just a copy table from the command line of the command interpreter. Maybe Sam's is a scenario where a stored procedure would be useful.

Dennis

Sam Varshavchik wrote:

On Fri, Apr 19, 2002 at 12:49:38PM -0700, Barry Lind wrote:

COPY will always be your best option if speed is the most important factor.


Would COPY issued through JDBC actually work?  My understanding is that for
non-postgres uid clients the COPY command is restricted to using stdin/stdout
only.  Or perhaps it's an artificial limitation imposed by psql.  I don't think
I've seen a COPY TABLE JDBC extension in the documentation.

Resetting System.cin/System.cout to a file, then issuing a COPY TABLE might
be worth investigating, but that's a hack.

--Barry

Sam Varshavchik wrote:

What would be the fastest way to insert a bunch of data (in 7.1)?  
Currently, a bunch of addBatch()es followed by executeBatch() is pretty 
fast, but not as fast as COPY TABLE via psql.  I suppose I can use a 
single executeBatch() to insert the entire table, but AFAIK addBatch() 
just buffers the SQL in memory, and I prefer to have some kind of an 
upper limit on my memory footprint.



---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--  _________|~~      @||  ====   | Dennis Roman Gesker		ICQ: 194047|  ====   | mailto:dennis@gesker.com		Fax: 413.740.4653|_________| http://www.gesker.com

Re: Fastest method to insert data.

From

Sam Varshavchik

Date:

20 April 2002, 20:03:04

On Sat, 20 Apr 2002, Dennis R. Gesker wrote:

> In the sequences of actually transferring the data I encountered some
> memory problems when using the .addBatch() methods. I probably should
> have expected this since the tables I'm seeking to draw data from are
> kind of large. I was thinking that using the .addBatch() would be a good
> approach since I could treat the whole batch as a transaction helping to
> ensure that I pulled everything I intended.

You'll need to issue executeBatch() every once in a while.  My current
approach is to executeBatch() for every thousand rows.  I've tried many
things, this is the one that proved to be the fastest.

But I think that this is still much slower than it needs to be.  COPY
TABLE is still much faster.

> Now, if there were classes that could GULP whole tables from a database
> (PG., MS or otherwise) at one shot and recreate these tables in PG. this
> would be great!
>
> Sam: Is the direction in which you were thinking?

Yes.  The Sybase SQL server has a complete separate set of APIs that are
designed to quickly upload a bunch of data to a table.  Sybase's bulk-copy
API is very similar to addBatch(), executeBatch() except that there's no
SQL involved.  You just specify the table, then start feeding it rows.