Thread: Fastest method to insert data.
What would be the fastest way to insert a bunch of data (in 7.1)? Currently, a bunch of addBatch()es followed by executeBatch() is pretty fast, but not as fast as COPY TABLE via psql. I suppose I can use a single executeBatch() to insert the entire table, but AFAIK addBatch() just buffers the SQL in memory, and I prefer to have some kind of an upper limit on my memory footprint. -- Sam
COPY will always be your best option if speed is the most important factor. --Barry Sam Varshavchik wrote: > > What would be the fastest way to insert a bunch of data (in 7.1)? > Currently, a bunch of addBatch()es followed by executeBatch() is pretty > fast, but not as fast as COPY TABLE via psql. I suppose I can use a > single executeBatch() to insert the entire table, but AFAIK addBatch() > just buffers the SQL in memory, and I prefer to have some kind of an > upper limit on my memory footprint. >
On Fri, Apr 19, 2002 at 12:49:38PM -0700, Barry Lind wrote: > COPY will always be your best option if speed is the most important factor. Would COPY issued through JDBC actually work? My understanding is that for non-postgres uid clients the COPY command is restricted to using stdin/stdout only. Or perhaps it's an artificial limitation imposed by psql. I don't think I've seen a COPY TABLE JDBC extension in the documentation. Resetting System.cin/System.cout to a file, then issuing a COPY TABLE might be worth investigating, but that's a hack. > > --Barry > > Sam Varshavchik wrote: > > > > What would be the fastest way to insert a bunch of data (in 7.1)? > > Currently, a bunch of addBatch()es followed by executeBatch() is pretty > > fast, but not as fast as COPY TABLE via psql. I suppose I can use a > > single executeBatch() to insert the entire table, but AFAIK addBatch() > > just buffers the SQL in memory, and I prefer to have some kind of an > > upper limit on my memory footprint. > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Sam
I'd be interested to hear thoughts on a "COPY TABLE" method or set of libraries. In my case I'm reading the table structure of tables in a Microsoft SQL database, using this information to recreate these tables in PostgreSql (some mapping of data types was needed of course) then pulling the data contained in the MS tables and storing them in PG.
In the sequences of actually transferring the data I encountered some memory problems when using the .addBatch() methods. I probably should have expected this since the tables I'm seeking to draw data from are kind of large. I was thinking that using the .addBatch() would be a good approach since I could treat the whole batch as a transaction helping to ensure that I pulled everything I intended.
Now I'm reading a record from the MS database and then executing the insert statement on the PG. database right away. My routines are pretty rough at this point (I'm still new to the language) but I would have to imagine that a more sophisticated programmers have probably already worked this out.
Earlier I was thinking that I would loop through the MS resultset from and then use and updatable resultset on the PG. side using the movetoinsertrow and like methods available to an updatable resultset. I don't think that updatable result sets are supported by the PG. JDBC driver yet. (This may be in there now) But from a performance point of view I don't know if it would make a huge difference.
Now, if there were classes that could GULP whole tables from a database (PG., MS or otherwise) at one shot and recreate these tables in PG. this would be great!
Sam: Is the direction in which you were thinking?
If this was doable I would have to wonder how much of a difference in speed would be say in a scenario that is closer to Sam's where a a tables is being copied both to and from a PG. database versus just a copy table from the command line of the command interpreter. Maybe Sam's is a scenario where a stored procedure would be useful.
Dennis
Sam Varshavchik wrote:
In the sequences of actually transferring the data I encountered some memory problems when using the .addBatch() methods. I probably should have expected this since the tables I'm seeking to draw data from are kind of large. I was thinking that using the .addBatch() would be a good approach since I could treat the whole batch as a transaction helping to ensure that I pulled everything I intended.
Now I'm reading a record from the MS database and then executing the insert statement on the PG. database right away. My routines are pretty rough at this point (I'm still new to the language) but I would have to imagine that a more sophisticated programmers have probably already worked this out.
Earlier I was thinking that I would loop through the MS resultset from and then use and updatable resultset on the PG. side using the movetoinsertrow and like methods available to an updatable resultset. I don't think that updatable result sets are supported by the PG. JDBC driver yet. (This may be in there now) But from a performance point of view I don't know if it would make a huge difference.
Now, if there were classes that could GULP whole tables from a database (PG., MS or otherwise) at one shot and recreate these tables in PG. this would be great!
Sam: Is the direction in which you were thinking?
If this was doable I would have to wonder how much of a difference in speed would be say in a scenario that is closer to Sam's where a a tables is being copied both to and from a PG. database versus just a copy table from the command line of the command interpreter. Maybe Sam's is a scenario where a stored procedure would be useful.
Dennis
Sam Varshavchik wrote:
On Fri, Apr 19, 2002 at 12:49:38PM -0700, Barry Lind wrote:COPY will always be your best option if speed is the most important factor.
Would COPY issued through JDBC actually work? My understanding is that for
non-postgres uid clients the COPY command is restricted to using stdin/stdout
only. Or perhaps it's an artificial limitation imposed by psql. I don't think
I've seen a COPY TABLE JDBC extension in the documentation.
Resetting System.cin/System.cout to a file, then issuing a COPY TABLE might
be worth investigating, but that's a hack.--Barry
Sam Varshavchik wrote:What would be the fastest way to insert a bunch of data (in 7.1)?
Currently, a bunch of addBatch()es followed by executeBatch() is pretty
fast, but not as fast as COPY TABLE via psql. I suppose I can use a
single executeBatch() to insert the entire table, but AFAIK addBatch()
just buffers the SQL in memory, and I prefer to have some kind of an
upper limit on my memory footprint.
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
-- _________|~~ @|| ==== | Dennis Roman Gesker ICQ: 194047| ==== | mailto:dennis@gesker.com Fax: 413.740.4653|_________| http://www.gesker.com
On Sat, 20 Apr 2002, Dennis R. Gesker wrote: > In the sequences of actually transferring the data I encountered some > memory problems when using the .addBatch() methods. I probably should > have expected this since the tables I'm seeking to draw data from are > kind of large. I was thinking that using the .addBatch() would be a good > approach since I could treat the whole batch as a transaction helping to > ensure that I pulled everything I intended. You'll need to issue executeBatch() every once in a while. My current approach is to executeBatch() for every thousand rows. I've tried many things, this is the one that proved to be the fastest. But I think that this is still much slower than it needs to be. COPY TABLE is still much faster. > Now, if there were classes that could GULP whole tables from a database > (PG., MS or otherwise) at one shot and recreate these tables in PG. this > would be great! > > Sam: Is the direction in which you were thinking? Yes. The Sybase SQL server has a complete separate set of APIs that are designed to quickly upload a bunch of data to a table. Sybase's bulk-copy API is very similar to addBatch(), executeBatch() except that there's no SQL involved. You just specify the table, then start feeding it rows.