Thread: Inserting 'large' amounts of data

Inserting 'large' amounts of data

From
Mario Splivalo
Date:
I have a web application which allows users to upload a lot of phone
numbers. I need to store those numbers to a database. Usualy, one would
upload around 70k-100k of records, totaling around 2 MB in size.

I'm using tomcat as an application server, and JDBC to connect to pg8.3
database.

I will have around 20-50 concurent users in peek hours, and even that is
quite overestimated.

I could create the temporary file on the filesystem where database
cluster is located and then execute COPY mytable FROM
'/tmp/upload-data/uuidofsomesort.csv' WITH CSV', but the 'problem' is
that database server and tomcat reside on different physical machines.

What would one recommend as the best way to insert those data?

    Mario

Re: Inserting 'large' amounts of data

From
John R Pierce
Date:
Mario Splivalo wrote:
> I have a web application which allows users to upload a lot of phone
> numbers. I need to store those numbers to a database. Usualy, one would
> upload around 70k-100k of records, totaling around 2 MB in size.
>
> ...
> t would one recommend as the best way to insert those data?
>


I believe you can use org.postgresql.copy.CopyIn() ...  there are
variants that use a writeToCopy() call to send the data, or a
java.io.InputStream, or a java.io.Reader ...



Re: Inserting 'large' amounts of data

From
Maciek Sakrejda
Date:
Right. As of the 8.4 release, you have a jdbc API to the postgresql
COPY functionality. Cast your Connection to a PGConnection and call
.getCopyAPI(). You can then use the CopyManager[1] to copy data in
from your tomcat servlet.

[1] http://jdbc.postgresql.org/documentation/publicapi/org/postgresql/copy/CopyManager.html

--
Maciek Sakrejda | Software Engineer | Truviso
(650) 242-3500 Main
(650) 242-3501 F
msakrejda@truviso.com
www.truviso.com

Re: Inserting 'large' amounts of data

From
dmp
Date:
>
>
>I have a web application which allows users to upload a lot of phone
>numbers. I need to store those numbers to a database. Usualy, one would
>upload around 70k-100k of records, totaling around 2 MB in size.
>
>I'm using tomcat as an application server, and JDBC to connect to pg8.3
>database.
>
>I will have around 20-50 concurent users in peek hours, and even that is
>quite overestimated.
>
>I could create the temporary file on the filesystem where database
>cluster is located and then execute COPY mytable FROM
>'/tmp/upload-data/uuidofsomesort.csv' WITH CSV', but the 'problem' is
>that database server and tomcat reside on different physical machines.
>
>What would one recommend as the best way to insert those data?
>
>    Mario
>
Hello Mario,
If the users already have the data in CSV format why not let them do it
via the
app. server? The connection can be made across machines if setup properly.

http://dandymadeproductions.com/projects/MyJSQLView/docs/javadocs/index.html
CSVDataImportThread.java

This class could be used as a basis, with some work. I have a bug with
data that has
semicolons, could be more robust also, but could be used for a start.

John wrote:

> I believe you can use org.postgresql.copy.CopyIn() ...  there are
> variants that use a writeToCopy() call to send the data, or a
> java.io.InputStream, or a java.io.Reader ...

This sounds a lot cleaner.

danap.

Re: Inserting 'large' amounts of data

From
Mario Splivalo
Date:
John R Pierce wrote:
> Mario Splivalo wrote:
>> I have a web application which allows users to upload a lot of phone
>> numbers. I need to store those numbers to a database. Usualy, one would
>> upload around 70k-100k of records, totaling around 2 MB in size.
>>
>> ...
>> t would one recommend as the best way to insert those data?
>
> I believe you can use org.postgresql.copy.CopyIn() ...  there are
> variants that use a writeToCopy() call to send the data, or a
> java.io.InputStream, or a java.io.Reader ...

Yes, I gues this is the way to go. I'll have to upgrade to 8.4, but
somehow I'd like to wait for a month or so before making the switch.

Thank you all.


    Mike

Re: Inserting 'large' amounts of data

From
Kris Jurka
Date:

On Thu, 27 Aug 2009, Mario Splivalo wrote:

> Yes, I gues this is the way to go. I'll have to upgrade to 8.4, but somehow
> I'd like to wait for a month or so before making the switch.
>

You only need the 8.4 JDBC driver, not a 8.4 server.  Copy support should
work for all 7.4 and later servers.

Kris Jurka

Re: Inserting 'large' amounts of data

From
John R Pierce
Date:
Kris Jurka wrote:
>
>
> On Thu, 27 Aug 2009, Mario Splivalo wrote:
>
>> Yes, I gues this is the way to go. I'll have to upgrade to 8.4, but
>> somehow I'd like to wait for a month or so before making the switch.
>>
>
> You only need the 8.4 JDBC driver, not a 8.4 server.  Copy support
> should work for all 7.4 and later servers.

speaking of using JDBC COPY FROM STDIN via this CopyManager
interface...  how does that handle constraint violations, like duplicate
primary key?



Re: Inserting 'large' amounts of data

From
Maciek Sakrejda
Date:
In the same manner as regular DML--you'll get a constraint violation:

cqdb=# create table foo(a int unique);
NOTICE:  CREATE TABLE / UNIQUE will create implicit index "foo_a_key"
for table "foo"
CREATE TABLE
cqdb=# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1
>> 1
>> 1
>> \.
ERROR:  duplicate key value violates unique constraint "foo_a_key"
CONTEXT:  COPY foo, line 2: "1"

(This is straight through psql, but essentially the same thing would
happen through jdbc).

--
Maciek Sakrejda | Software Engineer | Truviso
(650) 242-3500 Main
(650) 242-3501 F
msakrejda@truviso.com
www.truviso.com

Re: Inserting 'large' amounts of data

From
Kris Jurka
Date:

John R Pierce wrote:
>
> speaking of using JDBC COPY FROM STDIN via this CopyManager
> interface...  how does that handle constraint violations, like duplicate
> primary key?
>

Just like a regular copy failure via psql or constraint violation.  An
Exception is thrown and the transaction is aborted (which you must
rollback).

Kris Jurka

Re: Inserting 'large' amounts of data

From
Mario Splivalo
Date:
Kris Jurka wrote:
>
>
> On Thu, 27 Aug 2009, Mario Splivalo wrote:
>
>> Yes, I gues this is the way to go. I'll have to upgrade to 8.4, but
>> somehow I'd like to wait for a month or so before making the switch.
>>
>
> You only need the 8.4 JDBC driver, not a 8.4 server.  Copy support
> should work for all 7.4 and later servers.

Works like a charm! :)

    Mike