Populating large tables with occasional bad values - Mailing list pgsql-jdbc

From John T. Dow
Subject Populating large tables with occasional bad values
Date
Msg-id 200806111511.m5BFBlK7026168@web2.nidhog.com
Whole thread Raw
Responses Re: Populating large tables with occasional bad values
List pgsql-jdbc
Question: why are inserts approximately 10 times slower over the Internet than locally?

Background information:

I have a Java program that inserts rows using data derived from a legacy database which can have occasional bad values
(egtext in a numeric column) as well as duplicate primary key values. 

Postgres rejects the offending rows and my Java program writes the offending insert statement and the error message to
alog file. This allows the user to take action, perhaps cleaning the original data and reloading, perhaps entering a
fewrows manually, whatever. 

In the case of duplicate key values, for certain tables the program is told to modify the key (eg appending "A" or "B")
tomake it unique. In that case, the original insert is an error but after one or two retries, a computed unique key
allowsit to be inserted. 

While I understand that INSERT is not as fast as COPY, inserting is fast enough and provides the needed flexibility
(especiallyfor custom logic to alter duplicate keys). 

However, inserting rows is only fast enough if the database is on the same computer as the Java program. For example,
200,000rows can be inserted locally in under an hour. Over the Internet (using ssh tunneling with Trilead's code) it
tookover six hours. That's the problem. 

I've tinkered some with setting autocommit off and doing commits every 100 inserts, but if there's an error the entire
setis lost, not just the offending row. 

Also, postgres apparently doesn't allow setting the transaction isolation to NONE. Am I wrong about that? If it did,
wouldthat help? 

I fail to understand why this is so much slower over the Internet. Even if autocommit is on, that's more work for the
server,I shouldn't think that it increases the network traffic. I must be entirely in the dark on this. Does autocommit
causejdbc code on my client computer to send a BEGIN transaction, the insert, and a COMMIT for each row? 

John


pgsql-jdbc by date:

Previous
From: "Sanjay Thomas"
Date:
Subject: Error creating connection using postgres
Next
From: Craig Ringer
Date:
Subject: Re: Populating large tables with occasional bad values