Re: V3 protocol, batch statements and binary transfer - Mailing list pgsql-jdbc

From Alan Stange
Subject Re: V3 protocol, batch statements and binary transfer
Date
Msg-id 4069F058.9060801@rentec.com
Whole thread Raw
In response to Re: V3 protocol, batch statements and binary transfer  (Dave Cramer <pg@fastcrypt.com>)
Responses Re: V3 protocol, batch statements and binary transfer  (Andrea Aime <andrea.aime@aliceposta.it>)
List pgsql-jdbc
Hello all,

We have the same performance problems with bulk data inserts from jdbc
as well.   We used batches as well but made sure that each statement in
the batch was large ~128KB and inserted on many rows at a time.  This
cut down on the number of round trips to to the postgresql server.

In addition to a) and b) below, I'd add that the read size off the
sockets is too small.   It's a few KB currently and this should
definitely be bumped up to a larger number.

We're running on a gigE network and see about 50MB/s data rates coming
off the server (using a 2GB shared memory region).   This sounds nice,
but one has to keep in mind that the data is binary encoded in text.

Anyway, count me in to work on the jdbc client as well (in my limited
time).   To start, I have a couple of local performance hacks for which
I should submit proper patches.

-- Alan



Dave Cramer wrote:

>Andrea,
>
>Feel free to chip in, if you can help with the V3 implementation your
>patches would be greatly appreciated.
>
>Dave
>On Tue, 2004-03-30 at 03:44, Andrea Aime wrote:
>
>
>>Hi jdbc driver hackers,
>>my name's Andrea and I'm hitting some serious performance problem with the driver.
>>At present I'm working with the Postgis GIS extension and I'm hitting very low
>>performance during mass data insertion due to the driver limitations. Basically,
>>I want to turn a 20 MB shapefile into a postgres table, but it has to be an import
>>function on the client side (windows pc) so I can't just go to the command line and
>>issue a copy. But that's just an example, in general I need to perform mass insert
>>or updates in a transactional environment from a client, usually a Windows PC.
>>
>>As far as I can tell the low performance level is due to:
>>a) lack of true support of batch statements as introduced by the V3 protocol, that
>>    makes the network latency bite me very badly while I'm inserting that 100000 rows
>>b) use of the text mode instead of the binary one, more than doubling the size of data
>>    that are really transfered over the wire
>>
>>That makes the insertion of the above file take more than 2 minutes on a 100MB ethernet
>>(oh, I have to pass thru 3 switches, so the latency is not that good). A
>>reasonable transfer time for that amount of data should be less than 30 seconds IMHO.
>>
>>I'm wondering, why do you use the text mode instead of the more efficient binary one?
>>Secondly, reading the e-mails on the archive it appears that you are short of time
>>for implementing the V3 protocol. Can I help somehow?
>>
>>Best regards
>>Andrea Aime
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 6: Have you searched our list archives?
>>
>>               http://archives.postgresql.org
>>
>>
>>


pgsql-jdbc by date:

Previous
From: Peter Schuller
Date:
Subject: Re: JDBC driver's (non-)handling of InputStream:s
Next
From: Kris Jurka
Date:
Subject: Re: JDBC driver's (non-)handling of InputStream:s