Thread: V3 protocol, batch statements and binary transfer

V3 protocol, batch statements and binary transfer

From

Andrea Aime

Date:

30 March 2004, 02:40:56

Hi jdbc driver hackers,
my name's Andrea and I'm hitting some serious performance problem with the driver.
At present I'm working with the Postgis GIS extension and I'm hitting very low
performance during mass data insertion due to the driver limitations. Basically,
I want to turn a 20 MB shapefile into a postgres table, but it has to be an import
function on the client side (windows pc) so I can't just go to the command line and
issue a copy. But that's just an example, in general I need to perform mass insert
or updates in a transactional environment from a client, usually a Windows PC.

As far as I can tell the low performance level is due to:
a) lack of true support of batch statements as introduced by the V3 protocol, that
    makes the network latency bite me very badly while I'm inserting that 100000 rows
b) use of the text mode instead of the binary one, more than doubling the size of data
    that are really transfered over the wire

That makes the insertion of the above file take more than 2 minutes on a 100MB ethernet
(oh, I have to pass thru 3 switches, so the latency is not that good). A
reasonable transfer time for that amount of data should be less than 30 seconds IMHO.

I'm wondering, why do you use the text mode instead of the more efficient binary one?
Secondly, reading the e-mails on the archive it appears that you are short of time
for implementing the V3 protocol. Can I help somehow?

Best regards
Andrea Aime

Re: V3 protocol, batch statements and binary transfer

From

Dave Cramer

Date:

30 March 2004, 13:36:56

Andrea,

In general, yes, try to follow our style. Barry Lind wrote the initial
V3 Protocol, so bounce some ideas off the list.

submit your patch as a context diff, to the list. Also make sure you are
using the code from the gborg project.
http://gborg.postgresql.org/project/pgjdbc/projdisplay.php

Also test cases are nice ;)

Dave
On Tue, 2004-03-30 at 08:18, Andrea Aime wrote:
> Well, I have to do it in my spare time along with my regular Geotools2
> involvement, so it will require time. Is anyone willing to give directions
> or I just try to replicate the "look & feel" of the current code?
>
> Best regards
> Andrea Aime
>
> Dave Cramer wrote:
>
> > Andrea,
> >
> > Feel free to chip in, if you can help with the V3 implementation your
> > patches would be greatly appreciated.
> >
> > Dave
> > On Tue, 2004-03-30 at 03:44, Andrea Aime wrote:
> >
> >>Hi jdbc driver hackers,
> >>my name's Andrea and I'm hitting some serious performance problem with the driver.
> >>At present I'm working with the Postgis GIS extension and I'm hitting very low
> >>performance during mass data insertion due to the driver limitations. Basically,
> >>I want to turn a 20 MB shapefile into a postgres table, but it has to be an import
> >>function on the client side (windows pc) so I can't just go to the command line and
> >>issue a copy. But that's just an example, in general I need to perform mass insert
> >>or updates in a transactional environment from a client, usually a Windows PC.
> >>
> >>As far as I can tell the low performance level is due to:
> >>a) lack of true support of batch statements as introduced by the V3 protocol, that
> >>    makes the network latency bite me very badly while I'm inserting that 100000 rows
> >>b) use of the text mode instead of the binary one, more than doubling the size of data
> >>    that are really transfered over the wire
> >>
> >>That makes the insertion of the above file take more than 2 minutes on a 100MB ethernet
> >>(oh, I have to pass thru 3 switches, so the latency is not that good). A
> >>reasonable transfer time for that amount of data should be less than 30 seconds IMHO.
> >>
> >>I'm wondering, why do you use the text mode instead of the more efficient binary one?
> >>Secondly, reading the e-mails on the archive it appears that you are short of time
> >>for implementing the V3 protocol. Can I help somehow?
> >>
> >>Best regards
> >>Andrea Aime
> >>
> >>
> >>---------------------------(end of broadcast)---------------------------
> >>TIP 6: Have you searched our list archives?
> >>
> >>               http://archives.postgresql.org
> >>
>
>
--
Dave Cramer
519 939 0336
ICQ # 14675561

Re: V3 protocol, batch statements and binary transfer

From

Dave Cramer

Date:

30 March 2004, 13:45:49

Andrea,

Feel free to chip in, if you can help with the V3 implementation your
patches would be greatly appreciated.

Dave
On Tue, 2004-03-30 at 03:44, Andrea Aime wrote:
> Hi jdbc driver hackers,
> my name's Andrea and I'm hitting some serious performance problem with the driver.
> At present I'm working with the Postgis GIS extension and I'm hitting very low
> performance during mass data insertion due to the driver limitations. Basically,
> I want to turn a 20 MB shapefile into a postgres table, but it has to be an import
> function on the client side (windows pc) so I can't just go to the command line and
> issue a copy. But that's just an example, in general I need to perform mass insert
> or updates in a transactional environment from a client, usually a Windows PC.
>
> As far as I can tell the low performance level is due to:
> a) lack of true support of batch statements as introduced by the V3 protocol, that
>     makes the network latency bite me very badly while I'm inserting that 100000 rows
> b) use of the text mode instead of the binary one, more than doubling the size of data
>     that are really transfered over the wire
>
> That makes the insertion of the above file take more than 2 minutes on a 100MB ethernet
> (oh, I have to pass thru 3 switches, so the latency is not that good). A
> reasonable transfer time for that amount of data should be less than 30 seconds IMHO.
>
> I'm wondering, why do you use the text mode instead of the more efficient binary one?
> Secondly, reading the e-mails on the archive it appears that you are short of time
> for implementing the V3 protocol. Can I help somehow?
>
> Best regards
> Andrea Aime
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>
--
Dave Cramer
519 939 0336
ICQ # 14675561

Re: V3 protocol, batch statements and binary transfer

From

Andrea Aime

Date:

30 March 2004, 17:02:43

Well, I have to do it in my spare time along with my regular Geotools2
involvement, so it will require time. Is anyone willing to give directions
or I just try to replicate the "look & feel" of the current code?

Best regards
Andrea Aime

Dave Cramer wrote:

> Andrea,
>
> Feel free to chip in, if you can help with the V3 implementation your
> patches would be greatly appreciated.
>
> Dave
> On Tue, 2004-03-30 at 03:44, Andrea Aime wrote:
>
>>Hi jdbc driver hackers,
>>my name's Andrea and I'm hitting some serious performance problem with the driver.
>>At present I'm working with the Postgis GIS extension and I'm hitting very low
>>performance during mass data insertion due to the driver limitations. Basically,
>>I want to turn a 20 MB shapefile into a postgres table, but it has to be an import
>>function on the client side (windows pc) so I can't just go to the command line and
>>issue a copy. But that's just an example, in general I need to perform mass insert
>>or updates in a transactional environment from a client, usually a Windows PC.
>>
>>As far as I can tell the low performance level is due to:
>>a) lack of true support of batch statements as introduced by the V3 protocol, that
>>    makes the network latency bite me very badly while I'm inserting that 100000 rows
>>b) use of the text mode instead of the binary one, more than doubling the size of data
>>    that are really transfered over the wire
>>
>>That makes the insertion of the above file take more than 2 minutes on a 100MB ethernet
>>(oh, I have to pass thru 3 switches, so the latency is not that good). A
>>reasonable transfer time for that amount of data should be less than 30 seconds IMHO.
>>
>>I'm wondering, why do you use the text mode instead of the more efficient binary one?
>>Secondly, reading the e-mails on the archive it appears that you are short of time
>>for implementing the V3 protocol. Can I help somehow?
>>
>>Best regards
>>Andrea Aime
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 6: Have you searched our list archives?
>>
>>               http://archives.postgresql.org
>>

Re: V3 protocol, batch statements and binary transfer

From

Alan Stange

Date:

30 March 2004, 18:11:55

Hello all,

We have the same performance problems with bulk data inserts from jdbc
as well.   We used batches as well but made sure that each statement in
the batch was large ~128KB and inserted on many rows at a time.  This
cut down on the number of round trips to to the postgresql server.

In addition to a) and b) below, I'd add that the read size off the
sockets is too small.   It's a few KB currently and this should
definitely be bumped up to a larger number.

We're running on a gigE network and see about 50MB/s data rates coming
off the server (using a 2GB shared memory region).   This sounds nice,
but one has to keep in mind that the data is binary encoded in text.

Anyway, count me in to work on the jdbc client as well (in my limited
time).   To start, I have a couple of local performance hacks for which
I should submit proper patches.

-- Alan

Dave Cramer wrote:

>Andrea,
>
>Feel free to chip in, if you can help with the V3 implementation your
>patches would be greatly appreciated.
>
>Dave
>On Tue, 2004-03-30 at 03:44, Andrea Aime wrote:
>
>
>>Hi jdbc driver hackers,
>>my name's Andrea and I'm hitting some serious performance problem with the driver.
>>At present I'm working with the Postgis GIS extension and I'm hitting very low
>>performance during mass data insertion due to the driver limitations. Basically,
>>I want to turn a 20 MB shapefile into a postgres table, but it has to be an import
>>function on the client side (windows pc) so I can't just go to the command line and
>>issue a copy. But that's just an example, in general I need to perform mass insert
>>or updates in a transactional environment from a client, usually a Windows PC.
>>
>>As far as I can tell the low performance level is due to:
>>a) lack of true support of batch statements as introduced by the V3 protocol, that
>>    makes the network latency bite me very badly while I'm inserting that 100000 rows
>>b) use of the text mode instead of the binary one, more than doubling the size of data
>>    that are really transfered over the wire
>>
>>That makes the insertion of the above file take more than 2 minutes on a 100MB ethernet
>>(oh, I have to pass thru 3 switches, so the latency is not that good). A
>>reasonable transfer time for that amount of data should be less than 30 seconds IMHO.
>>
>>I'm wondering, why do you use the text mode instead of the more efficient binary one?
>>Secondly, reading the e-mails on the archive it appears that you are short of time
>>for implementing the V3 protocol. Can I help somehow?
>>
>>Best regards
>>Andrea Aime
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 6: Have you searched our list archives?
>>
>>               http://archives.postgresql.org
>>
>>
>>

Re: V3 protocol, batch statements and binary transfer

From

Andrea Aime

Date:

31 March 2004, 04:06:03

Alan Stange wrote:
> Hello all,
>
> We have the same performance problems with bulk data inserts from jdbc
> as well.   We used batches as well but made sure that each statement in
> the batch was large ~128KB and inserted on many rows at a time.  This
> cut down on the number of round trips to to the postgresql server.

Yes, I also did it but putting togheter many inserts into a single statement
and in fact it halved the time required to perform the inserts, still, it
takes too much time anyway: 1 minute for insertion and 5 seconds to read the
data...

> In addition to a) and b) below, I'd add that the read size off the
> sockets is too small.   It's a few KB currently and this should
> definitely be bumped up to a larger number.

In fact I've tried to bump up the 8kb value that's hardwired in the code
to 16,64,128Kb but saw no improvement on a 100Mb full switched LAN...

> We're running on a gigE network and see about 50MB/s data rates coming
> off the server (using a 2GB shared memory region).   This sounds nice,
> but one has to keep in mind that the data is binary encoded in text.
>
> Anyway, count me in to work on the jdbc client as well (in my limited
> time).   To start, I have a couple of local performance hacks for which
> I should submit proper patches.
>

I'm eager to have a look at them :-)

Best regards
Andrea Aime