Thread: V3 protocol, batch statements and binary transfer
Hi jdbc driver hackers, my name's Andrea and I'm hitting some serious performance problem with the driver. At present I'm working with the Postgis GIS extension and I'm hitting very low performance during mass data insertion due to the driver limitations. Basically, I want to turn a 20 MB shapefile into a postgres table, but it has to be an import function on the client side (windows pc) so I can't just go to the command line and issue a copy. But that's just an example, in general I need to perform mass insert or updates in a transactional environment from a client, usually a Windows PC. As far as I can tell the low performance level is due to: a) lack of true support of batch statements as introduced by the V3 protocol, that makes the network latency bite me very badly while I'm inserting that 100000 rows b) use of the text mode instead of the binary one, more than doubling the size of data that are really transfered over the wire That makes the insertion of the above file take more than 2 minutes on a 100MB ethernet (oh, I have to pass thru 3 switches, so the latency is not that good). A reasonable transfer time for that amount of data should be less than 30 seconds IMHO. I'm wondering, why do you use the text mode instead of the more efficient binary one? Secondly, reading the e-mails on the archive it appears that you are short of time for implementing the V3 protocol. Can I help somehow? Best regards Andrea Aime
Andrea, In general, yes, try to follow our style. Barry Lind wrote the initial V3 Protocol, so bounce some ideas off the list. submit your patch as a context diff, to the list. Also make sure you are using the code from the gborg project. http://gborg.postgresql.org/project/pgjdbc/projdisplay.php Also test cases are nice ;) Dave On Tue, 2004-03-30 at 08:18, Andrea Aime wrote: > Well, I have to do it in my spare time along with my regular Geotools2 > involvement, so it will require time. Is anyone willing to give directions > or I just try to replicate the "look & feel" of the current code? > > Best regards > Andrea Aime > > Dave Cramer wrote: > > > Andrea, > > > > Feel free to chip in, if you can help with the V3 implementation your > > patches would be greatly appreciated. > > > > Dave > > On Tue, 2004-03-30 at 03:44, Andrea Aime wrote: > > > >>Hi jdbc driver hackers, > >>my name's Andrea and I'm hitting some serious performance problem with the driver. > >>At present I'm working with the Postgis GIS extension and I'm hitting very low > >>performance during mass data insertion due to the driver limitations. Basically, > >>I want to turn a 20 MB shapefile into a postgres table, but it has to be an import > >>function on the client side (windows pc) so I can't just go to the command line and > >>issue a copy. But that's just an example, in general I need to perform mass insert > >>or updates in a transactional environment from a client, usually a Windows PC. > >> > >>As far as I can tell the low performance level is due to: > >>a) lack of true support of batch statements as introduced by the V3 protocol, that > >> makes the network latency bite me very badly while I'm inserting that 100000 rows > >>b) use of the text mode instead of the binary one, more than doubling the size of data > >> that are really transfered over the wire > >> > >>That makes the insertion of the above file take more than 2 minutes on a 100MB ethernet > >>(oh, I have to pass thru 3 switches, so the latency is not that good). A > >>reasonable transfer time for that amount of data should be less than 30 seconds IMHO. > >> > >>I'm wondering, why do you use the text mode instead of the more efficient binary one? > >>Secondly, reading the e-mails on the archive it appears that you are short of time > >>for implementing the V3 protocol. Can I help somehow? > >> > >>Best regards > >>Andrea Aime > >> > >> > >>---------------------------(end of broadcast)--------------------------- > >>TIP 6: Have you searched our list archives? > >> > >> http://archives.postgresql.org > >> > > -- Dave Cramer 519 939 0336 ICQ # 14675561
Andrea, Feel free to chip in, if you can help with the V3 implementation your patches would be greatly appreciated. Dave On Tue, 2004-03-30 at 03:44, Andrea Aime wrote: > Hi jdbc driver hackers, > my name's Andrea and I'm hitting some serious performance problem with the driver. > At present I'm working with the Postgis GIS extension and I'm hitting very low > performance during mass data insertion due to the driver limitations. Basically, > I want to turn a 20 MB shapefile into a postgres table, but it has to be an import > function on the client side (windows pc) so I can't just go to the command line and > issue a copy. But that's just an example, in general I need to perform mass insert > or updates in a transactional environment from a client, usually a Windows PC. > > As far as I can tell the low performance level is due to: > a) lack of true support of batch statements as introduced by the V3 protocol, that > makes the network latency bite me very badly while I'm inserting that 100000 rows > b) use of the text mode instead of the binary one, more than doubling the size of data > that are really transfered over the wire > > That makes the insertion of the above file take more than 2 minutes on a 100MB ethernet > (oh, I have to pass thru 3 switches, so the latency is not that good). A > reasonable transfer time for that amount of data should be less than 30 seconds IMHO. > > I'm wondering, why do you use the text mode instead of the more efficient binary one? > Secondly, reading the e-mails on the archive it appears that you are short of time > for implementing the V3 protocol. Can I help somehow? > > Best regards > Andrea Aime > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- Dave Cramer 519 939 0336 ICQ # 14675561
Well, I have to do it in my spare time along with my regular Geotools2 involvement, so it will require time. Is anyone willing to give directions or I just try to replicate the "look & feel" of the current code? Best regards Andrea Aime Dave Cramer wrote: > Andrea, > > Feel free to chip in, if you can help with the V3 implementation your > patches would be greatly appreciated. > > Dave > On Tue, 2004-03-30 at 03:44, Andrea Aime wrote: > >>Hi jdbc driver hackers, >>my name's Andrea and I'm hitting some serious performance problem with the driver. >>At present I'm working with the Postgis GIS extension and I'm hitting very low >>performance during mass data insertion due to the driver limitations. Basically, >>I want to turn a 20 MB shapefile into a postgres table, but it has to be an import >>function on the client side (windows pc) so I can't just go to the command line and >>issue a copy. But that's just an example, in general I need to perform mass insert >>or updates in a transactional environment from a client, usually a Windows PC. >> >>As far as I can tell the low performance level is due to: >>a) lack of true support of batch statements as introduced by the V3 protocol, that >> makes the network latency bite me very badly while I'm inserting that 100000 rows >>b) use of the text mode instead of the binary one, more than doubling the size of data >> that are really transfered over the wire >> >>That makes the insertion of the above file take more than 2 minutes on a 100MB ethernet >>(oh, I have to pass thru 3 switches, so the latency is not that good). A >>reasonable transfer time for that amount of data should be less than 30 seconds IMHO. >> >>I'm wondering, why do you use the text mode instead of the more efficient binary one? >>Secondly, reading the e-mails on the archive it appears that you are short of time >>for implementing the V3 protocol. Can I help somehow? >> >>Best regards >>Andrea Aime >> >> >>---------------------------(end of broadcast)--------------------------- >>TIP 6: Have you searched our list archives? >> >> http://archives.postgresql.org >>
Hello all, We have the same performance problems with bulk data inserts from jdbc as well. We used batches as well but made sure that each statement in the batch was large ~128KB and inserted on many rows at a time. This cut down on the number of round trips to to the postgresql server. In addition to a) and b) below, I'd add that the read size off the sockets is too small. It's a few KB currently and this should definitely be bumped up to a larger number. We're running on a gigE network and see about 50MB/s data rates coming off the server (using a 2GB shared memory region). This sounds nice, but one has to keep in mind that the data is binary encoded in text. Anyway, count me in to work on the jdbc client as well (in my limited time). To start, I have a couple of local performance hacks for which I should submit proper patches. -- Alan Dave Cramer wrote: >Andrea, > >Feel free to chip in, if you can help with the V3 implementation your >patches would be greatly appreciated. > >Dave >On Tue, 2004-03-30 at 03:44, Andrea Aime wrote: > > >>Hi jdbc driver hackers, >>my name's Andrea and I'm hitting some serious performance problem with the driver. >>At present I'm working with the Postgis GIS extension and I'm hitting very low >>performance during mass data insertion due to the driver limitations. Basically, >>I want to turn a 20 MB shapefile into a postgres table, but it has to be an import >>function on the client side (windows pc) so I can't just go to the command line and >>issue a copy. But that's just an example, in general I need to perform mass insert >>or updates in a transactional environment from a client, usually a Windows PC. >> >>As far as I can tell the low performance level is due to: >>a) lack of true support of batch statements as introduced by the V3 protocol, that >> makes the network latency bite me very badly while I'm inserting that 100000 rows >>b) use of the text mode instead of the binary one, more than doubling the size of data >> that are really transfered over the wire >> >>That makes the insertion of the above file take more than 2 minutes on a 100MB ethernet >>(oh, I have to pass thru 3 switches, so the latency is not that good). A >>reasonable transfer time for that amount of data should be less than 30 seconds IMHO. >> >>I'm wondering, why do you use the text mode instead of the more efficient binary one? >>Secondly, reading the e-mails on the archive it appears that you are short of time >>for implementing the V3 protocol. Can I help somehow? >> >>Best regards >>Andrea Aime >> >> >>---------------------------(end of broadcast)--------------------------- >>TIP 6: Have you searched our list archives? >> >> http://archives.postgresql.org >> >> >>
Alan Stange wrote: > Hello all, > > We have the same performance problems with bulk data inserts from jdbc > as well. We used batches as well but made sure that each statement in > the batch was large ~128KB and inserted on many rows at a time. This > cut down on the number of round trips to to the postgresql server. Yes, I also did it but putting togheter many inserts into a single statement and in fact it halved the time required to perform the inserts, still, it takes too much time anyway: 1 minute for insertion and 5 seconds to read the data... > In addition to a) and b) below, I'd add that the read size off the > sockets is too small. It's a few KB currently and this should > definitely be bumped up to a larger number. In fact I've tried to bump up the 8kb value that's hardwired in the code to 16,64,128Kb but saw no improvement on a 100Mb full switched LAN... > We're running on a gigE network and see about 50MB/s data rates coming > off the server (using a 2GB shared memory region). This sounds nice, > but one has to keep in mind that the data is binary encoded in text. > > Anyway, count me in to work on the jdbc client as well (in my limited > time). To start, I have a couple of local performance hacks for which > I should submit proper patches. > I'm eager to have a look at them :-) Best regards Andrea Aime