Re: bytea size limit? - Mailing list pgsql-jdbc
From | Michael Privat |
---|---|
Subject | Re: bytea size limit? |
Date | |
Msg-id | 1012075029.20040412223613@ceci.mit.edu Whole thread Raw |
In response to | Re: bytea size limit? (Oliver Jowett <oliver@opencloud.com>) |
Responses |
Re: bytea size limit?
Re: bytea size limit? |
List | pgsql-jdbc |
Frankly, I haven't touched the V2 and you are right, I did not apply the patch to it. It's the first time I take a look at the driver so it took me a little bit of adaptation, bare with me :) With regards to individually escaping, it's just as expensive as how it was before, except that it doesn't make several copies of the array like it used to (hence the memory saving). I don't think there is any performance impact at all. Basically just a memory gain. As far as the encoding. I think in your original email you had mentioned that the driver used UTF-8 (in which case there is an obvious optimization that can be made), but I couldn't find it in the driver. Everything looked like it was inheriting from the encoding scheme set in the connection. The stream is not necessary but I didn't like the idea of having bytea logic inside the stream so I put it into that inner class to make it easier to move if need be (although I couldn't find a good place for it). Plus it makes it easier to help stream directly (once we figure this out). Good point about over-escaping. This would probably resolve this whole problem of pre-determining the size. I didn't think about this (but then again, it assumes UTF-8 in all cases and I'm not familiar enough with how the postgresql driver works to make this call). Feel free to rework the patch as you want it. I just wanted to provide a starting point and I'm happy to help out. Monday, April 12, 2004, 10:08:44 PM, you wrote: OJ> Michael Privat wrote: >> Here you go. Coded against the CVS HEAD. Let me know if you want me to >> change anything. Obviously, because I have to send the size ahead, >> there is no other way but store the entire byte[] in memory (can't >> stream), but with this code, I was able to load a blob of 30Mb+ (with >> a heap size of 32Mb) against 1.5Mb yesterday with the same heap. So >> there is some improvement. >> >> http://jupiter.incodea.com/trans/pgsql-jdbc.diff OJ> Comments: OJ> Does this work for V2 connections? I didn't see a patch to the V2 OJ> execution path, but you've changed how setBytes() binds its argument. OJ> The argument to setBytes() is not copied when bound; changes made to the OJ> array after the setBytes() call but before query execution will be OJ> visible. Personally I don't mind this behaviour but others disagree :) OJ> -- see earlier discussion about mutable parameters in the pgsql-jdbc OJ> archives. OJ> Individually escaping and encoding each byte (in SendBytea and OJ> GetByteaSize) seems overly expensive. We should be able to do this in OJ> blocks at a minimum. OJ> Why is EncodedStream necessary? Can't you just unroll that logic OJ> directly in SendBytea()? OJ> It's probably possible to work around needing the whole array in memory OJ> (i.e. we could stream from an InputStream + length via setBinaryStream). OJ> If we escape *every* byte, we need exactly 5 bytes of UTF8 output for OJ> every input byte ('\' and '0'-'9' are all single bytes in UTF8, I OJ> believe) so we could compute the output size directly from the input length. OJ> ==== OJ> That said, this patch is a lot simpler than I expected it to be! It OJ> looks like a good solution until the driver knows how to speak V3 OJ> fluently. If you like, I'll see if I can rework your patch over the next OJ> couple of days to address the above points. OJ> -O OJ> ---------------------------(end of OJ> broadcast)--------------------------- OJ> TIP 6: Have you searched our list archives? OJ> http://archives.postgresql.org
pgsql-jdbc by date: