Thread: PreparedStatement parameters and mutable objects
I'm examining ways to reduce the amount of garbage generated by the driver, and one approach I'm looking at is to delay converting the parameters of a PreparedStatement to string form until we're actually streaming data to the backend, to avoid creating an intermediate copy. This leads to a question .. does this code have well-defined behaviour: Connection conn = /* ... */; PreparedStatement pstmt = conn.prepareStatement("INSERT INTO t(data) VALUES (?)"); byte[] data = new byte[] { (byte)1, (byte)2, (byte)3 }; pstmt.setBytes(1, data); data[0] = (byte)42; pstmt.executeUpdate(); i.e. is the driver required to capture the value of the byte array passed to setBytes() at a particular time (either the time of the call or the time of execution), or is it free to choose? Currently we capture the value at the time of call. The same problem applies to all parameter-setting methods that take mutable objects. I can't see anything about this in the (PDF) JDBC specification. The javadoc says: The driver converts this to an SQL VARBINARY or LONGVARBINARY (depending on the argument's size relative to the driver's limits on VARBINARY values) when it sends it to the database. which implies the driver can delay the conversion (or perhaps must delay the conversion). Anyone have an opinion on this? -O
Oliver, while I think the code snippet below has defined behaviour I think it is fraught with danger. As long as everyone knows that the code does this and writes code that does not break then yes it would work, however I don't believe this is the case. Also reading the spec that you provided, I think the phrase "when it sends" refers to the actual conversion that takes place not the specific time that it is done. In general while I can see that this would indeed speed up the driver, I don't think I can agree to the mechanism proposed however, I'm willing to be convinced. Dave On Tue, 2004-01-06 at 20:05, Oliver Jowett wrote: > I'm examining ways to reduce the amount of garbage generated by the > driver, and one approach I'm looking at is to delay converting the > parameters of a PreparedStatement to string form until we're actually > streaming data to the backend, to avoid creating an intermediate copy. > > This leads to a question .. does this code have well-defined behaviour: > > Connection conn = /* ... */; > PreparedStatement pstmt = > conn.prepareStatement("INSERT INTO t(data) VALUES (?)"); > byte[] data = new byte[] { (byte)1, (byte)2, (byte)3 }; > pstmt.setBytes(1, data); > data[0] = (byte)42; > pstmt.executeUpdate(); > > i.e. is the driver required to capture the value of the byte array > passed to setBytes() at a particular time (either the time of the call > or the time of execution), or is it free to choose? Currently we capture > the value at the time of call. > > The same problem applies to all parameter-setting methods that take > mutable objects. > > I can't see anything about this in the (PDF) JDBC specification. The > javadoc says: > > The driver converts this to an SQL VARBINARY or LONGVARBINARY > (depending on the argument's size relative to the driver's limits on > VARBINARY values) when it sends it to the database. > > which implies the driver can delay the conversion (or perhaps must delay > the conversion). > > Anyone have an opinion on this? > > -O > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > -- Dave Cramer 519 939 0336 ICQ # 1467551
Oliver, I think we are free to do it either way. I suspect that most jdbc drivers delay this work until the actual call to the server. I am curious why you are choosing to optimize this particular aspect? It seems the bigger win is on query results where the common code path does the following: read from socket, put value into a new byte[], convert byte[] to a String, convert String to required object/type. There are three copies in different forms of the same data being generated for each value. Ugh! I have long wanted to fix this, but it always ends up being too big a project for the time I have to work on it. Have you given any thought to this bigger problem? Also, your email below indicates to me that you are trying to do this with the old V2 protocol methods of executing queries. The new V3 protocol provides a much, much better interface to the driver to do this more efficiently. Have you looked at using the V3 protocol features to execute the prepared statements? thanks, --Barry Oliver Jowett wrote: > I'm examining ways to reduce the amount of garbage generated by the > driver, and one approach I'm looking at is to delay converting the > parameters of a PreparedStatement to string form until we're actually > streaming data to the backend, to avoid creating an intermediate copy. > > This leads to a question .. does this code have well-defined behaviour: > > Connection conn = /* ... */; > PreparedStatement pstmt = > conn.prepareStatement("INSERT INTO t(data) VALUES (?)"); > byte[] data = new byte[] { (byte)1, (byte)2, (byte)3 }; > pstmt.setBytes(1, data); > data[0] = (byte)42; > pstmt.executeUpdate(); > > i.e. is the driver required to capture the value of the byte array > passed to setBytes() at a particular time (either the time of the call > or the time of execution), or is it free to choose? Currently we capture > the value at the time of call. > > The same problem applies to all parameter-setting methods that take > mutable objects. > > I can't see anything about this in the (PDF) JDBC specification. The > javadoc says: > > The driver converts this to an SQL VARBINARY or LONGVARBINARY > (depending on the argument's size relative to the driver's limits on > VARBINARY values) when it sends it to the database. > > which implies the driver can delay the conversion (or perhaps must delay > the conversion). > > Anyone have an opinion on this? > > -O > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match >
Barry Lind wrote: > Oliver, > > I think we are free to do it either way. I suspect that most jdbc > drivers delay this work until the actual call to the server. Good! > I am curious why you are choosing to optimize this particular aspect? It > seems the bigger win is on query results where the common code path does > the following: read from socket, put value into a new byte[], convert > byte[] to a String, convert String to required object/type. There are > three copies in different forms of the same data being generated for > each value. Ugh! In our case, we don't pull much data from the DB in normal operation. The normal case is for each transaction to query the DB to check the state hasn't shifted unexpectedly (this is a tiny ResultSet -- one row with one int8 column), then push out a large batch of inserts/updates/deletes where the bulk of the data is bytearrays set via PreparedStatement.setBytes(). We only have large ResultSets when pulling data out of the database at startup, and at that point we're not so worried about garbage as we're not doing latency-sensitive processing yet. > I have long wanted to fix this, but it always ends up being too big a > project for the time I have to work on it. Have you given any thought > to this bigger problem? It's somewhere on my list of things to do.. but that list currently fills a whiteboard. I've given a bit of thought to it but didn't get much beyond looking at the current code. It seems like we should be doing something like maintaining the original bytearray as a single object (or perhaps one per row), and doing conversions from ranges of that array on demand .. would need to maintain some metadata to easily locate column and row boundaries .. using the V3 binary tuple format might make things easier. > Also, your email below indicates to me that you are trying to do this > with the old V2 protocol methods of executing queries. The new V3 > protocol provides a much, much better interface to the driver to do this > more efficiently. Have you looked at using the V3 protocol features to > execute the prepared statements? Yes, I'm looking at using V3 (also probably for COPY support). There's a fair amount of reorganization needed to do this properly, though; the main part would be abstracting the conversion of parameters to wire representations, so we can support different representations for V2 and V3 easily, and possibly something similar for query execution methods (to support reusing queries via V3's Parse message). There's also a whole slew of other changes that could be made to use V3's features, especially around cursors and not materializing all the rows in a resultset at once; it looks like we can use a named portal and set a row limit, then use FETCH or another Execute to navigate around the rows within that portal, for any row-returning command -- much cleaner than the current "transform-to-DECLARE" approach. But I'm not sure if I want to take that on at the moment :) -O
Dave Cramer wrote: > Oliver, > > while I think the code snippet below has defined behaviour What is the defined behaviour, then? It's not obvious to me whether {1,2,3}, {42,2,3}, or a driver-dependent value (i.e. undefined behaviour) gets inserted. > I think it is > fraught with danger. Yes :) > As long as everyone knows that the code does this and writes code that > does not break then yes it would work, however I don't believe this is > the case. I'm not sure what you mean here. Are you saying that the driver would be incorrect to delay the conversion, as there is existing (correct) code that expects other behaviour? Or are you saying that delaying the conversion is valid, but there is existing (buggy) code that expects it not to be delayed, and we shouldn't break that code? Or something else entirely? :) > Also reading the spec that you provided, I think the phrase "when it > sends" refers to the actual conversion that takes place not the specific > time that it is done. Yeah .. the spec just doesn't seem to address this area at all. -O
On Thu, 2004-01-08 at 01:03, Oliver Jowett wrote: > Dave Cramer wrote: > > Oliver, > > > > while I think the code snippet below has defined behaviour > > What is the defined behaviour, then? It's not obvious to me whether > {1,2,3}, {42,2,3}, or a driver-dependent value (i.e. undefined > behaviour) gets inserted. defined by who, I meant it was defined in that it didn't have any surprises. If you implement your idea, then 42,2,3 will be inserted otherwise 1,2,3 > > > I think it is > > fraught with danger. > > Yes :) > > > As long as everyone knows that the code does this and writes code that > > does not break then yes it would work, however I don't believe this is > > the case. > > I'm not sure what you mean here. > > Are you saying that the driver would be incorrect to delay the > conversion, as there is existing (correct) code that expects other > behaviour? > > Or are you saying that delaying the conversion is valid, but there is > existing (buggy) code that expects it not to be delayed, and we > shouldn't break that code? > > Or something else entirely? :) I'm saying that doing it this way will likely expose buggy code, which we will end up having to figure out why it doesn't work, when the user says "my code used to work, and now it doesn't", plus they don't send us code to look at. > > > Also reading the spec that you provided, I think the phrase "when it > > sends" refers to the actual conversion that takes place not the specific > > time that it is done. > > Yeah .. the spec just doesn't seem to address this area at all. :( > > -O > -- Dave Cramer 519 939 0336 ICQ # 1467551
Oliver Jowett wrote: > Barry Lind wrote: >> I am curious why you are choosing to optimize this particular aspect? >> It seems the bigger win is on query results where the common code path >> does the following: read from socket, put value into a new byte[], >> convert byte[] to a String, convert String to required object/type. >> There are three copies in different forms of the same data being >> generated for each value. Ugh! >> >> I have long wanted to fix this, but it always ends up being too big a >> project for the time I have to work on it. Have you given any thought >> to this bigger problem? > > It's somewhere on my list of things to do.. but that list currently > fills a whiteboard. I've given a bit of thought to it but didn't get > much beyond looking at the current code. It seems like we should be > doing something like maintaining the original bytearray as a single > object (or perhaps one per row), and doing conversions from ranges of > that array on demand .. would need to maintain some metadata to easily > locate column and row boundaries .. using the V3 binary tuple format > might make things easier. My idea for handling this problem was to extend the use of PGobject to cover all datatypes (not just special ones as is the case today). And then store data as PGobject[][] where each column value would be some subtype of PGobject specific for that datatype. Thus 'PGint' would know how to handle the v2 and v3 protocol format for ints and have accessor methods for getting the value as a String or possibly other formats, etc. I think this would allow the code in most cases to only need one copy of the data around at a time. >> Also, your email below indicates to me that you are trying to do this >> with the old V2 protocol methods of executing queries. The new V3 >> protocol provides a much, much better interface to the driver to do >> this more efficiently. Have you looked at using the V3 protocol >> features to execute the prepared statements? > > Yes, I'm looking at using V3 (also probably for COPY support). There's a > fair amount of reorganization needed to do this properly, though; the > main part would be abstracting the conversion of parameters to wire > representations, so we can support different representations for V2 and > V3 easily, and possibly something similar for query execution methods > (to support reusing queries via V3's Parse message). > > There's also a whole slew of other changes that could be made to use > V3's features, especially around cursors and not materializing all the > rows in a resultset at once; it looks like we can use a named portal and > set a row limit, then use FETCH or another Execute to navigate around > the rows within that portal, for any row-returning command -- much > cleaner than the current "transform-to-DECLARE" approach. But I'm not > sure if I want to take that on at the moment :) > Yup. There is a lot of work to do here to fully utilize the V3 protocol. I worked with Tom to insure that his protocol rewrite had all the features needed by jdbc (which I believe it does), we just need to take advantage of them. thanks, --Barry
Dave Cramer wrote: > On Thu, 2004-01-08 at 01:03, Oliver Jowett wrote: > >>Dave Cramer wrote: >> >>>Oliver, >>> >>>while I think the code snippet below has defined behaviour >> >>What is the defined behaviour, then? It's not obvious to me whether >>{1,2,3}, {42,2,3}, or a driver-dependent value (i.e. undefined >>behaviour) gets inserted. > > defined by who, I meant it was defined in that it didn't have any > surprises. If you implement your idea, then 42,2,3 will be inserted > otherwise 1,2,3 Ok. I meant "defined by the JDBC spec" i.e. portable between drivers. It sounds like you think it's undefined in this sense. > I'm saying that doing it this way will likely expose buggy code, which > we will end up having to figure out why it doesn't work, when the user > says "my code used to work, and now it doesn't", plus they don't send us > code to look at. How far do we go to support buggy code though? If we can't make this sort of change, we lose several opportunities for optimization. If the only objection is supporting buggy code, I'll probably do the changes anyway and apply the patch locally even if it doesn't hit CVS, since we don't have buggy code here ;) -O
On Fri, 9 Jan 2004, Oliver Jowett wrote: > > I'm saying that doing it this way will likely expose buggy code, which > > we will end up having to figure out why it doesn't work, when the user > > says "my code used to work, and now it doesn't", plus they don't send us > > code to look at. > > How far do we go to support buggy code though? If we can't make this > sort of change, we lose several opportunities for optimization. I don't think you can label this as buggy code unless you can point to the spec and say where it is disallowed. It is certainly something that looks dangerous and is unlikely to be written by the average developer, but that doesn't make it illegal. I lean towards the notion that when I say setXXX that's the time the value must be saved because that's certainly more intuitive and specific than "sometime later when the driver decides to." Kris Jurka
Well there is some hint that this is incorrect: pstmnt = con.prepareStatment("insert into foo values(?,?)"); pstmnt.setString(1,"abcd"); for( i=0; i< 5; i++) { Integer intx = new Integer(i); pstmnt.setInt(2,intx) pstmnt.executeUpdate(); } The above code should insert ("abcd", 0 .. 4 ) into five rows The point being that the value for setXXX does not change until setXXX is called. This is from section 24.1.2 in the JDBC api tutorial In all I'm sort of sitting on the fence here. I guess we could try it and see how many people get tripped up? Dave On Fri, 2004-01-09 at 03:18, Kris Jurka wrote: > On Fri, 9 Jan 2004, Oliver Jowett wrote: > > > > I'm saying that doing it this way will likely expose buggy code, which > > > we will end up having to figure out why it doesn't work, when the user > > > says "my code used to work, and now it doesn't", plus they don't send us > > > code to look at. > > > > How far do we go to support buggy code though? If we can't make this > > sort of change, we lose several opportunities for optimization. > > I don't think you can label this as buggy code unless you can point to the > spec and say where it is disallowed. It is certainly something that looks > dangerous and is unlikely to be written by the average developer, but that > doesn't make it illegal. I lean towards the notion that when I say setXXX > that's the time the value must be saved because that's certainly more > intuitive and specific than "sometime later when the driver decides to." > > Kris Jurka > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Dave Cramer 519 939 0336 ICQ # 1467551
If anyone is looking for behavior from others, Oracle (my apologies for using the 'O' word...) behaves like this: byte[] b = new byte[] {(byte)1,(byte)2,(byte)3}; stmnt.setBytes(1,b); b[2] = (byte) 33; stmnt.setBytes(1,b); stmnt.execute(); results in {1,2,33}. (Disclaimer - I'm not advocating something just because 'Oracle does it this way') Why someone would do this is, of course, I don't know. While I am all for the idea of improved performance, I find myself forced to agree with the idea that, while something like the above is kooky, it isn't illegal, and falls into the realm of expected (but not necessarily mandated) behavior from a driver. Ambiguity tends to have to favor the idiot, unfortunately. On Jan 9, 2004, at 8:46 AM, Dave Cramer wrote: > Well there is some hint that this is incorrect: > > > pstmnt = con.prepareStatment("insert into foo values(?,?)"); > > pstmnt.setString(1,"abcd"); > > for( i=0; i< 5; i++) > { > Integer intx = new Integer(i); > pstmnt.setInt(2,intx) > pstmnt.executeUpdate(); > } > > The above code should insert ("abcd", 0 .. 4 ) into five rows > > The point being that the value for setXXX does not change until setXXX > is called. > > This is from section 24.1.2 in the JDBC api tutorial > > In all I'm sort of sitting on the fence here. I guess we could try it > and see how many people get tripped up? > > Dave > > On Fri, 2004-01-09 at 03:18, Kris Jurka wrote: >> On Fri, 9 Jan 2004, Oliver Jowett wrote: >> >>>> I'm saying that doing it this way will likely expose buggy code, >>>> which >>>> we will end up having to figure out why it doesn't work, when the >>>> user >>>> says "my code used to work, and now it doesn't", plus they don't >>>> send us >>>> code to look at. >>> >>> How far do we go to support buggy code though? If we can't make this >>> sort of change, we lose several opportunities for optimization. >> >> I don't think you can label this as buggy code unless you can point >> to the >> spec and say where it is disallowed. It is certainly something that >> looks >> dangerous and is unlikely to be written by the average developer, but >> that >> doesn't make it illegal. I lean towards the notion that when I say >> setXXX >> that's the time the value must be saved because that's certainly more >> intuitive and specific than "sometime later when the driver decides >> to." >> >> Kris Jurka >> >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 4: Don't 'kill -9' the postmaster >> > -- > Dave Cramer > 519 939 0336 > ICQ # 1467551 > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > -------------------- Andrew Rawnsley President The Ravensfield Digital Resource Group, Ltd. (740) 587-0114 www.ravensfield.com
Andrew, What happens in Oracle if he second setBytes is not called? does it still insert {1,2,33} ? I think my biggest concern is Andrew's last statement: "Ambiguity tends to have to favor the idiot, unfortunately" Dave On Fri, 2004-01-09 at 09:28, Andrew Rawnsley wrote: > If anyone is looking for behavior from others, Oracle (my apologies for > using the 'O' word...) behaves like this: > > byte[] b = new byte[] {(byte)1,(byte)2,(byte)3}; > stmnt.setBytes(1,b); > b[2] = (byte) 33; > stmnt.setBytes(1,b); > stmnt.execute(); > > results in {1,2,33}. (Disclaimer - I'm not advocating something just > because 'Oracle does it this way') > > Why someone would do this is, of course, I don't know. > > While I am all for the idea of improved performance, I find myself > forced to agree with the idea that, while something > like the above is kooky, it isn't illegal, and falls into the realm of > expected (but not necessarily mandated) behavior from a driver. > Ambiguity tends to have to favor the idiot, unfortunately. > > On Jan 9, 2004, at 8:46 AM, Dave Cramer wrote: > > > Well there is some hint that this is incorrect: > > > > > > pstmnt = con.prepareStatment("insert into foo values(?,?)"); > > > > pstmnt.setString(1,"abcd"); > > > > for( i=0; i< 5; i++) > > { > > Integer intx = new Integer(i); > > pstmnt.setInt(2,intx) > > pstmnt.executeUpdate(); > > } > > > > The above code should insert ("abcd", 0 .. 4 ) into five rows > > > > The point being that the value for setXXX does not change until setXXX > > is called. > > > > This is from section 24.1.2 in the JDBC api tutorial > > > > In all I'm sort of sitting on the fence here. I guess we could try it > > and see how many people get tripped up? > > > > Dave > > > > On Fri, 2004-01-09 at 03:18, Kris Jurka wrote: > >> On Fri, 9 Jan 2004, Oliver Jowett wrote: > >> > >>>> I'm saying that doing it this way will likely expose buggy code, > >>>> which > >>>> we will end up having to figure out why it doesn't work, when the > >>>> user > >>>> says "my code used to work, and now it doesn't", plus they don't > >>>> send us > >>>> code to look at. > >>> > >>> How far do we go to support buggy code though? If we can't make this > >>> sort of change, we lose several opportunities for optimization. > >> > >> I don't think you can label this as buggy code unless you can point > >> to the > >> spec and say where it is disallowed. It is certainly something that > >> looks > >> dangerous and is unlikely to be written by the average developer, but > >> that > >> doesn't make it illegal. I lean towards the notion that when I say > >> setXXX > >> that's the time the value must be saved because that's certainly more > >> intuitive and specific than "sometime later when the driver decides > >> to." > >> > >> Kris Jurka > >> > >> > >> ---------------------------(end of > >> broadcast)--------------------------- > >> TIP 4: Don't 'kill -9' the postmaster > >> > > -- > > Dave Cramer > > 519 939 0336 > > ICQ # 1467551 > > > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 5: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/docs/faqs/FAQ.html > > > -------------------- > > Andrew Rawnsley > President > The Ravensfield Digital Resource Group, Ltd. > (740) 587-0114 > www.ravensfield.com > -- Dave Cramer 519 939 0336 ICQ # 1467551
Kris Jurka wrote: > > On Fri, 9 Jan 2004, Oliver Jowett wrote: > > >>>I'm saying that doing it this way will likely expose buggy code, which >>>we will end up having to figure out why it doesn't work, when the user >>>says "my code used to work, and now it doesn't", plus they don't send us >>>code to look at. >> >>How far do we go to support buggy code though? If we can't make this >>sort of change, we lose several opportunities for optimization. > > > I don't think you can label this as buggy code unless you can point to the > spec and say where it is disallowed. It is certainly something that looks > dangerous and is unlikely to be written by the average developer, but that > doesn't make it illegal. I lean towards the notion that when I say setXXX > that's the time the value must be saved because that's certainly more > intuitive and specific than "sometime later when the driver decides to." > A counter argument is that by definition the setXXXStream() methods do it the other way. The spec defines them (IIRC) to be read at execute time. There purpose after all is to allow the driver not to load the entire contents of the stream into memory, but to simply stream it to the server when executing the statement. So are the setXXXStream() methods a special case, or is the intention of the spec to be consistent and have all values bound at execute time. --Barry
Ah, yes, my bad, tested wrong. Whenever I use the word 'idiot' in post it ends up pointing back to me... Oracle produces {1,2,3} with the following: byte[] b = new byte[] {(byte)1,(byte)2,(byte)3}; stmnt.setInt(1,1); stmnt.setBytes(2,b); b[2] = (byte) 33; stmnt.executeUpdate(); So its doing it at the actual 'setXXX' call. On Jan 9, 2004, at 9:37 AM, Dave Cramer wrote: > Andrew, > > What happens in Oracle if he second setBytes is not called? > does it still insert {1,2,33} ? > > I think my biggest concern is Andrew's last statement: > "Ambiguity tends to have to favor the idiot, unfortunately" > > > Dave > > On Fri, 2004-01-09 at 09:28, Andrew Rawnsley wrote: >> If anyone is looking for behavior from others, Oracle (my apologies >> for >> using the 'O' word...) behaves like this: >> >> byte[] b = new byte[] {(byte)1,(byte)2,(byte)3}; >> stmnt.setBytes(1,b); >> b[2] = (byte) 33; >> stmnt.setBytes(1,b); >> stmnt.execute(); >> >> results in {1,2,33}. (Disclaimer - I'm not advocating something just >> because 'Oracle does it this way') >> >> Why someone would do this is, of course, I don't know. >> >> While I am all for the idea of improved performance, I find myself >> forced to agree with the idea that, while something >> like the above is kooky, it isn't illegal, and falls into the realm of >> expected (but not necessarily mandated) behavior from a driver. >> Ambiguity tends to have to favor the idiot, unfortunately. >> >> On Jan 9, 2004, at 8:46 AM, Dave Cramer wrote: >> >>> Well there is some hint that this is incorrect: >>> >>> >>> pstmnt = con.prepareStatment("insert into foo values(?,?)"); >>> >>> pstmnt.setString(1,"abcd"); >>> >>> for( i=0; i< 5; i++) >>> { >>> Integer intx = new Integer(i); >>> pstmnt.setInt(2,intx) >>> pstmnt.executeUpdate(); >>> } >>> >>> The above code should insert ("abcd", 0 .. 4 ) into five rows >>> >>> The point being that the value for setXXX does not change until >>> setXXX >>> is called. >>> >>> This is from section 24.1.2 in the JDBC api tutorial >>> >>> In all I'm sort of sitting on the fence here. I guess we could try it >>> and see how many people get tripped up? >>> >>> Dave >>> >>> On Fri, 2004-01-09 at 03:18, Kris Jurka wrote: >>>> On Fri, 9 Jan 2004, Oliver Jowett wrote: >>>> >>>>>> I'm saying that doing it this way will likely expose buggy code, >>>>>> which >>>>>> we will end up having to figure out why it doesn't work, when the >>>>>> user >>>>>> says "my code used to work, and now it doesn't", plus they don't >>>>>> send us >>>>>> code to look at. >>>>> >>>>> How far do we go to support buggy code though? If we can't make >>>>> this >>>>> sort of change, we lose several opportunities for optimization. >>>> >>>> I don't think you can label this as buggy code unless you can point >>>> to the >>>> spec and say where it is disallowed. It is certainly something that >>>> looks >>>> dangerous and is unlikely to be written by the average developer, >>>> but >>>> that >>>> doesn't make it illegal. I lean towards the notion that when I say >>>> setXXX >>>> that's the time the value must be saved because that's certainly >>>> more >>>> intuitive and specific than "sometime later when the driver decides >>>> to." >>>> >>>> Kris Jurka >>>> >>>> >>>> ---------------------------(end of >>>> broadcast)--------------------------- >>>> TIP 4: Don't 'kill -9' the postmaster >>>> >>> -- >>> Dave Cramer >>> 519 939 0336 >>> ICQ # 1467551 >>> >>> >>> ---------------------------(end of >>> broadcast)--------------------------- >>> TIP 5: Have you checked our extensive FAQ? >>> >>> http://www.postgresql.org/docs/faqs/FAQ.html >>> >> -------------------- >> >> Andrew Rawnsley >> President >> The Ravensfield Digital Resource Group, Ltd. >> (740) 587-0114 >> www.ravensfield.com >> > -- > Dave Cramer > 519 939 0336 > ICQ # 1467551 > -------------------- Andrew Rawnsley President The Ravensfield Digital Resource Group, Ltd. (740) 587-0114 www.ravensfield.com
Dave Cramer wrote: > Well there is some hint that this is incorrect: > > > pstmnt = con.prepareStatment("insert into foo values(?,?)"); > > pstmnt.setString(1,"abcd"); > > for( i=0; i< 5; i++) > { > Integer intx = new Integer(i); > pstmnt.setInt(2,intx) > pstmnt.executeUpdate(); > } > > The above code should insert ("abcd", 0 .. 4 ) into five rows > > The point being that the value for setXXX does not change until setXXX > is called. Well, this is a different case -- Integer is immutable, so when the driver transforms it to a DB representation is irrelevant. (I assume you mean setObject(), above, since setInt() expects a primitive int..). Even if the implication of the above code is that a parameter doesn't change from what you last set it to on execution, that doesn't really help us answer the question .. so you don't set it, and have the same mutable object reference, but that's no different to the case where you set it to a new mutable object reference .. it could still mutate before execution. If we were using e.g. setBytes() the analogous case is something like this: byte[] data = new byte[10]; // (1) for (int i = 0; i < 5; ++i) { data[0] = i; pstmt.setBytes(2, data); // (2) pstmt.executeUpdate(); } If binding was guaranteed to happen at execution you could get away with a setBytes() at (1) and remove (2). But I think the above code is the only way to be safe under all drivers. The above case is relatively boring as 'data' does not change between setBytes() and executeUpdate() anyway. More interesting is this one.. byte[] data = new byte[10]; for (int i = 0; i < 5; ++i) { data[0] = i; pstmt.setBytes(2, data); pstmt.addBatch(); } pstmt.executeBatch(); I think this is the case that is more likely to bite us. "set parameters, modify mutable objects, execute" seems like an unlikely pattern to use in real code. "reuse mutable objects, set parameters, add to batch, loop" seems more likely. I'm still in favour of an "undefined behaviour" interpretation here. There's not much benefit to application code in nailing down one behaviour or the other, and leaving it undefined gives the driver the flexibility to do whichever is a better implementation for the DB in question. I might send a query to the JDBC expert group about this. Anyone have a better contact address than the spec feedback address? (the Sun JDBC-INTEREST list seems to be pretty useless, it seems to be the "where can I find an oracle driver?" mailing list) -O
On Mon, 12 Jan 2004, Oliver Jowett wrote: > I'm still in favour of an "undefined behaviour" interpretation here. > There's not much benefit to application code in nailing down one > behaviour or the other, and leaving it undefined gives the driver the > flexibility to do whichever is a better implementation for the DB in > question. > The question that has yet been unanswered is how much taking advantage of the "undefined behavior" will get us. You personally seem most interested in the setBytes() case because it is relevent to your application and it can potentially be quite large. I don't know how much this would gain on say a Date object. For your particular problem it seems you could simply wrap the byte array in question inside a ByteArrayInputStream (which does not copy it) and use setBinaryStream which would allow the delayed reading of it. Kris Jurka
On 11/01/2004 22:40 Oliver Jowett wrote: > [snip] > I'm still in favour of an "undefined behaviour" interpretation here. > There's not much benefit to application code in nailing down one > behaviour or the other, and leaving it undefined gives the driver the > flexibility to do whichever is a better implementation for the DB in > question. Having followed this very interesting thread, I'm still wondering exactly how much measurable improvement could be achieved. I read an article on IBM developerWorks (sorry can't remember the URL) which stated that, on modern VMs, things like object creation aren't the performance bogeys that they once were. So I'm thinking that before we make a decision about committing to a change which might break someones app, is there any way by which we could measure the effects of the proposed change? > I might send a query to the JDBC expert group about this. I, for one, would be interested to know. -- Paul Thomas +------------------------------+---------------------------------------------+ | Thomas Micro Systems Limited | Software Solutions for the Smaller Business | | Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk | +------------------------------+---------------------------------------------+
Paul Thomas wrote: > > On 11/01/2004 22:40 Oliver Jowett wrote: > >> [snip] >> I'm still in favour of an "undefined behaviour" interpretation here. >> There's not much benefit to application code in nailing down one >> behaviour or the other, and leaving it undefined gives the driver the >> flexibility to do whichever is a better implementation for the DB in >> question. > > > Having followed this very interesting thread, I'm still wondering > exactly how much measurable improvement could be achieved. I read an > article on IBM developerWorks (sorry can't remember the URL) which > stated that, on modern VMs, things like object creation aren't the > performance bogeys that they once were. So I'm thinking that before we > make a decision about committing to a change which might break someones > app, is there any way by which we could measure the effects of the > proposed change? The problem is that it's very specific to the application workload; what case do we measure? The reason I'm interested in doing this for is not the direct CPU overhead of object creation (none of the JDBC code is on our main execution path), but the effect that object creation has on GC interval and pause. We're running a low-latency app where an extra 50ms pause due to GC has a large impact on our latency figures .. so the less garbage we generate in the first place, the better. We could farm the JDBC work out to a separate VM, but that gets complex quite fast. Aside from the change in behaviour, the change I'm proposing has no real downside I can see -- it's not a CPU-vs-memory tradeoff, we're just generating fewer intermediate copies altogether. More of a code complexity-vs-runtime cost tradeoff I suppose. -O
Kris Jurka wrote: > On Mon, 12 Jan 2004, Oliver Jowett wrote: > > >>I'm still in favour of an "undefined behaviour" interpretation here. >>There's not much benefit to application code in nailing down one >>behaviour or the other, and leaving it undefined gives the driver the >>flexibility to do whichever is a better implementation for the DB in >>question. >> > > > The question that has yet been unanswered is how much taking advantage of > the "undefined behavior" will get us. You personally seem most interested > in the setBytes() case because it is relevent to your application and it > can potentially be quite large. I don't know how much this would gain on > say a Date object. For your particular problem it seems you could simply > wrap the byte array in question inside a ByteArrayInputStream (which does > not copy it) and use setBinaryStream which would allow the delayed reading > of it. Yeah, modifying the driver to support setBinaryStream better is my second choice (the current implementation just delegates to setBytes). Most of that work overlaps with setBytes() though, it'll be almost as easy to do both. And wrapping the array just to have the driver unwrap it again seems a bit perverse :) I'm interested in the official word on this anyway since even if it's not useful to implement for some types, it's still a correctness issue. -O
Oliver, I think finding the "offical" word on this is going to be unlikely, at best it will be someone's opinion. It's not in the spec so it will be an interpretation. I think more important is meeting the expected behaviour from the users POV. That being said, my example showing mutable objects and the expected behaviour was just that an example, I think the behaviour should be the same for mutable/un-mutable objects. I would expect there would be a population of programmers out there that isn't even aware of the fact that some objects are un-mutable/mutable. Unfortunately we need to code to the lowest common denominator. Dave On Sun, 2004-01-11 at 19:49, Oliver Jowett wrote: > Kris Jurka wrote: > > On Mon, 12 Jan 2004, Oliver Jowett wrote: > > > > > >>I'm still in favour of an "undefined behaviour" interpretation here. > >>There's not much benefit to application code in nailing down one > >>behaviour or the other, and leaving it undefined gives the driver the > >>flexibility to do whichever is a better implementation for the DB in > >>question. > >> > > > > > > The question that has yet been unanswered is how much taking advantage of > > the "undefined behavior" will get us. You personally seem most interested > > in the setBytes() case because it is relevent to your application and it > > can potentially be quite large. I don't know how much this would gain on > > say a Date object. For your particular problem it seems you could simply > > wrap the byte array in question inside a ByteArrayInputStream (which does > > not copy it) and use setBinaryStream which would allow the delayed reading > > of it. > > Yeah, modifying the driver to support setBinaryStream better is my > second choice (the current implementation just delegates to setBytes). > Most of that work overlaps with setBytes() though, it'll be almost as > easy to do both. And wrapping the array just to have the driver unwrap > it again seems a bit perverse :) > > I'm interested in the official word on this anyway since even if it's > not useful to implement for some types, it's still a correctness issue. > > -O > -- Dave Cramer 519 939 0336 ICQ # 1467551
On 12/01/2004 00:39 Oliver Jowett wrote: > Paul Thomas wrote: >> Having followed this very interesting thread, I'm still wondering >> exactly how much measurable improvement could be achieved. I read an >> article on IBM developerWorks (sorry can't remember the URL) which >> stated that, on modern VMs, things like object creation aren't the >> performance bogeys that they once were. So I'm thinking that before we >> make a decision about committing to a change which might break someones >> app, is there any way by which we could measure the effects of the >> proposed change? > > The problem is that it's very specific to the application workload; what > case do we measure? I think you would need something which closely models you app's work load. > The reason I'm interested in doing this for is not the direct CPU > overhead of object creation (none of the JDBC code is on our main > execution path), but the effect that object creation has on GC interval > and pause. We're running a low-latency app where an extra 50ms pause due > to GC has a large impact on our latency figures .. so the less garbage > we generate in the first place, the better. We could farm the JDBC work > out to a separate VM, but that gets complex quite fast. Reducing the amount of garbage created is never a bad thing. Do you have an estimate of the % reduction you think you could achieve? > > Aside from the change in behaviour, the change I'm proposing has no real > downside I can see -- it's not a CPU-vs-memory tradeoff, we're just > generating fewer intermediate copies altogether. More of a code > complexity-vs-runtime cost tradeoff I suppose. Sounds fine to me. -- Paul Thomas +------------------------------+---------------------------------------------+ | Thomas Micro Systems Limited | Software Solutions for the Smaller Business | | Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk | +------------------------------+---------------------------------------------+
Dave Cramer wrote: > Oliver, > > I think finding the "offical" word on this is going to be unlikely, at > best it will be someone's opinion. It's not in the spec so it will be an > interpretation. I think more important is meeting the expected behaviour > from the users POV. I've sent an email asking for clarification to the feedback address in the 3.0 specification. Hopefully whatever response I get will make it into future versions of the JDBC specification. > That being said, my example showing mutable objects and the expected > behaviour was just that an example, I think the behaviour should be the > same for mutable/un-mutable objects. I would expect there would be a > population of programmers out there that isn't even aware of the fact > that some objects are un-mutable/mutable. Unfortunately we need to code > to the lowest common denominator. Ouch. Pass-by-reference semantics are pretty fundamental to Java. Do we really have to assume that our users aren't aware of this? BTW the behaviour for mutable vs. immutable objects *will* be the same .. it's just that for immutable objects, they don't happen to expose an interface that allows you to change their value (that's why you call them immutable ;). The reference semantics are identical. -O
Paul Thomas wrote: > On 12/01/2004 00:39 Oliver Jowett wrote: >> The problem is that it's very specific to the application workload; >> what case do we measure? > > I think you would need something which closely models you app's work load. > >> The reason I'm interested in doing this for is not the direct CPU >> overhead of object creation (none of the JDBC code is on our main >> execution path), but the effect that object creation has on GC >> interval and pause. We're running a low-latency app where an extra >> 50ms pause due to GC has a large impact on our latency figures .. so >> the less garbage we generate in the first place, the better. We could >> farm the JDBC work out to a separate VM, but that gets complex quite >> fast. > > > Reducing the amount of garbage created is never a bad thing. Do you have > an estimate of the % reduction you think you could achieve? See the attached testcase, it does batched inserts of byte arrays. I've run this as: java -Xrunhprof:heap=all,file=insertgarbage.txt,depth=10,cutoff=0 \ -verbose:gc -cp .:/net/java/java_libs/postgresql.jar \ TestInsertGarbage \ 'jdbc:postgresql:rhino?user=oliver&password=oliver' \ 100000 100 100 which does 100k inserts each with a 100 byte array, in batches of 100. This is roughly the same as our application workload when it is doing a DB resync of a reasonably-sized provisioning database. In the resulting heap dump, the allocation hotspots are: QueryExecutor.sendQueryV2 -> Encoding.encode -> String.getBytes (100000 byte[], 143738400 bytes) (100000 byte[], 37144800 bytes) (100000 byte[], 20000000 bytes) // This is encoding the query text Statement.setBytes -> setString -> escapeString -> StringBuffer.append (63000 char[], 81608000 bytes) (19200 char[], 25018400 bytes) (17800 char[], 23091200 bytes) Statement.setBytes -> setString -> StringBuffer.ensureCapacity (100000 char[], 65445600 bytes) Statement.setBytes -> PGbytea.toPGString -> StringBuffer.<init> (100000 char[], 41600000 bytes) Statement.setBytes -> PGbytea.toPGString -> StringBuffer.append (21800 char[], 17788800 bytes) (21400 char[], 17462400 bytes) (21300 char[], 17380800 bytes) (20800 char[], 16972800 bytes) (14400 char[], 11750400 bytes) QueryExecutor.executeV2 -> Jdbc3PreparedStatement.createResultSet (100000 Jdbc3ResultSet, 13600000 bytes) All other allocation points allocate <10MB. There's quite a bit of other object allocation that could be cleaned up, e.g. repeated allocation of QueryExecutor objects. Total allocations were around 631800000 bytes. I estimate that streaming the parameter data would generate about 499000000 bytes less garbage (basically all of the above goes away except for encoding the query text and creating the resultsets), which is a 79% reduction (this is actually a lot more than I expected!). There are improvements we can make here without streaming, though. Not creating resultset objects for updates, and doing all necessary escaping in a single pass rather than two, both seem like low-hanging fruit. -O import java.sql.*; public class TestInsertGarbage { public static void main(String args[]) throws Exception { if (args.length < 4) { System.err.println("usage: java TestInsertGarbage <jdbc: URL> <# of inserts> <array size> <batch size>"); return; } String driverURL = args[0]; int numInserts = Integer.parseInt(args[1]); int arraySize = Integer.parseInt(args[2]); int batchSize = Integer.parseInt(args[3]); // We don't want to count the data to be inserted itself, since that's // mostly static and isn't garbage. So just create a single array // to represent all the data. We'll reuse that on each insert. // we also give it a random distribution of values since that affects // the exact length needed to escape the value. byte[] dummyArray = new byte[arraySize]; Class.forName("org.postgresql.Driver"); Connection conn = DriverManager.getConnection(driverURL); // Schema setup. Statement stmt = conn.createStatement(); try { stmt.executeUpdate("DROP TABLE test_insert_garbage"); } catch (SQLException e) {} stmt.executeUpdate("CREATE TABLE test_insert_garbage(data bytea)"); // Testcase itself. conn.setAutoCommit(false); PreparedStatement inserter = conn.prepareStatement("INSERT INTO test_insert_garbage(data) VALUES (?)"); // Main loop. java.util.Random randomizer = new java.util.Random(); for (int i = 0; i < numInserts; ) { randomizer.nextBytes(dummyArray); // Hee hee. for (int j = 0; j < batchSize && i < numInserts; ++i, ++j) { inserter.setBytes(1, dummyArray); inserter.addBatch(); } inserter.executeBatch(); System.err.println("Inserted " + i + " rows"); } // Done. conn.commit(); System.err.println("Committed transaction."); } }
On Mon, 2004-01-12 at 17:44, Oliver Jowett wrote: > Dave Cramer wrote: > > Oliver, > > > > I think finding the "offical" word on this is going to be unlikely, at > > best it will be someone's opinion. It's not in the spec so it will be an > > interpretation. I think more important is meeting the expected behaviour > > from the users POV. > > I've sent an email asking for clarification to the feedback address in > the 3.0 specification. Hopefully whatever response I get will make it > into future versions of the JDBC specification. Ok, lets see what they say. > > > That being said, my example showing mutable objects and the expected > > behaviour was just that an example, I think the behaviour should be the > > same for mutable/un-mutable objects. I would expect there would be a > > population of programmers out there that isn't even aware of the fact > > that some objects are un-mutable/mutable. Unfortunately we need to code > > to the lowest common denominator. > > Ouch. Pass-by-reference semantics are pretty fundamental to Java. Do we > really have to assume that our users aren't aware of this? This is essentially what Andrew Rawnsley was referring to in his post where he said "Ambiguity tends to have to favor the idiot, unfortunately" > > BTW the behaviour for mutable vs. immutable objects *will* be the same > .. it's just that for immutable objects, they don't happen to expose an > interface that allows you to change their value (that's why you call > them immutable ;). The reference semantics are identical. > > -O > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- Dave Cramer 519 939 0336 ICQ # 1467551
Oliver, OK, now how do we test your estimates? If it truly shows this kind of improvement you would sway me! Dave On Mon, 2004-01-12 at 21:02, Oliver Jowett wrote: > Paul Thomas wrote: > > On 12/01/2004 00:39 Oliver Jowett wrote: > >> The problem is that it's very specific to the application workload; > >> what case do we measure? > > > > I think you would need something which closely models you app's work load. > > > >> The reason I'm interested in doing this for is not the direct CPU > >> overhead of object creation (none of the JDBC code is on our main > >> execution path), but the effect that object creation has on GC > >> interval and pause. We're running a low-latency app where an extra > >> 50ms pause due to GC has a large impact on our latency figures .. so > >> the less garbage we generate in the first place, the better. We could > >> farm the JDBC work out to a separate VM, but that gets complex quite > >> fast. > > > > > > Reducing the amount of garbage created is never a bad thing. Do you have > > an estimate of the % reduction you think you could achieve? > > See the attached testcase, it does batched inserts of byte arrays. I've > run this as: > > java -Xrunhprof:heap=all,file=insertgarbage.txt,depth=10,cutoff=0 \ > -verbose:gc -cp .:/net/java/java_libs/postgresql.jar \ > TestInsertGarbage \ > 'jdbc:postgresql:rhino?user=oliver&password=oliver' \ > 100000 100 100 > > which does 100k inserts each with a 100 byte array, in batches of 100. > This is roughly the same as our application workload when it is doing a > DB resync of a reasonably-sized provisioning database. > > In the resulting heap dump, the allocation hotspots are: > > QueryExecutor.sendQueryV2 -> Encoding.encode -> String.getBytes > (100000 byte[], 143738400 bytes) > (100000 byte[], 37144800 bytes) > (100000 byte[], 20000000 bytes) // This is encoding the query text > Statement.setBytes -> setString -> escapeString -> StringBuffer.append > (63000 char[], 81608000 bytes) > (19200 char[], 25018400 bytes) > (17800 char[], 23091200 bytes) > Statement.setBytes -> setString -> StringBuffer.ensureCapacity > (100000 char[], 65445600 bytes) > Statement.setBytes -> PGbytea.toPGString -> StringBuffer.<init> > (100000 char[], 41600000 bytes) > Statement.setBytes -> PGbytea.toPGString -> StringBuffer.append > (21800 char[], 17788800 bytes) > (21400 char[], 17462400 bytes) > (21300 char[], 17380800 bytes) > (20800 char[], 16972800 bytes) > (14400 char[], 11750400 bytes) > QueryExecutor.executeV2 -> Jdbc3PreparedStatement.createResultSet > (100000 Jdbc3ResultSet, 13600000 bytes) > > All other allocation points allocate <10MB. There's quite a bit of other > object allocation that could be cleaned up, e.g. repeated allocation of > QueryExecutor objects. > > Total allocations were around 631800000 bytes. I estimate that streaming > the parameter data would generate about 499000000 bytes less garbage > (basically all of the above goes away except for encoding the query text > and creating the resultsets), which is a 79% reduction (this is actually > a lot more than I expected!). > > There are improvements we can make here without streaming, though. Not > creating resultset objects for updates, and doing all necessary escaping > in a single pass rather than two, both seem like low-hanging fruit. > > -O > > ______________________________________________________________________ > > import java.sql.*; > > public class TestInsertGarbage { > public static void main(String args[]) throws Exception { > if (args.length < 4) { > System.err.println("usage: java TestInsertGarbage <jdbc: URL> <# of inserts> <array size> <batch size>"); > return; > } > > String driverURL = args[0]; > int numInserts = Integer.parseInt(args[1]); > int arraySize = Integer.parseInt(args[2]); > int batchSize = Integer.parseInt(args[3]); > > // We don't want to count the data to be inserted itself, since that's > // mostly static and isn't garbage. So just create a single array > // to represent all the data. We'll reuse that on each insert. > > // we also give it a random distribution of values since that affects > // the exact length needed to escape the value. > byte[] dummyArray = new byte[arraySize]; > > Class.forName("org.postgresql.Driver"); > Connection conn = DriverManager.getConnection(driverURL); > > // Schema setup. > Statement stmt = conn.createStatement(); > try { > stmt.executeUpdate("DROP TABLE test_insert_garbage"); > } catch (SQLException e) {} > stmt.executeUpdate("CREATE TABLE test_insert_garbage(data bytea)"); > > // Testcase itself. > conn.setAutoCommit(false); > PreparedStatement inserter = > conn.prepareStatement("INSERT INTO test_insert_garbage(data) VALUES (?)"); > > // Main loop. > java.util.Random randomizer = new java.util.Random(); > for (int i = 0; i < numInserts; ) { > randomizer.nextBytes(dummyArray); // Hee hee. > for (int j = 0; j < batchSize && i < numInserts; ++i, ++j) { > inserter.setBytes(1, dummyArray); > inserter.addBatch(); > } > inserter.executeBatch(); > System.err.println("Inserted " + i + " rows"); > } > > // Done. > conn.commit(); > System.err.println("Committed transaction."); > } > } > > ______________________________________________________________________ > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Dave Cramer 519 939 0336 ICQ # 1467551