Thread: binary tuple receiving patch v2
Hi, This patch includes the code from my first try with subject "ResultSet.getBytes() of bytea speedup batch". It also adds support for receiving date,timestamp,timestamptz,time and timetz types using binary encoding in addition to the bytea that my first patch supported. My quick benchmark for binary encoded time types gave the following results (with postgresql 8.1.5 compiled in two different date handling modes): intdatetime=off txt: speed 1298.70 (max memory: 9.4MB) bin: speed 4953.26 (max memory: 2.5MB) - over 250% faster intdatetime=on txt: speed 1449.28 (max memory: 9.4MB) bin: speed 5882.35 (max memory: 2.5MB) - over 300% faster And as my previous mail showed the binary encoding for bytea gave 20-400% speedup depending on the column size. ------ Changes since previous patch: - added time/timetz/timestamp/timestamptz/date support - binary encoding used only with v3 protocol - new "binaryEncoding=0/1" connection property for controlling the feature - currently defaults to 1 - possibility to control binary encoding per oid type - fixed sending Bind request when not all result set fields were binary What the patch does: After executing a prepared statement the first time the result set field information (Field[]) is cached into the prepared query object. Subsequent executions of the prepared statements use the field types to decide when to request binary encoding from the server. The caching seems safe because the Field objects are immutable. ----- Still open questions before the patch can be integrated - any help with answering these is appreciated: - is there any better way to obtain the result set field information in QueryExecutor? (org/postgresql/jdbc2/AbstractJdbc2Statement.java) * could the result set type information be obtained during the prepare so that they would be available even on the first execute? - in which conditions should a prepared statement drop the cached fields? for example: can the result set fields ever change after the statement has been prepared? - How big fuzzyness should be supported by the driver. If a column is of type date and ResulSet.getTimestamp is called, should it work? * I currently support only getString on the time fields * I can make the getTime/getDate/getTimestamp interchangeable if requested. It would even be possible to use getLong for time fields which the text encoding never supported - Must the different SQL time formats in java have different extreme small/large value handling than what postgresql internally uses? - if caching of Field information is allowed (at least in some circumstances), could the driver be optimised to only request a Describe on the first excution of a prepared statement (or once during prepare)? -Mikko
Attachment
On Sun, 19 Nov 2006, Mikko Tiihonen wrote: I have not yet looked at this patch, but hopefully I can answer some of the questions below. > - is there any better way to obtain the result set field information > in QueryExecutor? (org/postgresql/jdbc2/AbstractJdbc2Statement.java) > * could the result set type information be obtained during the > prepare so that they would be available even on the first execute? Getting information on the prepare before execution is possible, but would require an extra network roundtrip. There has been a lot of speculation about the impact of such a change, but I don't believe anyone has measured it. The other complication is that we often want/need to do a Bind before Describe so we know what's coming back. Consider "SELECT ?", calling setInt will make the describe come back with an int type. Without that I think it may error with an unresolved type. So you might need to do: Parse/Bind/Describe/Sync, process results of Describe, Bind/Execute/Sync while all that happens right now in one network trip. > - in which conditions should a prepared statement drop the cached > fields? for example: can the result set fields ever change after > the statement has been prepared? PreparedStatement ps = conn.prepareStatement("SELECT ?"); ps.setInt(1,1); ResultSet rs = ps.executeQuery(); ps.setLong(1,1); rs = ps.executeQuery(); Here the returned type has changed. In this case the driver recognized that the input types have changed and reparses the query behind the scenes. I'm not sure where you are operating so you might be alright or you might need extra work to handle this case. > - How big fuzzyness should be supported by the driver. If a column is > of type date and ResulSet.getTimestamp is called, should it work? > * I currently support only getString on the time fields > * I can make the getTime/getDate/getTimestamp interchangeable if > requested. It would even be possible to use getLong for time > fields which the text encoding never supported Yes, getTimestamp on a date should work. Consult the JDBC spec for details of all the conversions. In the JDBC 3 spec, appendix B table B-6 shows all the required conversions. > - Must the different SQL time formats in java have different extreme > small/large value handling than what postgresql internally uses? Don't know what you mean. > - if caching of Field information is allowed (at least in some > circumstances), could the driver be optimised to only request a > Describe on the first excution of a prepared statement (or once > during prepare)? > Well it does depend on the Bind parameters, but yes doing it repeatedly is often a waste of time. Hopefully that helped and I'll be able to look at the patch this weekend. Kris Jurka