Thread: Register arbitrary types Framework
Hello @all, I don't know whether this List is the correct place to ask this, and so I added an similar Feature Request to the gborg project: http://gborg.postgresql.org/project/pgjdbc/bugs/bugupdate.php?709 For my project, I need to register third-party classes as postgres classes, (specifically, I want JTS objects to be directly created when calling getObject(int)) and it is not eligible to patch those third-party classes. However, the current user-type register system only allows subtypes of PGObject to be registered. I have two Ideas on how to change this, whereas I would prefer the second one, and am willing to write and provide patches for both of them if they are to be accepted: - When a registered object is no subclass of PGObject, use the constructor that uses a single String as arguments for reading and toString() for writing. - Allow instances of a factory Interface to be registered to the PostGreSQL connection. Then, when reading this type, a creator function of the factory is called, and when writing, we can use a map to find the factory from the object's class. At least the second solution would also allow cleaning up of the PostGIS jdbc drivers as they now use a PGGeometry class that acts as a pure Wrapper function, and the user has to call getGeometry() on the received PGGeometry to get the real Geometry. Thanks for your patience, Markus -- markus schaber | dipl. informatiker logi-track ag | rennweg 14-16 | ch 8001 zürich phone +41-43-888 62 52 | fax +41-43-888 62 53 mailto:schabios@logi-track.com | www.logi-track.com
On Wed, 10 Mar 2004, Markus Schaber wrote: > Hello @all, > > For my project, I need to register third-party classes as postgres > classes, (specifically, I want JTS objects to be directly created when > calling getObject(int)) and it is not eligible to patch those > third-party classes. > > However, the current user-type register system only allows subtypes of > PGObject to be registered. I have two Ideas on how to change this, > whereas I would prefer the second one, and am willing to write and > provide patches for both of them if they are to be accepted: > > - When a registered object is no subclass of PGObject, use the > constructor that uses a single String as arguments for reading and > toString() for writing. > > - Allow instances of a factory Interface to be registered to the > PostGreSQL connection. Then, when reading this type, a creator function > of the factory is called, and when writing, we can use a map to find the > factory from the object's class. > I see what you are getting at, but I don't see this as all that useful. How is this much better than creating a PGObject adapter class to bridge the differences between your object and the driver? Wouldn't you need to make some kind of modification to the JTS object when doing a setObject() call? I find it difficult to see its toString() implementation mapping directly to a pg server side type. The JDBC spec provides another type mapping interface based on SQLData. This is more focused on complex types, but we could do something like only allowing one read or write call per SQLInput/SQLOutput object. This still doesn't solve your problem because it needs to implement SQLData. Perhaps you could give us some more specifics on the database types involved and why getObject has to return an instance of your object instead of an intermediary. Kris Jurka
Hi, Kris, On Thu, 11 Mar 2004 00:55:20 -0500 (EST) Kris Jurka <books@ejurka.com> wrote: > I see what you are getting at, but I don't see this as all that > useful. How is this much better than creating a PGObject adapter > class to bridge the differences between your object and the driver? In my eyes, it has the following advantages: - Less overhead. For every read Object, an additional wrapper has to be instantiated just to be thrown away afterwards. - More readable code. For every getObject(), you need an additional line to call getGeometry() or whatever unwrapping method you use. - Unification for the user with "internal" Objects like TimeStamp, Integer or String, where getObject() returns the Object directly. - Possible simplification of the Driver code as we may even change the forementioned internal Objects to use the new type system, they're just preloaded into the type table. - This way, we could even allow the Users to "override" the internal types, e. G. to return Collections instead of Arrays, or customized/ localized calendar objects instead of TimeStamps etc. - We can have one set of Geometry Java Objects for different SQL databases, and we only have to mess with it at the database/driver specific connection initialization, but not in the code that fires the queries. > Wouldn't you need to make some kind of modification to the JTS object > when doing a setObject() call? I find it difficult to see its > toString() implementation mapping directly to a pg server side type. Luckily, both JTS and PostGIS Objects return their WKT representation when calling toString(), and PostGreSQL understands this. But this problem is why I prefer the second solution using factories although it is more work to start with. The factory instances basically have 2 methods - one for reading and another one for writing. > Perhaps you could give us some more specifics on the database types > involved and why getObject has to return an instance of your object > instead of an intermediary. Currently, we have PostGIS geometries in the database, which can be read into PostGIS Java objects and JTS Java objects by using the PostGIS Wrapper rsp. our own JTS Wrapper. Our app itsself doesn't know anything about the wrappers (and doesn't want to know because it is designet to transparently support other I/O methods, too). And as our Queries are generic and user-configurable, and support other built-in types as timestamps, currently our SQL layer calls an unwrap() method after each getObject(). This unwrap() function tests for known Wrapper classes and calls their getGeometry() or similar method. Thanks, Markus -- markus schaber | dipl. informatiker logi-track ag | rennweg 14-16 | ch 8001 zürich phone +41-43-888 62 52 | fax +41-43-888 62 53 mailto:schabios@logi-track.com | www.logi-track.com
[Discussing not having ResultSet.getObject return a PGObject] I did not receive your reply through the list for some reason, but I stumbled across it in the archives: http://archives.postgresql.org/pgsql-jdbc/2004-03/msg00067.php I asked what about symmetry with setObject and you replied, But this problem is why I prefer the second solution using factories although it is more work to start with. The factory instances basically have 2 methods - one for reading and another one for writing. With getObject you can register the factory with a pg internal type name that the driver knows, but with setObject you have nothing to determine which (if any) factory to use other than the object itself. You could work on some kind of reflection based scheme, but this is certainly not symmetric with how getObject works. Kris Jurka
Hi, Kris, On Wed, 17 Mar 2004 02:29:07 -0500 (EST) Kris Jurka <books@ejurka.com> wrote: > I did not receive your reply through the list for some reason, but I > stumbled across it in the archives: Strange... I'll send you a Cc: to be on the safe side :-) > But this problem is why I prefer the second solution using > factories although it is more work to start with. The factory > instances basically have 2 methods - one for reading and > another one for writing. > > With getObject you can register the factory with a pg internal type > name that the driver knows, but with setObject you have nothing to > determine which (if any) factory to use other than the object itself. > You could work on some kind of reflection based scheme, but this is > certainly not symmetric with how getObject works. Every Object's class can be obtained with the getClass() method. So (as I weakly hinted in my original post) we can have a Map with the classes (or fully qualified class names) as keys, and the factories as values. Using a HashMap as Map, this allows us to find the factory in constant time (*). I also think that the current setObject(int, Object) using an 13-branch if(instanceof)-else construct could be sped up by this - of course, at the cost of creating the appropriate factory classes. On the other hand, this allows some code cleanup by moving most of the type specific code out of the statement and resultset classes, degrading the specific setXXX methods to simple wrappers. But this is not my primary intention - IMHO, it's more like a positive side effect to have the possibility. It has to be discussed whether this refactoring is appropriate regarding design and speed issues. Thanks for your patience, Markus (*) Of course, correct handling of subclasses is a little more tricky, but I already have some ideas on how to handle this problem. But as I don't want to overload this message, I'll outline them in another Followup. -- markus schaber | dipl. informatiker logi-track ag | rennweg 14-16 | ch 8001 zürich phone +41-43-888 62 52 | fax +41-43-888 62 53 mailto:schabios@logi-track.com | www.logi-track.com
On Thu, 18 Mar 2004, Markus Schaber wrote: > > With getObject you can register the factory with a pg internal type > > name that the driver knows, but with setObject you have nothing to > > determine which (if any) factory to use other than the object itself. > > You could work on some kind of reflection based scheme, but this is > > certainly not symmetric with how getObject works. > > Every Object's class can be obtained with the getClass() method. So (as > I weakly hinted in my original post) we can have a Map with the > classes (or fully qualified class names) as keys, and the factories as > values. Using a HashMap as Map, this allows us to find the factory in > constant time (*). > > I also think that the current setObject(int, Object) using an 13-branch > if(instanceof)-else construct could be sped up by this - of course, at > the cost of creating the appropriate factory classes. > Well this cleanup is something I'm actually more interested in, my work on adding COPY support to the driver has kind of stalled because I need the ability to read/write internal data types and all of this knowledge is contained in the Statement/ResultSet classes. > (*) Of course, correct handling of subclasses is a little more tricky, > but I already have some ideas on how to handle this problem. But > as I don't want to overload this message, I'll outline them in another > Followup. Yes, this is an issue I hadn't considered and am interested to see your reply. Kris Jurka
Markus Schaber wrote: > I also think that the current setObject(int, Object) using an 13-branch > if(instanceof)-else construct could be sped up by this - of course, at > the cost of creating the appropriate factory classes. 'instanceof' is such a common VM operation (it's implied by every cast) that I'd expect it to be pretty fast. Is a hashmap lookup actually faster than an inlined multibranch 'if' for the number of comparisons we do? -O
Hi, Kris, On Thu, 18 Mar 2004 12:16:13 -0500 (EST) Kris Jurka <books@ejurka.com> wrote: > Well this cleanup is something I'm actually more interested in, my > work on adding COPY support to the driver has kind of stalled because > I need the ability to read/write internal data types and all of this > knowledge is contained in the Statement/ResultSet classes. Okay, so maybe my Ideas will have a good side-effect :-) > > (*) Of course, correct handling of subclasses is a little more > > tricky, but I already have some ideas on how to handle this problem. > > But as I don't want to overload this message, I'll outline them in > > another Followup. > > Yes, this is an issue I hadn't considered and am interested to see > your reply. Okay, here are my thoughts: - Have the user explicitly register them. Each factory instance has a list of SQL types and a list of java classes it knows to process, and the factory creator must know all possible subclasses. Most of the simple classes (Integer, String etc.) are declared final, so they don't have any problems. This also works fine for PostGIS or JTS Geometries where we create different java subclasses of Geometry from the same SQL type, because the Factory knows what to create (just as PGGeometry does now). However, whenenever the user intends to setObject() his own subclasses of those known classes, he has to manually register those classes (e. G. by providing his own factory derivate). - Fall-back factory probing. The factory functions have a probe() method that returns true when the factory believes to be able to handle the java class. Whenever we don't find the class in our hash, we probe the factories until the first"hit", and register this factory for the class. The probe() method can internally use instanceof or reflection. This way it is possible to test for Interfaces or to handle all Objects that have a getSQLrepresentation() bean property etc... - Super-Class probing. Whenever we don't know the class, we try with the getSuperclass() result instead. This is repeated until we get to java.lang.Object which has a "write-only" factory calling toString(). This can be enhanced by calling the getInterfaces() method on every tested class, and look whether we have a registration for one of the implemented Interfaces (which are represented by Class objects, too). Disclaimer: It's late, and so I can't guarantee that above thoughts are understandable. And I did not yet think about Arrays and primitive types... Thanks for your patience, Markus -- markus schaber | dipl. informatiker logi-track ag | rennweg 14-16 | ch 8001 zürich phone +41-43-888 62 52 | fax +41-43-888 62 53 mailto:schabios@logi-track.com | www.logi-track.com
Hi, Oliver, On Fri, 19 Mar 2004 10:19:58 +1300 Oliver Jowett <oliver@opencloud.com> wrote: > > I also think that the current setObject(int, Object) using an > > 13-branch if(instanceof)-else construct could be sped up by this - > > of course, at the cost of creating the appropriate factory classes. > > 'instanceof' is such a common VM operation (it's implied by every > cast) that I'd expect it to be pretty fast. Is a hashmap lookup > actually faster than an inlined multibranch 'if' for the number of > comparisons we do? As I don't have the answer to this question, I invite you to benchmark this. And you should expect that your results vary with different jvms, jits and Map implementations (maybe we could use e. G. an apache commons map). However, this is why I think reworking the built-in types is to be discussed. We have to think about speed issues, but also about code maintenance and clean architecture. Have a nice sleep, Markus -- markus schaber | dipl. informatiker logi-track ag | rennweg 14-16 | ch 8001 zürich phone +41-43-888 62 52 | fax +41-43-888 62 53 mailto:schabios@logi-track.com | www.logi-track.com