Thread: Register arbitrary types Framework

Register arbitrary types Framework

From
Markus Schaber
Date:
Hello @all,

I don't know whether this List is the correct place to ask this, and so
I added an similar Feature Request to the gborg project:
http://gborg.postgresql.org/project/pgjdbc/bugs/bugupdate.php?709


For my project, I need to register third-party classes as postgres
classes, (specifically, I want JTS objects to be directly created when
calling getObject(int)) and it is not eligible to patch those
third-party classes.

However, the current user-type register system only allows subtypes of
PGObject to be registered. I have two Ideas on how to change this,
whereas I would prefer the second one, and am willing to write and
provide patches for both of them if they are to be accepted:

- When a registered object is no subclass of PGObject, use the
constructor that uses a single String as arguments for reading and
toString() for writing.

- Allow instances of a factory Interface to be registered to the
PostGreSQL connection. Then, when reading this type, a creator function
of the factory is called, and when writing, we can use a map to find the
factory from the object's class.

At least the second solution would also allow cleaning up of the PostGIS
jdbc drivers as they now use a PGGeometry class that acts as a pure
Wrapper function, and the user has to call getGeometry() on the received
PGGeometry to get the real Geometry.



Thanks for your patience,

Markus
--
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:schabios@logi-track.com | www.logi-track.com

Re: Register arbitrary types Framework

From
Kris Jurka
Date:

On Wed, 10 Mar 2004, Markus Schaber wrote:

> Hello @all,
>
> For my project, I need to register third-party classes as postgres
> classes, (specifically, I want JTS objects to be directly created when
> calling getObject(int)) and it is not eligible to patch those
> third-party classes.
>
> However, the current user-type register system only allows subtypes of
> PGObject to be registered. I have two Ideas on how to change this,
> whereas I would prefer the second one, and am willing to write and
> provide patches for both of them if they are to be accepted:
>
> - When a registered object is no subclass of PGObject, use the
> constructor that uses a single String as arguments for reading and
> toString() for writing.
>
> - Allow instances of a factory Interface to be registered to the
> PostGreSQL connection. Then, when reading this type, a creator function
> of the factory is called, and when writing, we can use a map to find the
> factory from the object's class.
>

I see what you are getting at, but I don't see this as all that useful.
How is this much better than creating a PGObject adapter class to bridge
the differences between your object and the driver?  Wouldn't you need to
make some kind of modification to the JTS object when doing a setObject()
call?  I find it difficult to see its toString() implementation mapping
directly to a pg server side type.

The JDBC spec provides another type mapping interface based on SQLData.
This is more focused on complex types, but we could do something like only
allowing one read or write call per SQLInput/SQLOutput object.  This still
doesn't solve your problem because it needs to implement SQLData.

Perhaps you could give us some more specifics on the database types
involved and why getObject has to return an instance of your object
instead of an intermediary.

Kris Jurka


Re: Register arbitrary types Framework

From
Markus Schaber
Date:
Hi, Kris,

On Thu, 11 Mar 2004 00:55:20 -0500 (EST)
Kris Jurka <books@ejurka.com> wrote:

> I see what you are getting at, but I don't see this as all that
> useful.  How is this much better than creating a PGObject adapter
> class to bridge the differences between your object and the driver?

In my eyes, it has the following advantages:

- Less overhead. For every read Object, an additional wrapper has to be
instantiated just to be thrown away afterwards.

- More readable code. For every getObject(), you need an additional line
to call getGeometry() or whatever unwrapping method you use.

- Unification for the user with "internal" Objects like TimeStamp,
Integer or String, where getObject() returns the Object directly.

- Possible simplification of the Driver code as we may even change the
forementioned internal Objects to use the new type system, they're just
preloaded into the type table.

- This way, we could even allow the Users to "override" the internal
types, e. G. to return Collections instead of Arrays, or customized/
localized calendar objects instead of TimeStamps etc.

- We can have one set of Geometry Java Objects for different SQL
databases, and we only have to mess with it at the database/driver
specific connection initialization, but not in the code that fires the
queries.

> Wouldn't you need to make some kind of modification to the JTS object
> when doing a setObject() call?  I find it difficult to see its
> toString() implementation mapping directly to a pg server side type.

Luckily, both JTS and PostGIS Objects return their WKT representation
when calling toString(), and PostGreSQL understands this.

But this problem is why I prefer the second solution using factories
although it is more work to start with. The factory instances basically
have 2 methods - one for reading and another one for writing.

> Perhaps you could give us some more specifics on the database types
> involved and why getObject has to return an instance of your object
> instead of an intermediary.

Currently, we have PostGIS geometries in the database, which can be read
into PostGIS Java objects and JTS Java objects by using the PostGIS
Wrapper rsp. our own JTS Wrapper.

Our app itsself doesn't know anything about the wrappers (and doesn't
want to know because it is designet to transparently support other I/O
methods, too). And as our Queries are generic and user-configurable, and
support other built-in types as timestamps, currently our SQL layer
calls an unwrap() method after each getObject(). This unwrap() function
tests for known Wrapper classes and calls their getGeometry() or similar
method.

Thanks,
Markus

--
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:schabios@logi-track.com | www.logi-track.com

Re: Register arbitrary types Framework

From
Kris Jurka
Date:
[Discussing not having ResultSet.getObject return a PGObject]

I did not receive your reply through the list for some reason, but I
stumbled across it in the archives:

http://archives.postgresql.org/pgsql-jdbc/2004-03/msg00067.php

I asked what about symmetry with setObject and you replied,

    But this problem is why I prefer the second solution using
    factories although it is more work to start with. The factory
    instances basically have 2 methods - one for reading and
    another one for writing.

With getObject you can register the factory with a pg internal type name
that the driver knows, but with setObject you have nothing to determine
which (if any) factory to use other than the object itself.  You could
work on some kind of reflection based scheme, but this is certainly not
symmetric with how getObject works.

Kris Jurka


Re: Register arbitrary types Framework

From
Markus Schaber
Date:
Hi, Kris,

On Wed, 17 Mar 2004 02:29:07 -0500 (EST)
Kris Jurka <books@ejurka.com> wrote:

> I did not receive your reply through the list for some reason, but I
> stumbled across it in the archives:

Strange... I'll send you a Cc: to be on the safe side :-)

>     But this problem is why I prefer the second solution using
>     factories although it is more work to start with. The factory
>     instances basically have 2 methods - one for reading and
>     another one for writing.
>
> With getObject you can register the factory with a pg internal type
> name that the driver knows, but with setObject you have nothing to
> determine which (if any) factory to use other than the object itself.
> You could work on some kind of reflection based scheme, but this is
> certainly not symmetric with how getObject works.

Every Object's class can be obtained with the getClass() method. So (as
I weakly hinted in my original post) we can have a Map with the
classes (or fully qualified class names) as keys, and the factories as
values. Using a HashMap as Map, this allows us to find the factory in
constant time (*).

I also think that the current setObject(int, Object) using an 13-branch
if(instanceof)-else construct could be sped up by this - of course, at
the cost of creating the appropriate factory classes.

On the other hand, this allows some code cleanup by moving most of the
type specific code out of the statement and resultset classes, degrading
the specific setXXX methods to simple wrappers. But this is not my
primary intention - IMHO, it's more like a positive side effect to
have the possibility. It has to be discussed whether this refactoring is
appropriate regarding design and speed issues.

Thanks for your patience,
Markus


(*) Of course, correct handling of subclasses is a little more tricky,
but I already have some ideas on how to handle this problem. But
as I don't want to overload this message, I'll outline them in another
Followup.

--
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:schabios@logi-track.com | www.logi-track.com

Re: Register arbitrary types Framework

From
Kris Jurka
Date:

On Thu, 18 Mar 2004, Markus Schaber wrote:

> > With getObject you can register the factory with a pg internal type
> > name that the driver knows, but with setObject you have nothing to
> > determine which (if any) factory to use other than the object itself.
> > You could work on some kind of reflection based scheme, but this is
> > certainly not symmetric with how getObject works.
>
> Every Object's class can be obtained with the getClass() method. So (as
> I weakly hinted in my original post) we can have a Map with the
> classes (or fully qualified class names) as keys, and the factories as
> values. Using a HashMap as Map, this allows us to find the factory in
> constant time (*).
>
> I also think that the current setObject(int, Object) using an 13-branch
> if(instanceof)-else construct could be sped up by this - of course, at
> the cost of creating the appropriate factory classes.
>

Well this cleanup is something I'm actually more interested in, my work on
adding COPY support to the driver has kind of stalled because I need the
ability to read/write internal data types and all of this knowledge is
contained in the Statement/ResultSet classes.

> (*) Of course, correct handling of subclasses is a little more tricky,
> but I already have some ideas on how to handle this problem. But
> as I don't want to overload this message, I'll outline them in another
> Followup.

Yes, this is an issue I hadn't considered and am interested to see your
reply.

Kris Jurka

Re: Register arbitrary types Framework

From
Oliver Jowett
Date:
Markus Schaber wrote:

> I also think that the current setObject(int, Object) using an 13-branch
> if(instanceof)-else construct could be sped up by this - of course, at
> the cost of creating the appropriate factory classes.

'instanceof' is such a common VM operation (it's implied by every cast)
that I'd expect it to be pretty fast. Is a hashmap lookup actually
faster than an inlined multibranch 'if' for the number of comparisons we do?

-O

Re: Register arbitrary types Framework

From
Markus Schaber
Date:
Hi, Kris,

On Thu, 18 Mar 2004 12:16:13 -0500 (EST)
Kris Jurka <books@ejurka.com> wrote:

> Well this cleanup is something I'm actually more interested in, my
> work on adding COPY support to the driver has kind of stalled because
> I need the ability to read/write internal data types and all of this
> knowledge is contained in the Statement/ResultSet classes.

Okay, so maybe my Ideas will have a good side-effect :-)

> > (*) Of course, correct handling of subclasses is a little more
> > tricky, but I already have some ideas on how to handle this problem.
> > But as I don't want to overload this message, I'll outline them in
> > another Followup.
>
> Yes, this is an issue I hadn't considered and am interested to see
> your reply.

Okay, here are my thoughts:


- Have the user explicitly register them. Each factory instance has a
list of SQL types and a list of java classes it knows to process, and
the factory creator must know all possible subclasses. Most of the
simple classes (Integer, String etc.) are declared final, so they don't
have any problems. This also works fine for PostGIS or JTS Geometries
where we create different java subclasses of Geometry from the same SQL
type, because the Factory knows what to create (just as PGGeometry does
now).

However, whenenever the user intends to setObject() his own subclasses
of those known classes, he has to manually register those classes (e. G.
by providing his own factory derivate).


- Fall-back factory probing. The factory functions have a probe()
method that returns true when the factory believes to be able to
handle the java class. Whenever we don't find the class in our hash,
we probe the factories until the first"hit", and register this factory
for the class.

The probe() method can internally use instanceof or
reflection. This way it is possible to test for Interfaces or to handle
all Objects that have a getSQLrepresentation() bean property etc...


- Super-Class probing. Whenever we don't know the class, we try with the
getSuperclass() result instead. This is repeated until we get to
java.lang.Object which has a "write-only" factory calling toString().

This can be enhanced by calling the getInterfaces() method on every
tested class, and look whether we have a registration for one of the
implemented Interfaces (which are represented by Class objects, too).


Disclaimer: It's late, and so I can't guarantee that above thoughts are
understandable. And I did not yet think about Arrays and primitive
types...

Thanks for your patience,
Markus
--
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:schabios@logi-track.com | www.logi-track.com

Re: Register arbitrary types Framework

From
Markus Schaber
Date:
Hi, Oliver,

On Fri, 19 Mar 2004 10:19:58 +1300
Oliver Jowett <oliver@opencloud.com> wrote:

> > I also think that the current setObject(int, Object) using an
> > 13-branch if(instanceof)-else construct could be sped up by this -
> > of course, at the cost of creating the appropriate factory classes.
>
> 'instanceof' is such a common VM operation (it's implied by every
> cast) that I'd expect it to be pretty fast. Is a hashmap lookup
> actually faster than an inlined multibranch 'if' for the number of
> comparisons we do?

As I don't have the answer to this question, I invite you to benchmark
this. And you should expect that your results vary with different jvms,
jits and Map implementations (maybe we could use e. G. an apache
commons map).

However, this is why I think reworking the built-in types is to be
discussed. We have to think about speed issues, but also about code
maintenance and clean architecture.


Have a nice sleep,
Markus
--
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:schabios@logi-track.com | www.logi-track.com