Re: Pl/Java - next step? - Mailing list pgsql-hackers

From Thomas Hallgren
Subject Re: Pl/Java - next step?
Date
Msg-id 00ca01c3f967$c0d443b0$6401a8c0@ad.eoncompany.com
Whole thread Raw
In response to Pl/Java - next step?  ("Thomas Hallgren" <thhal@mailblocks.com>)
List pgsql-hackers
Hi Dave,
Comments on your comments inline...

> What I would like to see is an abstraction of the interface that
> communicates with the JVM so that we can use either, as you have pointed
> out  the JNI mechanism has advantages, as does the remote mechanism.
>
Yes, we must agree 100% on the "interface" that the user of Pl/Java will
see. I've done some work that we could use as a common base. I've tried to
follow the proposed SQL standard whenever possible.

Everything the user of my implementation will see is a set of interfaces.
Most of them standard java.sql.* stuff. ResultSet plays a very important
role as I pass complex types and sets using that interface. I also represent
the :new and :old row in triggers using ResultSet. The only "proprietary"
interface in use, is the TriggerData interface.

I'm eager to commence a discussion around this. And of course, also around
how the actual CREATE FUNCTION etc. is written, how jar files are loaded,
how classpaths are setup etc.

> 1) Using JNI, you probably still want to communicate with another
> running java process. For instance lets say that you have an application
> which is viewing live inventory. For simplicity sake we will us a swing
> client.
>
> Now we have a pl/java trigger function which gets called when the
> inventory gets changed, ostensibly the java trigger would want to notify
> the client, so it will have to utilize some sort of remote function call
> to do this.
>
> So at this point we now have pl/java -> JNI -> JVM/procedure -> remote
> call -> JVM running client. This would further be exacerbated, by the
> fact that all your connections are going to have to make the same remote
> function call.
>
I agree. And as I pointed out initially, this is one of the advantages with
your solution "Reuse of existing JVM". Not because the call chain is longer
(since the JNI call is more or less free) but since two or more JVM's need
to be running instead of just one.

Separation of concern is an issue though. The load of the application server
will have a negative impact on the database performance. Especially in cases
where the database resides on a different server (which is very common).

Another issue, related to your example (and not solved in either of the
Pl/Java solutions), is transaction demarcation. Typically, the scenario you
describe, or any similar scenario where notifications are sent, would
require a transaction coordinated message queue. You don't want the message
to reach Tomcat unless the transaction commits. I've raised that issue
before (more than once). In order to solve this, we'd need the ability to
subscribe to transaction events in the backend.

Using a message queue, you will get an asynchrounious delivery mechanism and
the load of the appserver have no effect on database performance.

> 2) While I haven't done the real work to determine if the following is
> true, I'm not willing to buy the argument that the keeping multiple
> class loaders is about as expensive as multiple JVM's. My argument is
> based on the fact that servlet containers (Tomcat etal) do this without
> showing onerous memory footprints.
>
I agree, that argument is a bit far fetched. It all depends on what
isolation you want to establish between the connections. If you want full
isolation, i.e. like the one Oracle provides, where different sessions
cannot share data through static variables of a class, then you need more or
less completely separate classloader chains. This, combined with the fact
that you already have a process running for each connection, makes the
differences in resource consumpion fairly small.

But, a design where the isolation is less enforced (like in a servlet
container) is in your favor. JSR121, enabling virtual JVM's within a JVM
will also be a remedy for the problem when it arrives.

> As far as using a heavyweight RPC mechanism, I have talked to laszlo
> about removing CORBA and replacing it with a lightweight custom protocol
> much like the FE/BE protocol that the server uses, this seems to be a
> better solution, as I agree installing a CORBA environment would be a
> huge barrier to entry.
>
I've been down that road a couple of times in earlier assigments. Keep in
mind that you have to pass objects by reference in order to call back into
the caller process. And that the caller in turn might call you. And that
exceptions might be thrown at any time. CORBA caters for all of that. It is
expensive, no matter how you do it. One major cost of RPC is related to
process context switching and that will not go away regardless of how
efficient your protocoll is. You might find the gain of a home-brewed
protocol to be very small (if any).

RPC is drastically more expensive then in-process calls, no matter what you
do. This is the major disadvantage with your solution, just as resource
consumption imposed by several JVM is mine.

> Regarding Transaction Visibility pl-j (remote version) already uses SPI
> to do it's calls, and the jdbc layer there would do the same, I'm not
> sure how that it matters if it is jni, or remote? Perhaps I am missing
> something ?
>
My only argument is that you get a vast amount of RPC calls doing it that
way.

> In the end as we said before, laszlo is interested in collaborating so
> hopefully we can find some middle ground here ?
>
I'm interested too. As I wrote in the beginning of this mail, I've done some
work specifying interfaces etc. with an intention to follow the proposed SQL
standard. All jar manipulation, and aclassloader that utilizes the database
etc. is written in pure java and uses plain JDBC on the default connection.
Perhaps you've done similar things?

I think the Java interfaces and the SQL statements that will be exposed to
the Pl/Java user is the most important thing to agree on. If a trigger is
written for one implementation, it should be usable on the other, no matter
what. The second most important thing is store jar files etc. in a similar
way in the database and to strive to write as much as possible of the
manager functions in Java so that we can share them.

Do you store jar files in the database?
How do you manage different classpaths for different schemas?
Do you have some user guide or other reading that you can send to me?

Regards,

Thomas Hallgren



pgsql-hackers by date:

Previous
From: Dave Cramer
Date:
Subject: Re: Pl/Java - next step?
Next
From: Tom Lane
Date:
Subject: Re: Too-many-files errors on OS X