Re: Pl/Java - next step? - Mailing list pgsql-hackers

From Dave Cramer
Subject Re: Pl/Java - next step?
Date
Msg-id 1077464809.1642.234.camel@localhost.localdomain
Whole thread Raw
In response to Pl/Java - next step?  ("Thomas Hallgren" <thhal@mailblocks.com>)
List pgsql-hackers
Thomas,


What I would like to see is an abstraction of the interface that
communicates with the JVM so that we can use either, as you have pointed
out  the JNI mechanism has advantages, as does the remote mechanism.

I recently did an analysis of the two methods and there are a couple of
other points.

1) Using JNI, you probably still want to communicate with another
running java process. For instance lets say that you have an application
which is viewing live inventory. For simplicity sake we will us a swing
client.

Now we have a pl/java trigger function which gets called when the
inventory gets changed, ostensibly the java trigger would want to notify
the client, so it will have to utilize some sort of remote function call
to do this. 

So at this point we now have pl/java -> JNI -> JVM/procedure -> remote
call -> JVM running client. This would further be exacerbated, by the
fact that all your connections are going to have to make the same remote
function call.

2) While I haven't done the real work to determine if the following is
true, I'm not willing to buy the argument that the keeping multiple
class loaders is about as expensive as multiple JVM's. My argument is
based on the fact that servlet containers (Tomcat etal) do this without
showing onerous memory footprints. 


As far as using a heavyweight RPC mechanism, I have talked to laszlo
about removing CORBA and replacing it with a lightweight custom protocol
much like the FE/BE protocol that the server uses, this seems to be a
better solution, as I agree installing a CORBA environment would be a
huge barrier to entry.

Regarding Transaction Visibility pl-j (remote version) already uses SPI
to do it's calls, and the jdbc layer there would do the same, I'm not
sure how that it matters if it is jni, or remote? Perhaps I am missing
something ?

In the end as we said before, laszlo is interested in collaborating so
hopefully we can find some middle ground here ?


Dave

On Sat, 2004-02-21 at 05:04, Thomas Hallgren wrote:
> Two Pl/Java implementations exists today. Due to the architecture of
> PostgreSQL, compromises have been made in both of them to deal with the fact
> that each connection lives in its own process. One, I'll call it
> "Pl/Java_JNI" will spawn a JVM on demand for each connection and the other,
> "Pl/Java_remote", will spawn at least one JVM that lives in a process of its
> own and use an inter-process calling mechanism.
> 
> I can see PostgreSQL moving forward in one of four different directions:
> 
> 1. Select Pl/Java_JNI.
> 2. Select Pl/Java_remote
> 3. Choose both and agree on the SQL + Java semantics
> 4. Make the postmaster spawn threads rather than processes (controversial?
> Nah :-) )
> 
> As the one behind Pl/Java_JNI I'm perhaps not the most objective person when
> it comes to choice, but I'll make an effort here and try to list the pros
> and cons with each choice. My objective is to start a healthy discussion. I
> think Pl/Java migth boost usability of PostgreSQL quite a bit and with an
> almost explosive growth of the Java Community its essential that we conclude
> this sooner rather than later.
> 
> 
> 
> ** 1. Select Pl/Java_JNI **
> #Pros:#
> - Each call becomes extremely lightweight.
> JNI is in essence a straight forward in-process function invocation.
> Minimizing call overhead becomes very important for functions that a) are
> called very often and b) functions that need to call back into the backend
> several times.
> 
> - Minimum resource utilization when passing values.
> Values can be passed by reference. TriggerData, TupleDesc, HeapTuple, byte
> arrays etc. need not be copied. Return values can be allocated directly in
> the correct MemoryContext.
> 
> - Transaction visibility
> Using a JDBC driver that's implemented directly on top of SPI ensures that
> the transaction visibility is correct without the need to either propagate a
> transaction context or make remote calls back into the backend.
> 
> - Connection isolation
> Easy to use since the developer "owns" the whole JVM. There's no need to
> terminate all connections in order to replace code or to establish a debug
> session. Migration can take place gradually.
> 
> - Simplicity
> No hassle setting up inter-process communication or maintaining a separate
> JVM.
> 
> - Modern JVM's are less demanding
> Sun and other JVM vendors are making serious efforts to make the JVM more
> adaptable. Java is not used for heavy weight server processing only. Small
> utility programs become more and more common. Thus, decreasing start-up time
> and ability to adapt resource consumption have very high priority. Look here
> what Java 1.5 does
> http://java.sun.com/j2se/1.5.0/docs/relnotes/features.html#vm.
> 
> - Well knonw programming envionment
> JNI is standard. A potential developer of the code have access to on-line
> training.
> 
> #Cons:#
> - Resource consumption.
> A JVM is expensive from a resource perspective.
> 
> - Connection start-up time is high.
> Booting a JVM takes time. Setups where connections that makes invocations to
> Pl/Java are closed and created frequently will suffer from this.
> 
> - Java execution model differs from the one used by PostgreSQL
> Java uses multithreading wether you like it or not. And the JVM will throw
> exceptions. The Pl/Java_JNI handles this by introducing some macros that a
> potential developer that makes additions to the port must be aware of. This
> also introduces limitations for the user of Pl/Java JNI (such as very
> limited functionality once an error has been generated by the backend).
> 
> 
> 
> 
> 
> ** 2. Select Pl/Java_remote **
> #Pros:#
> - Each connection becomes fairly lightweight.
> A connection is represented as a thread in the remote JVM. Threads are much
> less expensive than a full-blown JVM.
> 
> - Connection start-up time is low
> Startup time will be very quick since thread creation is cheap. Even quicker
> if a thread-pool is utilized.
> 
> - Reuse of an existing JVM
> Small systems might use the same JVM to run an app-server as the one used by
> triggers and functions. Albeit not great from a "separation of concern"
> perspective, it might be very efficient for special needs.
> 
> - Ability to run the JVM on another server
> The JVM can run on a server different from the one running the backend
> process. If the number of calls are few in relation to the actual work
> performed in each call, this might be interesting.
> 
> #Cons:#
> - RPC calls are slow
> Call between processes are inherently very slow compared to in-process
> calls.
> 
> - RPC resources needed
> Each connection will need an additional socket or shared memory segment.
> 
> - Transaction visibility
> A connection established in the remote JVM must have the same transaction
> visibility as the invoker. In essence, a transaction context must be
> propagated to the remote JVM, or the remote JVM must have a JDBC driver that
> calls back into the backend.
> 
> - RPC management
> CORBA or some other mechanism must be installed and maintained.
> 
> - Starting/Stopping JVM affects all connections
> Attaching a debugger or generating profiling information implies a restart
> of the JVM, killing all existing connections that make use of
> Pl/Java_remote. Code migration implies full stop + restart (The JSR121
> Isolation API didn't make it into the 1.5 release).
> 
> - Complex programming envionment
> A potential developer of the code base have a lot to learn. The API between
> backend and Java code is non-standard.
> 
> 
> 
> 
> 
> ** 3. Choose both and agree on the SQL + Java semantics **
> #Pros:#
> - Best of two worlds
> The user can decide, depending on his/ her setup, thus gaining optimal
> performance.
> 
> - Everyone wins
> Nobody needs to feel sad when their implementation was rejected.
> 
> #Cons:#
> - Might be perceived as a kludge
> The competitors don't need multiple implementations. Introducing two ways of
> doing it might be perceived as ways to get around a less then perfect design
> with uncertainties and choice of another database as the result.
> 
> - The choice is not evident
> The user have to make a choice. Sometimes the choice is not evident.
> 
> - Project synchronization
> Someone needs to synchronize the projects.
> 
> - Double effort
> Almost everything needs to be developed twice since the approaches have
> fundamental differences.
> 
> 
> 
> 
> 
> ** 4. Make the postmaster spawn threads rather than processes **
> I know this is very controversial and perhaps I should not bring it up at
> all. But then again, why not? Most readers are open-minded right?
> 
> #Pros:#
> - Really best of two words
> There would be one JVM per postmaster and in-process calls would be used
> throughout
> 
> - Other pl<lang> could benefit?
> Other languages where multithreading is an option could benefit the same way
> Java does.
> 
> - Other pros
> Beyond the scope of the topic.
> 
> #Cons:#
> - Code rewrite
> Right. All PostgreSQL code would need an overhaul. That would be a serious
> effort to say the least.
> 
> - Code base selection
> We'd still need to choose what existing Pl/Java implementation that should
> be used as base for the in-process + multithreaded implementation.
> 
> - Other cons
> Beyond the scope of the topic.
> 
> What are the next steps? Setting up benchmarking and test performance
> perhaps? Should not be done my me, nor by the people behind the
> Pl/Java_remote, but rather by someone who is truly objective.
> 
> Kind regards
> 
> Thomas Hallgren
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
> 
-- 
Dave Cramer
519 939 0336
ICQ # 14675561



pgsql-hackers by date:

Previous
From: Kevin Brown
Date:
Subject: Re: Too-many-files errors on OS X
Next
From: "Thomas Hallgren"
Date:
Subject: Re: Pl/Java - next step?