Thread: Pl/Java - next step?

Pl/Java - next step?

From
"Thomas Hallgren"
Date:
Two Pl/Java implementations exists today. Due to the architecture of
PostgreSQL, compromises have been made in both of them to deal with the fact
that each connection lives in its own process. One, I'll call it
"Pl/Java_JNI" will spawn a JVM on demand for each connection and the other,
"Pl/Java_remote", will spawn at least one JVM that lives in a process of its
own and use an inter-process calling mechanism.

I can see PostgreSQL moving forward in one of four different directions:

1. Select Pl/Java_JNI.
2. Select Pl/Java_remote
3. Choose both and agree on the SQL + Java semantics
4. Make the postmaster spawn threads rather than processes (controversial?
Nah :-) )

As the one behind Pl/Java_JNI I'm perhaps not the most objective person when
it comes to choice, but I'll make an effort here and try to list the pros
and cons with each choice. My objective is to start a healthy discussion. I
think Pl/Java migth boost usability of PostgreSQL quite a bit and with an
almost explosive growth of the Java Community its essential that we conclude
this sooner rather than later.



** 1. Select Pl/Java_JNI **
#Pros:#
- Each call becomes extremely lightweight.
JNI is in essence a straight forward in-process function invocation.
Minimizing call overhead becomes very important for functions that a) are
called very often and b) functions that need to call back into the backend
several times.

- Minimum resource utilization when passing values.
Values can be passed by reference. TriggerData, TupleDesc, HeapTuple, byte
arrays etc. need not be copied. Return values can be allocated directly in
the correct MemoryContext.

- Transaction visibility
Using a JDBC driver that's implemented directly on top of SPI ensures that
the transaction visibility is correct without the need to either propagate a
transaction context or make remote calls back into the backend.

- Connection isolation
Easy to use since the developer "owns" the whole JVM. There's no need to
terminate all connections in order to replace code or to establish a debug
session. Migration can take place gradually.

- Simplicity
No hassle setting up inter-process communication or maintaining a separate
JVM.

- Modern JVM's are less demanding
Sun and other JVM vendors are making serious efforts to make the JVM more
adaptable. Java is not used for heavy weight server processing only. Small
utility programs become more and more common. Thus, decreasing start-up time
and ability to adapt resource consumption have very high priority. Look here
what Java 1.5 does
http://java.sun.com/j2se/1.5.0/docs/relnotes/features.html#vm.

- Well knonw programming envionment
JNI is standard. A potential developer of the code have access to on-line
training.

#Cons:#
- Resource consumption.
A JVM is expensive from a resource perspective.

- Connection start-up time is high.
Booting a JVM takes time. Setups where connections that makes invocations to
Pl/Java are closed and created frequently will suffer from this.

- Java execution model differs from the one used by PostgreSQL
Java uses multithreading wether you like it or not. And the JVM will throw
exceptions. The Pl/Java_JNI handles this by introducing some macros that a
potential developer that makes additions to the port must be aware of. This
also introduces limitations for the user of Pl/Java JNI (such as very
limited functionality once an error has been generated by the backend).





** 2. Select Pl/Java_remote **
#Pros:#
- Each connection becomes fairly lightweight.
A connection is represented as a thread in the remote JVM. Threads are much
less expensive than a full-blown JVM.

- Connection start-up time is low
Startup time will be very quick since thread creation is cheap. Even quicker
if a thread-pool is utilized.

- Reuse of an existing JVM
Small systems might use the same JVM to run an app-server as the one used by
triggers and functions. Albeit not great from a "separation of concern"
perspective, it might be very efficient for special needs.

- Ability to run the JVM on another server
The JVM can run on a server different from the one running the backend
process. If the number of calls are few in relation to the actual work
performed in each call, this might be interesting.

#Cons:#
- RPC calls are slow
Call between processes are inherently very slow compared to in-process
calls.

- RPC resources needed
Each connection will need an additional socket or shared memory segment.

- Transaction visibility
A connection established in the remote JVM must have the same transaction
visibility as the invoker. In essence, a transaction context must be
propagated to the remote JVM, or the remote JVM must have a JDBC driver that
calls back into the backend.

- RPC management
CORBA or some other mechanism must be installed and maintained.

- Starting/Stopping JVM affects all connections
Attaching a debugger or generating profiling information implies a restart
of the JVM, killing all existing connections that make use of
Pl/Java_remote. Code migration implies full stop + restart (The JSR121
Isolation API didn't make it into the 1.5 release).

- Complex programming envionment
A potential developer of the code base have a lot to learn. The API between
backend and Java code is non-standard.





** 3. Choose both and agree on the SQL + Java semantics **
#Pros:#
- Best of two worlds
The user can decide, depending on his/ her setup, thus gaining optimal
performance.

- Everyone wins
Nobody needs to feel sad when their implementation was rejected.

#Cons:#
- Might be perceived as a kludge
The competitors don't need multiple implementations. Introducing two ways of
doing it might be perceived as ways to get around a less then perfect design
with uncertainties and choice of another database as the result.

- The choice is not evident
The user have to make a choice. Sometimes the choice is not evident.

- Project synchronization
Someone needs to synchronize the projects.

- Double effort
Almost everything needs to be developed twice since the approaches have
fundamental differences.





** 4. Make the postmaster spawn threads rather than processes **
I know this is very controversial and perhaps I should not bring it up at
all. But then again, why not? Most readers are open-minded right?

#Pros:#
- Really best of two words
There would be one JVM per postmaster and in-process calls would be used
throughout

- Other pl<lang> could benefit?
Other languages where multithreading is an option could benefit the same way
Java does.

- Other pros
Beyond the scope of the topic.

#Cons:#
- Code rewrite
Right. All PostgreSQL code would need an overhaul. That would be a serious
effort to say the least.

- Code base selection
We'd still need to choose what existing Pl/Java implementation that should
be used as base for the in-process + multithreaded implementation.

- Other cons
Beyond the scope of the topic.

What are the next steps? Setting up benchmarking and test performance
perhaps? Should not be done my me, nor by the people behind the
Pl/Java_remote, but rather by someone who is truly objective.

Kind regards

Thomas Hallgren




Re: Pl/Java - next step?

From
Tom Lane
Date:
"Thomas Hallgren" <thhal@mailblocks.com> writes:
> ** 4. Make the postmaster spawn threads rather than processes **
> I know this is very controversial and perhaps I should not bring it up at
> all. But then again, why not? Most readers are open-minded right?

It's been considered and rejected before, and pljava isn't going to tilt
the scales.  In fact, the main thing that bothers me about your
description of JNI is "Java uses multithreading wether you like it or
not".  I am very afraid of what impact a JVM will have on the stability
of the surrounding backend.

Other than that fear, though, the JNI approach seems to have pretty
considerable advantages.  You listed startup time as the main
disadvantage, but perhaps that could be worked around.  Suppose the
postmaster started a JVM --- would that state inherit correctly into
subsequently forked backends?

Also, regarding your option #3 (do both), do you really think something
different is going to happen in practice?  The developers of the other
implementation aren't likely to give it up just because yours exists.
        regards, tom lane


Re: Pl/Java - next step?

From
"Thomas Hallgren"
Date:
> It's been considered and rejected before, and pljava isn't going to tilt
> the scales.
>
Didn't think it would. Thought it worth mentioning anyway, partly to get
your reaction.

> In fact, the main thing that bothers me about your
> description of JNI is "Java uses multithreading wether you like it or
> not".  I am very afraid of what impact a JVM will have on the stability
> of the surrounding backend.
>
I have taken extensive measures to prevent multiple threads to access the
backend simultaniously. I encourage you and anyone else who have an interest
in how this is done to read my "Some problems and their solution" document
posted here:
http://gborg.postgresql.org/project/pljava/genpage.php?solutions.

> Other than that fear, though, the JNI approach seems to have pretty
> considerable advantages.  You listed startup time as the main
> disadvantage, but perhaps that could be worked around.  Suppose the
> postmaster started a JVM --- would that state inherit correctly into
> subsequently forked backends?
>
That's an interesting thougth. The postmaster just forks. It never exec's
right? Is this true for win32 as well? I've never tried it but it might be
worth pursuing. Sun's new Java 1.5 jvm does this albeit a bit differently.
An initializer process starts up and persists its state. Subsequent JVM's
then reuse that state. I definitely plan for Pl/Java_JNI to take advantage
of that.

> Also, regarding your option #3 (do both), do you really think something
> different is going to happen in practice?  The developers of the other
> implementation aren't likely to give it up just because yours exists.
>
My objective is not that they or I should give up. I want us to reach a
concensus around what PostgreSQL should offer. If we can find ways to
collaborate and create a two way solution, that's great. If we can
collaborate around one of the solutions, that's perhaps even better, at
least from a developer resource perspective.

Regards,

Thomas Hallgren



Re: Pl/Java - next step?

From
Andrew Dunstan
Date:

Thomas Hallgren wrote:

>>Other than that fear, though, the JNI approach seems to have pretty
>>considerable advantages.  You listed startup time as the main
>>disadvantage, but perhaps that could be worked around.  Suppose the
>>postmaster started a JVM --- would that state inherit correctly into
>>subsequently forked backends?
>>
>>    
>>
>That's an interesting thougth. The postmaster just forks. It never exec's
>right? Is this true for win32 as well? I've never tried it but it might be
>worth pursuing. Sun's new Java 1.5 jvm does this albeit a bit differently.
>An initializer process starts up and persists its state. Subsequent JVM's
>then reuse that state. I definitely plan for Pl/Java_JNI to take advantage
>of that.
>
>  
>

Unfortunately, WIN32 has no fork(), and we have to exec the backend, in 
effect. You would need to handle both scenarios (#ifdef EXEC_BACKEND). 
For Unix this could be nice, though , and eliminate most of the 
disadvantage of your approach.



cheers

andrew



Re: Pl/Java - next step?

From
Joe Conway
Date:
Thomas Hallgren wrote:
> That's an interesting thougth. The postmaster just forks. It never exec's
> right? Is this true for win32 as well? I've never tried it but it might be
> worth pursuing. Sun's new Java 1.5 jvm does this albeit a bit differently.
> An initializer process starts up and persists its state. Subsequent JVM's
> then reuse that state. I definitely plan for Pl/Java_JNI to take advantage
> of that.

It would be easy enough to test. Just put a line in postgresql.conf like:  preload_libraries = '$libdir/plr:plr_init'
substituting the specifics for PL/Java.

This causes the postmaster to load the library and execute the "init" 
function. Subsequent forked backends get a copy. I don't honestly know 
what happens with the win32 port though -- anyone out there know?

Joe


Re: Pl/Java - next step?

From
Dave Cramer
Date:
Thomas,


What I would like to see is an abstraction of the interface that
communicates with the JVM so that we can use either, as you have pointed
out  the JNI mechanism has advantages, as does the remote mechanism.

I recently did an analysis of the two methods and there are a couple of
other points.

1) Using JNI, you probably still want to communicate with another
running java process. For instance lets say that you have an application
which is viewing live inventory. For simplicity sake we will us a swing
client.

Now we have a pl/java trigger function which gets called when the
inventory gets changed, ostensibly the java trigger would want to notify
the client, so it will have to utilize some sort of remote function call
to do this. 

So at this point we now have pl/java -> JNI -> JVM/procedure -> remote
call -> JVM running client. This would further be exacerbated, by the
fact that all your connections are going to have to make the same remote
function call.

2) While I haven't done the real work to determine if the following is
true, I'm not willing to buy the argument that the keeping multiple
class loaders is about as expensive as multiple JVM's. My argument is
based on the fact that servlet containers (Tomcat etal) do this without
showing onerous memory footprints. 


As far as using a heavyweight RPC mechanism, I have talked to laszlo
about removing CORBA and replacing it with a lightweight custom protocol
much like the FE/BE protocol that the server uses, this seems to be a
better solution, as I agree installing a CORBA environment would be a
huge barrier to entry.

Regarding Transaction Visibility pl-j (remote version) already uses SPI
to do it's calls, and the jdbc layer there would do the same, I'm not
sure how that it matters if it is jni, or remote? Perhaps I am missing
something ?

In the end as we said before, laszlo is interested in collaborating so
hopefully we can find some middle ground here ?


Dave

On Sat, 2004-02-21 at 05:04, Thomas Hallgren wrote:
> Two Pl/Java implementations exists today. Due to the architecture of
> PostgreSQL, compromises have been made in both of them to deal with the fact
> that each connection lives in its own process. One, I'll call it
> "Pl/Java_JNI" will spawn a JVM on demand for each connection and the other,
> "Pl/Java_remote", will spawn at least one JVM that lives in a process of its
> own and use an inter-process calling mechanism.
> 
> I can see PostgreSQL moving forward in one of four different directions:
> 
> 1. Select Pl/Java_JNI.
> 2. Select Pl/Java_remote
> 3. Choose both and agree on the SQL + Java semantics
> 4. Make the postmaster spawn threads rather than processes (controversial?
> Nah :-) )
> 
> As the one behind Pl/Java_JNI I'm perhaps not the most objective person when
> it comes to choice, but I'll make an effort here and try to list the pros
> and cons with each choice. My objective is to start a healthy discussion. I
> think Pl/Java migth boost usability of PostgreSQL quite a bit and with an
> almost explosive growth of the Java Community its essential that we conclude
> this sooner rather than later.
> 
> 
> 
> ** 1. Select Pl/Java_JNI **
> #Pros:#
> - Each call becomes extremely lightweight.
> JNI is in essence a straight forward in-process function invocation.
> Minimizing call overhead becomes very important for functions that a) are
> called very often and b) functions that need to call back into the backend
> several times.
> 
> - Minimum resource utilization when passing values.
> Values can be passed by reference. TriggerData, TupleDesc, HeapTuple, byte
> arrays etc. need not be copied. Return values can be allocated directly in
> the correct MemoryContext.
> 
> - Transaction visibility
> Using a JDBC driver that's implemented directly on top of SPI ensures that
> the transaction visibility is correct without the need to either propagate a
> transaction context or make remote calls back into the backend.
> 
> - Connection isolation
> Easy to use since the developer "owns" the whole JVM. There's no need to
> terminate all connections in order to replace code or to establish a debug
> session. Migration can take place gradually.
> 
> - Simplicity
> No hassle setting up inter-process communication or maintaining a separate
> JVM.
> 
> - Modern JVM's are less demanding
> Sun and other JVM vendors are making serious efforts to make the JVM more
> adaptable. Java is not used for heavy weight server processing only. Small
> utility programs become more and more common. Thus, decreasing start-up time
> and ability to adapt resource consumption have very high priority. Look here
> what Java 1.5 does
> http://java.sun.com/j2se/1.5.0/docs/relnotes/features.html#vm.
> 
> - Well knonw programming envionment
> JNI is standard. A potential developer of the code have access to on-line
> training.
> 
> #Cons:#
> - Resource consumption.
> A JVM is expensive from a resource perspective.
> 
> - Connection start-up time is high.
> Booting a JVM takes time. Setups where connections that makes invocations to
> Pl/Java are closed and created frequently will suffer from this.
> 
> - Java execution model differs from the one used by PostgreSQL
> Java uses multithreading wether you like it or not. And the JVM will throw
> exceptions. The Pl/Java_JNI handles this by introducing some macros that a
> potential developer that makes additions to the port must be aware of. This
> also introduces limitations for the user of Pl/Java JNI (such as very
> limited functionality once an error has been generated by the backend).
> 
> 
> 
> 
> 
> ** 2. Select Pl/Java_remote **
> #Pros:#
> - Each connection becomes fairly lightweight.
> A connection is represented as a thread in the remote JVM. Threads are much
> less expensive than a full-blown JVM.
> 
> - Connection start-up time is low
> Startup time will be very quick since thread creation is cheap. Even quicker
> if a thread-pool is utilized.
> 
> - Reuse of an existing JVM
> Small systems might use the same JVM to run an app-server as the one used by
> triggers and functions. Albeit not great from a "separation of concern"
> perspective, it might be very efficient for special needs.
> 
> - Ability to run the JVM on another server
> The JVM can run on a server different from the one running the backend
> process. If the number of calls are few in relation to the actual work
> performed in each call, this might be interesting.
> 
> #Cons:#
> - RPC calls are slow
> Call between processes are inherently very slow compared to in-process
> calls.
> 
> - RPC resources needed
> Each connection will need an additional socket or shared memory segment.
> 
> - Transaction visibility
> A connection established in the remote JVM must have the same transaction
> visibility as the invoker. In essence, a transaction context must be
> propagated to the remote JVM, or the remote JVM must have a JDBC driver that
> calls back into the backend.
> 
> - RPC management
> CORBA or some other mechanism must be installed and maintained.
> 
> - Starting/Stopping JVM affects all connections
> Attaching a debugger or generating profiling information implies a restart
> of the JVM, killing all existing connections that make use of
> Pl/Java_remote. Code migration implies full stop + restart (The JSR121
> Isolation API didn't make it into the 1.5 release).
> 
> - Complex programming envionment
> A potential developer of the code base have a lot to learn. The API between
> backend and Java code is non-standard.
> 
> 
> 
> 
> 
> ** 3. Choose both and agree on the SQL + Java semantics **
> #Pros:#
> - Best of two worlds
> The user can decide, depending on his/ her setup, thus gaining optimal
> performance.
> 
> - Everyone wins
> Nobody needs to feel sad when their implementation was rejected.
> 
> #Cons:#
> - Might be perceived as a kludge
> The competitors don't need multiple implementations. Introducing two ways of
> doing it might be perceived as ways to get around a less then perfect design
> with uncertainties and choice of another database as the result.
> 
> - The choice is not evident
> The user have to make a choice. Sometimes the choice is not evident.
> 
> - Project synchronization
> Someone needs to synchronize the projects.
> 
> - Double effort
> Almost everything needs to be developed twice since the approaches have
> fundamental differences.
> 
> 
> 
> 
> 
> ** 4. Make the postmaster spawn threads rather than processes **
> I know this is very controversial and perhaps I should not bring it up at
> all. But then again, why not? Most readers are open-minded right?
> 
> #Pros:#
> - Really best of two words
> There would be one JVM per postmaster and in-process calls would be used
> throughout
> 
> - Other pl<lang> could benefit?
> Other languages where multithreading is an option could benefit the same way
> Java does.
> 
> - Other pros
> Beyond the scope of the topic.
> 
> #Cons:#
> - Code rewrite
> Right. All PostgreSQL code would need an overhaul. That would be a serious
> effort to say the least.
> 
> - Code base selection
> We'd still need to choose what existing Pl/Java implementation that should
> be used as base for the in-process + multithreaded implementation.
> 
> - Other cons
> Beyond the scope of the topic.
> 
> What are the next steps? Setting up benchmarking and test performance
> perhaps? Should not be done my me, nor by the people behind the
> Pl/Java_remote, but rather by someone who is truly objective.
> 
> Kind regards
> 
> Thomas Hallgren
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
> 
-- 
Dave Cramer
519 939 0336
ICQ # 14675561



Re: Pl/Java - next step?

From
"Thomas Hallgren"
Date:
Hi Dave,
Comments on your comments inline...

> What I would like to see is an abstraction of the interface that
> communicates with the JVM so that we can use either, as you have pointed
> out  the JNI mechanism has advantages, as does the remote mechanism.
>
Yes, we must agree 100% on the "interface" that the user of Pl/Java will
see. I've done some work that we could use as a common base. I've tried to
follow the proposed SQL standard whenever possible.

Everything the user of my implementation will see is a set of interfaces.
Most of them standard java.sql.* stuff. ResultSet plays a very important
role as I pass complex types and sets using that interface. I also represent
the :new and :old row in triggers using ResultSet. The only "proprietary"
interface in use, is the TriggerData interface.

I'm eager to commence a discussion around this. And of course, also around
how the actual CREATE FUNCTION etc. is written, how jar files are loaded,
how classpaths are setup etc.

> 1) Using JNI, you probably still want to communicate with another
> running java process. For instance lets say that you have an application
> which is viewing live inventory. For simplicity sake we will us a swing
> client.
>
> Now we have a pl/java trigger function which gets called when the
> inventory gets changed, ostensibly the java trigger would want to notify
> the client, so it will have to utilize some sort of remote function call
> to do this.
>
> So at this point we now have pl/java -> JNI -> JVM/procedure -> remote
> call -> JVM running client. This would further be exacerbated, by the
> fact that all your connections are going to have to make the same remote
> function call.
>
I agree. And as I pointed out initially, this is one of the advantages with
your solution "Reuse of existing JVM". Not because the call chain is longer
(since the JNI call is more or less free) but since two or more JVM's need
to be running instead of just one.

Separation of concern is an issue though. The load of the application server
will have a negative impact on the database performance. Especially in cases
where the database resides on a different server (which is very common).

Another issue, related to your example (and not solved in either of the
Pl/Java solutions), is transaction demarcation. Typically, the scenario you
describe, or any similar scenario where notifications are sent, would
require a transaction coordinated message queue. You don't want the message
to reach Tomcat unless the transaction commits. I've raised that issue
before (more than once). In order to solve this, we'd need the ability to
subscribe to transaction events in the backend.

Using a message queue, you will get an asynchrounious delivery mechanism and
the load of the appserver have no effect on database performance.

> 2) While I haven't done the real work to determine if the following is
> true, I'm not willing to buy the argument that the keeping multiple
> class loaders is about as expensive as multiple JVM's. My argument is
> based on the fact that servlet containers (Tomcat etal) do this without
> showing onerous memory footprints.
>
I agree, that argument is a bit far fetched. It all depends on what
isolation you want to establish between the connections. If you want full
isolation, i.e. like the one Oracle provides, where different sessions
cannot share data through static variables of a class, then you need more or
less completely separate classloader chains. This, combined with the fact
that you already have a process running for each connection, makes the
differences in resource consumpion fairly small.

But, a design where the isolation is less enforced (like in a servlet
container) is in your favor. JSR121, enabling virtual JVM's within a JVM
will also be a remedy for the problem when it arrives.

> As far as using a heavyweight RPC mechanism, I have talked to laszlo
> about removing CORBA and replacing it with a lightweight custom protocol
> much like the FE/BE protocol that the server uses, this seems to be a
> better solution, as I agree installing a CORBA environment would be a
> huge barrier to entry.
>
I've been down that road a couple of times in earlier assigments. Keep in
mind that you have to pass objects by reference in order to call back into
the caller process. And that the caller in turn might call you. And that
exceptions might be thrown at any time. CORBA caters for all of that. It is
expensive, no matter how you do it. One major cost of RPC is related to
process context switching and that will not go away regardless of how
efficient your protocoll is. You might find the gain of a home-brewed
protocol to be very small (if any).

RPC is drastically more expensive then in-process calls, no matter what you
do. This is the major disadvantage with your solution, just as resource
consumption imposed by several JVM is mine.

> Regarding Transaction Visibility pl-j (remote version) already uses SPI
> to do it's calls, and the jdbc layer there would do the same, I'm not
> sure how that it matters if it is jni, or remote? Perhaps I am missing
> something ?
>
My only argument is that you get a vast amount of RPC calls doing it that
way.

> In the end as we said before, laszlo is interested in collaborating so
> hopefully we can find some middle ground here ?
>
I'm interested too. As I wrote in the beginning of this mail, I've done some
work specifying interfaces etc. with an intention to follow the proposed SQL
standard. All jar manipulation, and aclassloader that utilizes the database
etc. is written in pure java and uses plain JDBC on the default connection.
Perhaps you've done similar things?

I think the Java interfaces and the SQL statements that will be exposed to
the Pl/Java user is the most important thing to agree on. If a trigger is
written for one implementation, it should be usable on the other, no matter
what. The second most important thing is store jar files etc. in a similar
way in the database and to strive to write as much as possible of the
manager functions in Java so that we can share them.

Do you store jar files in the database?
How do you manage different classpaths for different schemas?
Do you have some user guide or other reading that you can send to me?

Regards,

Thomas Hallgren



Re: Pl/Java - next step?

From
"Thomas Hallgren"
Date:
> 1) Using JNI, you probably still want to communicate with another
> running java process.

B.T.W. I don't really agree on "probably". There are numerous cases when you
will be happy just communicating with the database, communicate with another
remote resource (message queue typically), or not communicate at all
(calculations, etc.).

Regards,
Thomas Hallgren




Re: Pl/Java - next step?

From
Dave Cramer
Date:
Not to minimize your work, as I think it is great, but this particular
use-case I consider to be overkill for pl/java. It is probably easier to
use pl/pgsql if all you want to do is calculations.

We had suggested an online chat to discuss this, when would you be
available for that? What timezone are you in. Laszlo is in hungary, and
I am in canada, so we are likely spread allover the the map .

Dave


On Sun, 2004-02-22 at 12:28, Thomas Hallgren wrote:
> > 1) Using JNI, you probably still want to communicate with another
> > running java process.
> 
> B.T.W. I don't really agree on "probably". There are numerous cases when you
> will be happy just communicating with the database, communicate with another
> remote resource (message queue typically), or not communicate at all
> (calculations, etc.).
> 
> Regards,
> Thomas Hallgren
> 
> 
-- 
Dave Cramer
519 939 0336
ICQ # 14675561



Re: Pl/Java - next step?

From
"Thomas Hallgren"
Date:
I'm in Sweden. Some time tuesday evening (european time) perhaps?

Why is your work not made public somewhere? The project on sourceforge is
inactive it seems. Do you have a CVS setup privately?

> Not to minimize your work, as I think it is great, but this particular
> use-case I consider to be overkill for pl/java. It is probably easier to
> use pl/pgsql if all you want to do is calculations.
>
Not to minimize your work, but if the only thing you want to do is to send a
request to a servlet, that is very easy to do with Pl/Perl ;-)

Seriously, when I say calculations, I mean any computed value that doesn't
involve database accesses. It could for instance be an implementation of a
soundex algorithm comparing two values or something similar like graphic
image matching. Regardless if such things can be implemented in pgsql or
not, the fact that there's a bunch of downloadable Java code out there that
can be used, with little or no effort, is enough to motivate my statement.

Regards,

Thomas Hallgren



Re: Pl/Java - next step?

From
Andrew Dunstan
Date:
One perfectly good reason for this scenario is portability between 
postgres and any database implementing the standard (e.g. Oracle).

cheers

andrew

Dave Cramer wrote:

>Not to minimize your work, as I think it is great, but this particular
>use-case I consider to be overkill for pl/java. It is probably easier to
>use pl/pgsql if all you want to do is calculations.
>
>We had suggested an online chat to discuss this, when would you be
>available for that? What timezone are you in. Laszlo is in hungary, and
>I am in canada, so we are likely spread allover the the map .
>
>Dave
>
>
>On Sun, 2004-02-22 at 12:28, Thomas Hallgren wrote:
>  
>
>>>1) Using JNI, you probably still want to communicate with another
>>>running java process.
>>>      
>>>
>>B.T.W. I don't really agree on "probably". There are numerous cases when you
>>will be happy just communicating with the database, communicate with another
>>remote resource (message queue typically), or not communicate at all
>>(calculations, etc.).
>>
>>Regards,
>>Thomas Hallgren
>>
>>
>>    
>>



Re: Pl/Java - next step?

From
Dave Cramer
Date:
tues evening euro time is fine with me. I am at GMT-5 so it will be
afternoon for me. What time ?

We should attempt an agenda?

Dave
On Sun, 2004-02-22 at 15:33, HORNYAK Laszlo wrote:
> Hi all!
> 
> Sorry for my latencies.
> An IRC chat is ok for me, anytime.
> 
> On Sun, Feb 22, 2004 at 08:08:00PM +0100, Thomas Hallgren wrote:
> > I'm in Sweden. Some time tuesday evening (european time) perhaps?
> > 
> > Why is your work not made public somewhere? The project on sourceforge is
> > inactive it seems. Do you have a CVS setup privately?
> 
> Yes, actualy, the sf.net cvs was used very rarely, so I simply droped it
> after a while. Now we use the CVS on Dave`s server, but it will move to
> a new server.
> 
> > 
> > > Not to minimize your work, as I think it is great, but this particular
> > > use-case I consider to be overkill for pl/java. It is probably easier to
> > > use pl/pgsql if all you want to do is calculations.
> > >
> > Not to minimize your work, but if the only thing you want to do is to send a
> > request to a servlet, that is very easy to do with Pl/Perl ;-)
> > 
> > Seriously, when I say calculations, I mean any computed value that doesn't
> > involve database accesses. It could for instance be an implementation of a
> > soundex algorithm comparing two values or something similar like graphic
> > image matching. Regardless if such things can be implemented in pgsql or
> > not, the fact that there's a bunch of downloadable Java code out there that
> > can be used, with little or no effort, is enough to motivate my statement.
> > 
> 
> Java in the database has quite a lot of advantages, and most people
> would prefer using java instead of learning one more language for stored
> procedures. If we can show that it can be stable and portable, people
> will love it. It is their problem what they use it for :))
> I think one could use it for sending data into message queues, call
> validation with EJB methods, do complex analisis on it, check if a key
> exists in another database(db platform independent distributed RDBMS),
> or whatever, it would make a DB realy inteligent, and would help a lot
> keeping 2 tier systems out of trouble.
> /s/would/will
> 
> Laszlo Hornyak
> 
> > Regards,
> > 
> > Thomas Hallgren
> 
-- 
Dave Cramer
519 939 0336
ICQ # 14675561



Re: Pl/Java - next step?

From
Peter Eisentraut
Date:
Thomas Hallgren wrote:
> 1. Select Pl/Java_JNI.
> 2. Select Pl/Java_remote
> 3. Choose both and agree on the SQL + Java semantics
> 4. Make the postmaster spawn threads rather than processes
> (controversial? Nah :-) )

Option 5 (or 0) would be to use GCJ.  This is likely to be the fastest 
and most lightweight solution, but perhaps not the most featureful.



Re: Pl/Java - next step?

From
"Thomas Hallgren"
Date:
> Option 5 (or 0) would be to use GCJ.  This is likely to be the fastest
> and most lightweight solution, but perhaps not the most featureful.
>
GCJ is definitely an alternative for the reasons you mention. I didn't
mention it (nor any other JVM) because I see it as one of several "JVM's"
that Pl/Java should be able to use. It comes with JNI (and what they claim a
much faster alternative). I'm currently looking into what's needed in order
to use GCJ for Pl/Java_JNI.

Regards,

Thomas Hallgren



Re: Pl/Java - next step?

From
"Thomas Hallgren"
Date:
For me it would be fine at 7pm GMT Tuesday. Here's an attempted agenda:

1. Try to identify the common public interface (SQL and Java).
This is the most important item in my view since it enables a user to
seamlessly switch between our two solutions. You describe your design from
this angle, and I describe mine. Let's also look at what the proposed
standard say.

2. Find things that we can collaborate on.
For instance, I have some of stuff written in Java that enables java classes
to be stored and loaded from the database. You might have similar things or
Java code that covers other areas that I can reuse.

3. Probe deeper and see if there's more that we can share (C-code
essentially).
I have my doubts about sharing C-code since we do things fundamentally
different. I know you have a generic call mechanism that we could use to
establish a common ground, but I think it would bad for both designs. We
have different objectives. You strive to minimize the number of RPC calls. I
strive to minimize call overhead and resource consumption.

You probably have more ideas so please add to this.

Please confirm the time and tell me where we meet (what IRC).

Regards,

Thomas Hallgren

----- Original Message ----- 
From: "Dave Cramer" <pg@fastcrypt.com>
To: "HORNYAK Laszlo" <hornyakl@inf.elte.hu>
Cc: "Thomas Hallgren" <thhal@mailblocks.com>; <pgsql-hackers@postgresql.org>
Sent: Monday, February 23, 2004 13:41
Subject: Re: [HACKERS] Pl/Java - next step?


> tues evening euro time is fine with me. I am at GMT-5 so it will be
> afternoon for me. What time ?
>
> We should attempt an agenda?
>
> Dave
> On Sun, 2004-02-22 at 15:33, HORNYAK Laszlo wrote:
> > Hi all!
> >
> > Sorry for my latencies.
> > An IRC chat is ok for me, anytime.
> >
> > On Sun, Feb 22, 2004 at 08:08:00PM +0100, Thomas Hallgren wrote:
> > > I'm in Sweden. Some time tuesday evening (european time) perhaps?
> > >
> > > Why is your work not made public somewhere? The project on sourceforge
is
> > > inactive it seems. Do you have a CVS setup privately?
> >
> > Yes, actualy, the sf.net cvs was used very rarely, so I simply droped it
> > after a while. Now we use the CVS on Dave`s server, but it will move to
> > a new server.
> >
> > >
> > > > Not to minimize your work, as I think it is great, but this
particular
> > > > use-case I consider to be overkill for pl/java. It is probably
easier to
> > > > use pl/pgsql if all you want to do is calculations.
> > > >
> > > Not to minimize your work, but if the only thing you want to do is to
send a
> > > request to a servlet, that is very easy to do with Pl/Perl ;-)
> > >
> > > Seriously, when I say calculations, I mean any computed value that
doesn't
> > > involve database accesses. It could for instance be an implementation
of a
> > > soundex algorithm comparing two values or something similar like
graphic
> > > image matching. Regardless if such things can be implemented in pgsql
or
> > > not, the fact that there's a bunch of downloadable Java code out there
that
> > > can be used, with little or no effort, is enough to motivate my
statement.
> > >
> >
> > Java in the database has quite a lot of advantages, and most people
> > would prefer using java instead of learning one more language for stored
> > procedures. If we can show that it can be stable and portable, people
> > will love it. It is their problem what they use it for :))
> > I think one could use it for sending data into message queues, call
> > validation with EJB methods, do complex analisis on it, check if a key
> > exists in another database(db platform independent distributed RDBMS),
> > or whatever, it would make a DB realy inteligent, and would help a lot
> > keeping 2 tier systems out of trouble.
> > /s/would/will
> >
> > Laszlo Hornyak
> >
> > > Regards,
> > >
> > > Thomas Hallgren
> >
> -- 
> Dave Cramer
> 519 939 0336
> ICQ # 14675561
>
>



Re: Pl/Java - next step?

From
"Rob Butler"
Date:
Hello all,
>
> 3. Probe deeper and see if there's more that we can share (C-code
> essentially).
> I have my doubts about sharing C-code since we do things fundamentally
> different. I know you have a generic call mechanism that we could use to
> establish a common ground, but I think it would bad for both designs. We
> have different objectives. You strive to minimize the number of RPC calls.
I
> strive to minimize call overhead and resource consumption.
>
I've been following this thread - and don't know much about either
implementation. On the re-use front it would be VERY nice if you could
somehow have a single patch for PostgreSQL's C code that called a set of
Java interfaces.  Then each of your implementations could implement that set
of Java interfaces (one using JNI, the other using RMI).  This would allow
the user to swap between either implementation, but would also reduce the
amount of similar C code in Postgres.  Something I think the PostgreSQL
hackers would much prefer.

Later
Rob



Re: Pl/Java - next step?

From
"Thomas Hallgren"
Date:
> On the re-use front it would be VERY nice if you could
> somehow have a single patch for PostgreSQL's C code that called a set of
> Java interfaces.  Then each of your implementations could implement that
set
> of Java interfaces (one using JNI, the other using RMI).  This would allow
> the user to swap between either implementation, but would also reduce the
> amount of similar C code in Postgres.  Something I think the PostgreSQL
> hackers would much prefer.
>
> Later
> Rob
>
I understand you concern. I'm all for code reuse and all the advantages that
it will bring. In my experience however, the design patterns used for
solutions that involve RPC differs a great deal from the ones used when you
have in-process calls. The driving forces are quite different. Let me give
you a concrete example.

Let's assume that we implement a trigger function, triggered before update
and on each row.

Using RPC, you'd like to minimize the number of calls that are made between
the two processes. Ideally, you'd like to have one call only. This can be
achieved by packing all information you have in one structure (the old row,
the new row, parameters etc.) and pass that data, by value, to the remote
process. In the remote process, all options are now open. You can read
parameters, the old row, and the new row, etc. Typically, some change would
be made to the new row and it would be sent back to the caller, again the
data is passed by value and streamed.

Using in-process calls, you'd like to minimize resource consumption. Thus,
you want minimize copying of data and you want to make data available on
demand. So, a JNI solution would typically wrap the TriggerData in a Java
object with accessor methods that enables the Java developer to obtain the
old row, the new row, and the parameters. An old row would be a wrapper of
the actual HeapTuple contained in the TriggerData etc.. No copies anywhere
and no streaming. But a radically increased number of calls between the Java
and the C domain compared to the RPC solution.

Now, consider that the one and only motivation for the JNI approach is to
have extremely fast integration between C and Java. This is accomplished at
the cost of resource consumption caused by multiple JVM's. Also take into
account that the major drawback with the RPC approach is the high number of
RPC calls that will be the result of some scenarios. It becomes very clear
(at least to me) that in order to get the best out of each solution, it's
essential that we use different design patterns. Otherwise, we get a
situation where optimizing the former means degrading the latter.

IMO, we can make the solutions exactly similar from a Pl/Java user's
perspective, and when it comes to all Java code used to administer each
solution, but not in C-code.

Regards,

Thomas Hallgren






Re: Pl/Java - next step?

From
hornyakl@inf.elte.hu (HORNYAK Laszlo)
Date:
Hi all!

Sorry for my latencies.
An IRC chat is ok for me, anytime.

On Sun, Feb 22, 2004 at 08:08:00PM +0100, Thomas Hallgren wrote:
> I'm in Sweden. Some time tuesday evening (european time) perhaps?
> 
> Why is your work not made public somewhere? The project on sourceforge is
> inactive it seems. Do you have a CVS setup privately?

Yes, actualy, the sf.net cvs was used very rarely, so I simply droped it
after a while. Now we use the CVS on Dave`s server, but it will move to
a new server.

> 
> > Not to minimize your work, as I think it is great, but this particular
> > use-case I consider to be overkill for pl/java. It is probably easier to
> > use pl/pgsql if all you want to do is calculations.
> >
> Not to minimize your work, but if the only thing you want to do is to send a
> request to a servlet, that is very easy to do with Pl/Perl ;-)
> 
> Seriously, when I say calculations, I mean any computed value that doesn't
> involve database accesses. It could for instance be an implementation of a
> soundex algorithm comparing two values or something similar like graphic
> image matching. Regardless if such things can be implemented in pgsql or
> not, the fact that there's a bunch of downloadable Java code out there that
> can be used, with little or no effort, is enough to motivate my statement.
> 

Java in the database has quite a lot of advantages, and most people
would prefer using java instead of learning one more language for stored
procedures. If we can show that it can be stable and portable, people
will love it. It is their problem what they use it for :))
I think one could use it for sending data into message queues, call
validation with EJB methods, do complex analisis on it, check if a key
exists in another database(db platform independent distributed RDBMS),
or whatever, it would make a DB realy inteligent, and would help a lot
keeping 2 tier systems out of trouble.
/s/would/will

Laszlo Hornyak

> Regards,
> 
> Thomas Hallgren


Re: Pl/Java - next step?

From
Bruce Momjian
Date:
Tom Lane wrote:
> "Thomas Hallgren" <thhal@mailblocks.com> writes:
> > ** 4. Make the postmaster spawn threads rather than processes **
> > I know this is very controversial and perhaps I should not bring it up at
> > all. But then again, why not? Most readers are open-minded right?
> 
> It's been considered and rejected before, and pljava isn't going to tilt
> the scales.  In fact, the main thing that bothers me about your
> description of JNI is "Java uses multithreading wether you like it or
> not".  I am very afraid of what impact a JVM will have on the stability
> of the surrounding backend.
> 
> Other than that fear, though, the JNI approach seems to have pretty
> considerable advantages.  You listed startup time as the main
> disadvantage, but perhaps that could be worked around.  Suppose the
> postmaster started a JVM --- would that state inherit correctly into
> subsequently forked backends?
> 
> Also, regarding your option #3 (do both), do you really think something
> different is going to happen in practice?  The developers of the other
> implementation aren't likely to give it up just because yours exists.

As I understand it, the JNI approach has one JVM per backend using java,
while the Java/remote approach uses a single JVM for all backends and
isolates them via classes.

JNI says function execution will be faster and cleaner, while
Java/remote feels system resource usage and startup time will be less.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Pl/Java - next step?

From
Alvaro Herrera
Date:
On Mon, Feb 23, 2004 at 05:14:09PM +0100, Peter Eisentraut wrote:
> Thomas Hallgren wrote:
> > 1. Select Pl/Java_JNI.
> > 2. Select Pl/Java_remote
> > 3. Choose both and agree on the SQL + Java semantics
> > 4. Make the postmaster spawn threads rather than processes
> > (controversial? Nah :-) )
> 
> Option 5 (or 0) would be to use GCJ.  This is likely to be the fastest 
> and most lightweight solution, but perhaps not the most featureful.

Hm, last time I tried this it just SIGSEGV'd the backend after loading
libgcj.so or something like that.  I didn't peek further because I feel
strange in Java land.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
La web junta la gente porque no importa que clase de mutante sexual seas,
tienes millones de posibles parejas. Pon "buscar gente que tengan sexo con
ciervos incendiánse", y el computador dirá "especifique el tipo de ciervo"
(Jason Alexander)