Thread: Pl/Java - next step?
Two Pl/Java implementations exists today. Due to the architecture of PostgreSQL, compromises have been made in both of them to deal with the fact that each connection lives in its own process. One, I'll call it "Pl/Java_JNI" will spawn a JVM on demand for each connection and the other, "Pl/Java_remote", will spawn at least one JVM that lives in a process of its own and use an inter-process calling mechanism. I can see PostgreSQL moving forward in one of four different directions: 1. Select Pl/Java_JNI. 2. Select Pl/Java_remote 3. Choose both and agree on the SQL + Java semantics 4. Make the postmaster spawn threads rather than processes (controversial? Nah :-) ) As the one behind Pl/Java_JNI I'm perhaps not the most objective person when it comes to choice, but I'll make an effort here and try to list the pros and cons with each choice. My objective is to start a healthy discussion. I think Pl/Java migth boost usability of PostgreSQL quite a bit and with an almost explosive growth of the Java Community its essential that we conclude this sooner rather than later. ** 1. Select Pl/Java_JNI ** #Pros:# - Each call becomes extremely lightweight. JNI is in essence a straight forward in-process function invocation. Minimizing call overhead becomes very important for functions that a) are called very often and b) functions that need to call back into the backend several times. - Minimum resource utilization when passing values. Values can be passed by reference. TriggerData, TupleDesc, HeapTuple, byte arrays etc. need not be copied. Return values can be allocated directly in the correct MemoryContext. - Transaction visibility Using a JDBC driver that's implemented directly on top of SPI ensures that the transaction visibility is correct without the need to either propagate a transaction context or make remote calls back into the backend. - Connection isolation Easy to use since the developer "owns" the whole JVM. There's no need to terminate all connections in order to replace code or to establish a debug session. Migration can take place gradually. - Simplicity No hassle setting up inter-process communication or maintaining a separate JVM. - Modern JVM's are less demanding Sun and other JVM vendors are making serious efforts to make the JVM more adaptable. Java is not used for heavy weight server processing only. Small utility programs become more and more common. Thus, decreasing start-up time and ability to adapt resource consumption have very high priority. Look here what Java 1.5 does http://java.sun.com/j2se/1.5.0/docs/relnotes/features.html#vm. - Well knonw programming envionment JNI is standard. A potential developer of the code have access to on-line training. #Cons:# - Resource consumption. A JVM is expensive from a resource perspective. - Connection start-up time is high. Booting a JVM takes time. Setups where connections that makes invocations to Pl/Java are closed and created frequently will suffer from this. - Java execution model differs from the one used by PostgreSQL Java uses multithreading wether you like it or not. And the JVM will throw exceptions. The Pl/Java_JNI handles this by introducing some macros that a potential developer that makes additions to the port must be aware of. This also introduces limitations for the user of Pl/Java JNI (such as very limited functionality once an error has been generated by the backend). ** 2. Select Pl/Java_remote ** #Pros:# - Each connection becomes fairly lightweight. A connection is represented as a thread in the remote JVM. Threads are much less expensive than a full-blown JVM. - Connection start-up time is low Startup time will be very quick since thread creation is cheap. Even quicker if a thread-pool is utilized. - Reuse of an existing JVM Small systems might use the same JVM to run an app-server as the one used by triggers and functions. Albeit not great from a "separation of concern" perspective, it might be very efficient for special needs. - Ability to run the JVM on another server The JVM can run on a server different from the one running the backend process. If the number of calls are few in relation to the actual work performed in each call, this might be interesting. #Cons:# - RPC calls are slow Call between processes are inherently very slow compared to in-process calls. - RPC resources needed Each connection will need an additional socket or shared memory segment. - Transaction visibility A connection established in the remote JVM must have the same transaction visibility as the invoker. In essence, a transaction context must be propagated to the remote JVM, or the remote JVM must have a JDBC driver that calls back into the backend. - RPC management CORBA or some other mechanism must be installed and maintained. - Starting/Stopping JVM affects all connections Attaching a debugger or generating profiling information implies a restart of the JVM, killing all existing connections that make use of Pl/Java_remote. Code migration implies full stop + restart (The JSR121 Isolation API didn't make it into the 1.5 release). - Complex programming envionment A potential developer of the code base have a lot to learn. The API between backend and Java code is non-standard. ** 3. Choose both and agree on the SQL + Java semantics ** #Pros:# - Best of two worlds The user can decide, depending on his/ her setup, thus gaining optimal performance. - Everyone wins Nobody needs to feel sad when their implementation was rejected. #Cons:# - Might be perceived as a kludge The competitors don't need multiple implementations. Introducing two ways of doing it might be perceived as ways to get around a less then perfect design with uncertainties and choice of another database as the result. - The choice is not evident The user have to make a choice. Sometimes the choice is not evident. - Project synchronization Someone needs to synchronize the projects. - Double effort Almost everything needs to be developed twice since the approaches have fundamental differences. ** 4. Make the postmaster spawn threads rather than processes ** I know this is very controversial and perhaps I should not bring it up at all. But then again, why not? Most readers are open-minded right? #Pros:# - Really best of two words There would be one JVM per postmaster and in-process calls would be used throughout - Other pl<lang> could benefit? Other languages where multithreading is an option could benefit the same way Java does. - Other pros Beyond the scope of the topic. #Cons:# - Code rewrite Right. All PostgreSQL code would need an overhaul. That would be a serious effort to say the least. - Code base selection We'd still need to choose what existing Pl/Java implementation that should be used as base for the in-process + multithreaded implementation. - Other cons Beyond the scope of the topic. What are the next steps? Setting up benchmarking and test performance perhaps? Should not be done my me, nor by the people behind the Pl/Java_remote, but rather by someone who is truly objective. Kind regards Thomas Hallgren
"Thomas Hallgren" <thhal@mailblocks.com> writes: > ** 4. Make the postmaster spawn threads rather than processes ** > I know this is very controversial and perhaps I should not bring it up at > all. But then again, why not? Most readers are open-minded right? It's been considered and rejected before, and pljava isn't going to tilt the scales. In fact, the main thing that bothers me about your description of JNI is "Java uses multithreading wether you like it or not". I am very afraid of what impact a JVM will have on the stability of the surrounding backend. Other than that fear, though, the JNI approach seems to have pretty considerable advantages. You listed startup time as the main disadvantage, but perhaps that could be worked around. Suppose the postmaster started a JVM --- would that state inherit correctly into subsequently forked backends? Also, regarding your option #3 (do both), do you really think something different is going to happen in practice? The developers of the other implementation aren't likely to give it up just because yours exists. regards, tom lane
> It's been considered and rejected before, and pljava isn't going to tilt > the scales. > Didn't think it would. Thought it worth mentioning anyway, partly to get your reaction. > In fact, the main thing that bothers me about your > description of JNI is "Java uses multithreading wether you like it or > not". I am very afraid of what impact a JVM will have on the stability > of the surrounding backend. > I have taken extensive measures to prevent multiple threads to access the backend simultaniously. I encourage you and anyone else who have an interest in how this is done to read my "Some problems and their solution" document posted here: http://gborg.postgresql.org/project/pljava/genpage.php?solutions. > Other than that fear, though, the JNI approach seems to have pretty > considerable advantages. You listed startup time as the main > disadvantage, but perhaps that could be worked around. Suppose the > postmaster started a JVM --- would that state inherit correctly into > subsequently forked backends? > That's an interesting thougth. The postmaster just forks. It never exec's right? Is this true for win32 as well? I've never tried it but it might be worth pursuing. Sun's new Java 1.5 jvm does this albeit a bit differently. An initializer process starts up and persists its state. Subsequent JVM's then reuse that state. I definitely plan for Pl/Java_JNI to take advantage of that. > Also, regarding your option #3 (do both), do you really think something > different is going to happen in practice? The developers of the other > implementation aren't likely to give it up just because yours exists. > My objective is not that they or I should give up. I want us to reach a concensus around what PostgreSQL should offer. If we can find ways to collaborate and create a two way solution, that's great. If we can collaborate around one of the solutions, that's perhaps even better, at least from a developer resource perspective. Regards, Thomas Hallgren
Thomas Hallgren wrote: >>Other than that fear, though, the JNI approach seems to have pretty >>considerable advantages. You listed startup time as the main >>disadvantage, but perhaps that could be worked around. Suppose the >>postmaster started a JVM --- would that state inherit correctly into >>subsequently forked backends? >> >> >> >That's an interesting thougth. The postmaster just forks. It never exec's >right? Is this true for win32 as well? I've never tried it but it might be >worth pursuing. Sun's new Java 1.5 jvm does this albeit a bit differently. >An initializer process starts up and persists its state. Subsequent JVM's >then reuse that state. I definitely plan for Pl/Java_JNI to take advantage >of that. > > > Unfortunately, WIN32 has no fork(), and we have to exec the backend, in effect. You would need to handle both scenarios (#ifdef EXEC_BACKEND). For Unix this could be nice, though , and eliminate most of the disadvantage of your approach. cheers andrew
Thomas Hallgren wrote: > That's an interesting thougth. The postmaster just forks. It never exec's > right? Is this true for win32 as well? I've never tried it but it might be > worth pursuing. Sun's new Java 1.5 jvm does this albeit a bit differently. > An initializer process starts up and persists its state. Subsequent JVM's > then reuse that state. I definitely plan for Pl/Java_JNI to take advantage > of that. It would be easy enough to test. Just put a line in postgresql.conf like: preload_libraries = '$libdir/plr:plr_init' substituting the specifics for PL/Java. This causes the postmaster to load the library and execute the "init" function. Subsequent forked backends get a copy. I don't honestly know what happens with the win32 port though -- anyone out there know? Joe
Thomas, What I would like to see is an abstraction of the interface that communicates with the JVM so that we can use either, as you have pointed out the JNI mechanism has advantages, as does the remote mechanism. I recently did an analysis of the two methods and there are a couple of other points. 1) Using JNI, you probably still want to communicate with another running java process. For instance lets say that you have an application which is viewing live inventory. For simplicity sake we will us a swing client. Now we have a pl/java trigger function which gets called when the inventory gets changed, ostensibly the java trigger would want to notify the client, so it will have to utilize some sort of remote function call to do this. So at this point we now have pl/java -> JNI -> JVM/procedure -> remote call -> JVM running client. This would further be exacerbated, by the fact that all your connections are going to have to make the same remote function call. 2) While I haven't done the real work to determine if the following is true, I'm not willing to buy the argument that the keeping multiple class loaders is about as expensive as multiple JVM's. My argument is based on the fact that servlet containers (Tomcat etal) do this without showing onerous memory footprints. As far as using a heavyweight RPC mechanism, I have talked to laszlo about removing CORBA and replacing it with a lightweight custom protocol much like the FE/BE protocol that the server uses, this seems to be a better solution, as I agree installing a CORBA environment would be a huge barrier to entry. Regarding Transaction Visibility pl-j (remote version) already uses SPI to do it's calls, and the jdbc layer there would do the same, I'm not sure how that it matters if it is jni, or remote? Perhaps I am missing something ? In the end as we said before, laszlo is interested in collaborating so hopefully we can find some middle ground here ? Dave On Sat, 2004-02-21 at 05:04, Thomas Hallgren wrote: > Two Pl/Java implementations exists today. Due to the architecture of > PostgreSQL, compromises have been made in both of them to deal with the fact > that each connection lives in its own process. One, I'll call it > "Pl/Java_JNI" will spawn a JVM on demand for each connection and the other, > "Pl/Java_remote", will spawn at least one JVM that lives in a process of its > own and use an inter-process calling mechanism. > > I can see PostgreSQL moving forward in one of four different directions: > > 1. Select Pl/Java_JNI. > 2. Select Pl/Java_remote > 3. Choose both and agree on the SQL + Java semantics > 4. Make the postmaster spawn threads rather than processes (controversial? > Nah :-) ) > > As the one behind Pl/Java_JNI I'm perhaps not the most objective person when > it comes to choice, but I'll make an effort here and try to list the pros > and cons with each choice. My objective is to start a healthy discussion. I > think Pl/Java migth boost usability of PostgreSQL quite a bit and with an > almost explosive growth of the Java Community its essential that we conclude > this sooner rather than later. > > > > ** 1. Select Pl/Java_JNI ** > #Pros:# > - Each call becomes extremely lightweight. > JNI is in essence a straight forward in-process function invocation. > Minimizing call overhead becomes very important for functions that a) are > called very often and b) functions that need to call back into the backend > several times. > > - Minimum resource utilization when passing values. > Values can be passed by reference. TriggerData, TupleDesc, HeapTuple, byte > arrays etc. need not be copied. Return values can be allocated directly in > the correct MemoryContext. > > - Transaction visibility > Using a JDBC driver that's implemented directly on top of SPI ensures that > the transaction visibility is correct without the need to either propagate a > transaction context or make remote calls back into the backend. > > - Connection isolation > Easy to use since the developer "owns" the whole JVM. There's no need to > terminate all connections in order to replace code or to establish a debug > session. Migration can take place gradually. > > - Simplicity > No hassle setting up inter-process communication or maintaining a separate > JVM. > > - Modern JVM's are less demanding > Sun and other JVM vendors are making serious efforts to make the JVM more > adaptable. Java is not used for heavy weight server processing only. Small > utility programs become more and more common. Thus, decreasing start-up time > and ability to adapt resource consumption have very high priority. Look here > what Java 1.5 does > http://java.sun.com/j2se/1.5.0/docs/relnotes/features.html#vm. > > - Well knonw programming envionment > JNI is standard. A potential developer of the code have access to on-line > training. > > #Cons:# > - Resource consumption. > A JVM is expensive from a resource perspective. > > - Connection start-up time is high. > Booting a JVM takes time. Setups where connections that makes invocations to > Pl/Java are closed and created frequently will suffer from this. > > - Java execution model differs from the one used by PostgreSQL > Java uses multithreading wether you like it or not. And the JVM will throw > exceptions. The Pl/Java_JNI handles this by introducing some macros that a > potential developer that makes additions to the port must be aware of. This > also introduces limitations for the user of Pl/Java JNI (such as very > limited functionality once an error has been generated by the backend). > > > > > > ** 2. Select Pl/Java_remote ** > #Pros:# > - Each connection becomes fairly lightweight. > A connection is represented as a thread in the remote JVM. Threads are much > less expensive than a full-blown JVM. > > - Connection start-up time is low > Startup time will be very quick since thread creation is cheap. Even quicker > if a thread-pool is utilized. > > - Reuse of an existing JVM > Small systems might use the same JVM to run an app-server as the one used by > triggers and functions. Albeit not great from a "separation of concern" > perspective, it might be very efficient for special needs. > > - Ability to run the JVM on another server > The JVM can run on a server different from the one running the backend > process. If the number of calls are few in relation to the actual work > performed in each call, this might be interesting. > > #Cons:# > - RPC calls are slow > Call between processes are inherently very slow compared to in-process > calls. > > - RPC resources needed > Each connection will need an additional socket or shared memory segment. > > - Transaction visibility > A connection established in the remote JVM must have the same transaction > visibility as the invoker. In essence, a transaction context must be > propagated to the remote JVM, or the remote JVM must have a JDBC driver that > calls back into the backend. > > - RPC management > CORBA or some other mechanism must be installed and maintained. > > - Starting/Stopping JVM affects all connections > Attaching a debugger or generating profiling information implies a restart > of the JVM, killing all existing connections that make use of > Pl/Java_remote. Code migration implies full stop + restart (The JSR121 > Isolation API didn't make it into the 1.5 release). > > - Complex programming envionment > A potential developer of the code base have a lot to learn. The API between > backend and Java code is non-standard. > > > > > > ** 3. Choose both and agree on the SQL + Java semantics ** > #Pros:# > - Best of two worlds > The user can decide, depending on his/ her setup, thus gaining optimal > performance. > > - Everyone wins > Nobody needs to feel sad when their implementation was rejected. > > #Cons:# > - Might be perceived as a kludge > The competitors don't need multiple implementations. Introducing two ways of > doing it might be perceived as ways to get around a less then perfect design > with uncertainties and choice of another database as the result. > > - The choice is not evident > The user have to make a choice. Sometimes the choice is not evident. > > - Project synchronization > Someone needs to synchronize the projects. > > - Double effort > Almost everything needs to be developed twice since the approaches have > fundamental differences. > > > > > > ** 4. Make the postmaster spawn threads rather than processes ** > I know this is very controversial and perhaps I should not bring it up at > all. But then again, why not? Most readers are open-minded right? > > #Pros:# > - Really best of two words > There would be one JVM per postmaster and in-process calls would be used > throughout > > - Other pl<lang> could benefit? > Other languages where multithreading is an option could benefit the same way > Java does. > > - Other pros > Beyond the scope of the topic. > > #Cons:# > - Code rewrite > Right. All PostgreSQL code would need an overhaul. That would be a serious > effort to say the least. > > - Code base selection > We'd still need to choose what existing Pl/Java implementation that should > be used as base for the in-process + multithreaded implementation. > > - Other cons > Beyond the scope of the topic. > > What are the next steps? Setting up benchmarking and test performance > perhaps? Should not be done my me, nor by the people behind the > Pl/Java_remote, but rather by someone who is truly objective. > > Kind regards > > Thomas Hallgren > > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend > -- Dave Cramer 519 939 0336 ICQ # 14675561
Hi Dave, Comments on your comments inline... > What I would like to see is an abstraction of the interface that > communicates with the JVM so that we can use either, as you have pointed > out the JNI mechanism has advantages, as does the remote mechanism. > Yes, we must agree 100% on the "interface" that the user of Pl/Java will see. I've done some work that we could use as a common base. I've tried to follow the proposed SQL standard whenever possible. Everything the user of my implementation will see is a set of interfaces. Most of them standard java.sql.* stuff. ResultSet plays a very important role as I pass complex types and sets using that interface. I also represent the :new and :old row in triggers using ResultSet. The only "proprietary" interface in use, is the TriggerData interface. I'm eager to commence a discussion around this. And of course, also around how the actual CREATE FUNCTION etc. is written, how jar files are loaded, how classpaths are setup etc. > 1) Using JNI, you probably still want to communicate with another > running java process. For instance lets say that you have an application > which is viewing live inventory. For simplicity sake we will us a swing > client. > > Now we have a pl/java trigger function which gets called when the > inventory gets changed, ostensibly the java trigger would want to notify > the client, so it will have to utilize some sort of remote function call > to do this. > > So at this point we now have pl/java -> JNI -> JVM/procedure -> remote > call -> JVM running client. This would further be exacerbated, by the > fact that all your connections are going to have to make the same remote > function call. > I agree. And as I pointed out initially, this is one of the advantages with your solution "Reuse of existing JVM". Not because the call chain is longer (since the JNI call is more or less free) but since two or more JVM's need to be running instead of just one. Separation of concern is an issue though. The load of the application server will have a negative impact on the database performance. Especially in cases where the database resides on a different server (which is very common). Another issue, related to your example (and not solved in either of the Pl/Java solutions), is transaction demarcation. Typically, the scenario you describe, or any similar scenario where notifications are sent, would require a transaction coordinated message queue. You don't want the message to reach Tomcat unless the transaction commits. I've raised that issue before (more than once). In order to solve this, we'd need the ability to subscribe to transaction events in the backend. Using a message queue, you will get an asynchrounious delivery mechanism and the load of the appserver have no effect on database performance. > 2) While I haven't done the real work to determine if the following is > true, I'm not willing to buy the argument that the keeping multiple > class loaders is about as expensive as multiple JVM's. My argument is > based on the fact that servlet containers (Tomcat etal) do this without > showing onerous memory footprints. > I agree, that argument is a bit far fetched. It all depends on what isolation you want to establish between the connections. If you want full isolation, i.e. like the one Oracle provides, where different sessions cannot share data through static variables of a class, then you need more or less completely separate classloader chains. This, combined with the fact that you already have a process running for each connection, makes the differences in resource consumpion fairly small. But, a design where the isolation is less enforced (like in a servlet container) is in your favor. JSR121, enabling virtual JVM's within a JVM will also be a remedy for the problem when it arrives. > As far as using a heavyweight RPC mechanism, I have talked to laszlo > about removing CORBA and replacing it with a lightweight custom protocol > much like the FE/BE protocol that the server uses, this seems to be a > better solution, as I agree installing a CORBA environment would be a > huge barrier to entry. > I've been down that road a couple of times in earlier assigments. Keep in mind that you have to pass objects by reference in order to call back into the caller process. And that the caller in turn might call you. And that exceptions might be thrown at any time. CORBA caters for all of that. It is expensive, no matter how you do it. One major cost of RPC is related to process context switching and that will not go away regardless of how efficient your protocoll is. You might find the gain of a home-brewed protocol to be very small (if any). RPC is drastically more expensive then in-process calls, no matter what you do. This is the major disadvantage with your solution, just as resource consumption imposed by several JVM is mine. > Regarding Transaction Visibility pl-j (remote version) already uses SPI > to do it's calls, and the jdbc layer there would do the same, I'm not > sure how that it matters if it is jni, or remote? Perhaps I am missing > something ? > My only argument is that you get a vast amount of RPC calls doing it that way. > In the end as we said before, laszlo is interested in collaborating so > hopefully we can find some middle ground here ? > I'm interested too. As I wrote in the beginning of this mail, I've done some work specifying interfaces etc. with an intention to follow the proposed SQL standard. All jar manipulation, and aclassloader that utilizes the database etc. is written in pure java and uses plain JDBC on the default connection. Perhaps you've done similar things? I think the Java interfaces and the SQL statements that will be exposed to the Pl/Java user is the most important thing to agree on. If a trigger is written for one implementation, it should be usable on the other, no matter what. The second most important thing is store jar files etc. in a similar way in the database and to strive to write as much as possible of the manager functions in Java so that we can share them. Do you store jar files in the database? How do you manage different classpaths for different schemas? Do you have some user guide or other reading that you can send to me? Regards, Thomas Hallgren
> 1) Using JNI, you probably still want to communicate with another > running java process. B.T.W. I don't really agree on "probably". There are numerous cases when you will be happy just communicating with the database, communicate with another remote resource (message queue typically), or not communicate at all (calculations, etc.). Regards, Thomas Hallgren
Not to minimize your work, as I think it is great, but this particular use-case I consider to be overkill for pl/java. It is probably easier to use pl/pgsql if all you want to do is calculations. We had suggested an online chat to discuss this, when would you be available for that? What timezone are you in. Laszlo is in hungary, and I am in canada, so we are likely spread allover the the map . Dave On Sun, 2004-02-22 at 12:28, Thomas Hallgren wrote: > > 1) Using JNI, you probably still want to communicate with another > > running java process. > > B.T.W. I don't really agree on "probably". There are numerous cases when you > will be happy just communicating with the database, communicate with another > remote resource (message queue typically), or not communicate at all > (calculations, etc.). > > Regards, > Thomas Hallgren > > -- Dave Cramer 519 939 0336 ICQ # 14675561
I'm in Sweden. Some time tuesday evening (european time) perhaps? Why is your work not made public somewhere? The project on sourceforge is inactive it seems. Do you have a CVS setup privately? > Not to minimize your work, as I think it is great, but this particular > use-case I consider to be overkill for pl/java. It is probably easier to > use pl/pgsql if all you want to do is calculations. > Not to minimize your work, but if the only thing you want to do is to send a request to a servlet, that is very easy to do with Pl/Perl ;-) Seriously, when I say calculations, I mean any computed value that doesn't involve database accesses. It could for instance be an implementation of a soundex algorithm comparing two values or something similar like graphic image matching. Regardless if such things can be implemented in pgsql or not, the fact that there's a bunch of downloadable Java code out there that can be used, with little or no effort, is enough to motivate my statement. Regards, Thomas Hallgren
One perfectly good reason for this scenario is portability between postgres and any database implementing the standard (e.g. Oracle). cheers andrew Dave Cramer wrote: >Not to minimize your work, as I think it is great, but this particular >use-case I consider to be overkill for pl/java. It is probably easier to >use pl/pgsql if all you want to do is calculations. > >We had suggested an online chat to discuss this, when would you be >available for that? What timezone are you in. Laszlo is in hungary, and >I am in canada, so we are likely spread allover the the map . > >Dave > > >On Sun, 2004-02-22 at 12:28, Thomas Hallgren wrote: > > >>>1) Using JNI, you probably still want to communicate with another >>>running java process. >>> >>> >>B.T.W. I don't really agree on "probably". There are numerous cases when you >>will be happy just communicating with the database, communicate with another >>remote resource (message queue typically), or not communicate at all >>(calculations, etc.). >> >>Regards, >>Thomas Hallgren >> >> >> >>
tues evening euro time is fine with me. I am at GMT-5 so it will be afternoon for me. What time ? We should attempt an agenda? Dave On Sun, 2004-02-22 at 15:33, HORNYAK Laszlo wrote: > Hi all! > > Sorry for my latencies. > An IRC chat is ok for me, anytime. > > On Sun, Feb 22, 2004 at 08:08:00PM +0100, Thomas Hallgren wrote: > > I'm in Sweden. Some time tuesday evening (european time) perhaps? > > > > Why is your work not made public somewhere? The project on sourceforge is > > inactive it seems. Do you have a CVS setup privately? > > Yes, actualy, the sf.net cvs was used very rarely, so I simply droped it > after a while. Now we use the CVS on Dave`s server, but it will move to > a new server. > > > > > > Not to minimize your work, as I think it is great, but this particular > > > use-case I consider to be overkill for pl/java. It is probably easier to > > > use pl/pgsql if all you want to do is calculations. > > > > > Not to minimize your work, but if the only thing you want to do is to send a > > request to a servlet, that is very easy to do with Pl/Perl ;-) > > > > Seriously, when I say calculations, I mean any computed value that doesn't > > involve database accesses. It could for instance be an implementation of a > > soundex algorithm comparing two values or something similar like graphic > > image matching. Regardless if such things can be implemented in pgsql or > > not, the fact that there's a bunch of downloadable Java code out there that > > can be used, with little or no effort, is enough to motivate my statement. > > > > Java in the database has quite a lot of advantages, and most people > would prefer using java instead of learning one more language for stored > procedures. If we can show that it can be stable and portable, people > will love it. It is their problem what they use it for :)) > I think one could use it for sending data into message queues, call > validation with EJB methods, do complex analisis on it, check if a key > exists in another database(db platform independent distributed RDBMS), > or whatever, it would make a DB realy inteligent, and would help a lot > keeping 2 tier systems out of trouble. > /s/would/will > > Laszlo Hornyak > > > Regards, > > > > Thomas Hallgren > -- Dave Cramer 519 939 0336 ICQ # 14675561
Thomas Hallgren wrote: > 1. Select Pl/Java_JNI. > 2. Select Pl/Java_remote > 3. Choose both and agree on the SQL + Java semantics > 4. Make the postmaster spawn threads rather than processes > (controversial? Nah :-) ) Option 5 (or 0) would be to use GCJ. This is likely to be the fastest and most lightweight solution, but perhaps not the most featureful.
> Option 5 (or 0) would be to use GCJ. This is likely to be the fastest > and most lightweight solution, but perhaps not the most featureful. > GCJ is definitely an alternative for the reasons you mention. I didn't mention it (nor any other JVM) because I see it as one of several "JVM's" that Pl/Java should be able to use. It comes with JNI (and what they claim a much faster alternative). I'm currently looking into what's needed in order to use GCJ for Pl/Java_JNI. Regards, Thomas Hallgren
For me it would be fine at 7pm GMT Tuesday. Here's an attempted agenda: 1. Try to identify the common public interface (SQL and Java). This is the most important item in my view since it enables a user to seamlessly switch between our two solutions. You describe your design from this angle, and I describe mine. Let's also look at what the proposed standard say. 2. Find things that we can collaborate on. For instance, I have some of stuff written in Java that enables java classes to be stored and loaded from the database. You might have similar things or Java code that covers other areas that I can reuse. 3. Probe deeper and see if there's more that we can share (C-code essentially). I have my doubts about sharing C-code since we do things fundamentally different. I know you have a generic call mechanism that we could use to establish a common ground, but I think it would bad for both designs. We have different objectives. You strive to minimize the number of RPC calls. I strive to minimize call overhead and resource consumption. You probably have more ideas so please add to this. Please confirm the time and tell me where we meet (what IRC). Regards, Thomas Hallgren ----- Original Message ----- From: "Dave Cramer" <pg@fastcrypt.com> To: "HORNYAK Laszlo" <hornyakl@inf.elte.hu> Cc: "Thomas Hallgren" <thhal@mailblocks.com>; <pgsql-hackers@postgresql.org> Sent: Monday, February 23, 2004 13:41 Subject: Re: [HACKERS] Pl/Java - next step? > tues evening euro time is fine with me. I am at GMT-5 so it will be > afternoon for me. What time ? > > We should attempt an agenda? > > Dave > On Sun, 2004-02-22 at 15:33, HORNYAK Laszlo wrote: > > Hi all! > > > > Sorry for my latencies. > > An IRC chat is ok for me, anytime. > > > > On Sun, Feb 22, 2004 at 08:08:00PM +0100, Thomas Hallgren wrote: > > > I'm in Sweden. Some time tuesday evening (european time) perhaps? > > > > > > Why is your work not made public somewhere? The project on sourceforge is > > > inactive it seems. Do you have a CVS setup privately? > > > > Yes, actualy, the sf.net cvs was used very rarely, so I simply droped it > > after a while. Now we use the CVS on Dave`s server, but it will move to > > a new server. > > > > > > > > > Not to minimize your work, as I think it is great, but this particular > > > > use-case I consider to be overkill for pl/java. It is probably easier to > > > > use pl/pgsql if all you want to do is calculations. > > > > > > > Not to minimize your work, but if the only thing you want to do is to send a > > > request to a servlet, that is very easy to do with Pl/Perl ;-) > > > > > > Seriously, when I say calculations, I mean any computed value that doesn't > > > involve database accesses. It could for instance be an implementation of a > > > soundex algorithm comparing two values or something similar like graphic > > > image matching. Regardless if such things can be implemented in pgsql or > > > not, the fact that there's a bunch of downloadable Java code out there that > > > can be used, with little or no effort, is enough to motivate my statement. > > > > > > > Java in the database has quite a lot of advantages, and most people > > would prefer using java instead of learning one more language for stored > > procedures. If we can show that it can be stable and portable, people > > will love it. It is their problem what they use it for :)) > > I think one could use it for sending data into message queues, call > > validation with EJB methods, do complex analisis on it, check if a key > > exists in another database(db platform independent distributed RDBMS), > > or whatever, it would make a DB realy inteligent, and would help a lot > > keeping 2 tier systems out of trouble. > > /s/would/will > > > > Laszlo Hornyak > > > > > Regards, > > > > > > Thomas Hallgren > > > -- > Dave Cramer > 519 939 0336 > ICQ # 14675561 > >
Hello all, > > 3. Probe deeper and see if there's more that we can share (C-code > essentially). > I have my doubts about sharing C-code since we do things fundamentally > different. I know you have a generic call mechanism that we could use to > establish a common ground, but I think it would bad for both designs. We > have different objectives. You strive to minimize the number of RPC calls. I > strive to minimize call overhead and resource consumption. > I've been following this thread - and don't know much about either implementation. On the re-use front it would be VERY nice if you could somehow have a single patch for PostgreSQL's C code that called a set of Java interfaces. Then each of your implementations could implement that set of Java interfaces (one using JNI, the other using RMI). This would allow the user to swap between either implementation, but would also reduce the amount of similar C code in Postgres. Something I think the PostgreSQL hackers would much prefer. Later Rob
> On the re-use front it would be VERY nice if you could > somehow have a single patch for PostgreSQL's C code that called a set of > Java interfaces. Then each of your implementations could implement that set > of Java interfaces (one using JNI, the other using RMI). This would allow > the user to swap between either implementation, but would also reduce the > amount of similar C code in Postgres. Something I think the PostgreSQL > hackers would much prefer. > > Later > Rob > I understand you concern. I'm all for code reuse and all the advantages that it will bring. In my experience however, the design patterns used for solutions that involve RPC differs a great deal from the ones used when you have in-process calls. The driving forces are quite different. Let me give you a concrete example. Let's assume that we implement a trigger function, triggered before update and on each row. Using RPC, you'd like to minimize the number of calls that are made between the two processes. Ideally, you'd like to have one call only. This can be achieved by packing all information you have in one structure (the old row, the new row, parameters etc.) and pass that data, by value, to the remote process. In the remote process, all options are now open. You can read parameters, the old row, and the new row, etc. Typically, some change would be made to the new row and it would be sent back to the caller, again the data is passed by value and streamed. Using in-process calls, you'd like to minimize resource consumption. Thus, you want minimize copying of data and you want to make data available on demand. So, a JNI solution would typically wrap the TriggerData in a Java object with accessor methods that enables the Java developer to obtain the old row, the new row, and the parameters. An old row would be a wrapper of the actual HeapTuple contained in the TriggerData etc.. No copies anywhere and no streaming. But a radically increased number of calls between the Java and the C domain compared to the RPC solution. Now, consider that the one and only motivation for the JNI approach is to have extremely fast integration between C and Java. This is accomplished at the cost of resource consumption caused by multiple JVM's. Also take into account that the major drawback with the RPC approach is the high number of RPC calls that will be the result of some scenarios. It becomes very clear (at least to me) that in order to get the best out of each solution, it's essential that we use different design patterns. Otherwise, we get a situation where optimizing the former means degrading the latter. IMO, we can make the solutions exactly similar from a Pl/Java user's perspective, and when it comes to all Java code used to administer each solution, but not in C-code. Regards, Thomas Hallgren
Hi all! Sorry for my latencies. An IRC chat is ok for me, anytime. On Sun, Feb 22, 2004 at 08:08:00PM +0100, Thomas Hallgren wrote: > I'm in Sweden. Some time tuesday evening (european time) perhaps? > > Why is your work not made public somewhere? The project on sourceforge is > inactive it seems. Do you have a CVS setup privately? Yes, actualy, the sf.net cvs was used very rarely, so I simply droped it after a while. Now we use the CVS on Dave`s server, but it will move to a new server. > > > Not to minimize your work, as I think it is great, but this particular > > use-case I consider to be overkill for pl/java. It is probably easier to > > use pl/pgsql if all you want to do is calculations. > > > Not to minimize your work, but if the only thing you want to do is to send a > request to a servlet, that is very easy to do with Pl/Perl ;-) > > Seriously, when I say calculations, I mean any computed value that doesn't > involve database accesses. It could for instance be an implementation of a > soundex algorithm comparing two values or something similar like graphic > image matching. Regardless if such things can be implemented in pgsql or > not, the fact that there's a bunch of downloadable Java code out there that > can be used, with little or no effort, is enough to motivate my statement. > Java in the database has quite a lot of advantages, and most people would prefer using java instead of learning one more language for stored procedures. If we can show that it can be stable and portable, people will love it. It is their problem what they use it for :)) I think one could use it for sending data into message queues, call validation with EJB methods, do complex analisis on it, check if a key exists in another database(db platform independent distributed RDBMS), or whatever, it would make a DB realy inteligent, and would help a lot keeping 2 tier systems out of trouble. /s/would/will Laszlo Hornyak > Regards, > > Thomas Hallgren
Tom Lane wrote: > "Thomas Hallgren" <thhal@mailblocks.com> writes: > > ** 4. Make the postmaster spawn threads rather than processes ** > > I know this is very controversial and perhaps I should not bring it up at > > all. But then again, why not? Most readers are open-minded right? > > It's been considered and rejected before, and pljava isn't going to tilt > the scales. In fact, the main thing that bothers me about your > description of JNI is "Java uses multithreading wether you like it or > not". I am very afraid of what impact a JVM will have on the stability > of the surrounding backend. > > Other than that fear, though, the JNI approach seems to have pretty > considerable advantages. You listed startup time as the main > disadvantage, but perhaps that could be worked around. Suppose the > postmaster started a JVM --- would that state inherit correctly into > subsequently forked backends? > > Also, regarding your option #3 (do both), do you really think something > different is going to happen in practice? The developers of the other > implementation aren't likely to give it up just because yours exists. As I understand it, the JNI approach has one JVM per backend using java, while the Java/remote approach uses a single JVM for all backends and isolates them via classes. JNI says function execution will be faster and cleaner, while Java/remote feels system resource usage and startup time will be less. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Mon, Feb 23, 2004 at 05:14:09PM +0100, Peter Eisentraut wrote: > Thomas Hallgren wrote: > > 1. Select Pl/Java_JNI. > > 2. Select Pl/Java_remote > > 3. Choose both and agree on the SQL + Java semantics > > 4. Make the postmaster spawn threads rather than processes > > (controversial? Nah :-) ) > > Option 5 (or 0) would be to use GCJ. This is likely to be the fastest > and most lightweight solution, but perhaps not the most featureful. Hm, last time I tried this it just SIGSEGV'd the backend after loading libgcj.so or something like that. I didn't peek further because I feel strange in Java land. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) La web junta la gente porque no importa que clase de mutante sexual seas, tienes millones de posibles parejas. Pon "buscar gente que tengan sexo con ciervos incendiánse", y el computador dirá "especifique el tipo de ciervo" (Jason Alexander)