Thread: java stored procedures
Hi! A few months ago I asked if anyone started working on PL/JAVA, the ansver was no. Now I started to write a java stored procedure language and environment for PostgreSQL. Some code is already working, and it is geting interresting. So, I would like to ask you to write me your ideas, suggestions, etc for this environment. The source code will be available under GPL when it is worth for distributing it (this will take for a while). thanks. Laszlo Hornyak
Thanks for your work on it; although I'm not a java programmer, I'm sure a number of people will find this useful. I do have a small licensing concern though. Postgresql is currently distributed under a BSD-style license. If your goal is to have this included with the Postgresql distribution, it would probably keep things a lot simpler if the Java extension were kept under a compatible license, or even the same one. Please note, I am NOT trying start a flame war over which license is better; I'm just proposing that major software packages like Postgresql use a consistent licensing scheme. Sort of like Perl modules are typically licensed "under the same terms as Perl itself" so as to guarantee they don't diverge from perl, and people aren't surprised by a different license agreement when they add another perl module from CPAN. Thanks again, Wes Sheldahl Laszlo Hornyak <hornyakl%freemail.hu@interlock.lexmark.com> on 11/30/2001 04:14:12 AM Please respond to hornyakl%users.sourceforge.net@interlock.lexmark.com To: pgsql-general%postgresql.org@interlock.lexmark.com cc: (bcc: Wesley Sheldahl/Lex/Lexmark) Subject: [GENERAL] java stored procedures Hi! A few months ago I asked if anyone started working on PL/JAVA, the ansver was no. Now I started to write a java stored procedure language and environment for PostgreSQL. Some code is already working, and it is geting interresting. So, I would like to ask you to write me your ideas, suggestions, etc for this environment. The source code will be available under GPL when it is worth for distributing it (this will take for a while). thanks. Laszlo Hornyak ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
Laszlo, In my mind it would be more useful if this code was under the same license as the rest of postgresql. That way it could become part of the product as opposed to always being a separate component. (Just like plpgsql, pltcl and the other procedural languages). thanks, --Barry Laszlo Hornyak wrote: > Hi! > > A few months ago I asked if anyone started working on PL/JAVA, the > ansver was no. Now I started to write a java stored procedure language > and environment for PostgreSQL. Some code is already working, and it is > geting interresting. So, I would like to ask you to write me your ideas, > suggestions, etc for this environment. > The source code will be available under GPL when it is worth for > distributing it (this will take for a while). > thanks. > > Laszlo Hornyak > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
Laszlo, I think it would help a lot if you could take a little time to write down what your planned architecture for a pljava would be. It then becomes much easier for myself and probably others reading these lists to make suggestions on ways to improve what you are planning (or possible problems with your strategy). Without knowing what exactly you are thinking of doing it is difficult to comment. But let me try throwing out a few thoughts about how I think this should be done. First question is how will the jvm be run? Since postgres is a multiprocess implementation (i.e. each connection has a separate process on the server) and since java is a multithreaded implementation (i.e. one process supporting multiple threads), what should the pljava implementation look like? I think there should be a single jvm process for the entire db server that each postgresql process connects to through sockets/rmi. It will be too expensive to create a new jvm process for each postgresql connection (expensive in both terms of memory and cpu, since the startup time for the jvm is significant and it requires a lot of memory). Having one jvm that all the postgres backend processes communicate with makes the whole feature much more complicated, but is necessary in my opinion. Then the question becomes how does the jvm process interact with the database since they are two different processes. You will need some sort of interprocess communication between the two to execute sql statements. This could be accomplished by using the existing jdbc driver. But the bigest problem here is getting the transaction semantics right. How does a sql statement being run by a java stored procedure get access to the same connection/transaction as the original client? What you don't want happening is that sql issued in a stored java procedure executes in a different transaction as the caller, what would rollback of the stored function call mean in that case? I am very interested in hearing what your plans are for pl/java. I think this is a very difficult project, but one that would be very useful and welcome. thanks, --Barry Laszlo Hornyak wrote: > Hi! > > I am such a lame in the licensing area. As much as I know, BSD license > is more free than GPL. I think it is too early to think about licensing, > but it`s ok, you won :), when it will be ready(or it will seem to get > closer to a working thing, currently it looks more like a interresting > test), I will ask you if you want to distribute it with Postgres, and if > you say yes, the license will be the same as Postgresql`s license. > Anyway is this neccessary when it is the part of the distribution? > Is this ok for you? > > thanks, > Laszlo Hornyak > > ps: still waiting for your ideas, suggestions, etc :) I am not memeber > of the mailing list, please write me dirrectly! > > Barry Lind wrote: > >> Laszlo, >> >> In my mind it would be more useful if this code was under the same >> license as the rest of postgresql. That way it could become part of >> the product as opposed to always being a separate component. (Just >> like plpgsql, pltcl and the other procedural languages). >> >> thanks, >> --Barry >> >> > >
Barry Lind <barry@xythos.com> writes: > Having one jvm that all the postgres backend processes communicate with makes > the whole feature much more complicated, but is necessary in my opinion. Agreed. Also, the JVM is a multithreaded app, and running it inside a non-threaded program (the backend) might cause problems. > Then the question becomes how does the jvm process interact with the database > since they are two different processes. You will need some sort of > interprocess communication between the two to execute sql statements. This > could be accomplished by using the existing jdbc driver. But the bigest > problem here is getting the transaction semantics right. How does a sql > statement being run by a java stored procedure get access to the same > connection/transaction as the original client? What you don't want happening > is that sql issued in a stored java procedure executes in a different > transaction as the caller, what would rollback of the stored function call > mean in that case? I think you would have to to expose the SPI layer to Java running in a separate process, either using an RMI server written in C or a custom protocol over a TCP socket (Java of course can't do Unix sockets). This raises some thorny issues of authentication and security but I don't think they're insurmountable. You could, for example, create a cryptographically strong "cookie" in the backend when a Java function is called. The cookie would be passed to the Java function when it gets invoked, and then must be passed back to the SPI layer in order for the latter to accept the call. A bit clunky but should be safe as far as I can see. The cookie would be needed anyhow, I think, in order for the SPI layer to be able to find the transaction that the Java function was originally invoked in. You could make the SPI layer stuff look like a normal JDBC driver to user code--PL/Perl does this kind of thing with the Perl DBI interface. -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
Laszlo, > I am very far from features like this. > PL/JAVA now: > -there is a separate process running java (kaffe). this process creates > a sys v message queue, that holds requests. almost forgot, a shared > memory segment too. I didn`t find better way to tell postgres the > informations about the java process. Does the mechanism you are planning support running any JVM? In my opionion Kaffe isn't good enough to be widely useful. I think you should be able to plugin whatever jvm is best on your platform, which will likely be either the Sun or IBM JVMs. Also, can you explain this a little bit more. How does the jvm process get started? (I would hope that the postgresql server processes would start it when needed, as opposed to requiring that it be started separately.) How does the jvm access these shared memory structures? Since there aren't any methods in the java API to do such things that I am aware of. > -the java request_handler function on the server side attaches to the > shared memory, reads the key of the message queue., attaches to it, > sends the data of the function, and a signal for the pl/java. after, it > is waiting for a signal from the java thread. I don't understand how you do this in java? I must not be understanding something correctly here. > -when java thread receives the signal, it reads the message(s) from the > queue, and starts some actions. When done it tells postgres with a > signal that it is ready, and it can come for its results. This will be > rewritten see below problems. Are signals the best way to accomplish this? > -And postgres is runing, while java is waiting for postgres to say > something. But in reality if the postgres process is executing a stored function it needs to wait for the result of that function call before continuing doesn't it? > > Threading on the java process side is not done yet, ok, it is not that > hard, I will write it, if it will be realy neccessary. Agreed, this is important. > > The problems, for now: > I had a very simple system, that passed a very limited scale of argument > types, with a very limited quantity of parameters (int, varchar, bool). > Postgres has limits for the argument count too, but not for types. It > had too much limits, so I am working (or to tell the truth now only > thinking) on a new type handling that fits the felxibility of > Postgresql`s type flexibility. For this I will have to learn a lot about > Postgres`s type system. This will be my program this weekend. :) Shouldn't this code use all or most of the logic found in the FE/BE protocol? Why invent and code another mechanism to transfer data when one already exists. (I will admit that the current FE/BE mechanism isn't the ideal choice, but it seems easier to reuse what exists for now and improve on it later). > > thanks, > Laszlo Hornyak > You didn't mention how you plan to deal with the transaction symantics. So what happens when the pl/java function calls through jdbc back to the server to insert some data? That should happen in the same transaction as the caller correct? thanks, --Barry
* Barry Lind <barry@xythos.com> wrote: | | possible problems with your strategy). Without knowing what exactly | you are thinking of doing it is difficult to comment. Agreed. | Having one jvm that all the postgres backend processes communicate | with makes the whole feature much more complicated, but is necessary | in my opinion. I'm not quite sure if I agree here. Startup time is not an issue if you are using connection pooling on the client and memory is cheap ;-) Having a separate process would indeed introduce overhead where you don't want it. In the critical path when executing queries. | I am very interested in hearing what your plans are for pl/java. I | think this is a very difficult project, but one that would be very | useful and welcome. I would very much like to hear about the plans myself. -- Gunnar Rønning - gunnar@polygnosis.com Senior Consultant, Polygnosis AS, http://www.polygnosis.com/
Hi! Barry Lind wrote: > Laszlo, > > I think it would help a lot if you could take a little time to write > down what your planned architecture for a pljava would be. It then > becomes much easier for myself and probably others reading these lists > to make suggestions on ways to improve what you are planning (or > possible problems with your strategy). Without knowing what exactly > you are thinking of doing it is difficult to comment. > > > But let me try throwing out a few thoughts about how I think this > should be done. > > First question is how will the jvm be run? Since postgres is a > multiprocess implementation (i.e. each connection has a separate > process on the server) and since java is a multithreaded > implementation (i.e. one process supporting multiple threads), what > should the pljava implementation look like? I think there should be a > single jvm process for the entire db server that each postgresql > process connects to through sockets/rmi. It will be too expensive to > create a new jvm process for each postgresql connection (expensive in > both terms of memory and cpu, since the startup time for the jvm is > significant and it requires a lot of memory). I absolutely agree. OK, it`s done. So, a late-night-brainstorming here: What I would like to see in PL/JAVA is the object oriented features, that makes postgresql nice. Creating a new table creates a new class in the java side too. Instantiating an object of the newly created class inserts a row into the table. In postgresql tables can be inherited, and this could be easyly done by pl/java too. I think this would look nice. But this is not the main feature. Why I would like to see a nice java procedural language inside postgres is java`s advanced communication features (I mean CORBA, jdbc, other protocols). This is the sugar in the caffe. I am very far from features like this. PL/JAVA now: -there is a separate process running java (kaffe). this process creates a sys v message queue, that holds requests. almost forgot, a shared memory segment too. I didn`t find better way to tell postgres the informations about the java process. -the java request_handler function on the server side attaches to the shared memory, reads the key of the message queue., attaches to it, sends the data of the function, and a signal for the pl/java. after, it is waiting for a signal from the java thread. -when java thread receives the signal, it reads the message(s) from the queue, and starts some actions. When done it tells postgres with a signal that it is ready, and it can come for its results. This will be rewritten see below problems. -And postgres is runing, while java is waiting for postgres to say something. Threading on the java process side is not done yet, ok, it is not that hard, I will write it, if it will be realy neccessary. The problems, for now: I had a very simple system, that passed a very limited scale of argument types, with a very limited quantity of parameters (int, varchar, bool). Postgres has limits for the argument count too, but not for types. It had too much limits, so I am working (or to tell the truth now only thinking) on a new type handling that fits the felxibility of Postgresql`s type flexibility. For this I will have to learn a lot about Postgres`s type system. This will be my program this weekend. :) thanks, Laszlo Hornyak
Hi! I am such a lame in the licensing area. As much as I know, BSD license is more free than GPL. I think it is too early to think about licensing, but it`s ok, you won :), when it will be ready(or it will seem to get closer to a working thing, currently it looks more like a interresting test), I will ask you if you want to distribute it with Postgres, and if you say yes, the license will be the same as Postgresql`s license. Anyway is this neccessary when it is the part of the distribution? Is this ok for you? thanks, Laszlo Hornyak ps: still waiting for your ideas, suggestions, etc :) I am not memeber of the mailing list, please write me dirrectly! Barry Lind wrote: > Laszlo, > > In my mind it would be more useful if this code was under the same > license as the rest of postgresql. That way it could become part of > the product as opposed to always being a separate component. (Just > like plpgsql, pltcl and the other procedural languages). > > thanks, > --Barry > >
Hi! Barry Lind wrote: > Does the mechanism you are planning support running any JVM? In my > opionion Kaffe isn't good enough to be widely useful. I think you > should be able to plugin whatever jvm is best on your platform, which > will likely be either the Sun or IBM JVMs. Ok, I also had problems with caffe, but it may work. I like it becouse it is small (the source is about 6M). As much as I know Java VM`s has a somewhat standard native interface called JNI. I use this to start the VM, and communicate with it. If you think I should change I will do it, but it may take a long time to get the new VM. For then I have to run kaffe. > Also, can you explain this a little bit more. How does the jvm > process get started? (I would hope that the postgresql server > processes would start it when needed, as opposed to requiring that it > be started separately.) How does the jvm access these shared memory > structures? Since there aren't any methods in the java API to do such > things that I am aware of. JVM does not. 'the java process' does with simple posix calls. I use debian potatoe, on any other posix system it should work, on any other somewhat posix compatible system it may work, I am not sure... > > I don't understand how you do this in java? I must not be > understanding something correctly here. My failure. The 'java request_handler' is not a java function, it is the C call_handler in the Postgres side, that is started when a function of language 'pljava' is called. I made some failure in my previous mail. At home I named the pl/java language pl/pizza (something that is not caffe, but well known enough :). The application has two running binaries: -pizza (which was called 'java process' last time) This is a small C program that uses JNI to start VM and call java methods. -plpizza.so the shared object that contains the call_handler function. > > >> -when java thread receives the signal, it reads the message(s) from >> the queue, and starts some actions. When done it tells postgres with >> a signal that it is ready, and it can come for its results. This will >> be rewritten see below problems. > > > > Are signals the best way to accomplish this? I don`t know if it is the best, it is the only way I know :) Do you know any other ways? > > >> -And postgres is runing, while java is waiting for postgres to say >> something. > > But in reality if the postgres process is executing a stored function > it needs to wait for the result of that function call before > continuing doesn't it? Surely, this is done. How could Postgres tell the result anyway ? :) > >> >> Threading on the java process side is not done yet, ok, it is not >> that hard, I will write it, if it will be realy neccessary. > > Agreed, this is important. > > Shouldn't this code use all or most of the logic found in the FE/BE > protocol? Why invent and code another mechanism to transfer data when > one already exists. (I will admit that the current FE/BE mechanism > isn't the ideal choice, but it seems easier to reuse what exists for > now and improve on it later). Well, I am relatively new to Postgres, and I don`t know these protocols. In the weekend I will start to learn it, and in Sunday or Monday I maybe I will understand it, if not, next weekend.. > > You didn't mention how you plan to deal with the transaction > symantics. So what happens when the pl/java function calls through > jdbc back to the server to insert some data? That should happen in > the same transaction as the caller correct? I don`t think this will be a problem, I have ideas for this. Idea mean: I know how I will start it, it may be good, or it may be fataly stupid idea, it will turn out when I tried it. Simply: The same way plpizza tells pizza the request, pizza can talk back to plpizza. This is planed to work with similar mechanism I described last time (shm+signals). Monday I will try to send a little pieces of code to make thing clear, ok? thanks, Laszlo Hornyak