Thread: pljava revisited
Hi, I'm working on a new pl/java prototype that I hope will become production quality some time in the future. Before my project gets to far, I'd like to gather some input from other users. I've taken a slightly different approach than what seems to be the case for other attempts that I've managed to dig up. Here's some highlights in my approach: 1. A new Java VM is spawned for each connection. I know that this will give a performance hit when a new connection is created. The alternative however, implies that all calls becomes inter-process calls which I think is a much worse scenario. Especially since most modern environments today has some kind of connection pooling. Another reason is that the connections represents sessions and those sessions gets a very natural isolation using separate VM's. A third reason is that the "current connection" would become unavailable in a remote process (see #5). 2. There's no actual Java code in the body of a function. Simply a reference to a static method. My reasoning is that when writing (and debugging) java, you want to use your favorite IDE. Mixing Java with SQL just gets messy. 3. As opposed to the Tcl, Python, and Perl, that for obvious reasons uses strings, my pl/java will use native types wherever possible. A flag can be added to the function definition if real objects are preferred instead of primitives (motivated by the fact that the primitives cannot reflect NULL values). 4. The code is actually written using JNI and C++ but without any templates, no &-style object references, no operator overloads, external class libraries etc. I use C++ simply to get better quality, readability and structure on the code. 5. I plan to write a JDBC layer using JNI on top of the SPI calls to enable JDBC functionality on the current connection. Some things will be limited (begin/commit etc. will not be possible to do here for instance). Current status is that my first calls from Postgres to Java has been made. Lot's of work remain. What are your thoughts and ideas? Thomas Hallgren
Thomas Hallgren wrote: >Hi, >I'm working on a new pl/java prototype that I hope will become production >quality some time in the future. Before my project gets to far, I'd like to >gather some input from other users. I've taken a slightly different approach >than what seems to be the case for other attempts that I've managed to dig >up. Here's some highlights in my approach: > >1. A new Java VM is spawned for each connection. I know that this will give >a performance hit when a new connection is created. The alternative however, >implies that all calls becomes inter-process calls which I think is a much >worse scenario. Especially since most modern environments today has some >kind of connection pooling. Another reason is that the connections >represents sessions and those sessions gets a very natural isolation using >separate VM's. A third reason is that the "current connection" would become >unavailable in a remote process (see #5). > Maybe on-demand might be better - if the particular backend doesn't need it why incur the overhead? > >2. There's no actual Java code in the body of a function. Simply a reference >to a static method. My reasoning is that when writing (and debugging) java, >you want to use your favorite IDE. Mixing Java with SQL just gets messy. > Perhaps an example or two might help me understand better how this would work. > >3. As opposed to the Tcl, Python, and Perl, that for obvious reasons uses >strings, my pl/java will use native types wherever possible. A flag can be >added to the function definition if real objects are preferred instead of >primitives (motivated by the fact that the primitives cannot reflect NULL >values). > >4. The code is actually written using JNI and C++ but without any templates, >no &-style object references, no operator overloads, external class >libraries etc. I use C++ simply to get better quality, readability and >structure on the code. > Other pl* (perl, python, tcl) languages have vanilla C glue code. Might be better to stick to this. If you aren't using advanced C++ features that shouldn't be too hard - well structured C can be just as readable as well structured C++. At the very lowest level, about the only things C++ buys you are the ability to declare variables in arbitrary places, and // style comments. > >5. I plan to write a JDBC layer using JNI on top of the SPI calls to enable >JDBC functionality on the current connection. Some things will be limited >(begin/commit etc. will not be possible to do here for instance). > Again. examples would help me understand better. Is there a web page for your project? cheers andrew
On Dec 10, 2003, at 11:23 AM, Andrew Dunstan wrote: > Thomas Hallgren wrote: > >> Hi, >> I'm working on a new pl/java prototype that I hope will become >> production >> quality some time in the future. Before my project gets to far, I'd >> like to >> gather some input from other users. I've taken a slightly different >> approach >> than what seems to be the case for other attempts that I've managed >> to dig >> up. Here's some highlights in my approach: >> >> 1. A new Java VM is spawned for each connection. I know that this >> will give >> a performance hit when a new connection is created. The alternative >> however, >> implies that all calls becomes inter-process calls which I think is a >> much >> worse scenario. Especially since most modern environments today has >> some >> kind of connection pooling. Another reason is that the connections >> represents sessions and those sessions gets a very natural isolation >> using >> separate VM's. A third reason is that the "current connection" would >> become >> unavailable in a remote process (see #5). >> > > Maybe on-demand might be better - if the particular backend doesn't > need it why incur the overhead? > I think a JVM per connection is going to add too much overhead, even if its on-demand. Some platforms handle multiple JVMs better than others, but still. 25 or so individual JVMs is going to be a mess, in terms of resource consumption. Start time/connect time will be an issue. Saying 'people use pools', while generally accurate, kind of sweeps the problem under the carpet instead of the dust bin. >> >> 2. There's no actual Java code in the body of a function. Simply a >> reference >> to a static method. My reasoning is that when writing (and debugging) >> java, >> you want to use your favorite IDE. Mixing Java with SQL just gets >> messy. >> > > > Perhaps an example or two might help me understand better how this > would work. > >> >> 3. As opposed to the Tcl, Python, and Perl, that for obvious reasons >> uses >> strings, my pl/java will use native types wherever possible. A flag >> can be >> added to the function definition if real objects are preferred >> instead of >> primitives (motivated by the fact that the primitives cannot reflect >> NULL >> values). >> >> 4. The code is actually written using JNI and C++ but without any >> templates, >> no &-style object references, no operator overloads, external class >> libraries etc. I use C++ simply to get better quality, readability and >> structure on the code. >> > > Other pl* (perl, python, tcl) languages have vanilla C glue code. > Might be better to stick to this. If you aren't using advanced C++ > features that shouldn't be too hard - well structured C can be just as > readable as well structured C++. At the very lowest level, about the > only things C++ buys you are the ability to declare variables in > arbitrary places, and // style comments. > Agreed. Given that the rest of the code base is C....I would imagine that the Powers that Be would frown a bit on merging C++ code in, and relegate it to contrib for eternity... Not knocking the idea, mind you - I think it would be great if it can be pulled off. Was thinking about it myself as a way to learn more of the backend code and scrape the thick layer of rust off of my C skills. Would like to see where you are with it. >> >> 5. I plan to write a JDBC layer using JNI on top of the SPI calls to >> enable >> JDBC functionality on the current connection. Some things will be >> limited >> (begin/commit etc. will not be possible to do here for instance). >> > > Again. examples would help me understand better. > > Is there a web page for your project? > > > cheers > > andrew > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > -------------------- Andrew Rawnsley President The Ravensfield Digital Resource Group, Ltd. (740) 587-0114 www.ravensfield.com
The JVM will be started on-demand. Although I realize that one JVM per connection will consume a fair amount of resources, I still think it is the best solution. The description of this system must of course make it very clear that this is what happens and ultimately provide the means of tuning the JVM's as much as possible. I advocate this solution because I think that the people that has the primary interest of a pl/java will be those who write enterprise systems using Java. J2EE systems are always equipped with connection pools. But, I'm of course open for other alternatives. Let's say that there's a JVM with a thread-pool that the Postgress sessions will connect to using some kind of RPC. This implies that each call will have an overhead of at least 2 OS context switches. Compared to in-process calls, this will severely crippel the performance. How do you suggest that we circumvent this problem? Antother problem is that we will immeditately loose the ability to use the "current connection" provided by the SPI interfaces. We can of course establish a back-channel to the original process but that will incure even more performance hits. A third alternative is to establish brand new connections in the remote JVM. Problem then is to propagate the transaction context correctly. Albeit solvable, the performance using distributed transactions will be much worse than in-process. How do we solve this? C++ or C is not a big issue. I might rewrite it into pure C. The main reason for C++ is to be able to use objects with virtual methods. I know how to do that in C too but I don't quite agree that its "just as clean" :-) - thomas > I think a JVM per connection is going to add too much overhead, even if > its on-demand. Some platforms handle > multiple JVMs better than others, but still. 25 or so individual JVMs > is going to be a mess, in terms of resource consumption. > > Start time/connect time will be an issue. Saying 'people use pools', > while generally accurate, kind of sweeps the problem > under the carpet instead of the dust bin. > > >> > >> 2. There's no actual Java code in the body of a function. Simply a > >> reference > >> to a static method. My reasoning is that when writing (and debugging) > >> java, > >> you want to use your favorite IDE. Mixing Java with SQL just gets > >> messy. > >> > > > > > > Perhaps an example or two might help me understand better how this > > would work. > > > >> > >> 3. As opposed to the Tcl, Python, and Perl, that for obvious reasons > >> uses > >> strings, my pl/java will use native types wherever possible. A flag > >> can be > >> added to the function definition if real objects are preferred > >> instead of > >> primitives (motivated by the fact that the primitives cannot reflect > >> NULL > >> values). > >> > >> 4. The code is actually written using JNI and C++ but without any > >> templates, > >> no &-style object references, no operator overloads, external class > >> libraries etc. I use C++ simply to get better quality, readability and > >> structure on the code. > >> > > > > Other pl* (perl, python, tcl) languages have vanilla C glue code. > > Might be better to stick to this. If you aren't using advanced C++ > > features that shouldn't be too hard - well structured C can be just as > > readable as well structured C++. At the very lowest level, about the > > only things C++ buys you are the ability to declare variables in > > arbitrary places, and // style comments. > > > > Agreed. Given that the rest of the code base is C....I would imagine > that the Powers that Be would frown a bit on merging > C++ code in, and relegate it to contrib for eternity... > > Not knocking the idea, mind you - I think it would be great if it can > be pulled off. Was thinking about it myself as a way to learn more > of the backend code and scrape the thick layer of rust off of my C > skills. Would like to see where you are with it. > > > >> > >> 5. I plan to write a JDBC layer using JNI on top of the SPI calls to > >> enable > >> JDBC functionality on the current connection. Some things will be > >> limited > >> (begin/commit etc. will not be possible to do here for instance). > >> > > > > Again. examples would help me understand better. > > > > Is there a web page for your project? > > > > > > cheers > > > > andrew > > > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 5: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/docs/faqs/FAQ.html > > > -------------------- > > Andrew Rawnsley > President > The Ravensfield Digital Resource Group, Ltd. > (740) 587-0114 > www.ravensfield.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
Thomas Hallgren wrote: >The JVM will be started on-demand. >Although I realize that one JVM per connection will consume a fair amount of >resources, I still think it is the best solution. The description of this >system must of course make it very clear that this is what happens and >ultimately provide the means of tuning the JVM's as much as possible. > >I advocate this solution because I think that the people that has the >primary interest of a pl/java will be those who write enterprise systems >using Java. J2EE systems are always equipped with connection pools. > Yes, but as was pointed out even if I use connection pooling I would rather not have, say, 25 JVMs loaded if I can help it. > >But, I'm of course open for other alternatives. Let's say that there's a JVM >with a thread-pool that the Postgress sessions will connect to using some >kind of RPC. This implies that each call will have an overhead of at least 2 >OS context switches. Compared to in-process calls, this will severely >crippel the performance. How do you suggest that we circumvent this problem? > Context switches are not likely to be more expensive that loading an extra JVM, I suspect. Depending on your OS/hw they can be incredibly cheap, in fact. > >Antother problem is that we will immeditately loose the ability to use the >"current connection" provided by the SPI interfaces. We can of course >establish a back-channel to the original process but that will incure even >more performance hits. A third alternative is to establish brand new >connections in the remote JVM. Problem then is to propagate the transaction >context correctly. Albeit solvable, the performance using distributed >transactions will be much worse than in-process. How do we solve this? > We are theorising ahead of data, somewhat. My suggestion would be to continue in the direction you are going, and later, when you can, stress test it. Ideally, if you then need to move to a shared JVM this would be transparent to upper levels of the code. > >C++ or C is not a big issue. I might rewrite it into pure C. The main reason >for C++ is to be able to use objects with virtual methods. I know how to do >that in C too but I don't quite agree that its "just as clean" :-) > > Maybe not, but it's what is used in the core Pg distribution. Go with the flow :-) cheers andrew
On Dec 10, 2003, at 1:51 PM, Andrew Dunstan wrote: > Thomas Hallgren wrote: > >> The JVM will be started on-demand. >> Although I realize that one JVM per connection will consume a fair >> amount of >> resources, I still think it is the best solution. The description of >> this >> system must of course make it very clear that this is what happens and >> ultimately provide the means of tuning the JVM's as much as possible. >> >> I advocate this solution because I think that the people that has the >> primary interest of a pl/java will be those who write enterprise >> systems >> using Java. J2EE systems are always equipped with connection pools. >> > > Yes, but as was pointed out even if I use connection pooling I would > rather not have, say, 25 JVMs loaded if I can help it. > Its also a bit of a solution by circumstance, rather that a solution by design. >> >> But, I'm of course open for other alternatives. Let's say that >> there's a JVM >> with a thread-pool that the Postgress sessions will connect to using >> some >> kind of RPC. This implies that each call will have an overhead of at >> least 2 >> OS context switches. Compared to in-process calls, this will severely >> crippel the performance. How do you suggest that we circumvent this >> problem? >> My comments here are pretty off the cuff. You've thought about this far more than I have. > > > Context switches are not likely to be more expensive that loading an > extra JVM, I suspect. Depending on your OS/hw they can be incredibly > cheap, in fact. > >> >> Antother problem is that we will immeditately loose the ability to >> use the >> "current connection" provided by the SPI interfaces. We can of course >> establish a back-channel to the original process but that will incure >> even >> more performance hits. A third alternative is to establish brand new >> connections in the remote JVM. Problem then is to propagate the >> transaction >> context correctly. Albeit solvable, the performance using distributed >> transactions will be much worse than in-process. How do we solve this? >> > > We are theorising ahead of data, somewhat. My suggestion would be to > continue in the direction you are going, and later, when you can, > stress test it. Ideally, if you then need to move to a shared JVM this > would be transparent to upper levels of the code. > Agreed - sounds like you've done a fair amount of ground work. I at least am interested in where you're going with it. >> >> C++ or C is not a big issue. I might rewrite it into pure C. The main >> reason >> for C++ is to be able to use objects with virtual methods. I know how >> to do >> that in C too but I don't quite agree that its "just as clean" :-) >> > > Maybe not, but it's what is used in the core Pg distribution. Go with > the flow :-) > > cheers > > andrew > > > ---------------------------(end of > broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > -------------------- Andrew Rawnsley President The Ravensfield Digital Resource Group, Ltd. (740) 587-0114 www.ravensfield.com
Andrew Rawnsley wrote: >> Other pl* (perl, python, tcl) languages have vanilla C glue code. >> Might be better to stick to this. If you aren't using advanced C++ >> features that shouldn't be too hard - well structured C can be just as >> readable as well structured C++. At the very lowest level, about the >> only things C++ buys you are the ability to declare variables in >> arbitrary places, and // style comments. >> > > Agreed. Given that the rest of the code base is C....I would imagine > that the Powers that Be would frown a bit on merging > C++ code in, and relegate it to contrib for eternity... It will probably have to live on GBorg right from the beginning anyway, so "the Powers" might not care at all. Thus far _all_ procedural languages are loadable modules. VM or not, I don't see why this one would be any different. That also answers the "on demand" question to some extent, doesn't it? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Andrew Dunstan <andrew@dunslane.net> writes: > Thomas Hallgren wrote: >> C++ or C is not a big issue. I might rewrite it into pure C. The main reason >> for C++ is to be able to use objects with virtual methods. I know how to do >> that in C too but I don't quite agree that its "just as clean" :-) > Maybe not, but it's what is used in the core Pg distribution. Go with > the flow :-) If you have any hope of someday seeing pljava merged into the main PG distribution, you had better stick to C. IMHO there would be essentially no chance of adopting a module that requires C++, simply because the additional configuration and portability work would be too much of a pain in the neck. libpq++ got heaved overboard largely because the autoconf burden for it was too high, and we're unlikely to look favorably on something that would make us put that back in. Of course, if you don't think pljava will ever become mainstream, this argument won't have much force to you ... regards, tom lane
On Wed, 2003-12-10 at 13:04, Jan Wieck wrote: > Andrew Rawnsley wrote: > > >> Other pl* (perl, python, tcl) languages have vanilla C glue code. > >> Might be better to stick to this. If you aren't using advanced C++ > >> features that shouldn't be too hard - well structured C can be just as > >> readable as well structured C++. At the very lowest level, about the > >> only things C++ buys you are the ability to declare variables in > >> arbitrary places, and // style comments. > >> > > > > Agreed. Given that the rest of the code base is C....I would imagine > > that the Powers that Be would frown a bit on merging > > C++ code in, and relegate it to contrib for eternity... > > It will probably have to live on GBorg right from the beginning anyway, > so "the Powers" might not care at all. > > Thus far _all_ procedural languages are loadable modules. VM or not, I > don't see why this one would be any different. That also answers the "on > demand" question to some extent, doesn't it? > Maybe I'm mixing concepts here, but didn't Joe Conway create the ability to do pl module loading on demand or on connection creation via GUC? ISTR he needed this due to R's overhead. If so seems this could be implemented both ways, with a recommendation on which is best to follow. Speaking of plR, I'd recommend anyone interested in developing pl's, whether enhancing old ones or creating new ones, to check out the plR code on gborg, it was written recently and is pretty advanced. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
--- Thomas Hallgren <thhal@mailblocks.com> wrote: > The JVM will be started on-demand. > Although I realize that one JVM per connection will consume a fair amount of > resources, I still think it is the best solution. The description of this > system must of course make it very clear that this is what happens and > ultimately provide the means of tuning the JVM's as much as possible. I think the new 1.5 JDK "Tiger" (to be released soon) will feature the "shared VM" option, i.e. one JVM could be used to run multiple and independent apps. Maybe worth looking into this. > I advocate this solution because I think that the people that has the > primary interest of a pl/java will be those who write enterprise systems > using Java. J2EE systems are always equipped with connection pools. IMHO, pl/java would be a great feature for Postgresql to have. It would increase pgSql's chances to be considered as an "enterprise" RDBMS since most of the enterprise apps are written in Java nowdays. Regards, __________________________________ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/
Two comments. Context switches are of course much cheaper then loading a JVM. No argument there. The point is that the JVM is loaded once for each connection (when the connection makes the first call to a java function). Millions of calls may follow that reuses the same JVM. Each of those calls will suffer from context switches if the JVM is remote. A 1 to a million (or more) ratio is in fact very likey when function calls are used in predicates and/or projections of selects on larger tables. Regarding C++, as I said, no big deal. I'll change it for the reasons mentioned before I release my first cut. Thanks, - thomas
Tom Lane wrote: > libpq++ got heaved overboard largely > because the autoconf burden for it was too high, That's news to me. Certainly the overhead doesn't grow smaller by splitting stuff up in smaller pieces.
Thomas Hallgren wrote: > What are your thoughts and ideas? Instead of making up your own stuff, there's a whole SQL standard that tells you how Java embedded in an SQL server should work. Of course that doesn't tell you about implementation details.
Peter Eisentraut wrote: >Thomas Hallgren wrote: > > >>What are your thoughts and ideas? >> >> > >Instead of making up your own stuff, there's a whole SQL standard that >tells you how Java embedded in an SQL server should work. Of course >that doesn't tell you about implementation details. > > > Where can it be found? cheers andrew
Andrew Dunstan wrote: > Peter Eisentraut wrote: > >Thomas Hallgren wrote: > >>What are your thoughts and ideas? > > > >Instead of making up your own stuff, there's a whole SQL standard > > that tells you how Java embedded in an SQL server should work. Of > > course that doesn't tell you about implementation details. > > Where can it be found? Developer FAQ 1.12 But the sqlstandards.org server appears to be down right now.
The sqlstandards.org is still down I think. Is this something new in the upcoming 200x spec? I could not see it mentioned in the SQL-99. I'm a great fan of standards. If there is one I'll make my pljava adhere to it. Any information on this topic is greatly appreciated. Thanks, - thomas "Peter Eisentraut" <peter_e@gmx.net> wrote in message news:200312120059.12142.peter_e@gmx.net... > Andrew Dunstan wrote: > > Peter Eisentraut wrote: > > >Thomas Hallgren wrote: > > >>What are your thoughts and ideas? > > > > > >Instead of making up your own stuff, there's a whole SQL standard > > > that tells you how Java embedded in an SQL server should work. Of > > > course that doesn't tell you about implementation details. > > > > Where can it be found? > > Developer FAQ 1.12 > > But the sqlstandards.org server appears to be down right now. > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > >
Peter Eisentraut wrote: > Tom Lane wrote: > > libpq++ got heaved overboard largely > > because the autoconf burden for it was too high, > > That's news to me. Certainly the overhead doesn't grow smaller by > splitting stuff up in smaller pieces. Yea, now there is no configure for libpq++ at all, so you have to muck around with it to get it to compile. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073