Thread: Regarding GSoc Application
Hi All,
I submitted a GSoc application yesterday. Please review it and let me know if anyone needs any clarifications.
Atri
Hi Atri, Is there some JDBC API that supports this in newer versions of the API ? Dave Cramer dave.cramer(at)credativ(dot)ca http://www.credativ.ca On Sat, Apr 7, 2012 at 7:07 AM, Atri Sharma <atri.jiit@gmail.com> wrote: > Hi All, > > > > I submitted a GSoc application yesterday. Please review it and let me know > if anyone needs any clarifications. > > > > Atri
On Sun, Apr 8, 2012 at 8:56 AM, Dave Cramer <pg@fastcrypt.com> wrote: > Hi Atri, > > Is there some JDBC API that supports this in newer versions of the API ? Didn't parse that question. My understanding is that the only JDBC features needed are what's already there, to make connections to databases and execute queries. The GSoC proposal is here: https://google-melange.appspot.com/gsoc/proposal/review/google/gsoc2012/atrisharma/1001 merlin
How will the user access this? Will it be a normal query through the existing API ? Will it be a private postgresql API ? How will they set it up ? It appears complicated as you have to setup PL/Java as well Dave Cramer dave.cramer(at)credativ(dot)ca http://www.credativ.ca On Mon, Apr 9, 2012 at 11:45 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Sun, Apr 8, 2012 at 8:56 AM, Dave Cramer <pg@fastcrypt.com> wrote: >> Hi Atri, >> >> Is there some JDBC API that supports this in newer versions of the API ? > > Didn't parse that question. My understanding is that the only JDBC > features needed are what's already there, to make connections to > databases and execute queries. > > The GSoC proposal is here: > > https://google-melange.appspot.com/gsoc/proposal/review/google/gsoc2012/atrisharma/1001 > > merlin
On Mon, Apr 9, 2012 at 10:47 AM, Dave Cramer <pg@fastcrypt.com> wrote: > How will the user access this? Will it be a normal query through the > existing API ? Will it be a private postgresql API ? > > How will they set it up ? It appears complicated as you have to setup > PL/Java as well Yeah -- it will run through pl/java (at least, that's the idea). What pl/java brings to the table is well thought out integration of the JVM to postgres so that you can invoke java as functions from postgres. PL/java of course is a heavy dependency and non-trivial to set up and install. But to access the jdbc from postgres I think it's the easiest way forward. Straight JNI to the JVM from FDW might be a better/cleaner route but we haven't done the research to see exactly what's involved there. I suspect that invoking java from postgres is non trivial any way you slice it and that's not a wheel worth re-inventing. In other words, the basic idea is to do two things: a dblink-ish wrapper for JDBC via pl/java and a FDW wrapper through that via SPI. Better ideas and criticism are welcome of course. merlin
On Mon, Apr 9, 2012 at 11:55 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Mon, Apr 9, 2012 at 10:47 AM, Dave Cramer <pg@fastcrypt.com> wrote: >> How will the user access this? Will it be a normal query through the >> existing API ? Will it be a private postgresql API ? >> >> How will they set it up ? It appears complicated as you have to setup >> PL/Java as well > > Yeah -- it will run through pl/java (at least, that's the idea). What > pl/java brings to the table is well thought out integration of the JVM > to postgres so that you can invoke java as functions from postgres. > PL/java of course is a heavy dependency and non-trivial to set up and > install. But to access the jdbc from postgres I think it's the > easiest way forward. Straight JNI to the JVM from FDW might be a > better/cleaner route but we haven't done the research to see exactly > what's involved there. I suspect that invoking java from postgres is > non trivial any way you slice it and that's not a wheel worth > re-inventing. > > In other words, the basic idea is to do two things: a dblink-ish > wrapper for JDBC via pl/java and a FDW wrapper through that via SPI. > Better ideas and criticism are welcome of course. > > merlin So I'm confused, once they link a file to an FDW can't you just read it with an normal select ? What additional functionality will this provide ? Dave
On Mon, Apr 9, 2012 at 11:14 AM, Dave Cramer <pg@fastcrypt.com> wrote: > So I'm confused, once they link a file to an FDW can't you just read > it with an normal select ? > > What additional functionality will this provide ? > > Dave The basic objective is to expose the JDBC to postgres for grabbing external data. FDW is a C API and JDBC is java routines so the main challenge is to figure out how to jump from a FDW call into java. pl/java is one way to solve that problem. Once done, you should be able to FDW to any jdbc supporting data source, which is basically everything. merlin
On 04/09/2012 12:14 PM, Dave Cramer wrote: > > So I'm confused, once they link a file to an FDW can't you just read > it with an normal select ? > > What additional functionality will this provide ? > I'm confused about what you're confused about. Surely this won't be linking files to an FDW, but foreign DBMS tables, in anything you can access via JDBC. All you'll need on the postgres side is the relevant JDBC driver, so you'd have instant access via standard select queries to anything you can get a JDBC driver to talk to. That seems to me something worth having. I imagine it would look rather like this: CREATE FOREIGN DATA WRAPPER foodb HANDLER pljava_jdbc_handler OPTIONS (driver 'jdbc.foodb.org'); CREATE SERVER myfoodb FOREIGN DATA WRAPPER foodb OPTIONS(host '1.2.3.4', user 'foouser', password 'foopw'); CREATE FOREIGN TABLE footbl (id int, data text) SERVER myfoodb; SELECT * from footbl; cheers andrew
On Mon, Apr 9, 2012 at 12:45 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > > On 04/09/2012 12:14 PM, Dave Cramer wrote: >> >> >> So I'm confused, once they link a file to an FDW can't you just read >> it with an normal select ? >> >> What additional functionality will this provide ? >> > > > > I'm confused about what you're confused about. Surely this won't be linking > files to an FDW, but foreign DBMS tables, in anything you can access via > JDBC. All you'll need on the postgres side is the relevant JDBC driver, so > you'd have instant access via standard select queries to anything you can > get a JDBC driver to talk to. That seems to me something worth having. > > I imagine it would look rather like this: > > CREATE FOREIGN DATA WRAPPER foodb HANDLER pljava_jdbc_handler > OPTIONS (driver 'jdbc.foodb.org'); > CREATE SERVER myfoodb FOREIGN DATA WRAPPER foodb OPTIONS(host > '1.2.3.4', user 'foouser', password 'foopw'); > CREATE FOREIGN TABLE footbl (id int, data text) SERVER myfoodb; > SELECT * from footbl; > Well this is certainly more explanation than we have so far. Is this the intended use case ? Dave
Dave Cramer <pg@fastcrypt.com> wrote: > Andrew Dunstan <andrew@dunslane.net> wrote: >> All you'll need on the postgres side is the relevant JDBC driver, >> so you'd have instant access via standard select queries to >> anything you can get a JDBC driver to talk to. That seems to me >> something worth having. >> >> I imagine it would look rather like this: >> >> CREATE FOREIGN DATA WRAPPER foodb HANDLER pljava_jdbc_handler >> OPTIONS (driver 'jdbc.foodb.org'); >> CREATE SERVER myfoodb FOREIGN DATA WRAPPER foodb OPTIONS(host >> '1.2.3.4', user 'foouser', password 'foopw'); >> CREATE FOREIGN TABLE footbl (id int, data text) SERVER myfoodb; >> SELECT * from footbl; >> > Well this is certainly more explanation than we have so far. > > Is this the intended use case ? That is how I've understood it from the discussion I've seen -- an FDW to connect to JDBC so that you can wrap anything accessible from JDBC. The use of pl/Java seems to be the easiest way to get there. -Kevin
On Mon, Apr 9, 2012 at 10:15 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > > On 04/09/2012 12:14 PM, Dave Cramer wrote: >> >> >> So I'm confused, once they link a file to an FDW can't you just read >> it with an normal select ? >> >> What additional functionality will this provide ? >> > > > > I'm confused about what you're confused about. Surely this won't be linking > files to an FDW, but foreign DBMS tables, in anything you can access via > JDBC. All you'll need on the postgres side is the relevant JDBC driver, so > you'd have instant access via standard select queries to anything you can > get a JDBC driver to talk to. That seems to me something worth having. > > I imagine it would look rather like this: > > CREATE FOREIGN DATA WRAPPER foodb HANDLER pljava_jdbc_handler > OPTIONS (driver 'jdbc.foodb.org'); > CREATE SERVER myfoodb FOREIGN DATA WRAPPER foodb OPTIONS(host > '1.2.3.4', user 'foouser', password 'foopw'); > CREATE FOREIGN TABLE footbl (id int, data text) SERVER myfoodb; > SELECT * from footbl; > > > cheers > > andrew Hi Andrew, Thanks for going through my proposal and commenting on it. I think you have hit the nail on the head.We will be connecting the foreign DBMS tables.The main aim of the project is to wrap JDBC so we can connect to anything that can be reached through a JDBC URL. I am considering two paths for doing this: The first one takes the help of the SPI(Server Programming Interface) and the second one directly connects through Pl/Java and JNI(Java Native Interface). Please let me know your further comments and also,please advise me on how to proceed further. Atri -- Regards, Atri l'apprenant
On Mon, Apr 9, 2012 at 12:25 PM, Atri Sharma <atri.jiit@gmail.com> wrote: > On Mon, Apr 9, 2012 at 10:15 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> On 04/09/2012 12:14 PM, Dave Cramer wrote: >>> So I'm confused, once they link a file to an FDW can't you just read >>> it with an normal select ? >>> >>> What additional functionality will this provide ? >>> >> >> >> >> I'm confused about what you're confused about. Surely this won't be linking >> files to an FDW, but foreign DBMS tables, in anything you can access via >> JDBC. All you'll need on the postgres side is the relevant JDBC driver, so >> you'd have instant access via standard select queries to anything you can >> get a JDBC driver to talk to. That seems to me something worth having. >> >> I imagine it would look rather like this: >> >> CREATE FOREIGN DATA WRAPPER foodb HANDLER pljava_jdbc_handler >> OPTIONS (driver 'jdbc.foodb.org'); >> CREATE SERVER myfoodb FOREIGN DATA WRAPPER foodb OPTIONS(host >> '1.2.3.4', user 'foouser', password 'foopw'); >> CREATE FOREIGN TABLE footbl (id int, data text) SERVER myfoodb; >> SELECT * from footbl; >> >> >> cheers >> >> andrew > > Hi Andrew, > > Thanks for going through my proposal and commenting on it. > > I think you have hit the nail on the head.We will be connecting the > foreign DBMS tables.The main aim of the project is to wrap JDBC so we > can connect to anything that can be reached through a JDBC URL. > > I am considering two paths for doing this: > The first one takes the help of the SPI(Server Programming Interface) > and the second one directly connects through Pl/Java and JNI(Java > Native Interface). > > Please let me know your further comments and also,please advise me on > how to proceed further. I think the best way to go is as planned. Step one is to get pl/java installed and attempt a minimal set of functions that can connect to and gather data from an external source...let's start with pl/java 'hello world' and go from there. Once done it's time to start thinking about how the java internals will look like -- we can crib from dblink for that though. merlin
On Mon, Apr 9, 2012 at 11:40 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Mon, Apr 9, 2012 at 12:25 PM, Atri Sharma <atri.jiit@gmail.com> wrote: >> On Mon, Apr 9, 2012 at 10:15 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >>> On 04/09/2012 12:14 PM, Dave Cramer wrote: >>>> So I'm confused, once they link a file to an FDW can't you just read >>>> it with an normal select ? >>>> >>>> What additional functionality will this provide ? >>>> >>> >>> >>> >>> I'm confused about what you're confused about. Surely this won't be linking >>> files to an FDW, but foreign DBMS tables, in anything you can access via >>> JDBC. All you'll need on the postgres side is the relevant JDBC driver, so >>> you'd have instant access via standard select queries to anything you can >>> get a JDBC driver to talk to. That seems to me something worth having. >>> >>> I imagine it would look rather like this: >>> >>> CREATE FOREIGN DATA WRAPPER foodb HANDLER pljava_jdbc_handler >>> OPTIONS (driver 'jdbc.foodb.org'); >>> CREATE SERVER myfoodb FOREIGN DATA WRAPPER foodb OPTIONS(host >>> '1.2.3.4', user 'foouser', password 'foopw'); >>> CREATE FOREIGN TABLE footbl (id int, data text) SERVER myfoodb; >>> SELECT * from footbl; >>> >>> >>> cheers >>> >>> andrew >> >> Hi Andrew, >> >> Thanks for going through my proposal and commenting on it. >> >> I think you have hit the nail on the head.We will be connecting the >> foreign DBMS tables.The main aim of the project is to wrap JDBC so we >> can connect to anything that can be reached through a JDBC URL. >> >> I am considering two paths for doing this: >> The first one takes the help of the SPI(Server Programming Interface) >> and the second one directly connects through Pl/Java and JNI(Java >> Native Interface). >> >> Please let me know your further comments and also,please advise me on >> how to proceed further. > > I think the best way to go is as planned. Step one is to get pl/java > installed and attempt a minimal set of functions that can connect to > and gather data from an external source...let's start with pl/java > 'hello world' and go from there. Once done it's time to start > thinking about how the java internals will look like -- we can crib > from dblink for that though. > > merlin I agree,this should be the correct path. Atri -- Regards, Atri l'apprenant
On 04/09/2012 01:25 PM, Atri Sharma wrote: > On Mon, Apr 9, 2012 at 10:15 PM, Andrew Dunstan<andrew@dunslane.net> wrote: >> >> On 04/09/2012 12:14 PM, Dave Cramer wrote: >>> >>> So I'm confused, once they link a file to an FDW can't you just read >>> it with an normal select ? >>> >>> What additional functionality will this provide ? >>> >> >> >> I'm confused about what you're confused about. Surely this won't be linking >> files to an FDW, but foreign DBMS tables, in anything you can access via >> JDBC. All you'll need on the postgres side is the relevant JDBC driver, so >> you'd have instant access via standard select queries to anything you can >> get a JDBC driver to talk to. That seems to me something worth having. >> >> I imagine it would look rather like this: >> >> CREATE FOREIGN DATA WRAPPER foodb HANDLER pljava_jdbc_handler >> OPTIONS (driver 'jdbc.foodb.org'); >> CREATE SERVER myfoodb FOREIGN DATA WRAPPER foodb OPTIONS(host >> '1.2.3.4', user 'foouser', password 'foopw'); >> CREATE FOREIGN TABLE footbl (id int, data text) SERVER myfoodb; >> SELECT * from footbl; >> >> >> cheers >> >> andrew > Hi Andrew, > > Thanks for going through my proposal and commenting on it. > > I think you have hit the nail on the head.We will be connecting the > foreign DBMS tables.The main aim of the project is to wrap JDBC so we > can connect to anything that can be reached through a JDBC URL. > > I am considering two paths for doing this: > The first one takes the help of the SPI(Server Programming Interface) > and the second one directly connects through Pl/Java and JNI(Java > Native Interface). > I'd say forget SPI - I don't think it's going to help you here. Just concentrate on getting the functionality via a PL/Java wrapper. I wouldn't worry too much about jdbc style URLs either, since logically I think you'd want to specify the connection parameters via server and FDW options as in my example above - that way it would be consistent with other FDWs. But that's a piece of bikeshedding for now. Basically, you want to implement the handler function to start with. cheers andrew
On Tue, Apr 10, 2012 at 8:42 AM, Andrew Dunstan <andrew@dunslane.net> wrote: >> I am considering two paths for doing this: >> The first one takes the help of the SPI(Server Programming Interface) >> and the second one directly connects through Pl/Java and JNI(Java >> Native Interface). >> > > I'd say forget SPI - I don't think it's going to help you here. Just > concentrate on getting the functionality via a PL/Java wrapper. I wouldn't > worry too much about jdbc style URLs either, since logically I think you'd > want to specify the connection parameters via server and FDW options as in > my example above - that way it would be consistent with other FDWs. But > that's a piece of bikeshedding for now. Basically, you want to implement the > handler function to start with. how do you cross from FDW into a pl/java routine without SPI? merlin
On 04/10/2012 09:48 AM, Merlin Moncure wrote: > On Tue, Apr 10, 2012 at 8:42 AM, Andrew Dunstan<andrew@dunslane.net> wrote: >>> I am considering two paths for doing this: >>> The first one takes the help of the SPI(Server Programming Interface) >>> and the second one directly connects through Pl/Java and JNI(Java >>> Native Interface). >>> >> I'd say forget SPI - I don't think it's going to help you here. Just >> concentrate on getting the functionality via a PL/Java wrapper. I wouldn't >> worry too much about jdbc style URLs either, since logically I think you'd >> want to specify the connection parameters via server and FDW options as in >> my example above - that way it would be consistent with other FDWs. But >> that's a piece of bikeshedding for now. Basically, you want to implement the >> handler function to start with. > how do you cross from FDW into a pl/java routine without SPI? > Add the FDW handler as a sibling function of the function call handler. At least that would be my first approach to writing a DBI::DBD FDW wrapper for plperl, which I naturally know rather better than the PL/Java code. cheers andrew
On Tue, Apr 10, 2012 at 9:15 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > > > On 04/10/2012 09:48 AM, Merlin Moncure wrote: >> >> On Tue, Apr 10, 2012 at 8:42 AM, Andrew Dunstan<andrew@dunslane.net> >> wrote: >>>> >>>> I am considering two paths for doing this: >>>> The first one takes the help of the SPI(Server Programming Interface) >>>> and the second one directly connects through Pl/Java and JNI(Java >>>> Native Interface). >>>> >>> I'd say forget SPI - I don't think it's going to help you here. Just >>> concentrate on getting the functionality via a PL/Java wrapper. I >>> wouldn't >>> worry too much about jdbc style URLs either, since logically I think >>> you'd >>> want to specify the connection parameters via server and FDW options as >>> in >>> my example above - that way it would be consistent with other FDWs. But >>> that's a piece of bikeshedding for now. Basically, you want to implement >>> the >>> handler function to start with. >> >> how do you cross from FDW into a pl/java routine without SPI? >> > > > Add the FDW handler as a sibling function of the function call handler. At > least that would be my first approach to writing a DBI::DBD FDW wrapper for > plperl, which I naturally know rather better than the PL/Java code. right -- well looking at plperl.c, I take it you are meaning to invoke (or write a similar function to) the plperl_call_handler, right? hm -- if you could do that then yes, that would work for pl/java as well and it would bypass the SPI interface. this is avoiding the public APIs, so it's going to take some time to figure out how to set up the call and walk the result since I don't know how to do that off the top of my head. merlin
On 04/10/2012 10:34 AM, Merlin Moncure wrote: > On Tue, Apr 10, 2012 at 9:15 AM, Andrew Dunstan<andrew@dunslane.net> wrote: >> >> On 04/10/2012 09:48 AM, Merlin Moncure wrote: >>> On Tue, Apr 10, 2012 at 8:42 AM, Andrew Dunstan<andrew@dunslane.net> >>> wrote: >>>>> I am considering two paths for doing this: >>>>> The first one takes the help of the SPI(Server Programming Interface) >>>>> and the second one directly connects through Pl/Java and JNI(Java >>>>> Native Interface). >>>>> >>>> I'd say forget SPI - I don't think it's going to help you here. Just >>>> concentrate on getting the functionality via a PL/Java wrapper. I >>>> wouldn't >>>> worry too much about jdbc style URLs either, since logically I think >>>> you'd >>>> want to specify the connection parameters via server and FDW options as >>>> in >>>> my example above - that way it would be consistent with other FDWs. But >>>> that's a piece of bikeshedding for now. Basically, you want to implement >>>> the >>>> handler function to start with. >>> how do you cross from FDW into a pl/java routine without SPI? >>> >> >> Add the FDW handler as a sibling function of the function call handler. At >> least that would be my first approach to writing a DBI::DBD FDW wrapper for >> plperl, which I naturally know rather better than the PL/Java code. > right -- well looking at plperl.c, I take it you are meaning to invoke > (or write a similar function to) the plperl_call_handler, right? hm > -- if you could do that then yes, that would work for pl/java as well > and it would bypass the SPI interface. this is avoiding the public > APIs, so it's going to take some time to figure out how to set up the > call and walk the result since I don't know how to do that off the top > of my head. > I don't understand what the heck you're talking about, TBH. From a user perspective there is nothing to work out. It will look like any other FDW. The implementor of the FDW handler will have to work out the glue between postgres and the JVM, but that's not going to be you, right? cheers andrew
On Tue, Apr 10, 2012 at 9:47 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > I don't understand what the heck you're talking about, TBH. From a user > perspective there is nothing to work out. It will look like any other FDW. yes, that is correct. > The implementor of the FDW handler will have to work out the glue between > postgres and the JVM, but that's not going to be you, right? Correct. I think I understand what you're driving at. Basically, pl/java is the glue. my thinking was inside the FDW callbacks to to do SPI calls to invoke the pl/java routines. Unlike other fdw implementations which mostly wrap C libraries -- which makes things very easy since you can directly jump into the routine for foreign execution -- a luxury we don't have. We have to invoke java and there are two basic ways to tie into the java runtime: one is to jump through SPI via the SQL executor. The other is JNI into the pl/java jvm which I think you were hinting was the better approach. Doing an SPI call from a FDW callback is inefficient -- that's an extra call into the executor (although you can prepare it) and you have to walk the SPI result just to build it up again in the FDW iterator. A JNI solution instead would jump into the jvm and do java invocation and I believe would drive the difficulty of this project up a couple of notches whereas a SPI approach utilizes a well documented interface. We're not stuck on the approach though -- I'm pushing Atri to get the environment set up so we can explore alternative solutions. In other words, our proposal is basically pretty similar to what you'd end up with if you wrapped dblink into a fdw making dblink calls inside the fdw over spi. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > ... We have to invoke java and there > are two basic ways to tie into the java runtime: one is to jump > through SPI via the SQL executor. The other is JNI into the pl/java > jvm which I think you were hinting was the better approach. Hm? SPI doesn't know anything about Java either. regards, tom lane
On Tue, Apr 10, 2012 at 11:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> ... We have to invoke java and there >> are two basic ways to tie into the java runtime: one is to jump >> through SPI via the SQL executor. The other is JNI into the pl/java >> jvm which I think you were hinting was the better approach. > > Hm? SPI doesn't know anything about Java either. > > regards, tom lane Having pl/java as a dependancy here makes this a very complex "feature" to setup. The potential benefits are quite minimal since almost any decent ETL tool can handle multiple data sources Dave
On Tue, Apr 10, 2012 at 8:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> ... We have to invoke java and there >> are two basic ways to tie into the java runtime: one is to jump >> through SPI via the SQL executor. The other is JNI into the pl/java >> jvm which I think you were hinting was the better approach. > > Hm? SPI doesn't know anything about Java either. > > regards, tom lane Hi Tom, We plan to call SQL through SPI from the FDW,which in turn would call the Pl/Java routine. We are working on the JNI Invocation API approach also.We will be currently researching the differences between the two approaches(SPI and JNI). Atri -- Regards, Atri l'apprenant
Atri Sharma <atri.jiit@gmail.com> writes: > On Tue, Apr 10, 2012 at 8:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Hm? SPI doesn't know anything about Java either. > We plan to call SQL through SPI from the FDW,which in turn would call > the Pl/Java routine. If you're saying that every Java function that the FDW needs would have to be exposed as a SQL function, that seems like a pretty high-risk (not to mention low performance) approach. Not only do you have to design a SQL representation for every datatype you need, but you have to be sure that you do not have any security holes arising from unscrupulous users calling those SQL functions manually with arguments of their choosing. regards, tom lane
On 04/10/2012 11:36 AM, Tom Lane wrote: > Atri Sharma<atri.jiit@gmail.com> writes: >> On Tue, Apr 10, 2012 at 8:55 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> Hm? SPI doesn't know anything about Java either. >> We plan to call SQL through SPI from the FDW,which in turn would call >> the Pl/Java routine. > If you're saying that every Java function that the FDW needs would have > to be exposed as a SQL function, that seems like a pretty high-risk > (not to mention low performance) approach. Not only do you have to > design a SQL representation for every datatype you need, but you have to > be sure that you do not have any security holes arising from > unscrupulous users calling those SQL functions manually with arguments > of their choosing. > > Yeah. I think this design is horribly baroque and unnecessary. SPI is for talking SQL. It's completely in the way of a straight-forward implementation of this feature IMNSHO. cheers andrew
On Tue, Apr 10, 2012 at 10:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Atri Sharma <atri.jiit@gmail.com> writes: >> On Tue, Apr 10, 2012 at 8:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Hm? SPI doesn't know anything about Java either. > >> We plan to call SQL through SPI from the FDW,which in turn would call >> the Pl/Java routine. > > If you're saying that every Java function that the FDW needs would have > to be exposed as a SQL function, that seems like a pretty high-risk > (not to mention low performance) approach. Not only do you have to > design a SQL representation for every datatype you need, but you have to > be sure that you do not have any security holes arising from > unscrupulous users calling those SQL functions manually with arguments > of their choosing. Hm, well, for data type representation, an 'all text' representation would avoid that requirement (although could certainly add it back in later for performance reasons). That's not all that different from what the other fdw projects are doing -- mostly wrapping BuildTupleFromCStrings and such. But totally agree that for top performance you'd need direct native transfer. I'm in the 'perfect is the enemy of the good' mindset here. I think the security argument is mostly bogus -- pl/java is already well into the untrusted side of things and I was figuring being able to bypass the fdw layer and invoke the functions dblink style was a feature, not a bug. But adding up all the comments I see healthy skepticism that running through SPI is the proper approach and it is noted. So the way forward is a more direct hook to the jvm or to go back to the drawing board I suppose. I agree that JNI isn't required -- we're going to have to study the pl/java system a bit to determine the best way to hook in. This could end up getting us into the 'biting of more than can chew' territory admittedly, but Atri is enthusiastic and wants to give it a go. Additionally, Dave is skeptical that pl/java dependency is a good foundation for a generally useful library. I'm not buying that -- pl/java is the 'best of class' for implementing java inside the database that I'm aware of. I see absolutely no reason why it couldn't be packaged as an extension -- the project is a bit dusty and needs some TLC but does what it does very well. I also respectfully disagree that the presence of high quality ETL engines eliminate the usefulness of a direct database to database transfer mechanism. I personally never go the ETL route when I can just dblink the data across and do the massaging in SQL. Other developers may think differently of course. Of course, if there was a good way to implement jdbc/fdw without using pl/java that would be good to know. merlin
On Tue, Apr 10, 2012 at 11:07 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > I agree that JNI isn't required -- we're going to > have to study the pl/java system a bit to determine the best way to > hook in. This could end up getting us into the 'biting of more than > can chew' territory admittedly, but Atri is enthusiastic and wants to > give it a go. Well. maybe I spoke too soon...JNI is probably the best route. Since SPI is off the table, all we're really pulling in from pl/java is the (non-trivial) proper installation of a jvm into a postgres process. pl/java is essentially a wrapper to JNI that does postgres to jni type conversion and builds a database driven class lookup system that allows building the java environment inside the database. That part is not needed at all; since we're not going to install our routines as pl/java installed they will be installed into a separate and more typical class path from a jvm point of view. There are no exposed routines in pljava.so that allow for easy manual invocation of a java routine -- it's all pretty much hardwired to the language function call handler system. This begs the question about why pl/java should be used at all. Well, as I said the jvm insertion into postgres is non-trivial so that alone is worth something. Maybe though we should just rip out the jvm guts that we really need (mostly some parts of backend.c and some of the type files) and do a completely standalone jni wrapper. pl/java's build system is a mess anyways so we're not losing all that much in trying to go off on our own and do something. OTOH, throwing a build system on top of the growing pile of things to do is turning this into a daunting project vs the (admittedly cobbled together) approach I was hoping to do earlier. Anyways, it's really Atri's call if he's comfortable proceeding. merlin
>> I agree that JNI isn't required -- we're going to >> have to study the pl/java system a bit to determine the best way to >> hook in. This could end up getting us into the 'biting of more than >> can chew' territory admittedly, but Atri is enthusiastic and wants to >> give it a go. > >Well. maybe I spoke too soon...JNI is probably the best route. Since >SPI is off the table, all we're really pulling in from pl/java is the >(non-trivial) proper installation of a jvm into a postgres process. >pl/java is essentially a wrapper to JNI that does postgres to jni type >conversion and builds a database driven class lookup system that >allows building the java environment inside the database. That part >is not needed at all; since we're not going to install our routines as >pl/java installed they will be installed into a separate and more >typical class path from a jvm point of view. There are no exposed >routines in pljava.so that allow for easy manual invocation of a java >routine -- it's all pretty much hardwired to the language function >call handler system. > >This begs the question about why pl/java should be used at all. Well, >as I said the jvm insertion into postgres is non-trivial so that alone >is worth something. Maybe though we should just rip out the jvm guts >that we really need (mostly some parts of backend.c and some of the >type files) and do a completely standalone jni wrapper. pl/java's >build system is a mess anyways so we're not losing all that much in >trying to go off on our own and do something. OTOH, throwing a build >system on top of the growing pile of things to do is turning this into >a daunting project vs the (admittedly cobbled together) approach I was >hoping to do earlier. Anyways, it's really Atri's call if he's >comfortable proceeding. Hi All, I think we are back on the initial approach I proposed(hooking directly into the JVM and executing Java code that calls JDBC).I think the best way to do this is create a JVM that executes the Java code and give the control of the JVM to the native API. I agree,the only need of Pl/Java that is apparent here is the need of the Java internals(JDK et al).If we set them up independently,then,we can have the FDW wrapping JDBC directly through JNI.JNI would call pure Java functions to connect to the JDBC. I think we can proceed with this.Once we are done with the API calling Java functions,I think the rest of the path is easily mapped(writing Java functions to connect to JDBC). Please let me know your opinions on this. Atri
On 04/10/12 8:41 PM, Atri Sharma wrote: > I agree,the only need of Pl/Java that is apparent here is the need of the > Java internals(JDK et al).If we set them up independently,then,we can have > the FDW wrapping JDBC directly through JNI.JNI would call pure Java > functions to connect to the JDBC. PL/Java also provides for passing SQL arguments in various data types to/from your java code. not usinig pl/java would mean you'd have to reinvent all those wheels, they exist now in a standard mannner, why not use them? as I see it, your proposal could be implemented as a set of pljava functions callable from SQL that would in turn invoke JDBC to connect to and fetch data from your foreign database, then return it to the calling SQL procedure, much as dblink does now for pg to pg database links. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
Hi John, Yes,I agree,that can be done,but we had an extensive discussion on it yesterday and Andrew and Tom believe that would pose serious security issues as any malicious user can change the arguments sent to the SQL and cause problems. Atri -----Original Message----- From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of John R Pierce Sent: 11 April 2012 10:00 To: pgsql-jdbc@postgresql.org Subject: Re: [HACKERS] [JDBC] Regarding GSoc Application On 04/10/12 8:41 PM, Atri Sharma wrote: > I agree,the only need of Pl/Java that is apparent here is the need of the > Java internals(JDK et al).If we set them up independently,then,we can have > the FDW wrapping JDBC directly through JNI.JNI would call pure Java > functions to connect to the JDBC. PL/Java also provides for passing SQL arguments in various data types to/from your java code. not usinig pl/java would mean you'd have to reinvent all those wheels, they exist now in a standard mannner, why not use them? as I see it, your proposal could be implemented as a set of pljava functions callable from SQL that would in turn invoke JDBC to connect to and fetch data from your foreign database, then return it to the calling SQL procedure, much as dblink does now for pg to pg database links.
On 04/10/12 9:36 PM, Atri Sharma wrote: > Hi John, > > Yes,I agree,that can be done,but we had an extensive discussion on it > yesterday and Andrew and Tom believe that would pose serious security issues > as any malicious user can change the arguments sent to the SQL and cause > problems. > I'm not sure what "change the arguments sent to SQL" means. A malicious user with sufficient privileges can do all sorts of damage, and there's not much much you can do about it short of not letting malicious users have privileges. your foreign data wrapper code should probably require that the user who creates a FDW connection to an external database have adequate permissions. the foreign database servr already has its own authentication hoops tha this FDW user will have to provide. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
>I'm not sure what "change the arguments sent to SQL" means. A >malicious user with sufficient privileges can do all sorts of damage, >and there's not much much you can do about it short of not letting >malicious users have privileges. > >your foreign data wrapper code should probably require that the user who >creates a FDW connection to an external database have adequate >permissions. the foreign database servr already has its own >authentication hoops tha this FDW user will have to provide. Hi John, Please find the reply from Tom on the idea of SQL-Pl/Java calls: This was what I wrote: > We plan to call SQL through SPI from the FDW,which in turn would call > the Pl/Java routine. This was what Tom replied: ******* If you're saying that every Java function that the FDW needs would have to be exposed as a SQL function, that seems like a pretty high-risk (not to mention low performance) approach. Not only do you have to design a SQL representation for every datatype you need, but you have to be sure that you do not have any security holes arising from unscrupulous users calling those SQL functions manually with arguments of their choosing. ************ Please let me know your opinion on this. Atri
On 04/10/12 10:44 PM, Atri Sharma wrote: > We plan to call SQL through SPI from the FDW,which in turn would call > > the Pl/Java routine. that seems overly complex. SPI is an internal mechanism used by C functions that are called from SQL why not just implement it as.... user application -> SQL -> PLjava FDW functions -> pljava code -> JDBC -> foreign database (and of course, back to return the data from said foreign database) this would look to the user very much like dblink looks now, and have similar security issues and mitigations. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
> >that seems overly complex. SPI is an internal mechanism used by C >functions that are called from SQL > >why not just implement it as.... > >user application -> SQL -> PLjava FDW functions -> pljava code -> JDBC >-> foreign database > >(and of course, back to return the data from said foreign database) > Hi John, Hmm.That sounds good,but where does the FDW come in?Will the user application interact with the FDW,and FDW handle the SQL calls and everything else from there i.e. FDW wraps the entire process,right? Also,will it solve Tom's concern? Atri
On Tue, Apr 10, 2012 at 8:41 PM, Atri Sharma <atri.jiit@gmail.com> wrote: > Hi All, > > I think we are back on the initial approach I proposed(hooking directly into > the JVM and executing Java code that calls JDBC).I think the best way to do > this is create a JVM that executes the Java code and give the control of the > JVM to the native API. > > I agree,the only need of Pl/Java that is apparent here is the need of the > Java internals(JDK et al).If we set them up independently,then,we can have > the FDW wrapping JDBC directly through JNI.JNI would call pure Java > functions to connect to the JDBC. > > I think we can proceed with this.Once we are done with the API calling Java > functions,I think the rest of the path is easily mapped(writing Java > functions to connect to JDBC). > > Please let me know your opinions on this. > I think Multicorn is a good example, which invokes Python from FDW routines though it is not using PL/Python. http://multicorn.org/ Thanks, -- Hitoshi Harada
On 04/10/12 11:09 PM, Atri Sharma wrote: >> why not just implement it as.... >> > >> >user application -> SQL -> PLjava FDW functions -> pljava code -> JDBC >> >-> foreign database >> > >> >(and of course, back to return the data from said foreign database) >> > > Hi John, > > Hmm.That sounds good,but where does the FDW come in?Will the user > application interact with the FDW,and FDW handle the SQL calls and > everything else from there i.e. FDW wraps the entire process,right? see dblink ofr an example API http://www.postgresql.org/docs/current/static/dblink.html the 'pljava fdw functions' are the publicly visible pljava functions implementing your api... while my 'pljava code' are additional functions/methods you use internally that aren't visible externally -- john r pierce N 37, W 122 santa cruz ca mid-left coast
> >I think Multicorn is a good example, which invokes Python from FDW >routines though it is not using PL/Python. > >http://multicorn.org/ > > Hi Hitoshi, Thanks for the link. You mean,I should try to build something like Multicorn for Java? Atri
>see dblink ofr an example API >http://www.postgresql.org/docs/current/static/dblink.html > >the 'pljava fdw functions' are the publicly visible pljava functions >implementing your api... while my 'pljava code' are additional >functions/methods you use internally that aren't visible externally > > Hi John, Thanks for the link and advice. Ok,now I think I got your point.The Pl/Java FDW functions are what are used for the public FDW API....while Pl/Java code actually does the job,but it is hidden from the user.The user actually never gets to talk to the SQL part of the FDW,so he cannot cause much harm,right? If my understanding is correct,I think it will be a sweet solution. Please let me know your opinion. Atri
On Tue, Apr 10, 2012 at 10:41 PM, Atri Sharma <atri.jiit@gmail.com> wrote: >>Well. maybe I spoke too soon...JNI is probably the best route. Since >>SPI is off the table, all we're really pulling in from pl/java is the >>(non-trivial) proper installation of a jvm into a postgres process. >>pl/java is essentially a wrapper to JNI that does postgres to jni type >>conversion and builds a database driven class lookup system that >>allows building the java environment inside the database. That part >>is not needed at all; since we're not going to install our routines as >>pl/java installed they will be installed into a separate and more >>typical class path from a jvm point of view. There are no exposed >>routines in pljava.so that allow for easy manual invocation of a java >>routine -- it's all pretty much hardwired to the language function >>call handler system. >> >>This begs the question about why pl/java should be used at all. Well, >>as I said the jvm insertion into postgres is non-trivial so that alone >>is worth something. Maybe though we should just rip out the jvm guts >>that we really need (mostly some parts of backend.c and some of the >>type files) and do a completely standalone jni wrapper. pl/java's >>build system is a mess anyways so we're not losing all that much in >>trying to go off on our own and do something. OTOH, throwing a build >>system on top of the growing pile of things to do is turning this into >>a daunting project vs the (admittedly cobbled together) approach I was >>hoping to do earlier. Anyways, it's really Atri's call if he's >>comfortable proceeding. > > Hi All, > > I think we are back on the initial approach I proposed(hooking directly into > the JVM and executing Java code that calls JDBC).I think the best way to do > this is create a JVM that executes the Java code and give the control of the > JVM to the native API. > > I agree,the only need of Pl/Java that is apparent here is the need of the > Java internals(JDK et al).If we set them up independently,then,we can have > the FDW wrapping JDBC directly through JNI.JNI would call pure Java > functions to connect to the JDBC. > > I think we can proceed with this.Once we are done with the API calling Java > functions,I think the rest of the path is easily mapped(writing Java > functions to connect to JDBC). yeah -- it sound plausible. I think the next step is to pull one of the fdw projects that is working and currently builds with pgxs. Make sure it is bsd licensed and that you retain the original copyright in the code (feel free to add your own). The mysql_fdw project for example is a good candidate. After that we need to strip out all the mysql specific parts so that we have a basic skeleton of the fdw wrapper that still builds. From there we will need to build in the jni calls as well as jvm initialization code we are going to more or less directly copy from pl/java, but one thing at a time: IMO, getting the jvm installed and executing a 'hello world' jni is a good milestone to reach before getting into all the mechanics of the remote querying. The jvm initialization code in pl/java we are going to use is in src/C/pljava/backend.c. A lot of the initialization work done there is irrelevant to what we are trying to do but there are some very important bits like the on_proc_exit handler that does cleanup when the postgres process exits. Ideally we can drop just the stuff we need from there and get our project to compile. merlin
Hi all, In continuation to the discussion regarding my JDBC wrapping FDW,I have been talking to members of the community and I have two approaches on which I would request your suggestions and opinions: >I think we are back on the initial approach I proposed(hooking directly into the JVM and executing Java code that calls JDBC). >I think the best way to do this is create a JVM that executes the Java code and give the control of the JVM to the native API. >I agree,the only need of Pl/Java that is apparent here is the need of the Java internals(JDK et al).If we set them up >independently,then,we can have the FDW wrapping JDBC directly through JNI.JNI would call pure Java functions to connect to the >JDBC. In the above context,I think if we directly wrap JNI and call pure Java routines(not using Pl/Java codes directly),we can build something on the lines of Multicorn(a nice suggestion by Hitoshi) http://multicorn.org/ On the other hand,as suggested by John,we can use the Pl/Java routines that already exist to build our FDW's API.This shall be the flow: user application -> SQL -> PLjava FDW functions -> pljava code -> JDBC -> foreign database The user will have to pass security levels and permissions before being able to get to the FDW.This would look to the user very much like dblink looks now, and have similar security issues and mitigations. Please let me know how to proceed further. Atri
On Tue, Apr 10, 2012 at 10:41 PM, Atri Sharma <atri.jiit@gmail.com> wrote: >>>Well. maybe I spoke too soon...JNI is probably the best route. Since >>>SPI is off the table, all we're really pulling in from pl/java is the >>>(non-trivial) proper installation of a jvm into a postgres process. >>>pl/java is essentially a wrapper to JNI that does postgres to jni type >>>conversion and builds a database driven class lookup system that >>>allows building the java environment inside the database. That part >>>is not needed at all; since we're not going to install our routines as >>>pl/java installed they will be installed into a separate and more >>>typical class path from a jvm point of view. There are no exposed >>>routines in pljava.so that allow for easy manual invocation of a java >>>routine -- it's all pretty much hardwired to the language function >>>call handler system. >>> >>>This begs the question about why pl/java should be used at all. Well, >>>as I said the jvm insertion into postgres is non-trivial so that alone >>>is worth something. Maybe though we should just rip out the jvm guts >>>that we really need (mostly some parts of backend.c and some of the >>>type files) and do a completely standalone jni wrapper. pl/java's >>>build system is a mess anyways so we're not losing all that much in >>>trying to go off on our own and do something. OTOH, throwing a build >>>system on top of the growing pile of things to do is turning this into >>>a daunting project vs the (admittedly cobbled together) approach I was >>>hoping to do earlier. Anyways, it's really Atri's call if he's >>>comfortable proceeding. >> >> Hi All, >> >> I think we are back on the initial approach I proposed(hooking directly into >> the JVM and executing Java code that calls JDBC).I think the best way to do >> this is create a JVM that executes the Java code and give the control of the >> JVM to the native API. >> >> I agree,the only need of Pl/Java that is apparent here is the need of the >> Java internals(JDK et al).If we set them up independently,then,we can have >> the FDW wrapping JDBC directly through JNI.JNI would call pure Java >> functions to connect to the JDBC. >> >> I think we can proceed with this.Once we are done with the API calling Java >> functions,I think the rest of the path is easily mapped(writing Java >> functions to connect to JDBC). > >yeah -- it sound plausible. I think the next step is to pull one of >the fdw projects that is working and currently builds with pgxs. Make >sure it is bsd licensed and that you retain the original copyright in >the code (feel free to add your own). The mysql_fdw project for >example is a good candidate. After that we need to strip out all the >mysql specific parts so that we have a basic skeleton of the fdw >wrapper that still builds. From there we will need to build in the >jni calls as well as jvm initialization code we are going to more or >less directly copy from pl/java, but one thing at a time: IMO, >getting the jvm installed and executing a 'hello world' jni is a good >milestone to reach before getting into all the mechanics of the remote >querying. > >The jvm initialization code in pl/java we are going to use is in >src/C/pljava/backend.c. A lot of the initialization work done there >is irrelevant to what we are trying to do but there are some very >important bits like the on_proc_exit handler that does cleanup when >the postgres process exits. Ideally we can drop just the stuff we >need from there and get our project to compile. > I agree,atm I'll work on getting JNI set up and to build a JVM using it and getting the "Hello,World" running in it in PostGreSQl. I'll keep you posted, Atri