Thread: new procedural language - PL/R
I'm nearing completion of a new procedural language, PL/R. It provides an interface to the R Statistical Computing language. R is similar to the commercial package S-Plus; for more on R see: http://www.r-project.org/ Here is the first paragraph of their intro: "R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R." Before I post the source somewhere, I have a few questions: 1) R itself is under GPL, and as far as I can tell the shared library libR.so is also under GPL, not LGPL. I assume that means I need to release plr under GPL -- does that sound correct? 2) Knowing the trend to move stuff *out* of the PostgreSQL source tarball, and assuming plr is released under GPL, is there any chance that it would be accepted into src/pl or contrib, or should I start a gborg project (I'd prefer if it could at least live in contrib)? If I am somehow able to release it under a BSD license, would that change the answer (if so, I'll at least ask the r-devel list about LGPL on the shared library)? 3) The only major feature not yet developed is the ability to handle triggers. Any strong feelings on whether this is necessary for a first release? I see that pl/perl doesn't handle triggers. It seems like using a plpgsql trigger to call a plr function is a reasonable workaround. Thanks for any thoughts or comments. Joe
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >>2) Knowing the trend to move stuff *out* of the PostgreSQL source tarball, and >>assuming plr is released under GPL, is there any chance that it would be >>accepted into src/pl or contrib, or should I start a gborg project (I'd prefer >>if it could at least live in contrib)? > > I think we'd have to insist on gborg. The only reason there are any > non-BSD-license items left in contrib is that I haven't finished my TODO > item to get their licenses changed or remove 'em. Thanks for the confirmation. That's what I suspected. >>If I am somehow able to release it >>under a BSD license, would that change the answer (if so, I'll at least ask >>the r-devel list about LGPL on the shared library)? > > BSD would be good. I agree that it'll be a pain in the neck to > maintain a PL that is not in the main tree, so I'd support accepting it > if we can get the license right. OK -- I'll see what they have to say about it over on r-devel. >>3) The only major feature not yet developed is the ability to handle triggers. >>Any strong feelings on whether this is necessary for a first release? > > No. I'm not sure you'd really need triggers written in R ever ;-) Yeah, that's what I figured too. Thanks for the feedback! Joe
> >>Any strong feelings on whether this is necessary for a first release? > > > > No. I'm not sure you'd really need triggers written in R ever ;-) > > Yeah, that's what I figured too. Indeed. R sounds like it might be an interesting platform from which to do "data mining," and in that sort of context, you're almost exclusively reading data, so the loss of triggers would be of no great importance. What might be "nifty" would be to have some mappings that did Clever Transformations of Queries Into Views, particularly if that allowed harnessing the DBMS to do some of the statistical analysis behind your back... -- If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me http://www3.sympatico.ca/cbbrowne/rdbms.html "After you've heard two eyewitness accounts of an auto accident it makes you wonder about history." -- Bits and Pieces
cbbrowne@cbbrowne.com wrote: > What might be "nifty" would be to have some mappings that did Clever > Transformations of Queries Into Views, particularly if that allowed > harnessing the DBMS to do some of the statistical analysis behind your > back... I'm not quite sure what you mean here, but it does support pulling data into the R interpreter as a "data.frame" via SPI, and returning R matricies/vectors/data.frames as either Postgres arrays or as rows and columns of a table function. Here's two contrived, but illustrative, examples: create or replace function test_dtup() returns record as 'data.frame(letters[1:10],1:10)' language 'plr'; select * from test_dtup() as t(f1 text, f2 int); f1 | f2 ----+---- a | 1 b | 2 c | 3 d | 4 e | 5 f | 6 g | 7 h | 8 i | 9 j | 10 (10 rows) create or replace function test_spi_tup(text) returns record as 'pg.spi.exec(arg1)' language 'plr'; select * from test_spi_tup('select oid, typname from pg_type where typname = ''oid'' or typname = ''text''') as t(typeid oid, typename name); typeid | typename --------+---------- 25 | text 26 | oid (2 rows) You could easily perform a parameterized query via SPI, retrieve the results into an R data.frame, do some statistical manipulations, and then return the results as a table function. The table function itself could be wrapped in a view to hide the whole thing from the end-user. You can also create custom aggregates. There has been at least one thread not too long ago regarding an aggregate to calculate median, for instance. Here it is in plr: create table foo(f1 text, f2 float8); insert into foo values('cat1',1.21); insert into foo values('cat1',1.24); insert into foo values('cat1',1.18); insert into foo values('cat1',1.26); insert into foo values('cat1',1.15); insert into foo values('cat2',1.15); insert into foo values('cat2',1.26); insert into foo values('cat2',1.32); insert into foo values('cat2',1.30); create or replace function r_median(_float8) returns float as 'median(arg1)' language 'plr'; CREATE AGGREGATE median (sfunc = array_accum, basetype = float8, stype = _float8, finalfunc = r_median); select f1, median(f2) from foo group by f1 order by f1; f1 | median ------+-------- cat1 | 1.21 cat2 | 1.28 (2 rows) It's not as fast as the native PostgreSQL functions if you just need average or standard deviation, but it's alot easier and faster than writing your own for something more out-of-the-ordinary. Joe
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >>2) Knowing the trend to move stuff *out* of the PostgreSQL source tarball, and >>assuming plr is released under GPL, is there any chance that it would be >>accepted into src/pl or contrib, or should I start a gborg project (I'd prefer >>if it could at least live in contrib)? > > I think we'd have to insist on gborg. The only reason there are any > non-BSD-license items left in contrib is that I haven't finished my TODO > item to get their licenses changed or remove 'em. [...snip...] > BSD would be good. I agree that it'll be a pain in the neck to > maintain a PL that is not in the main tree, so I'd support accepting it > if we can get the license right. I finally got a response from one of the core R developers: "libR is GPL-ed, and that is unlikely to change" -- so I guess gborg it is :-( (not that I have anything against gborg ;-)) Before making any release announcements, I'd be interested in feedback if anyone feels so inclined. The source is currently available here: http://www.joeconway.com/plr/plr.0.1.1.alpha.tar.gz The documentation, including preprocessed html, is in the tar ball. I've also posted the html docs here: http://www.joeconway.com/plr/index.html From the README (more or less): ------------------------------- Installation: Place tar file in 'contrib' in the PostgreSQL source tree and untar. Then run: make make install make installcheck You can use plr.sql to create the language and functions in your database of choice, e.g. psql mydatabase < plr.sql ------------------------------- In addition to the documentation, the plr.out file in plr/expected is a good source of usage examples. PL/R should build cleanly with PostgreSQL 7.3.x and cvs HEAD. It was developed using libR from R 1.6.2 under Red Hat 7.3 & 8.0 -- I've not tested against other versions of R or different OSes. Please let me know how it goes. Thanks, Joe