Thread: new procedural language - PL/R

new procedural language - PL/R

From
Joe Conway
Date:
I'm nearing completion of a new procedural language, PL/R. It provides an 
interface to the R Statistical Computing language. R is similar to the 
commercial package S-Plus; for more on R see:  http://www.r-project.org/

Here is the first paragraph of their intro:
"R is a language and environment for statistical computing and graphics. It is 
a GNU project which is similar to the S language and environment which was 
developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by 
John Chambers and colleagues. R can be considered as a different 
implementation of S. There are some important differences, but much code 
written for S runs unaltered under R."

Before I post the source somewhere, I have a few questions:

1) R itself is under GPL, and as far as I can tell the shared library libR.so 
is also under GPL, not LGPL. I assume that means I need to release plr under 
GPL -- does that sound correct?

2) Knowing the trend to move stuff *out* of the PostgreSQL source tarball, and 
assuming plr is released under GPL, is there any chance that it would be 
accepted into src/pl or contrib, or should I start a gborg project (I'd prefer 
if it could at least live in contrib)? If I am somehow able to release it 
under a BSD license, would that change the answer (if so, I'll at least ask 
the r-devel list about LGPL on the shared library)?

3) The only major feature not yet developed is the ability to handle triggers. 
Any strong feelings on whether this is necessary for a first release? I see 
that pl/perl doesn't handle triggers. It seems like using a plpgsql trigger to 
call a plr function is a reasonable workaround.

Thanks for any thoughts or comments.

Joe



Re: new procedural language - PL/R

From
Joe Conway
Date:
Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>>2) Knowing the trend to move stuff *out* of the PostgreSQL source tarball, and 
>>assuming plr is released under GPL, is there any chance that it would be 
>>accepted into src/pl or contrib, or should I start a gborg project (I'd prefer 
>>if it could at least live in contrib)?
> 
> I think we'd have to insist on gborg.  The only reason there are any
> non-BSD-license items left in contrib is that I haven't finished my TODO
> item to get their licenses changed or remove 'em.

Thanks for the confirmation. That's what I suspected.

>>If I am somehow able to release it 
>>under a BSD license, would that change the answer (if so, I'll at least ask 
>>the r-devel list about LGPL on the shared library)?
> 
> BSD would be good.  I agree that it'll be a pain in the neck to
> maintain a PL that is not in the main tree, so I'd support accepting it
> if we can get the license right.

OK -- I'll see what they have to say about it over on r-devel.


>>3) The only major feature not yet developed is the ability to handle triggers. 
>>Any strong feelings on whether this is necessary for a first release?
> 
> No.  I'm not sure you'd really need triggers written in R ever ;-)

Yeah, that's what I figured too.

Thanks for the feedback!

Joe




Re: new procedural language - PL/R

From
cbbrowne@cbbrowne.com
Date:
> >>Any strong feelings on whether this is necessary for a first release?
> > 
> > No.  I'm not sure you'd really need triggers written in R ever ;-)
> 
> Yeah, that's what I figured too.

Indeed.  R sounds like it might be an interesting platform from which to
do "data mining," and in that sort of context, you're almost exclusively
reading data, so the loss of triggers would be of no great importance.

What might be "nifty" would be to have some mappings that did Clever
Transformations of Queries Into Views, particularly if that allowed
harnessing the DBMS to do some of the statistical analysis behind your
back...
--
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
http://www3.sympatico.ca/cbbrowne/rdbms.html
"After you've  heard two  eyewitness accounts of  an auto  accident it
makes you wonder about history." -- Bits and Pieces


Re: new procedural language - PL/R

From
Joe Conway
Date:
cbbrowne@cbbrowne.com wrote:
> What might be "nifty" would be to have some mappings that did Clever
> Transformations of Queries Into Views, particularly if that allowed
> harnessing the DBMS to do some of the statistical analysis behind your
> back...

I'm not quite sure what you mean here, but it does support pulling data into 
the R interpreter as a "data.frame" via SPI, and returning R 
matricies/vectors/data.frames as either Postgres arrays or as rows and columns 
of a table function. Here's two contrived, but illustrative, examples:

create or replace function test_dtup() returns record as 
'data.frame(letters[1:10],1:10)' language 'plr';
select * from test_dtup() as t(f1 text, f2 int); f1 | f2
----+---- a  |  1 b  |  2 c  |  3 d  |  4 e  |  5 f  |  6 g  |  7 h  |  8 i  |  9 j  | 10
(10 rows)

create or replace function test_spi_tup(text) returns record as 
'pg.spi.exec(arg1)' language 'plr';
select * from test_spi_tup('select oid, typname from pg_type where typname = 
''oid'' or typname = ''text''') as t(typeid oid, typename name); typeid | typename
--------+----------     25 | text     26 | oid
(2 rows)

You could easily perform a parameterized query via SPI, retrieve the results 
into an R data.frame, do some statistical manipulations, and then return the 
results as a table function. The table function itself could be wrapped in a 
view to hide the whole thing from the end-user.

You can also create custom aggregates. There has been at least one thread not 
too long ago regarding an aggregate to calculate median, for instance. Here it 
is in plr:

create table foo(f1 text, f2 float8);
insert into foo values('cat1',1.21);
insert into foo values('cat1',1.24);
insert into foo values('cat1',1.18);
insert into foo values('cat1',1.26);
insert into foo values('cat1',1.15);
insert into foo values('cat2',1.15);
insert into foo values('cat2',1.26);
insert into foo values('cat2',1.32);
insert into foo values('cat2',1.30);
create or replace function r_median(_float8) returns float as 'median(arg1)' 
language 'plr';
CREATE AGGREGATE median (sfunc = array_accum, basetype = float8, stype = 
_float8, finalfunc = r_median);
select f1, median(f2) from foo group by f1 order by f1;  f1  | median
------+-------- cat1 |   1.21 cat2 |   1.28
(2 rows)

It's not as fast as the native PostgreSQL functions if you just need average 
or standard deviation, but it's alot easier and faster than writing your own 
for something more out-of-the-ordinary.

Joe



Re: new procedural language - PL/R

From
Joe Conway
Date:
Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>>2) Knowing the trend to move stuff *out* of the PostgreSQL source tarball, and 
>>assuming plr is released under GPL, is there any chance that it would be 
>>accepted into src/pl or contrib, or should I start a gborg project (I'd prefer 
>>if it could at least live in contrib)?
> 
> I think we'd have to insist on gborg.  The only reason there are any
> non-BSD-license items left in contrib is that I haven't finished my TODO
> item to get their licenses changed or remove 'em.

[...snip...]

> BSD would be good.  I agree that it'll be a pain in the neck to
> maintain a PL that is not in the main tree, so I'd support accepting it
> if we can get the license right.

I finally got a response from one of the core R developers: "libR is 
GPL-ed, and that is unlikely to change" -- so I guess gborg it is :-(
(not that I have anything against gborg ;-))

Before making any release announcements, I'd be interested in feedback 
if anyone feels so inclined. The source is currently available here:
http://www.joeconway.com/plr/plr.0.1.1.alpha.tar.gz

The documentation, including preprocessed html, is in the tar ball. I've 
also posted the html docs here:  http://www.joeconway.com/plr/index.html
From the README (more or less):
-------------------------------
Installation:  Place tar file in 'contrib' in the PostgreSQL source tree and untar.  Then run:
    make    make install    make installcheck
  You can use plr.sql to create the language and functions in your  database of choice, e.g.
    psql mydatabase < plr.sql
-------------------------------

In addition to the documentation, the plr.out file in plr/expected is a 
good source of usage examples.

PL/R should build cleanly with PostgreSQL 7.3.x and cvs HEAD. It was 
developed using libR from R 1.6.2 under Red Hat 7.3 & 8.0 -- I've not 
tested against other versions of R or different OSes.

Please let me know how it goes.

Thanks,

Joe