Re: new procedural language - PL/R - Mailing list pgsql-hackers

From Joe Conway
Subject Re: new procedural language - PL/R
Date
Msg-id 3E3EE17A.4030107@joeconway.com
Whole thread Raw
In response to Re: new procedural language - PL/R  (cbbrowne@cbbrowne.com)
List pgsql-hackers
cbbrowne@cbbrowne.com wrote:
> What might be "nifty" would be to have some mappings that did Clever
> Transformations of Queries Into Views, particularly if that allowed
> harnessing the DBMS to do some of the statistical analysis behind your
> back...

I'm not quite sure what you mean here, but it does support pulling data into 
the R interpreter as a "data.frame" via SPI, and returning R 
matricies/vectors/data.frames as either Postgres arrays or as rows and columns 
of a table function. Here's two contrived, but illustrative, examples:

create or replace function test_dtup() returns record as 
'data.frame(letters[1:10],1:10)' language 'plr';
select * from test_dtup() as t(f1 text, f2 int); f1 | f2
----+---- a  |  1 b  |  2 c  |  3 d  |  4 e  |  5 f  |  6 g  |  7 h  |  8 i  |  9 j  | 10
(10 rows)

create or replace function test_spi_tup(text) returns record as 
'pg.spi.exec(arg1)' language 'plr';
select * from test_spi_tup('select oid, typname from pg_type where typname = 
''oid'' or typname = ''text''') as t(typeid oid, typename name); typeid | typename
--------+----------     25 | text     26 | oid
(2 rows)

You could easily perform a parameterized query via SPI, retrieve the results 
into an R data.frame, do some statistical manipulations, and then return the 
results as a table function. The table function itself could be wrapped in a 
view to hide the whole thing from the end-user.

You can also create custom aggregates. There has been at least one thread not 
too long ago regarding an aggregate to calculate median, for instance. Here it 
is in plr:

create table foo(f1 text, f2 float8);
insert into foo values('cat1',1.21);
insert into foo values('cat1',1.24);
insert into foo values('cat1',1.18);
insert into foo values('cat1',1.26);
insert into foo values('cat1',1.15);
insert into foo values('cat2',1.15);
insert into foo values('cat2',1.26);
insert into foo values('cat2',1.32);
insert into foo values('cat2',1.30);
create or replace function r_median(_float8) returns float as 'median(arg1)' 
language 'plr';
CREATE AGGREGATE median (sfunc = array_accum, basetype = float8, stype = 
_float8, finalfunc = r_median);
select f1, median(f2) from foo group by f1 order by f1;  f1  | median
------+-------- cat1 |   1.21 cat2 |   1.28
(2 rows)

It's not as fast as the native PostgreSQL functions if you just need average 
or standard deviation, but it's alot easier and faster than writing your own 
for something more out-of-the-ordinary.

Joe



pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: [mail] Re: Windows Build System
Next
From: Tom Lane
Date:
Subject: Re: Win32 Powerfail testing - results