Thread: PL/R etc.
I don't know whether anybody active on the list has R (and in particular PL/R) experience, but just in case... :-) i) Something like APL can operate on an array with minimal regard for index order, i.e. operations across the array are as easily-expressed and as efficient as operations down the array. Does this apply to PL/R? ii) Things like OpenOffice can be very inefficient if operating over a table comprising a non-trivial number of rows. Does PL/R offer a significant improvement, e.g. by using a cursor rather than trying to read an entire resultset into memory? -- Mark Morgan Lloyd markMLl .AT. telemetry.co .DOT. uk [Opinions above are the author's, not those of his employers or colleagues]
On Fri, May 10, 2013 at 4:41 AM, Mark Morgan Lloyd <markMLl.pgsql-general@telemetry.co.uk> wrote: > I don't know whether anybody active on the list has R (and in particular > PL/R) experience, but just in case... :-) > > i) Something like APL can operate on an array with minimal regard for > index order, i.e. operations across the array are as easily-expressed and as > efficient as operations down the array. Does this apply to PL/R? > > ii) Things like OpenOffice can be very inefficient if operating over a > table comprising a non-trivial number of rows. Does PL/R offer a significant > improvement, e.g. by using a cursor rather than trying to read an entire > resultset into memory? pl/r (via R) very terse and expressive. it will probably meet or beat any performance expectations you have coming from openoffice. that said, it's definitely a memory bound language; typically problem solving involves stuffing data into huge data frames which then pass to the high level problem solving functions like glm. you have full access to sql within the pl/r function, so nothing is keeping you from paging data into the frame via a cursor, but that only helps so much. a lot depends on the specific problem you solve of course. merlin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/10/2013 06:33 AM, Merlin Moncure wrote: > you have full access to sql within the pl/r function, so nothing > is keeping you from paging data into the frame via a cursor, but > that only helps so much. I have thought about implementing the paging transparently beneath the dataframe object, but have not gotten around to it. Apparently the Oracle connector to R is able to do that, so I'm sure it is a SMOP. Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJRjR7PAAoJEDfy90M199hlV8MQAJXaAW2trOPRK+BO53jWKfJs Ksdepom2Mc4MAvMCPY+t3VCSvIA8SIYwj58dzMF9sC5jz46Dfgu+mT6ML3qFadIq yO/aa6Ss9j/LzyODpON3uZb3P/HAAifDC+rg11JzgoQj9L/7eoc+uI7Ruc3He6aE hA4MkxN9zuvowTt9yGi+N0iQNvMMIlFUNS0Uc0PUwYWXka0PSjkZsTShySaF5U34 ch9PNqS2U1vsHi+D8YsgiloHuFoZMvV0NEr9vixWC8s0mtP2+LHSrklMTD9X6HDg YMm0ma+/cmALhc51qXkCpNX4S4oDhf9Ma+y3E1++BNQW50Vnu0+/mcCEwoIxpJEg aphT+UX3k0duusE6PfzPx/ouHslW/TfBKdSkYZSwaqqhKky0pzxDnOgsXUxw6w1a RBOHds+5moCnyhQ1TI+RyWQl0+jBueJWAXOJxb3d9Sy5EjWXUgWBwr0yjORfK4pg s7Wo5Y0Isb9Y+au0OFbIjlSYXiJsXy/n7IiiKaFR74pE3StIm3pn2T7Zc48aJOly s1xmWXdBZGq78KE59fqA/wFsKqYWp9FM9WCBTVdncP70pnKtOJmhhTWXKDOYiCpJ v8EE814zPj4x8kTjWXjTdT1kGCaLwe9GREkmmW0w2M+lVpi9/oBLs6uIdUdyQTci HzL41MQfE/T+81Bkg4V2 =NIq1 -----END PGP SIGNATURE-----
Merlin Moncure wrote: > On Fri, May 10, 2013 at 4:41 AM, Mark Morgan Lloyd > <markMLl.pgsql-general@telemetry.co.uk> wrote: >> I don't know whether anybody active on the list has R (and in particular >> PL/R) experience, but just in case... :-) >> >> i) Something like APL can operate on an array with minimal regard for >> index order, i.e. operations across the array are as easily-expressed and as >> efficient as operations down the array. Does this apply to PL/R? >> >> ii) Things like OpenOffice can be very inefficient if operating over a >> table comprising a non-trivial number of rows. Does PL/R offer a significant >> improvement, e.g. by using a cursor rather than trying to read an entire >> resultset into memory? > > pl/r (via R) very terse and expressive. it will probably meet or beat > any performance expectations you have coming from openoffice. that > said, it's definitely a memory bound language; typically problem > solving involves stuffing data into huge data frames which then pass > to the high level problem solving functions like glm. > > you have full access to sql within the pl/r function, so nothing is > keeping you from paging data into the frame via a cursor, but that > only helps so much. > > a lot depends on the specific problem you solve of course. Thanks Merlin and Joe. As an occasional APL user "terse and oppressive" doesn't really bother me :-) As a particular example of the sort of thing I'm thinking, using "pure" SQL the operation of summing the columns in each row and summing the rows in each column are very different. In contrast, in APL if I have an array B 1 2 3 4 5 6 7 8 9 10 11 12 I can perform a reduction operation using + over whichever axis I specify: +/[1]B 15 18 21 24 +/[2]B 10 26 42 or even by default +/B 10 26 42 Does PL/R provide that sort of abstraction in a uniform fashion? -- Mark Morgan Lloyd markMLl .AT. telemetry.co .DOT. uk [Opinions above are the author's, not those of his employers or colleagues]
On Fri, May 10, 2013 at 2:32 PM, Mark Morgan Lloyd <markMLl.pgsql-general@telemetry.co.uk> wrote: > Merlin Moncure wrote: >> >> On Fri, May 10, 2013 at 4:41 AM, Mark Morgan Lloyd >> <markMLl.pgsql-general@telemetry.co.uk> wrote: >>> >>> I don't know whether anybody active on the list has R (and in particular >>> PL/R) experience, but just in case... :-) >>> >>> i) Something like APL can operate on an array with minimal regard for >>> index order, i.e. operations across the array are as easily-expressed and >>> as >>> efficient as operations down the array. Does this apply to PL/R? >>> >>> ii) Things like OpenOffice can be very inefficient if operating over a >>> table comprising a non-trivial number of rows. Does PL/R offer a >>> significant >>> improvement, e.g. by using a cursor rather than trying to read an entire >>> resultset into memory? >> >> >> pl/r (via R) very terse and expressive. it will probably meet or beat >> any performance expectations you have coming from openoffice. that >> said, it's definitely a memory bound language; typically problem >> solving involves stuffing data into huge data frames which then pass >> to the high level problem solving functions like glm. >> >> you have full access to sql within the pl/r function, so nothing is >> keeping you from paging data into the frame via a cursor, but that >> only helps so much. >> >> a lot depends on the specific problem you solve of course. > > > Thanks Merlin and Joe. As an occasional APL user "terse and oppressive" > doesn't really bother me :-) > > As a particular example of the sort of thing I'm thinking, using "pure" SQL > the operation of summing the columns in each row and summing the rows in > each column are very different. > > In contrast, in APL if I have an array > > B > 1 2 3 4 > 5 6 7 8 > 9 10 11 12 > > I can perform a reduction operation using + over whichever axis I specify: > > +/[1]B > 15 18 21 24 > +/[2]B > 10 26 42 > > or even by default > > +/B > 10 26 42 > > Does PL/R provide that sort of abstraction in a uniform fashion? certainly (for example see here: http://stackoverflow.com/questions/13352180/sum-different-columns-in-a-data-frame) -- getting good at R can take some time but it's worth it. R is "hot" right now with all the buzz around big data lately. The main challenge actually is the language is so rich it can be difficult to zero in on the precise behaviors you need. Also, the documentation is all over the place. pl/r plays in nicely because with some thought you can marry the R analysis functions directly to the query in terms of both inputs and outputs -- basically very, very sweet syntax sugar. It's a little capricious though (and be advised: Joe has put up some very important and necessary fixes quite recently) so usually I work out the R code in the R console first before putting in the database. merlin
Merlin Moncure wrote: > On Fri, May 10, 2013 at 2:32 PM, Mark Morgan Lloyd > <markMLl.pgsql-general@telemetry.co.uk> wrote: >> Merlin Moncure wrote: >>> On Fri, May 10, 2013 at 4:41 AM, Mark Morgan Lloyd >>> <markMLl.pgsql-general@telemetry.co.uk> wrote: >>>> I don't know whether anybody active on the list has R (and in particular >>>> PL/R) experience, but just in case... :-) >>>> >>>> i) Something like APL can operate on an array with minimal regard for >>>> index order, i.e. operations across the array are as easily-expressed and >>>> as >>>> efficient as operations down the array. Does this apply to PL/R? >>>> >>>> ii) Things like OpenOffice can be very inefficient if operating over a >>>> table comprising a non-trivial number of rows. Does PL/R offer a >>>> significant >>>> improvement, e.g. by using a cursor rather than trying to read an entire >>>> resultset into memory? >>> >>> pl/r (via R) very terse and expressive. it will probably meet or beat >>> any performance expectations you have coming from openoffice. that >>> said, it's definitely a memory bound language; typically problem >>> solving involves stuffing data into huge data frames which then pass >>> to the high level problem solving functions like glm. >>> >>> you have full access to sql within the pl/r function, so nothing is >>> keeping you from paging data into the frame via a cursor, but that >>> only helps so much. >>> >>> a lot depends on the specific problem you solve of course. >> >> Thanks Merlin and Joe. As an occasional APL user "terse and oppressive" >> doesn't really bother me :-) >> >> As a particular example of the sort of thing I'm thinking, using "pure" SQL >> the operation of summing the columns in each row and summing the rows in >> each column are very different. >> >> In contrast, in APL if I have an array >> >> B >> 1 2 3 4 >> 5 6 7 8 >> 9 10 11 12 >> >> I can perform a reduction operation using + over whichever axis I specify: >> >> +/[1]B >> 15 18 21 24 >> +/[2]B >> 10 26 42 >> >> or even by default >> >> +/B >> 10 26 42 >> >> Does PL/R provide that sort of abstraction in a uniform fashion? > > certainly (for example see here: > http://stackoverflow.com/questions/13352180/sum-different-columns-in-a-data-frame) > -- getting good at R can take some time but it's worth it. R is > "hot" right now with all the buzz around big data lately. The main > challenge actually is the language is so rich it can be difficult to > zero in on the precise behaviors you need. Also, the documentation > is all over the place. > > pl/r plays in nicely because with some thought you can marry the R > analysis functions directly to the query in terms of both inputs and > outputs -- basically very, very sweet syntax sugar. It's a little > capricious though (and be advised: Joe has put up some very important > and necessary fixes quite recently) so usually I work out the R code > in the R console first before putting in the database. [Peruse] Thanks, I think I get the general idea. I'm aware of the significance of R, and in particular that it's attracting attention due to the undesirability of hiding functionality in spreadsheets where these usurped APL for certain types of operation. -- Mark Morgan Lloyd markMLl .AT. telemetry.co .DOT. uk [Opinions above are the author's, not those of his employers or colleagues]