Thread: PL/R etc.

PL/R etc.

From
Mark Morgan Lloyd
Date:
I don't know whether anybody active on the list has R (and in particular
PL/R) experience, but just in case... :-)

i)   Something like APL can operate on an array with minimal regard for
index order, i.e. operations across the array are as easily-expressed
and as efficient as operations down the array. Does this apply to PL/R?

ii)  Things like OpenOffice can be very inefficient if operating over a
table comprising a non-trivial number of rows. Does PL/R offer a
significant improvement, e.g. by using a cursor rather than trying to
read an entire resultset into memory?

--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]


Re: PL/R etc.

From
Merlin Moncure
Date:
On Fri, May 10, 2013 at 4:41 AM, Mark Morgan Lloyd
<markMLl.pgsql-general@telemetry.co.uk> wrote:
> I don't know whether anybody active on the list has R (and in particular
> PL/R) experience, but just in case... :-)
>
> i)   Something like APL can operate on an array with minimal regard for
> index order, i.e. operations across the array are as easily-expressed and as
> efficient as operations down the array. Does this apply to PL/R?
>
> ii)  Things like OpenOffice can be very inefficient if operating over a
> table comprising a non-trivial number of rows. Does PL/R offer a significant
> improvement, e.g. by using a cursor rather than trying to read an entire
> resultset into memory?

pl/r (via R) very terse and expressive.  it will probably meet or beat
any performance expectations you have coming from openoffice.   that
said, it's definitely a memory bound language; typically problem
solving involves stuffing data into huge data frames which then pass
to the high level problem solving functions like glm.

you have full access to sql within the pl/r function, so nothing is
keeping you from paging data into the frame via a cursor, but that
only helps so much.

a lot depends on the specific problem you solve of course.

merlin


Re: PL/R etc.

From
Joe Conway
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/10/2013 06:33 AM, Merlin Moncure wrote:
> you have full access to sql within the pl/r function, so nothing
> is keeping you from paging data into the frame via a cursor, but
> that only helps so much.

I have thought about implementing the paging transparently beneath the
dataframe object, but have not gotten around to it. Apparently the
Oracle connector to R is able to do that, so I'm sure it is a SMOP.

Joe

- --
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRjR7PAAoJEDfy90M199hlV8MQAJXaAW2trOPRK+BO53jWKfJs
Ksdepom2Mc4MAvMCPY+t3VCSvIA8SIYwj58dzMF9sC5jz46Dfgu+mT6ML3qFadIq
yO/aa6Ss9j/LzyODpON3uZb3P/HAAifDC+rg11JzgoQj9L/7eoc+uI7Ruc3He6aE
hA4MkxN9zuvowTt9yGi+N0iQNvMMIlFUNS0Uc0PUwYWXka0PSjkZsTShySaF5U34
ch9PNqS2U1vsHi+D8YsgiloHuFoZMvV0NEr9vixWC8s0mtP2+LHSrklMTD9X6HDg
YMm0ma+/cmALhc51qXkCpNX4S4oDhf9Ma+y3E1++BNQW50Vnu0+/mcCEwoIxpJEg
aphT+UX3k0duusE6PfzPx/ouHslW/TfBKdSkYZSwaqqhKky0pzxDnOgsXUxw6w1a
RBOHds+5moCnyhQ1TI+RyWQl0+jBueJWAXOJxb3d9Sy5EjWXUgWBwr0yjORfK4pg
s7Wo5Y0Isb9Y+au0OFbIjlSYXiJsXy/n7IiiKaFR74pE3StIm3pn2T7Zc48aJOly
s1xmWXdBZGq78KE59fqA/wFsKqYWp9FM9WCBTVdncP70pnKtOJmhhTWXKDOYiCpJ
v8EE814zPj4x8kTjWXjTdT1kGCaLwe9GREkmmW0w2M+lVpi9/oBLs6uIdUdyQTci
HzL41MQfE/T+81Bkg4V2
=NIq1
-----END PGP SIGNATURE-----


Re: PL/R etc.

From
Mark Morgan Lloyd
Date:
Merlin Moncure wrote:
> On Fri, May 10, 2013 at 4:41 AM, Mark Morgan Lloyd
> <markMLl.pgsql-general@telemetry.co.uk> wrote:
>> I don't know whether anybody active on the list has R (and in particular
>> PL/R) experience, but just in case... :-)
>>
>> i)   Something like APL can operate on an array with minimal regard for
>> index order, i.e. operations across the array are as easily-expressed and as
>> efficient as operations down the array. Does this apply to PL/R?
>>
>> ii)  Things like OpenOffice can be very inefficient if operating over a
>> table comprising a non-trivial number of rows. Does PL/R offer a significant
>> improvement, e.g. by using a cursor rather than trying to read an entire
>> resultset into memory?
>
> pl/r (via R) very terse and expressive.  it will probably meet or beat
> any performance expectations you have coming from openoffice.   that
> said, it's definitely a memory bound language; typically problem
> solving involves stuffing data into huge data frames which then pass
> to the high level problem solving functions like glm.
>
> you have full access to sql within the pl/r function, so nothing is
> keeping you from paging data into the frame via a cursor, but that
> only helps so much.
>
> a lot depends on the specific problem you solve of course.

Thanks Merlin and Joe. As an occasional APL user "terse and oppressive"
doesn't really bother me :-)

As a particular example of the sort of thing I'm thinking, using "pure"
SQL the operation of summing the columns in each row and summing the
rows in each column are very different.

In contrast, in APL if I have an array

         B
1  2  3  4
5  6  7  8
9 10 11 12

I can perform a reduction operation using + over whichever axis I specify:

         +/[1]B
15 18 21 24
         +/[2]B
10 26 42

or even by default

         +/B
10 26 42

Does PL/R provide that sort of abstraction in a uniform fashion?

--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]


Re: PL/R etc.

From
Merlin Moncure
Date:
On Fri, May 10, 2013 at 2:32 PM, Mark Morgan Lloyd
<markMLl.pgsql-general@telemetry.co.uk> wrote:
> Merlin Moncure wrote:
>>
>> On Fri, May 10, 2013 at 4:41 AM, Mark Morgan Lloyd
>> <markMLl.pgsql-general@telemetry.co.uk> wrote:
>>>
>>> I don't know whether anybody active on the list has R (and in particular
>>> PL/R) experience, but just in case... :-)
>>>
>>> i)   Something like APL can operate on an array with minimal regard for
>>> index order, i.e. operations across the array are as easily-expressed and
>>> as
>>> efficient as operations down the array. Does this apply to PL/R?
>>>
>>> ii)  Things like OpenOffice can be very inefficient if operating over a
>>> table comprising a non-trivial number of rows. Does PL/R offer a
>>> significant
>>> improvement, e.g. by using a cursor rather than trying to read an entire
>>> resultset into memory?
>>
>>
>> pl/r (via R) very terse and expressive.  it will probably meet or beat
>> any performance expectations you have coming from openoffice.   that
>> said, it's definitely a memory bound language; typically problem
>> solving involves stuffing data into huge data frames which then pass
>> to the high level problem solving functions like glm.
>>
>> you have full access to sql within the pl/r function, so nothing is
>> keeping you from paging data into the frame via a cursor, but that
>> only helps so much.
>>
>> a lot depends on the specific problem you solve of course.
>
>
> Thanks Merlin and Joe. As an occasional APL user "terse and oppressive"
> doesn't really bother me :-)
>
> As a particular example of the sort of thing I'm thinking, using "pure" SQL
> the operation of summing the columns in each row and summing the rows in
> each column are very different.
>
> In contrast, in APL if I have an array
>
>         B
> 1  2  3  4
> 5  6  7  8
> 9 10 11 12
>
> I can perform a reduction operation using + over whichever axis I specify:
>
>         +/[1]B
> 15 18 21 24
>         +/[2]B
> 10 26 42
>
> or even by default
>
>         +/B
> 10 26 42
>
> Does PL/R provide that sort of abstraction in a uniform fashion?

certainly (for example see here:
http://stackoverflow.com/questions/13352180/sum-different-columns-in-a-data-frame)
-- getting good at R can take some time but it's worth it.   R is
"hot" right now with all the buzz around big data lately.  The main
challenge actually is the language is so rich it can be difficult to
zero in on the precise behaviors you need.   Also, the documentation
is all over the place.

pl/r plays in nicely because with some thought you can marry the R
analysis functions directly to the query in terms of both inputs and
outputs -- basically very, very sweet syntax sugar.   It's a little
capricious though (and be advised: Joe has put up some very important
and necessary fixes quite recently) so usually I work out the R code
in the R console first before putting in the database.

merlin


Re: PL/R etc.

From
Mark Morgan Lloyd
Date:
Merlin Moncure wrote:
> On Fri, May 10, 2013 at 2:32 PM, Mark Morgan Lloyd
> <markMLl.pgsql-general@telemetry.co.uk> wrote:
>> Merlin Moncure wrote:
>>> On Fri, May 10, 2013 at 4:41 AM, Mark Morgan Lloyd
>>> <markMLl.pgsql-general@telemetry.co.uk> wrote:
>>>> I don't know whether anybody active on the list has R (and in particular
>>>> PL/R) experience, but just in case... :-)
>>>>
>>>> i)   Something like APL can operate on an array with minimal regard for
>>>> index order, i.e. operations across the array are as easily-expressed and
>>>> as
>>>> efficient as operations down the array. Does this apply to PL/R?
>>>>
>>>> ii)  Things like OpenOffice can be very inefficient if operating over a
>>>> table comprising a non-trivial number of rows. Does PL/R offer a
>>>> significant
>>>> improvement, e.g. by using a cursor rather than trying to read an entire
>>>> resultset into memory?
>>>
>>> pl/r (via R) very terse and expressive.  it will probably meet or beat
>>> any performance expectations you have coming from openoffice.   that
>>> said, it's definitely a memory bound language; typically problem
>>> solving involves stuffing data into huge data frames which then pass
>>> to the high level problem solving functions like glm.
>>>
>>> you have full access to sql within the pl/r function, so nothing is
>>> keeping you from paging data into the frame via a cursor, but that
>>> only helps so much.
>>>
>>> a lot depends on the specific problem you solve of course.
>>
>> Thanks Merlin and Joe. As an occasional APL user "terse and oppressive"
>> doesn't really bother me :-)
>>
>> As a particular example of the sort of thing I'm thinking, using "pure" SQL
>> the operation of summing the columns in each row and summing the rows in
>> each column are very different.
>>
>> In contrast, in APL if I have an array
>>
>>         B
>> 1  2  3  4
>> 5  6  7  8
>> 9 10 11 12
>>
>> I can perform a reduction operation using + over whichever axis I specify:
>>
>>         +/[1]B
>> 15 18 21 24
>>         +/[2]B
>> 10 26 42
>>
>> or even by default
>>
>>         +/B
>> 10 26 42
>>
>> Does PL/R provide that sort of abstraction in a uniform fashion?
>
> certainly (for example see here:
> http://stackoverflow.com/questions/13352180/sum-different-columns-in-a-data-frame)
> -- getting good at R can take some time but it's worth it.   R is
> "hot" right now with all the buzz around big data lately.  The main
> challenge actually is the language is so rich it can be difficult to
> zero in on the precise behaviors you need.   Also, the documentation
> is all over the place.
>
> pl/r plays in nicely because with some thought you can marry the R
> analysis functions directly to the query in terms of both inputs and
> outputs -- basically very, very sweet syntax sugar.   It's a little
> capricious though (and be advised: Joe has put up some very important
> and necessary fixes quite recently) so usually I work out the R code
> in the R console first before putting in the database.

[Peruse] Thanks, I think I get the general idea. I'm aware of the
significance of R, and in particular that it's attracting attention due
to the undesirability of hiding functionality in spreadsheets where
these usurped APL for certain types of operation.

--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]