Re: PL/R etc. - Mailing list pgsql-general

From Merlin Moncure
Subject Re: PL/R etc.
Date
Msg-id CAHyXU0yvD9C30vM4452Xa_uPW1OR9vXaieN_GfuUmvGixv4+aw@mail.gmail.com
Whole thread Raw
In response to Re: PL/R etc.  (Mark Morgan Lloyd <markMLl.pgsql-general@telemetry.co.uk>)
Responses Re: PL/R etc.  (Mark Morgan Lloyd <markMLl.pgsql-general@telemetry.co.uk>)
List pgsql-general
On Fri, May 10, 2013 at 2:32 PM, Mark Morgan Lloyd
<markMLl.pgsql-general@telemetry.co.uk> wrote:
> Merlin Moncure wrote:
>>
>> On Fri, May 10, 2013 at 4:41 AM, Mark Morgan Lloyd
>> <markMLl.pgsql-general@telemetry.co.uk> wrote:
>>>
>>> I don't know whether anybody active on the list has R (and in particular
>>> PL/R) experience, but just in case... :-)
>>>
>>> i)   Something like APL can operate on an array with minimal regard for
>>> index order, i.e. operations across the array are as easily-expressed and
>>> as
>>> efficient as operations down the array. Does this apply to PL/R?
>>>
>>> ii)  Things like OpenOffice can be very inefficient if operating over a
>>> table comprising a non-trivial number of rows. Does PL/R offer a
>>> significant
>>> improvement, e.g. by using a cursor rather than trying to read an entire
>>> resultset into memory?
>>
>>
>> pl/r (via R) very terse and expressive.  it will probably meet or beat
>> any performance expectations you have coming from openoffice.   that
>> said, it's definitely a memory bound language; typically problem
>> solving involves stuffing data into huge data frames which then pass
>> to the high level problem solving functions like glm.
>>
>> you have full access to sql within the pl/r function, so nothing is
>> keeping you from paging data into the frame via a cursor, but that
>> only helps so much.
>>
>> a lot depends on the specific problem you solve of course.
>
>
> Thanks Merlin and Joe. As an occasional APL user "terse and oppressive"
> doesn't really bother me :-)
>
> As a particular example of the sort of thing I'm thinking, using "pure" SQL
> the operation of summing the columns in each row and summing the rows in
> each column are very different.
>
> In contrast, in APL if I have an array
>
>         B
> 1  2  3  4
> 5  6  7  8
> 9 10 11 12
>
> I can perform a reduction operation using + over whichever axis I specify:
>
>         +/[1]B
> 15 18 21 24
>         +/[2]B
> 10 26 42
>
> or even by default
>
>         +/B
> 10 26 42
>
> Does PL/R provide that sort of abstraction in a uniform fashion?

certainly (for example see here:
http://stackoverflow.com/questions/13352180/sum-different-columns-in-a-data-frame)
-- getting good at R can take some time but it's worth it.   R is
"hot" right now with all the buzz around big data lately.  The main
challenge actually is the language is so rich it can be difficult to
zero in on the precise behaviors you need.   Also, the documentation
is all over the place.

pl/r plays in nicely because with some thought you can marry the R
analysis functions directly to the query in terms of both inputs and
outputs -- basically very, very sweet syntax sugar.   It's a little
capricious though (and be advised: Joe has put up some very important
and necessary fixes quite recently) so usually I work out the R code
in the R console first before putting in the database.

merlin


pgsql-general by date:

Previous
From: Mark Morgan Lloyd
Date:
Subject: Re: PL/R etc.
Next
From: Mark Morgan Lloyd
Date:
Subject: Re: PL/R etc.