pl/R questions - Mailing list pgsql-general
From | Mike Mascari |
---|---|
Subject | pl/R questions |
Date | |
Msg-id | 3F2B606E.6060800@mascari.com Whole thread Raw |
Responses |
Re: pl/R questions
|
List | pgsql-general |
Hello. I've downloaded and installed pl/R version 0.4.4. My goal is to be able to perform multivariate linear regression analysis. However, for the sake of simplicity, I'm performing a single variable regression and prediction as a test: 1) CREATE TABLE entries(float8 x, float8 y); I have the following values: x | y ----+-------- 1 | 133890 2 | 135000 3 | 135790 4 | 137300 5 | 138130 6 | 139100 7 | 139900 8 | 141120 9 | 141890 10 | 143230 11 | 144000 12 | 145290 2) CREATE TABLE predictions(float8 x); I have the following new x values: x ---- 13 14 15 16 17 3) I have the following functions: -- Reset R global variables CREATE OR REPLACE FUNCTION r_resetlm() RETURNS integer AS ' xs <<- c() ys <<- c() nxs <<- c() return(1) ' LANGUAGE 'plr' WITH (isStrict); -- Add a new known x, f(x) CREATE OR REPLACE FUNCTION r_initknowns(float8, float8) RETURNS integer AS ' xs <<- c(xs, arg1) ys <<- c(ys, arg2) return(1) ' LANGUAGE 'plr' WITH (isStrict); -- Add a predicting x CREATE OR REPLACE FUNCTION r_initpredicts(float8) RETURNS integer AS ' nxs <<- c(nxs, arg1) return(1) ' LANGUAGE 'plr' WITH (isStrict); -- Generate the predictions CREATE OR REPLACE FUNCTION r_predict() RETURNS SETOF RECORD AS ' samples <- data.frame(xs=nxs) result <- predict(lm(ys ~ xs), samples) return (result) ' LANGUAGE 'plr' WITH (isStrict); 4) I perform the prediction like so: select r_resetlm(); select r_initknowns(x, y) from entries; select r_initpredicts(x) from predictions; select * from r_predict() as trend(ny float8); This works, but there are several potential problems using this method: (A) The function r_resetlm() must be called to reset the global values before each invocation. Not a big problem, but I would like to avoid globals, if possible. The relations supplying the data are temporary tables and thus I cannot refer to their names in static pl/R. I can't figure out a way to use pg.spi.prepare()/pg.spi.execp() to initialize R variables with the result of the executed queries. I would like to do something like this, instead: CREATE OR REPLACE FUNCTION r_predict(text, text) RETURNS SETOF RECORD AS ' sql <- paste("SELECT x, y FROM", arg1, "ORDER BY x") plan <- pg.spi.prepare(sql, NA) pg.spi.execp(plan, NA) ??? Read results into appropriate vectors samples <- data.frame(xs=nxs) result <- predict(lm(ys ~ xs), samples) return (result) ' LANGUAGE 'plr' WITH (isStrict); (B) I suppose an unqualified SELECT will always invoke r_initknowns() and r_initpredicts() but is this guaranteed? And guaranteed to only be executed once for each tuple? If so, then I'm somewhat less bothered by the use of R globals. Is using the VOLATILE attribute in the CREATE FUNTION statement sufficient to guarantee that the call will always be made? (C) For the life of me, and this is an R question, I cannot figure out how to get R to perform predictions on multivariate data: ys <- c(133890, 135000, 135790, 137300, 138130, 139100, 139900, 141120, 141890, 143230, 144000, 145290) xs1 <- c(1:12) xs2 <- c(22, 24.5, 27, 33, 36.8, 40, 44, 57, 59, 62, 74, 77) xm <- cbind(xs1, xs2) nx1 <- c(13:17) nx2 <- c(82, 85, 88.3, 90, 95) samples <- data.frame(xs1=nx1, xs2=nx2) f <- predict(lm(ys ~ xm), samples) data.frame(f) yields: f 1 133949.8 2 134970.2 3 135990.6 4 137008.1 5 138027.5 6 139047.3 7 140066.5 8 141078.3 9 142099.1 10 143119.1 11 144131.7 12 145151.7 Not the predicted y's for the new x1's and x2's. I tried: f <- predict.mlm(lm(ys ~ xm), samples) and got: Error in object$coefficients[piv, ] : incorrect number of dimensions And I have no mlm() to use. Any clues? I'm a bit of an R newbie. I wrote my own multivariate linear regression functions years ago in C++, but I'd prefer to leverage R from PostgreSQL. Mike Mascari mascarm@mascari.com PS: Thanks to Joe Conway for this procedural language.
pgsql-general by date: