Thread: pl/R questions
Hello. I've downloaded and installed pl/R version 0.4.4. My goal is to be able to perform multivariate linear regression analysis. However, for the sake of simplicity, I'm performing a single variable regression and prediction as a test: 1) CREATE TABLE entries(float8 x, float8 y); I have the following values: x | y ----+-------- 1 | 133890 2 | 135000 3 | 135790 4 | 137300 5 | 138130 6 | 139100 7 | 139900 8 | 141120 9 | 141890 10 | 143230 11 | 144000 12 | 145290 2) CREATE TABLE predictions(float8 x); I have the following new x values: x ---- 13 14 15 16 17 3) I have the following functions: -- Reset R global variables CREATE OR REPLACE FUNCTION r_resetlm() RETURNS integer AS ' xs <<- c() ys <<- c() nxs <<- c() return(1) ' LANGUAGE 'plr' WITH (isStrict); -- Add a new known x, f(x) CREATE OR REPLACE FUNCTION r_initknowns(float8, float8) RETURNS integer AS ' xs <<- c(xs, arg1) ys <<- c(ys, arg2) return(1) ' LANGUAGE 'plr' WITH (isStrict); -- Add a predicting x CREATE OR REPLACE FUNCTION r_initpredicts(float8) RETURNS integer AS ' nxs <<- c(nxs, arg1) return(1) ' LANGUAGE 'plr' WITH (isStrict); -- Generate the predictions CREATE OR REPLACE FUNCTION r_predict() RETURNS SETOF RECORD AS ' samples <- data.frame(xs=nxs) result <- predict(lm(ys ~ xs), samples) return (result) ' LANGUAGE 'plr' WITH (isStrict); 4) I perform the prediction like so: select r_resetlm(); select r_initknowns(x, y) from entries; select r_initpredicts(x) from predictions; select * from r_predict() as trend(ny float8); This works, but there are several potential problems using this method: (A) The function r_resetlm() must be called to reset the global values before each invocation. Not a big problem, but I would like to avoid globals, if possible. The relations supplying the data are temporary tables and thus I cannot refer to their names in static pl/R. I can't figure out a way to use pg.spi.prepare()/pg.spi.execp() to initialize R variables with the result of the executed queries. I would like to do something like this, instead: CREATE OR REPLACE FUNCTION r_predict(text, text) RETURNS SETOF RECORD AS ' sql <- paste("SELECT x, y FROM", arg1, "ORDER BY x") plan <- pg.spi.prepare(sql, NA) pg.spi.execp(plan, NA) ??? Read results into appropriate vectors samples <- data.frame(xs=nxs) result <- predict(lm(ys ~ xs), samples) return (result) ' LANGUAGE 'plr' WITH (isStrict); (B) I suppose an unqualified SELECT will always invoke r_initknowns() and r_initpredicts() but is this guaranteed? And guaranteed to only be executed once for each tuple? If so, then I'm somewhat less bothered by the use of R globals. Is using the VOLATILE attribute in the CREATE FUNTION statement sufficient to guarantee that the call will always be made? (C) For the life of me, and this is an R question, I cannot figure out how to get R to perform predictions on multivariate data: ys <- c(133890, 135000, 135790, 137300, 138130, 139100, 139900, 141120, 141890, 143230, 144000, 145290) xs1 <- c(1:12) xs2 <- c(22, 24.5, 27, 33, 36.8, 40, 44, 57, 59, 62, 74, 77) xm <- cbind(xs1, xs2) nx1 <- c(13:17) nx2 <- c(82, 85, 88.3, 90, 95) samples <- data.frame(xs1=nx1, xs2=nx2) f <- predict(lm(ys ~ xm), samples) data.frame(f) yields: f 1 133949.8 2 134970.2 3 135990.6 4 137008.1 5 138027.5 6 139047.3 7 140066.5 8 141078.3 9 142099.1 10 143119.1 11 144131.7 12 145151.7 Not the predicted y's for the new x1's and x2's. I tried: f <- predict.mlm(lm(ys ~ xm), samples) and got: Error in object$coefficients[piv, ] : incorrect number of dimensions And I have no mlm() to use. Any clues? I'm a bit of an R newbie. I wrote my own multivariate linear regression functions years ago in C++, but I'd prefer to leverage R from PostgreSQL. Mike Mascari mascarm@mascari.com PS: Thanks to Joe Conway for this procedural language.
Mike Mascari wrote: > (A) The function r_resetlm() must be called to reset the global values > before each invocation. Not a big problem, but I would like to avoid > globals, if possible. The relations supplying the data are temporary > tables and thus I cannot refer to their names in static pl/R. I can't > figure out a way to use pg.spi.prepare()/pg.spi.execp() to initialize > R variables with the result of the executed queries. I would like to > do something like this, instead: > > CREATE OR REPLACE FUNCTION r_predict(text, text) > RETURNS SETOF RECORD AS ' > > sql <- paste("SELECT x, y FROM", arg1, "ORDER BY x") > plan <- pg.spi.prepare(sql, NA) > pg.spi.execp(plan, NA) > > ??? Read results into appropriate vectors > > samples <- data.frame(xs=nxs) > result <- predict(lm(ys ~ xs), samples) > return (result) > > ' LANGUAGE 'plr' WITH (isStrict); I don't think you can do a prepared plan if the table itself is going to change, only when parameters change. Maybe something like this works: CREATE OR REPLACE FUNCTION r_predict(text, text) RETURNS SETOF RECORD AS ' sql <- paste("SELECT x, y FROM", arg1, "ORDER BY x") xyknowns <- pg.spi.exec(sql) xs <- as.numeric(xyknowns[,1]) ys <- as.numeric(xyknowns[,2]) sql <- paste("SELECT x FROM", arg2, "ORDER BY x") xypred <- pg.spi.exec(sql) nxs <- as.numeric(xypred[,1]) samples <- data.frame(xs=nxs) result <- predict(lm(ys ~ xs), samples) return (result) ' LANGUAGE 'plr' WITH (isStrict); regression=# select * from r_predict('entries', 'predictions') as trend(ny float8); ny ------------------ 146171.515151515 147189.696969697 148207.878787879 149226.060606061 150244.242424242 (5 rows) > (B) I suppose an unqualified SELECT will always invoke r_initknowns() > and r_initpredicts() but is this guaranteed? And guaranteed to only be > executed once for each tuple? If so, then I'm somewhat less bothered > by the use of R globals. Is using the VOLATILE attribute in the CREATE > FUNTION statement sufficient to guarantee that the call will always be > made? Use the above -- I think your original multistep process is not the way to go anyway > (C) For the life of me, and this is an R question, I cannot figure out > how to get R to perform predictions on multivariate data: I'm sure there is support for multivariate linear regression in R, but I'm still too new at R to know the answer myself. You should try posting that one to R-help. BTW, I created a PL/R specific mailing list on gborg, but no one is subscribed currently. If people on this list find PL/R specific questions too off-topic, perhaps we should move there. R specific questions should definitely be posted to R-help though. Regards, Joe
Joe Conway wrote: > Mike Mascari wrote: > >> (A) The function r_resetlm() must be called to reset the global values >> before each invocation. Not a big problem, but I would like to avoid >> globals, if possible. The relations supplying the data are temporary >> tables and thus I cannot refer to their names in static pl/R. I can't >> figure out a way to use pg.spi.prepare()/pg.spi.execp() to initialize >> R variables with the result of the executed queries. I would like to >> do something like this, instead: > > I don't think you can do a prepared plan if the table itself is going to > change, only when parameters change. Maybe something like this works: > > CREATE OR REPLACE FUNCTION r_predict(text, text) > RETURNS SETOF RECORD AS ' > sql <- paste("SELECT x, y FROM", arg1, "ORDER BY x") > xyknowns <- pg.spi.exec(sql) > xs <- as.numeric(xyknowns[,1]) > ys <- as.numeric(xyknowns[,2]) > > sql <- paste("SELECT x FROM", arg2, "ORDER BY x") > xypred <- pg.spi.exec(sql) > nxs <- as.numeric(xypred[,1]) > > samples <- data.frame(xs=nxs) > result <- predict(lm(ys ~ xs), samples) > return (result) > ' LANGUAGE 'plr' WITH (isStrict); > > regression=# select * from r_predict('entries', 'predictions') as > trend(ny float8); > ny > ------------------ > 146171.515151515 > 147189.696969697 > 148207.878787879 > 149226.060606061 > 150244.242424242 > (5 rows) Very nice. Thanks a bunch! > I'm sure there is support for multivariate linear regression in R, but > I'm still too new at R to know the answer myself. You should try posting > that one to R-help. Okay. > BTW, I created a PL/R specific mailing list on gborg, but no one is > subscribed currently. If people on this list find PL/R specific > questions too off-topic, perhaps we should move there. R specific > questions should definitely be posted to R-help though. Okay. Great! Thanks again! Mike Mascari mascarm@mascari.com