Postgres Analysis Tool-Pak - Mailing list pgsql-hackers

From Mike Mascari
Subject Postgres Analysis Tool-Pak
Date
Msg-id 39260C4F.3BA68B77@mascari.com
Whole thread Raw
Responses Re: Postgres Analysis Tool-Pak
List pgsql-hackers
Hello guys,

As a personal project, I have written an Excel-compatible
spreadsheet as a component of a 3D charting and graphing
application. It uses a Phong-style renderer with texture mapping,
bump mapping, Z-buffer shadow mapping, non-refractive
transparency and antialiasing to produce some very nice looking
images. Anyway, a component of the data analysis tool is, as
mentioned, a spreadsheet. I've been able to implement around 300
of the 320 Excel-style functions and was wondering if they would
be useful to PostgreSQL. Here's the list:

ABS
ACCRINT
ACCRINTM
ACOS
ACOSH
ADDRESS
AMORDEGRC
AMORLINC
AND
AREAS
ASIN
ASINH
ATAN
ATAN2
ATANH
AVEDEV
AVERAGE
BESSELI
BESSELJ
BESSELK
BESSELY
BETADIST
BETAINV
BIN2DEC
BIN2HEX
BIN2OCT
BINOMDIST
CALL
CEILING
CELL
CHAR
CHIDIST
CHIINV
CHITEST
CHOOSE
CLEAN
CODE
COLUMN
COLUMNS
COMBIN
COMPLEX
CONCATENATE
CONFIDENCE
CONVERT
CORREL
COS
COSH
COUNT
COUNTA
COUNTBLANK
COUNTIF
COUPDAYS
COUPDAYSBS
COUPDAYSNC
COUPNCD
COUPNUM
COUPPCD
COVAR
CRITBINOM
CUMIPMT
CUMPRINC
DATE
DATEVALUE
DAVERAGE
DAY
DAYS360
DB
DCOUNT
DCOUNTA
DDB
DEC2BIN
DEC2HEX
DEC2OCT
DEGREES
DELTA
DEVSQ
DGET
DISC
DMAX
DMIN
DOLLAR
DOLLARDE
DOLLARFR
DPRODUCT
DSTDEV
DSTDEVP
DSUM
DURATION
DVAR
DVARP
EDATE
EFFECT
EOMONTH
ERF
ERFC
ERRORTYPE
EVEN
EXACT
EXP
EXPONDIST
FACT
FACTDOUBLE
FALSE
FDIST
FIND
FINV
FISHER
FISHERINV
FIXED
FLOOR
FORECAST
FREQUENCY
FTEST
FV
FVSCHEDULE
GAMMADIST
GAMMAINV
GAMMALN
GCD
GEOMEAN
GESTEP
GROWTH
HARMEAN
HEX2BIN
HEX2DEC
HEX2OCT
HLOOKUP
HOUR
HYPGEOMDIST
IF
IMABS
IMAGINARY
IMARGUMENT
IMCONJUGATE
IMCOS
IMDIV
IMEXP
IMLN
IMLOG10
IMLOG2
IMPOWER
IMPRODUCT
IMREAL
IMSIN
IMSQRT
IMSUB
IMSUM
INDEX
INDIRECT
INFO
INT
INTERCEPT
INTRATE
IPMT
IRR
ISBLANK
ISERR
ISERROR
ISEVEN
ISLOGICAL
ISNA
ISNONTEXT
ISNUMBER
ISODD
ISREF
ISTEXT
KURT
LARGE
LCM
LEFT
LEN
LINEST
LN
LOG
LOG10
LOGEST
LOGINV
LOGNORMDIST
LOOKUP
LSOLVE
LOWER
MATCH
MAX
MDETERM
MDURATION
MEDIAN
MID
MIN
MINUTE
MINVERSE
MIRR
MMULT
MOD
MODE
MONTH
MROUND
MULTINOMIAL
N
NA
NEGBINOMDIST
NETWORKDAYS
NOMINAL
NORMDIST
NORMINV
NORMSDIST
NORMSINV
NOT
NOW
NPER
NPV
OCT2BIN
OCT2DEC
OCT2HEX
ODD
ODDFPPRICE
ODDFYIELD
ODDLPRICE
ODDLYIELD
OFFSET
OR
PEARSON
PERCENTILE
PERCENTRANK
PERMUT
PI
PMT
POISSON
POWER
PPMT
PRICE
PRICEDISC
PRICEMAT
PROB
PRODUCT
PROPER
PV
QUARTILE
QUOTIENT
RADIANS
RAND
RANDBETWEEN
RANK
RATE
RECEIVED
REGISTERID
REPLACE
REPT
RIGHT
ROMAN
ROUND
ROUNDDOWN
ROUNDUP
ROW
ROWS
RSQ
SEARCH
SECOND
SERIESSUM
SIGN
SIN
SINH
SKEW
SLN
SLOPE
SMALL
SQLREQUEST
SQRT
SQRTPI
STANDARDIZE
STDEV
STDEVP
STEYX
SUBSTITUTE
SUBTOTAL
SUM
SUMIF
SUMPRODUCT
SUMSQ
SUMX2MY2
SUMX2PY2
SUMXMY2
SYD
T
TAN
TANH
TBILLEQ
TBILLPRICE
TBILLYIELD
TDIST
TEXT
TIME
TIMEVALUE
TINV
TODAY
TRANSPOSE
TREND
TRIM
TRIMMEAN
TRUE
TRUNC
TTEST
TYPE
UPPER
VALUE
VAR
VARP
VDB
VLOOKUP
WEEKDAY
WEIBULL
WORKDAY
XIRR
XNPV
YEAR
YEARFRAC
YIELD
YIELDDISC
YIELDMAT

Naturally, some of these functions already map to existing
PostgreSQL functions and some are irrelevant to PostgreSQL (such
as ADDRESS, CALL, HLOOKUP for example). Others map with different
SQL names. All of them have been written using double precision
numbers (like Excel) and therefore would take some time to use
numeric -- if that's what is desired. The cumulative distribution
functions and their inverses I derived solely from recurrence
relations, polynomial approximations, and evalutation of various
infinite series to n-terms from the "Handbook of Mathematical
Functions" and so, like Excel, only realize 32-bit float
precision. I suspect there may be some BSD licensed libraries out
there that do a better job (hopefully with arbitrary precision).

Some of the functions, it would seem to me, would be very useful,
but I'm not sure of how the semantics would map. For example,
TREND() performs multi-variate linear regression analysis (i.e.:
good for picking stocks), but, not only would it require the new
function manager interface to return a result set, but it takes
several 'sets' as input as well:

TREND(known y's, known x1's, known x2's, ..., new x's)

and so it would act almost like an aggregate except that it would
yield a result set, not just a single-valued result. I visualize
something like:

SELECT TREND(quotes.prices, quotes.date1, quotes.date2) FROM
quotes;

Anyways, I guess I'm like Chris Bitmead. What do you want me to
do with this crap? I could port the non-set returning functions
and single-pass aggregates now (such as DDB(), RATE(), PMT(),
etc.) and wait on the others until the rewrite. Or I could just
wait until the rewrite if the function call interface is going to
change dramatically for this such as NULL handling etc.

Any comments?

Mike Mascari


pgsql-hackers by date:

Previous
From: Thomas Lockhart
Date:
Subject: Re: Performance (was: The New Slashdot Setup (includes MySql server))
Next
From: Thomas Lockhart
Date:
Subject: Re: OO / fe-be protocol