Re: Add error functions: erf() and erfc() - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Add error functions: erf() and erfc()
Date
Msg-id CA+hUKGKJAcB8Q5qziKTTSnkA4Mnv_6f+7-_XUgbh9jFjSdEFQg@mail.gmail.com
Whole thread Raw
In response to Add error functions: erf() and erfc()  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: Add error functions: erf() and erfc()
List pgsql-hackers
On Tue, Feb 28, 2023 at 1:54 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> Now that we have random_normal(), it seems like it would be useful to
> add the error functions erf() and erfc(), which I think are
> potentially useful to the people who will find random_normal() useful,
> and possibly others.
>
> An immediate use for erf() is that it allows us to do a
> Kolmogorov-Smirnov test for random_normal(), similar to the one for
> random().
>
> Both of these functions are defined in POSIX and C99, so in theory
> they should be available on all platforms. If that turns out not to be
> the case, then there's a commonly used implementation (e.g., see [1]),
> which we could include. I played around with that (replacing the
> direct bit manipulation stuff with frexp()/ldexp(), see pg_erf.c
> attached), and it appeared to be accurate to +/-1 ULP across the full
> range of inputs. Hopefully we won't need that though.

Hi,

No comment on the maths, but I'm pretty sure we won't need a fallback
implementation.  That stuff goes back to the math libraries of 80s
Unixes, even though it didn't make it into C until '99.  I just
checked the man pages for all our target systems and they all show it.
(There might be some portability details around the tgmath.h versions
on some systems, eg to support different float sizes, I dunno, but
you're using the plain math.h versions.)

I wonder if the SQL standard has anything to say about these, or the
related normal CDF.  I can't check currently but I doubt it, based on
searches and other systems' manuals.

Two related functions that also arrived in C99 are lgamma() and
tgamma().  If you'll excuse the digression, that reminded me of
something I was trying to figure out once, for a practical problem.
My statistics knowledge is extremely patchy, but I have been trying to
up my benchmarking game, and that led to a bit of remedial reading on
Student's t tests and related stuff.  A few shaven yaks later, I
understood that you could probably (if you like pain) do that sort of
stuff inside PostgreSQL using our existing aggregates, if you took the
approach of ministat[1].  That tool has a table of critical values
inside it, indexed by degrees-of-freedom (1-100) and confidence level
(80, 90, 95, 98, 99, 99.5), and one could probably write SQL queries
that spit out an answer like "p is less than 5%, ship it!", if we
stole that table.  But what if I actually want to know p?  Of course
you can do all that good stuff very easily with tools like R, SciPy
etc and maybe that's the best way to do it.  But Oracle, and I think
several other analytics-focused SQL systems, can do it in a very easy
built-in way.  I think to get at that you probably need the t CDF, and
in there[2] I see... Γ().  Huh.

[1] https://man.freebsd.org/cgi/man.cgi?query=ministat
[2] https://www.mathworks.com/help/stats/tcdf.html



pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security
Next
From: Dean Rasheed
Date:
Subject: Re: Add error functions: erf() and erfc()