Re: Weighted Stats - Mailing list pgsql-hackers
From | David Fetter |
---|---|
Subject | Re: Weighted Stats |
Date | |
Msg-id | 20160320223840.GA17918@fetter.org Whole thread Raw |
In response to | Re: Weighted Stats (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: Weighted Stats
|
List | pgsql-hackers |
On Sat, Mar 19, 2016 at 05:04:08PM +0100, Tomas Vondra wrote: > Hi, > > On 03/19/2016 07:34 AM, David Fetter wrote: > >On Fri, Mar 18, 2016 at 06:12:12PM -0700, Jeff Janes wrote: > >>On Tue, Mar 15, 2016 at 8:36 AM, David Fetter <david@fetter.org> wrote: > >>> > >>>Please find attached a patch that uses the float8 version to cover the > >>>numeric types. > >> > >>Is there a well-defined meaning for having a negative weight? If no, > >>should it be disallowed? > > > >Opinions on this appear to vary. A Wikipedia article defines weights > >as non-negative, while a manual to which it refers only uses non-zero. > > > >https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Mathematical_definition > >https://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html > > I don't think that actually allows negative weights. It says that > > w_i = 1/\sigma_i^2 > > and variance is always > 0, thus w_i > 0. The zero is used as a special flag > to remove the sample from the data set in a simple way. > > >I'm not sure which if either would be authoritative, but I could > >certainly make up variants for each assumption. > > > >The assumption they have in common about weights is that a zero > >weight is not part of the calculation, which assumption is > >implemented in the previously submitted code. > > I think that if we're not sure what should happen with negative weights, > then we should disallow them. It's easy to allow them later once we have a > reasonable definition, but if we allow them now and later realize it should > behave differently, we'll be in trouble because of breaking existing uses. OK > I can't really come up with a reasonable example that would actually use > negative weights. Can you? That would probably help with defining the > behavior correctly. No, but I'm not a statistician. I've seen them mentioned in contexts that appear to be discussions among same, and again opinions vary. > Allowing negative weights has other consequences. For example, what if > sum(W) ends up being 0? For example > > CREATE TABLE t (a float, b float); > INSERT INTO t SELECT i, 1 FROM generate_series(1,1000) s(i); > INSERT INTO t SELECT i, -1 FROM generate_series(1,1000) s(i); > > SELECT weighted_avg(a,b) FROM t; > weighted_avg > -------------- > NaN > (1 row) > > Is that the correct behavior? Why? It's not, and you're right. I will send a patch that disallows negative weights this evening or tomorrow. It will be slightly more complicated as I believe I will need to create a new accumulator function for the weighted_avg() case where I had been using an extant one before. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
pgsql-hackers by date: