Re: Weighted Stats - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Weighted Stats
Date
Msg-id 0caf72a3-0d81-cdf8-f525-f58aa8df3449@2ndquadrant.com
Whole thread Raw
In response to Re: Weighted Stats  (David Fetter <david@fetter.org>)
Responses Re: Weighted Stats
List pgsql-hackers
Hi,

On 03/19/2016 07:34 AM, David Fetter wrote:
> On Fri, Mar 18, 2016 at 06:12:12PM -0700, Jeff Janes wrote:
>> On Tue, Mar 15, 2016 at 8:36 AM, David Fetter <david@fetter.org> wrote:
>>>
>>> Please find attached a patch that uses the float8 version to cover the
>>> numeric types.
>>
>> Is there a well-defined meaning for having a negative weight?  If no,
>> should it be disallowed?
>
> Opinions on this appear to vary.  A Wikipedia article defines weights
> as non-negative, while a manual to which it refers only uses non-zero.
>
> https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Mathematical_definition
> https://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html

I don't think that actually allows negative weights. It says that
   w_i = 1/\sigma_i^2

and variance is always > 0, thus w_i > 0. The zero is used as a special 
flag to remove the sample from the data set in a simple way.

> I'm not sure which if either would be authoritative, but I could
> certainly make up variants for each assumption.
>
> The assumption they have in common about weights is that a zero
> weight is not part of the calculation, which assumption is
> implemented in the previously submitted code.

I think that if we're not sure what should happen with negative weights, 
then we should disallow them. It's easy to allow them later once we have 
a reasonable definition, but if we allow them now and later realize it 
should behave differently, we'll be in trouble because of breaking 
existing uses.

I can't really come up with a reasonable example that would actually use 
negative weights. Can you? That would probably help with defining the 
behavior correctly.

Allowing negative weights has other consequences. For example, what if 
sum(W) ends up being 0? For example

CREATE TABLE t (a float, b float);
INSERT INTO  t SELECT i,  1 FROM generate_series(1,1000) s(i);
INSERT INTO  t SELECT i, -1 FROM generate_series(1,1000) s(i);

SELECT weighted_avg(a,b) FROM t; weighted_avg
--------------          NaN
(1 row)

Is that the correct behavior? Why?

So -1 to allowing negative weights, unless we can come up with proper 
definition or at least good examples demonstrating the usefulness.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pgbench stats per script & other stuff
Next
From: Fabien COELHO
Date:
Subject: Re: pgbench stats per script & other stuff