Thread: Calculating Median value

Calculating Median value

From
Chandru Aroor
Date:
Hi,

I am trying to debug a function that was written by someone else.  As part of what this function does, it calculates the median value of a values passed on to it as a single column.  The value that it return is wrong. The user written function, lets call it calculate median_room_rate. It selects a list of qualifying values and uses median to determine the median.  At first I thought this was a inbuilt SQL function such as min and max. After many hours of trying to troubleshoot I discovered median might be a user written or an extension (I still don't know) function. My pgAdmin browser was not enabled to display the Aggregates Object where I found it.  median is defines as follows:

CREATE AGGREGATE public.median(anyelement) (
  SFUNC=array_append,
  STYPE=anyarray,
  FINALFUNC=array_median
 
I will be honest. I have no clue how that works or what it is supposed to do.  

As an example, I am passing on two values 0.1303 and 0.0757. These are being cast as  According to online calculators and Excel I am supposed to get back 0.1030. However, the median function is returning 1.0757 which totally does not make sense.

I tried to create my own median function in the Aggregates Object thinking I will pass in a numeric[] object, but I am unable to select any State Function or Final Function in the UI.

 I suppose I can write my own function, but not sure how/what.

Any help is mightily appreciated. 

Chandru



Re: Calculating Median value

From
Tom Lane
Date:
Chandru Aroor <caroor@yahoo.com> writes:
> ... median is defines as follows:
> CREATE AGGREGATE public.median(anyelement) ( SFUNC=array_append, STYPE=anyarray, FINALFUNC=array_median
> I will be honest. I have no clue how that works or what it is supposed
> to do.

It's a user-defined aggregate.  See

https://www.postgresql.org/docs/current/static/xaggr.html

array_append is a built-in function that, in this usage, would just serve
to collect all the input values into an array.  array_median is not the
name of any built-in function, so it must be user-written code.  What
I'd expect it to do, if it's trying to implement the usual definition of
median, is to sort the array and then take the middle element.  Judging
from your description, it's not doing that.

If you're using PG 9.4 or later, you could skip trying to debug this
homegrown version of median and instead use the standard ordered-set
aggregate percentile_cont, or possibly percentile_disc.  See

https://www.postgresql.org/docs/current/static/sql-expressions.html#SYNTAX-AGGREGATES

            regards, tom lane


Re: Calculating Median value

From
Chandru Aroor
Date:
Thanks Tom.  You are actually correct. In talking with the business user, we need to get the 50th percentile, and picking the higher value for even number of members,  so median as Excel calculates it is actually not appropriate. Will aggregate percentile_cont, or possibly percentile_disc do it?

On Sunday, May 6, 2018, 10:44:19 AM CDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:


Chandru Aroor <caroor@yahoo.com> writes:

> ... median is defines as follows:
> CREATE AGGREGATE public.median(anyelement) ( SFUNC=array_append, STYPE=anyarray, FINALFUNC=array_median
> I will be honest. I have no clue how that works or what it is supposed
> to do.


It's a user-defined aggregate.  See

https://www.postgresql.org/docs/current/static/xaggr.html

array_append is a built-in function that, in this usage, would just serve
to collect all the input values into an array.  array_median is not the
name of any built-in function, so it must be user-written code.  What
I'd expect it to do, if it's trying to implement the usual definition of
median, is to sort the array and then take the middle element.  Judging
from your description, it's not doing that.

If you're using PG 9.4 or later, you could skip trying to debug this
homegrown version of median and instead use the standard ordered-set
aggregate percentile_cont, or possibly percentile_disc.  See

https://www.postgresql.org/docs/current/static/sql-expressions.html#SYNTAX-AGGREGATES

            regards, tom lane

Re: Calculating Median value

From
Tom Lane
Date:
Chandru Aroor <caroor@yahoo.com> writes:
>  Thanks Tom.  You are actually correct. In talking with the business user, we need to get the 50th percentile, and
pickingthe higher value for even number of members,  so median as Excel calculates it is actually not appropriate. Will
aggregatepercentile_cont, or possibly percentile_disc do it? 

If you don't want to interpolate between the two middle values, you must
use percentile_disc not percentile_cont --- that's exactly what the
difference is between those two functions.

I think though that percentile_disc picks the first not the second of the
two middle values.  You can fix that by reversing the sort order, so it'd
look like

SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY x DESC) FROM ...

Here's some examples;

regression=# SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY x) FROM (VALUES (1),(3),(7),(9)) v(x);
 percentile_disc
-----------------
               3
(1 row)

regression=# SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY x DESC) FROM (VALUES (1),(3),(7),(9)) v(x);
 percentile_disc
-----------------
               7
(1 row)

regression=# SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY x) FROM (VALUES (1),(3),(7),(9)) v(x);
 percentile_cont
-----------------
               5
(1 row)


            regards, tom lane