Thread: Median

Median

From
"Kermani, Bahram"
Date:
Hello,
 
I am trying to do a Median or Trimmed-mean operation in postgreSQL. I was wondering if anybody knew how to do it. I appreciate it if you reply to my email address.
 
Thanks,
Bahram Kermani

Re: Median

From
"omid omoomi"
Date:
Hi,
I'll be glad if you describe more details about the problem. Is it a kind of 
statistical analysis or what?
Omid Omoomi

>From: "Kermani, Bahram" <BKermani@illumina.com>
>To: "'pgsql-sql@postgresql.org'" <pgsql-sql@postgresql.org>
>Subject: [SQL] Median
>Date: Fri, 30 Jun 2000 17:37:06 -0700
>
>Hello,
>
>I am trying to do a Median or Trimmed-mean operation in postgreSQL. I was
>wondering if anybody knew how to do it. I appreciate it if you reply to my
>email address.
>
>Thanks,
>Bahram Kermani
>bkermani@illumina.com <mailto:bkermani@illumina.com>

________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com



Re: Median

From
JanWieck@t-online.de (Jan Wieck)
Date:
omid omoomi wrote:
> Hi,
> I'll be glad if you describe more details about the problem. Is it a kind of
> statistical analysis or what?
> Omid Omoomi
   Sorry to respond that slow.
   It's something, Ulf Mehlig described to me a couple of months   ago.  The median is the  value,  below  and  above
of which   exactly  half  of  the  sorted  items would be. Having an odd   number of entries, it's exactly the one in
themiddle like
 
       1 2 3 4 5 6 7       ------|------             M = 4
   Having an even number of entries, it's  the  average  between   the two in the middle like
       1 2 3 4 5 6 7 8       -------|-------              M = (4 + 5) / 2
   As  he  said,  other ranges like Quartile (the position where   25% of the entries are below and 75% are above)
wouldalso be   of  interest.  So the most useful thing would be an aggregate   like qantil(n), where n is value between
0.0 and  100.0,  so   that  quantil(50.0) is the Median, quantil(25.0) is the first   Quartile and so on.
 
   I don't see any quick solution how to solve this problem with   an aggregate.  Aggregates get all selected values in
unsorted  order, and don't know ahead how many  items  there  will  be.   Even if, all this wouldn't be of any use,
becauseyou need to   look at the entire sorted list of selected items.
 
   Maybe someone else has an idea.

>
> >From: "Kermani, Bahram" <BKermani@illumina.com>
> >To: "'pgsql-sql@postgresql.org'" <pgsql-sql@postgresql.org>
> >Subject: [SQL] Median
> >Date: Fri, 30 Jun 2000 17:37:06 -0700
> >
> >Hello,
> >
> >I am trying to do a Median or Trimmed-mean operation in postgreSQL. I was
> >wondering if anybody knew how to do it. I appreciate it if you reply to my
> >email address.
> >
> >Thanks,
> >Bahram Kermani
> >bkermani@illumina.com <mailto:bkermani@illumina.com>
>
> ________________________________________________________________________
> Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com
>


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: Median

From
Tom Lane
Date:
JanWieck@t-online.de (Jan Wieck) writes:
>     I don't see any quick solution how to solve this problem with
>     an aggregate.  Aggregates get all selected values in unsorted
>     order, and don't know ahead how many  items  there  will  be.
>     Even if, all this wouldn't be of any use, because you need to
>     look at the entire sorted list of selected items.

Hmm.  It would be a pretty straightforward extension of the existing
support for DISTINCT aggregates to allow the agg function to receive
all the inputs in sorted order along with a count of how many there
are, whereupon a percentile aggregate would be trivial.

Slow, but trivial.

Probably a better way would be to skip evaluating the agg's transition
function as such, and instead call the agg's final function just once
with a pointer to the tuplesort object that contains the sorted input
data.  Then you reach in and pull out just the items you want, instead
of having to read 'em all.  Or you can scan 'em if you want.  We
might need to add a few features to the tuplesort API to allow access
to the N'th item in the sorted data, but it's surely doable.

Anyone care to work up a detailed proposal for something along this
line?  It seems like more work than it's worth to me, but if someone
else wants to do the legwork...
        regards, tom lane


Re: Median

From
Thomas Lockhart
Date:
> Maybe someone else has an idea.

I implemented a different algorithm several years ago. It is an
O(log(N)) process (unlike most other techniques), and was borrowed from
the "Algorithms" book (it's at work; but it is the classic "yellow
jacket" book with Fortran code and the other volume with *really bad* C
code which looks like Fortran ;)

Anyway, once the bugs were fixed (and there were several :( and a better
endgame algorithm was coded, it was suitable for hypercube distributed
processing, where you just exchange a small amount of info between
iterations. And it involved simply *counting* how many values were above
and below an initial or updated guess, then iterating on a new guess.

afaik that wouldn't be suitable for the existing aggregate capabilities,
but multi-pass algorithms would be nice to be able to do.
                   - Thomas