Thread: How do work tercile, percentile & funcion percentile_cont() ?

How do work tercile, percentile & funcion percentile_cont() ?

From
PALAYRET Jacques
Date:
Hello,

From a table x(c1) containing 30 lines with integer values (column c1) from 1 to 30 :

SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
 percentile_cont
------------------
 10.6666666666667
(1 ligne)
SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ;
 percentile_cont
------------------
 20.3333333333333
(1 ligne)

If ordering,
the first 10 lines (c1 from 1 to 10) have  ntile(3) OVER(ORDER BY c1) = 1,
the 10 following lines (c1 from 11 to 20) have  ntile(3) OVER(ORDER BY c1) = 2,
the last 10 lines (c1 from 21 to 30) have  ntile(3) OVER(ORDER BY c1) = 3.
So, I though it should be :
percentile_cont(1./3) = 10.5 (the arithmetic average between 10 et 11)   and not 10.6666666666667
percentile_cont(2./3) = 20.5 (the arithmetic average between 20 et 21)   and not 20.3333333333333

Thank in advance for explanation

----- Météo-France -----
PALAYRET JACQUES
DCSC/MBD
jacques.palayret@meteo.fr
Fixe : +33 561078319

Re: How do work tercile, percentile & funcion percentile_cont() ?

From
PALAYRET Jacques
Date:
Of course, the same result with this table x(c1) for example :
SELECT c1,  ntile(3) OVER(ORDER BY c1)  FROM x ORDER BY c1 ;
 c1 | ntile
----+-------
  1 |     1
  2 |     1
  3 |     1
  4 |     1
  5 |     1
  6 |     1
  7 |     1
  8 |     1
  9 |     1
 10 |     1
 11 |     2
 11 |     2
 11 |     2
 11 |     2
 11 |     2
 11 |     2
 11 |     2
 11 |     2
 11 |     2
 20 |     2
 21 |     3
 21 |     3
 21 |     3
 21 |     3
 21 |     3
 21 |     3
 21 |     3
 21 |     3
 21 |     3
 21 |     3
(30 lignes)

SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
 percentile_cont
------------------
 10.6666666666667
(1 ligne)

SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ;
 percentile_cont
------------------
 20.3333333333333
(1 ligne)




De: "PALAYRET Jacques" <jacques.palayret@meteo.fr>
À: pgsql-general@lists.postgresql.org
Envoyé: Mercredi 22 Avril 2020 10:00:49
Objet: How do work tercile, percentile & funcion percentile_cont() ?

Hello,

From a table x(c1) containing 30 lines with integer values (column c1) from 1 to 30 :

SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
 percentile_cont
------------------
 10.6666666666667
(1 ligne)
SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ;
 percentile_cont
------------------
 20.3333333333333
(1 ligne)

If ordering,
the first 10 lines (c1 from 1 to 10) have  ntile(3) OVER(ORDER BY c1) = 1,
the 10 following lines (c1 from 11 to 20) have  ntile(3) OVER(ORDER BY c1) = 2,
the last 10 lines (c1 from 21 to 30) have  ntile(3) OVER(ORDER BY c1) = 3.
So, I though it should be :
percentile_cont(1./3) = 10.5 (the arithmetic average between 10 et 11)   and not 10.6666666666667
percentile_cont(2./3) = 20.5 (the arithmetic average between 20 et 21)   and not 20.3333333333333

Thank in advance for explanation

----- Météo-France -----
PALAYRET JACQUES
DCSC/MBD
jacques.palayret@meteo.fr
Fixe : +33 561078319

Re: How do work tercile, percentile & funcion percentile_cont() ?

From
PALAYRET Jacques
Date:
With Libre Office Calc, same result than with PostgreSQL :
=CENTILE({1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23;24;25;26;27;28;29;30}; 0.3333333333)
gives
10.6666666657   (and NOT 10.5)

I didn't think so.

De: "PALAYRET Jacques" <jacques.palayret@meteo.fr>
À: pgsql-general@lists.postgresql.org
Envoyé: Mercredi 22 Avril 2020 10:15:26
Objet: Re: How do work tercile, percentile & funcion percentile_cont() ?

Of course, the same result with this table x(c1) for example :
SELECT c1,  ntile(3) OVER(ORDER BY c1)  FROM x ORDER BY c1 ;
 c1 | ntile
----+-------
  1 |     1
  2 |     1
  3 |     1
  4 |     1
  5 |     1
  6 |     1
  7 |     1
  8 |     1
  9 |     1
 10 |     1
 11 |     2
 11 |     2
 11 |     2
 11 |     2
 11 |     2
 11 |     2
 11 |     2
 11 |     2
 11 |     2
 20 |     2
 21 |     3
 21 |     3
 21 |     3
 21 |     3
 21 |     3
 21 |     3
 21 |     3
 21 |     3
 21 |     3
 21 |     3
(30 lignes)

SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
 percentile_cont
------------------
 10.6666666666667
(1 ligne)

SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ;
 percentile_cont
------------------
 20.3333333333333
(1 ligne)




De: "PALAYRET Jacques" <jacques.palayret@meteo.fr>
À: pgsql-general@lists.postgresql.org
Envoyé: Mercredi 22 Avril 2020 10:00:49
Objet: How do work tercile, percentile & funcion percentile_cont() ?

Hello,

From a table x(c1) containing 30 lines with integer values (column c1) from 1 to 30 :

SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
 percentile_cont
------------------
 10.6666666666667
(1 ligne)
SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ;
 percentile_cont
------------------
 20.3333333333333
(1 ligne)

If ordering,
the first 10 lines (c1 from 1 to 10) have  ntile(3) OVER(ORDER BY c1) = 1,
the 10 following lines (c1 from 11 to 20) have  ntile(3) OVER(ORDER BY c1) = 2,
the last 10 lines (c1 from 21 to 30) have  ntile(3) OVER(ORDER BY c1) = 3.
So, I though it should be :
percentile_cont(1./3) = 10.5 (the arithmetic average between 10 et 11)   and not 10.6666666666667
percentile_cont(2./3) = 20.5 (the arithmetic average between 20 et 21)   and not 20.3333333333333

Thank in advance for explanation

----- Météo-France -----
PALAYRET JACQUES
DCSC/MBD
jacques.palayret@meteo.fr
Fixe : +33 561078319

Re: How do work tercile, percentile & funcion percentile_cont() ?

From
Tom Lane
Date:
PALAYRET Jacques <jacques.palayret@meteo.fr> writes:
> SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
> percentile_cont
> ------------------
> 10.6666666666667
> (1 ligne)

> So, I though it should be :
> percentile_cont(1./3) = 10.5 (the arithmetic average between 10 et 11) and not 10.6666666666667

As far as I can see from looking at the SQL spec, this is correct;
at least, it's clear that we're *not* supposed to just interpolate
halfway between the nearest rows.  The result is defined to be

T0.Y + FACTOR * (T1.Y - T0.Y)

where T0.Y and T1.Y are the nearest row values and FACTOR is

NVE*(N–1) – floor(NVE*(N–1))

where NVE is the argument of percentile_cont and N is the number of rows.
(In SQL:2003, see 10.9 <aggregate function> general rule 7)h)i).)

            regards, tom lane



Re: How do work tercile, percentile & funcion percentile_cont() ?

From
"Peter J. Holzer"
Date:
On 2020-04-22 10:00:49 +0200, PALAYRET Jacques wrote:
> From a table x(c1) containing 30 lines with integer values (column c1) from 1
> to 30 :
>
> SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
>  percentile_cont
> ------------------
>  10.6666666666667
> (1 ligne)
> SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ;
>  percentile_cont
> ------------------
>  20.3333333333333
> (1 ligne)

Think of the 30 values as points in a graph: The x values run from 0 to
29, the y values are your values. The points are connected with straight
lines. Since your y values just increase by 1 you have a straight line
from (0, 1) to (29, 30)

Then percentile_cont(1./3) is the value y value 1/3 along that line and
percentile_cont(2./3) is the y value 2/3 along that line. So you need to
find the y values corresponding to 29*(1/3) = 9.667 and 29*(2/3) = 19.333.
These are obviously 10.667 and 20.333 respectively.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment