Thread: How do work tercile, percentile & funcion percentile_cont() ?
Hello,
From a table x(c1) containing 30 lines with integer values (column c1) from 1 to 30 :
SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
10.6666666666667
(1 ligne)
SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
20.3333333333333
(1 ligne)
If ordering,
the first 10 lines (c1 from 1 to 10) have ntile(3) OVER(ORDER BY c1) = 1,
the 10 following lines (c1 from 11 to 20) have ntile(3) OVER(ORDER BY c1) = 2,
the last 10 lines (c1 from 21 to 30) have ntile(3) OVER(ORDER BY c1) = 3.
So, I though it should be :
percentile_cont(1./3) = 10.5 (the arithmetic average between 10 et 11) and not 10.6666666666667
percentile_cont(2./3) = 20.5 (the arithmetic average between 20 et 21) and not 20.3333333333333
Thank in advance for explanation
From a table x(c1) containing 30 lines with integer values (column c1) from 1 to 30 :
SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
10.6666666666667
(1 ligne)
SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
20.3333333333333
(1 ligne)
If ordering,
the first 10 lines (c1 from 1 to 10) have ntile(3) OVER(ORDER BY c1) = 1,
the 10 following lines (c1 from 11 to 20) have ntile(3) OVER(ORDER BY c1) = 2,
the last 10 lines (c1 from 21 to 30) have ntile(3) OVER(ORDER BY c1) = 3.
So, I though it should be :
percentile_cont(1./3) = 10.5 (the arithmetic average between 10 et 11) and not 10.6666666666667
percentile_cont(2./3) = 20.5 (the arithmetic average between 20 et 21) and not 20.3333333333333
Thank in advance for explanation
----- Météo-France -----
PALAYRET JACQUES
DCSC/MBD
jacques.palayret@meteo.fr
Fixe : +33 561078319
PALAYRET JACQUES
DCSC/MBD
jacques.palayret@meteo.fr
Fixe : +33 561078319
Of course, the same result with this table x(c1) for example :
SELECT c1, ntile(3) OVER(ORDER BY c1) FROM x ORDER BY c1 ;
c1 | ntile
----+-------
1 | 1
2 | 1
3 | 1
4 | 1
5 | 1
6 | 1
7 | 1
8 | 1
9 | 1
10 | 1
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
20 | 2
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
(30 lignes)
SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
10.6666666666667
(1 ligne)
SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
20.3333333333333
(1 ligne)
c1 | ntile
----+-------
1 | 1
2 | 1
3 | 1
4 | 1
5 | 1
6 | 1
7 | 1
8 | 1
9 | 1
10 | 1
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
20 | 2
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
(30 lignes)
SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
10.6666666666667
(1 ligne)
SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
20.3333333333333
(1 ligne)
De: "PALAYRET Jacques" <jacques.palayret@meteo.fr>
À: pgsql-general@lists.postgresql.org
Envoyé: Mercredi 22 Avril 2020 10:00:49
Objet: How do work tercile, percentile & funcion percentile_cont() ?
À: pgsql-general@lists.postgresql.org
Envoyé: Mercredi 22 Avril 2020 10:00:49
Objet: How do work tercile, percentile & funcion percentile_cont() ?
Hello,
From a table x(c1) containing 30 lines with integer values (column c1) from 1 to 30 :
SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
10.6666666666667
(1 ligne)
SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
20.3333333333333
(1 ligne)
If ordering,
the first 10 lines (c1 from 1 to 10) have ntile(3) OVER(ORDER BY c1) = 1,
the 10 following lines (c1 from 11 to 20) have ntile(3) OVER(ORDER BY c1) = 2,
the last 10 lines (c1 from 21 to 30) have ntile(3) OVER(ORDER BY c1) = 3.
So, I though it should be :
percentile_cont(1./3) = 10.5 (the arithmetic average between 10 et 11) and not 10.6666666666667
percentile_cont(2./3) = 20.5 (the arithmetic average between 20 et 21) and not 20.3333333333333
Thank in advance for explanation
From a table x(c1) containing 30 lines with integer values (column c1) from 1 to 30 :
SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
10.6666666666667
(1 ligne)
SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
20.3333333333333
(1 ligne)
If ordering,
the first 10 lines (c1 from 1 to 10) have ntile(3) OVER(ORDER BY c1) = 1,
the 10 following lines (c1 from 11 to 20) have ntile(3) OVER(ORDER BY c1) = 2,
the last 10 lines (c1 from 21 to 30) have ntile(3) OVER(ORDER BY c1) = 3.
So, I though it should be :
percentile_cont(1./3) = 10.5 (the arithmetic average between 10 et 11) and not 10.6666666666667
percentile_cont(2./3) = 20.5 (the arithmetic average between 20 et 21) and not 20.3333333333333
Thank in advance for explanation
----- Météo-France -----
PALAYRET JACQUES
DCSC/MBD
jacques.palayret@meteo.fr
Fixe : +33 561078319
PALAYRET JACQUES
DCSC/MBD
jacques.palayret@meteo.fr
Fixe : +33 561078319
With Libre Office Calc, same result than with PostgreSQL :
=CENTILE({1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23;24;25;26;27;28;29;30}; 0.3333333333)
gives
10.6666666657 (and NOT 10.5)
=CENTILE({1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23;24;25;26;27;28;29;30}; 0.3333333333)
gives
10.6666666657 (and NOT 10.5)
I didn't think so.
De: "PALAYRET Jacques" <jacques.palayret@meteo.fr>
À: pgsql-general@lists.postgresql.org
Envoyé: Mercredi 22 Avril 2020 10:15:26
Objet: Re: How do work tercile, percentile & funcion percentile_cont() ?
À: pgsql-general@lists.postgresql.org
Envoyé: Mercredi 22 Avril 2020 10:15:26
Objet: Re: How do work tercile, percentile & funcion percentile_cont() ?
Of course, the same result with this table x(c1) for example :
SELECT c1, ntile(3) OVER(ORDER BY c1) FROM x ORDER BY c1 ;
c1 | ntile
----+-------
1 | 1
2 | 1
3 | 1
4 | 1
5 | 1
6 | 1
7 | 1
8 | 1
9 | 1
10 | 1
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
20 | 2
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
(30 lignes)
SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
10.6666666666667
(1 ligne)
SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
20.3333333333333
(1 ligne)
c1 | ntile
----+-------
1 | 1
2 | 1
3 | 1
4 | 1
5 | 1
6 | 1
7 | 1
8 | 1
9 | 1
10 | 1
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
11 | 2
20 | 2
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
21 | 3
(30 lignes)
SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
10.6666666666667
(1 ligne)
SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
20.3333333333333
(1 ligne)
De: "PALAYRET Jacques" <jacques.palayret@meteo.fr>
À: pgsql-general@lists.postgresql.org
Envoyé: Mercredi 22 Avril 2020 10:00:49
Objet: How do work tercile, percentile & funcion percentile_cont() ?
À: pgsql-general@lists.postgresql.org
Envoyé: Mercredi 22 Avril 2020 10:00:49
Objet: How do work tercile, percentile & funcion percentile_cont() ?
Hello,
From a table x(c1) containing 30 lines with integer values (column c1) from 1 to 30 :
SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
10.6666666666667
(1 ligne)
SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
20.3333333333333
(1 ligne)
If ordering,
the first 10 lines (c1 from 1 to 10) have ntile(3) OVER(ORDER BY c1) = 1,
the 10 following lines (c1 from 11 to 20) have ntile(3) OVER(ORDER BY c1) = 2,
the last 10 lines (c1 from 21 to 30) have ntile(3) OVER(ORDER BY c1) = 3.
So, I though it should be :
percentile_cont(1./3) = 10.5 (the arithmetic average between 10 et 11) and not 10.6666666666667
percentile_cont(2./3) = 20.5 (the arithmetic average between 20 et 21) and not 20.3333333333333
Thank in advance for explanation
From a table x(c1) containing 30 lines with integer values (column c1) from 1 to 30 :
SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
10.6666666666667
(1 ligne)
SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ;
percentile_cont
------------------
20.3333333333333
(1 ligne)
If ordering,
the first 10 lines (c1 from 1 to 10) have ntile(3) OVER(ORDER BY c1) = 1,
the 10 following lines (c1 from 11 to 20) have ntile(3) OVER(ORDER BY c1) = 2,
the last 10 lines (c1 from 21 to 30) have ntile(3) OVER(ORDER BY c1) = 3.
So, I though it should be :
percentile_cont(1./3) = 10.5 (the arithmetic average between 10 et 11) and not 10.6666666666667
percentile_cont(2./3) = 20.5 (the arithmetic average between 20 et 21) and not 20.3333333333333
Thank in advance for explanation
----- Météo-France -----
PALAYRET JACQUES
DCSC/MBD
jacques.palayret@meteo.fr
Fixe : +33 561078319
PALAYRET JACQUES
DCSC/MBD
jacques.palayret@meteo.fr
Fixe : +33 561078319
PALAYRET Jacques <jacques.palayret@meteo.fr> writes: > SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ; > percentile_cont > ------------------ > 10.6666666666667 > (1 ligne) > So, I though it should be : > percentile_cont(1./3) = 10.5 (the arithmetic average between 10 et 11) and not 10.6666666666667 As far as I can see from looking at the SQL spec, this is correct; at least, it's clear that we're *not* supposed to just interpolate halfway between the nearest rows. The result is defined to be T0.Y + FACTOR * (T1.Y - T0.Y) where T0.Y and T1.Y are the nearest row values and FACTOR is NVE*(N–1) – floor(NVE*(N–1)) where NVE is the argument of percentile_cont and N is the number of rows. (In SQL:2003, see 10.9 <aggregate function> general rule 7)h)i).) regards, tom lane
On 2020-04-22 10:00:49 +0200, PALAYRET Jacques wrote: > From a table x(c1) containing 30 lines with integer values (column c1) from 1 > to 30 : > > SELECT percentile_cont(1./3) WITHIN GROUP (ORDER BY c1) FROM x ; > percentile_cont > ------------------ > 10.6666666666667 > (1 ligne) > SELECT percentile_cont(2./3) WITHIN GROUP (ORDER BY c1) FROM x ; > percentile_cont > ------------------ > 20.3333333333333 > (1 ligne) Think of the 30 values as points in a graph: The x values run from 0 to 29, the y values are your values. The points are connected with straight lines. Since your y values just increase by 1 you have a straight line from (0, 1) to (29, 30) Then percentile_cont(1./3) is the value y value 1/3 along that line and percentile_cont(2./3) is the y value 2/3 along that line. So you need to find the y values corresponding to 29*(1/3) = 9.667 and 29*(2/3) = 19.333. These are obviously 10.667 and 20.333 respectively. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"