Thread: sorting problem with distinct on()
Hi there ans thx for reading and answering this post if you can:
Let's regard an example of a table (stars) containing a code for a star (symb)
and its index (nb) in a file.
nb | symb
----+------------
0 | alTau
1 | alTau
2 | bePer
3 | alSco
4 | alLeo
5 | alCMa
6 | alVir
7 | Trappist-1
8 | SgrA*
9 | SgrA*
10 | GA
11 | VC
12 | M31
13 | M44
14 | alUMi
15 | M87
16 | alCyg
17 | alCyg
18 | beOri
19 | omiCet
20 | epTau
21 | epCas
22 | alAnd
23 | alAnd
24 | beAnd
25 | ga-1And
26 | ga-1And
27 | ga-1And
28 | ga-1And
29 | xiAnd
30 | upAnd
31 | upAnd
32 | 14And
33 | 51And
34 | M31
35 | alAql
36 | alAql
37 | beAql
38 | gaAql
39 | deAql
40 | epAql
41 | zeAql
42 | zeAql
43 | etAql
44 | thAql
45 | ioAql
46 | laAql
47 | xiAql
48 | 12Aql
49 | alAqr
50 | alAqr
51 | beAqr
52 | gaAqr
53 | gaAqr
54 | deAqr
55 | epAqr
56 | epAqr
57 | ze-1Aqr
58 | etAqr
59 | etAqr
60 | thAqr
61 | kaAqr
62 | laAqr
63 | laAqr
64 | laAqr
65 | laAqr
66 | nuAqr
67 | piAqr
68 | xiAqr
69 | Trappist-1
70 | alAra
71 | muAra
72 | alAri
73 | beAri
74 | beAri
75 | gaAri
76 | deAri
77 | 39Ari
78 | 41Ari
79 | alAur
80 | alAur
81 | beAur
82 | deAur
83 | epAur
84 | epAur
85 | epAur
86 | zeAur
87 | zeAur
88 | zeAur
89 | zeAur
90 | etAur
91 | etAur
92 | thAur
93 | thAur
94 | thAur
95 | ioAur
96 | ioAur
97 | ioAur
98 | alBoo
99 | alBoo
Notice that some symbs are duplicated and I want to select only the
distinct symbs.
So, I could use
select distinct on(symb) * from stars;
I get this result:
nb | symb
----+-------
48 | 12Aql
32 | 14And
77 | 39Ari
78 | 41Ari
33 | 51And
10 | GA
34 | M31
13 | M44
15 | M87
9 | SgrA*
and so on...
But when I want to reorder it by nb with:
with x as (select distinct on(symb) * from stars) select * from x order by nb;
nb | symb
----+------------
1 | alTau
2 | bePer
3 | alSco
4 | alLeo
5 | alCMa
6 | alVir
7 | Trappist-1
9 | SgrA*
10 | GA
11 | VC
13 | M44
14 | alUMi
15 | M87
16 | alCyg
18 | beOri
19 | omiCet
20 | epTau
21 | epCas
23 | alAnd
24 | beAnd
28 | ga-1And
...
Notice that 12 is missing in the list.
Even with x as (select distinct on(symb) * from stars) select * from x
where nb = 12 order by nb;
returns nothing, whereas it should, but
with x as (select distinct on(symb) * from stars) select * from x where
symb = 'M31' order by nb;
returns
nb | symb
----+------
12 | M31
which is correct.
What should I do in order to get the wanted results.
This phenomenon occurs with or without index. On postgres 14.7 linux x86_64.
Thx again.
Let's regard an example of a table (stars) containing a code for a star (symb)
and its index (nb) in a file.
nb | symb
----+------------
0 | alTau
1 | alTau
2 | bePer
3 | alSco
4 | alLeo
5 | alCMa
6 | alVir
7 | Trappist-1
8 | SgrA*
9 | SgrA*
10 | GA
11 | VC
12 | M31
13 | M44
14 | alUMi
15 | M87
16 | alCyg
17 | alCyg
18 | beOri
19 | omiCet
20 | epTau
21 | epCas
22 | alAnd
23 | alAnd
24 | beAnd
25 | ga-1And
26 | ga-1And
27 | ga-1And
28 | ga-1And
29 | xiAnd
30 | upAnd
31 | upAnd
32 | 14And
33 | 51And
34 | M31
35 | alAql
36 | alAql
37 | beAql
38 | gaAql
39 | deAql
40 | epAql
41 | zeAql
42 | zeAql
43 | etAql
44 | thAql
45 | ioAql
46 | laAql
47 | xiAql
48 | 12Aql
49 | alAqr
50 | alAqr
51 | beAqr
52 | gaAqr
53 | gaAqr
54 | deAqr
55 | epAqr
56 | epAqr
57 | ze-1Aqr
58 | etAqr
59 | etAqr
60 | thAqr
61 | kaAqr
62 | laAqr
63 | laAqr
64 | laAqr
65 | laAqr
66 | nuAqr
67 | piAqr
68 | xiAqr
69 | Trappist-1
70 | alAra
71 | muAra
72 | alAri
73 | beAri
74 | beAri
75 | gaAri
76 | deAri
77 | 39Ari
78 | 41Ari
79 | alAur
80 | alAur
81 | beAur
82 | deAur
83 | epAur
84 | epAur
85 | epAur
86 | zeAur
87 | zeAur
88 | zeAur
89 | zeAur
90 | etAur
91 | etAur
92 | thAur
93 | thAur
94 | thAur
95 | ioAur
96 | ioAur
97 | ioAur
98 | alBoo
99 | alBoo
Notice that some symbs are duplicated and I want to select only the
distinct symbs.
So, I could use
select distinct on(symb) * from stars;
I get this result:
nb | symb
----+-------
48 | 12Aql
32 | 14And
77 | 39Ari
78 | 41Ari
33 | 51And
10 | GA
34 | M31
13 | M44
15 | M87
9 | SgrA*
and so on...
But when I want to reorder it by nb with:
with x as (select distinct on(symb) * from stars) select * from x order by nb;
nb | symb
----+------------
1 | alTau
2 | bePer
3 | alSco
4 | alLeo
5 | alCMa
6 | alVir
7 | Trappist-1
9 | SgrA*
10 | GA
11 | VC
13 | M44
14 | alUMi
15 | M87
16 | alCyg
18 | beOri
19 | omiCet
20 | epTau
21 | epCas
23 | alAnd
24 | beAnd
28 | ga-1And
...
Notice that 12 is missing in the list.
Even with x as (select distinct on(symb) * from stars) select * from x
where nb = 12 order by nb;
returns nothing, whereas it should, but
with x as (select distinct on(symb) * from stars) select * from x where
symb = 'M31' order by nb;
returns
nb | symb
----+------
12 | M31
which is correct.
What should I do in order to get the wanted results.
This phenomenon occurs with or without index. On postgres 14.7 linux x86_64.
Thx again.
On Fri, May 5, 2023 at 6:55 AM Sky Lendar <skylendar@gmail.com> wrote:
Hi there ans thx for reading and answering this post if you can:
Let's regard an example of a table (stars) containing a code for a star (symb)
and its index (nb) in a file.
nb | symb
----+------------
0 | alTau
1 | alTau
2 | bePer
3 | alSco
4 | alLeo
5 | alCMa
6 | alVir
7 | Trappist-1
8 | SgrA*
9 | SgrA*
10 | GA
11 | VC
12 | M31
13 | M44
14 | alUMi
15 | M87
16 | alCyg
17 | alCyg
18 | beOri
19 | omiCet
20 | epTau
21 | epCas
22 | alAnd
23 | alAnd
24 | beAnd
25 | ga-1And
26 | ga-1And
27 | ga-1And
28 | ga-1And
29 | xiAnd
30 | upAnd
31 | upAnd
32 | 14And
33 | 51And
34 | M31
35 | alAql
36 | alAql
37 | beAql
38 | gaAql
39 | deAql
40 | epAql
41 | zeAql
42 | zeAql
43 | etAql
44 | thAql
45 | ioAql
46 | laAql
47 | xiAql
48 | 12Aql
49 | alAqr
50 | alAqr
51 | beAqr
52 | gaAqr
53 | gaAqr
54 | deAqr
55 | epAqr
56 | epAqr
57 | ze-1Aqr
58 | etAqr
59 | etAqr
60 | thAqr
61 | kaAqr
62 | laAqr
63 | laAqr
64 | laAqr
65 | laAqr
66 | nuAqr
67 | piAqr
68 | xiAqr
69 | Trappist-1
70 | alAra
71 | muAra
72 | alAri
73 | beAri
74 | beAri
75 | gaAri
76 | deAri
77 | 39Ari
78 | 41Ari
79 | alAur
80 | alAur
81 | beAur
82 | deAur
83 | epAur
84 | epAur
85 | epAur
86 | zeAur
87 | zeAur
88 | zeAur
89 | zeAur
90 | etAur
91 | etAur
92 | thAur
93 | thAur
94 | thAur
95 | ioAur
96 | ioAur
97 | ioAur
98 | alBoo
99 | alBoo
Notice that some symbs are duplicated and I want to select only the
distinct symbs.
So, I could use
select distinct on(symb) * from stars;
I get this result:
nb | symb
----+-------
48 | 12Aql
32 | 14And
77 | 39Ari
78 | 41Ari
33 | 51And
10 | GA
34 | M31
13 | M44
15 | M87
9 | SgrA*
Notice that 12 is missing in the list.
Even with x as (select distinct on(symb) * from stars) select * from x
where nb = 12 order by nb;
nb = 12 is a duplicate with np = 34
Since your DISTINCT ON *subquery* doesn't specify an ordering which of those two are chosen as the representative record for M31 is non-determinstic.
If you want to ensure the lowest valued nb is chosen you need to sort the *subquery*. The first record the DISTINCT encounters is the one selected to represent.
Sorting in the outer/main query happens after the DISTINCT and so the record is already gone.
David J.
Thank you for the clarification.
Le ven. 5 mai 2023 à 16:16, David G. Johnston <david.g.johnston@gmail.com> a écrit :
On Fri, May 5, 2023 at 6:55 AM Sky Lendar <skylendar@gmail.com> wrote:Hi there ans thx for reading and answering this post if you can:
Let's regard an example of a table (stars) containing a code for a star (symb)
and its index (nb) in a file.
nb | symb
----+------------
0 | alTau
1 | alTau
2 | bePer
3 | alSco
4 | alLeo
5 | alCMa
6 | alVir
7 | Trappist-1
8 | SgrA*
9 | SgrA*
10 | GA
11 | VC
12 | M31
13 | M44
14 | alUMi
15 | M87
16 | alCyg
17 | alCyg
18 | beOri
19 | omiCet
20 | epTau
21 | epCas
22 | alAnd
23 | alAnd
24 | beAnd
25 | ga-1And
26 | ga-1And
27 | ga-1And
28 | ga-1And
29 | xiAnd
30 | upAnd
31 | upAnd
32 | 14And
33 | 51And
34 | M31
35 | alAql
36 | alAql
37 | beAql
38 | gaAql
39 | deAql
40 | epAql
41 | zeAql
42 | zeAql
43 | etAql
44 | thAql
45 | ioAql
46 | laAql
47 | xiAql
48 | 12Aql
49 | alAqr
50 | alAqr
51 | beAqr
52 | gaAqr
53 | gaAqr
54 | deAqr
55 | epAqr
56 | epAqr
57 | ze-1Aqr
58 | etAqr
59 | etAqr
60 | thAqr
61 | kaAqr
62 | laAqr
63 | laAqr
64 | laAqr
65 | laAqr
66 | nuAqr
67 | piAqr
68 | xiAqr
69 | Trappist-1
70 | alAra
71 | muAra
72 | alAri
73 | beAri
74 | beAri
75 | gaAri
76 | deAri
77 | 39Ari
78 | 41Ari
79 | alAur
80 | alAur
81 | beAur
82 | deAur
83 | epAur
84 | epAur
85 | epAur
86 | zeAur
87 | zeAur
88 | zeAur
89 | zeAur
90 | etAur
91 | etAur
92 | thAur
93 | thAur
94 | thAur
95 | ioAur
96 | ioAur
97 | ioAur
98 | alBoo
99 | alBoo
Notice that some symbs are duplicated and I want to select only the
distinct symbs.
So, I could use
select distinct on(symb) * from stars;
I get this result:
nb | symb
----+-------
48 | 12Aql
32 | 14And
77 | 39Ari
78 | 41Ari
33 | 51And
10 | GA
34 | M31
13 | M44
15 | M87
9 | SgrA*
Notice that 12 is missing in the list.
Even with x as (select distinct on(symb) * from stars) select * from x
where nb = 12 order by nb;nb = 12 is a duplicate with np = 34Since your DISTINCT ON *subquery* doesn't specify an ordering which of those two are chosen as the representative record for M31 is non-determinstic.If you want to ensure the lowest valued nb is chosen you need to sort the *subquery*. The first record the DISTINCT encounters is the one selected to represent.Sorting in the outer/main query happens after the DISTINCT and so the record is already gone.David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Fri, May 5, 2023 at 6:55 AM Sky Lendar <skylendar@gmail.com> wrote: >> Notice that 12 is missing in the list. > Since your DISTINCT ON *subquery* doesn't specify an ordering which of > those two are chosen as the representative record for M31 is > non-determinstic. > If you want to ensure the lowest valued nb is chosen you need to sort the > *subquery*. The "weather_reports" example here might help clarify this for you: https://www.postgresql.org/docs/current/sql-select.html regards, tom lane