Thread: DISTINCT in STRING_AGG

DISTINCT in STRING_AGG

From
"Sterpu Victor"
Date:
Hello
 
Can I make a distinct STRING_AGG?
This is my query :
SELECT atjs.id, STRING_AGG(CAST(aqjs1.id AS VARCHAR), ',' ORDER BY aqjs1.to_left) AS children
FROM administration.ad_query_join_select atjs
JOIN administration.ad_query aq ON (aq.id=atjs.id_ad_query)
LEFT JOIN administration.ad_query_join_select aqjs1 ON (aqjs1.id_ad_query = atjs.id_ad_query AND aqjs1.to_left>atjs.to_left AND aqjs1.to_right<atjs.to_right)
LEFT JOIN administration.ad_query_join_select aqjs2 ON (aqjs2.id_ad_query = atjs.id_ad_query AND aqjs2.to_left>atjs.to_left AND aqjs2.to_right<atjs.to_right AND aqjs2.to_left<aqjs1.to_left AND aqjs2.to_right>aqjs1.to_right)
LEFT JOIN administration.ad_query_join_select aqjs3 ON (aqjs3.id_ad_query = atjs.id_ad_query AND aqjs3.to_left<atjs.to_left AND aqjs3.to_right>atjs.to_right)
WHERE aqjs2.id IS NULL AND atjs.id_ad_query = 475543
GROUP BY aq.id, atjs.id
ORDER BY aq.id ASC, atjs.to_left ASC;
 
And "childen" contain doubles. The result is:
id          ; children
1399029;"1399031,1399031"
1399031;"1399032,1399032,1399032,1399033,1399033,1399033"
 
There are doubles because of the join aqjs3 witch is producing this problem.
Can I make it so the children ID's are unique?
 
Thank you.

DISCLAIMER:
Acest mesaj de posta electronica si documentele aferente sunt confidentiale. Este interzisa distribuirea, dezvaluirea sau orice alt mod de utilizare a lor. Daca nu sunteti destinatarul acestui mesaj, este interzis sa actionati in baza acestor informatii. Citirea, copierea, distribuirea, dezvaluirea sau utilizarea in alt mod a informatiei continute in acest mesaj constituie o incalcare a legii. Daca ati primit mesajul din greseala, va rugam sa il distrugeti, anuntand expeditorul de eroarea comisa. Intrucat nu poate fi garantat faptul ca posta electronica este un mod sigur si lipsit de erori de transmitere a informatiilor, este responsabilitatea dvs. sa va asigurati ca mesajul (inclusiv documentele alaturate lui) este validat si autorizat spre a fi utilizat in mediul dvs.

Re: DISTINCT in STRING_AGG

From
"John J. Turner"
Date:
On Nov 28, 2015, at 1:35 PM, Sterpu Victor <victor@caido.ro> wrote:
Hello
 
Can I make a distinct STRING_AGG?
This is my query :
SELECT atjs.id, STRING_AGG(CAST(aqjs1.id AS VARCHAR), ',' ORDER BY aqjs1.to_left) AS children
FROM administration.ad_query_join_select atjs 
JOIN administration.ad_query aq ON (aq.id=atjs.id_ad_query) 
LEFT JOIN administration.ad_query_join_select aqjs1 ON (aqjs1.id_ad_query = atjs.id_ad_query AND aqjs1.to_left>atjs.to_left AND aqjs1.to_right<atjs.to_right) 
LEFT JOIN administration.ad_query_join_select aqjs2 ON (aqjs2.id_ad_query = atjs.id_ad_query AND aqjs2.to_left>atjs.to_left AND aqjs2.to_right<atjs.to_right AND aqjs2.to_left<aqjs1.to_left AND aqjs2.to_right>aqjs1.to_right) 
LEFT JOIN administration.ad_query_join_select aqjs3 ON (aqjs3.id_ad_query = atjs.id_ad_query AND aqjs3.to_left<atjs.to_left AND aqjs3.to_right>atjs.to_right)
WHERE aqjs2.id IS NULL AND atjs.id_ad_query = 475543
GROUP BY aq.id, atjs.id 
ORDER BY aq.id ASC, atjs.to_left ASC;
 
And "childen" contain doubles. The result is:
id          ; children
1399029;"1399031,1399031"
1399031;"1399032,1399032,1399032,1399033,1399033,1399033"
 
There are doubles because of the join aqjs3 witch is producing this problem.Can I make it so the children ID's are unique?

Just to mention, this looks like a good candidate for range types and CTE’s.

The > / < comparisons appear to be mutually exclusive in each LEFT JOIN clause, so it’s not apparent why aqjs3 is causing duplication, as you’ve stated.

As far as I can see, without providing us with your table constraints/keys, there’s no way to determine what makes your ID values unique…

However, if you defer your STRING_AGG until after you derive a distinct “staging” result set from the joins, then you can effect uniqueness -
e.g. (air code):
   WITH q AS 
      (SELECT aq.id aq_parent_id, atjs.id atjs_parent_id, CAST(aqjs1.id AS VARCHAR) child FROM ... GROUP BY aq.id, atjs.id, aqjs1.id)
   SELECT atjs_parent_id, STRING_AGG(child,’,’ ORDER BY aqjs.to_left) children 
   FROM q LEFT JOIN (SELECT DISTINCT id_ad_query, to_left FROM administration.ad_query_join_select) aqjs ON …
   GROUP BY aq_parent_id, parent
   ORDER BY aq_parent_id, atjs.to_left;

Something along these lines ‘may’ produce a unique set of child values for each id by which to perform a STRING_AGG on, but again, I can only guess based on the lack of definition provided for your table constraints.

John 

Re: DISTINCT in STRING_AGG

From
Geoff Winkless
Date:
On 28 November 2015 at 18:35, Sterpu Victor <victor@caido.ro> wrote:
Can I make a distinct STRING_AGG?
This is my query :
SELECT atjs.id, STRING_AGG(CAST(aqjs1.id AS VARCHAR), ',' ORDER BY aqjs1.to_left) AS children
FROM administration.ad_query_join_select atjs
JOIN administration.ad_query aq ON (aq.id=atjs.id_ad_query)
LEFT JOIN administration.ad_query_join_select aqjs1 ON (aqjs1.id_ad_query = atjs.id_ad_query AND aqjs1.to_left>atjs.to_left AND aqjs1.to_right<atjs.to_right)
LEFT JOIN administration.ad_query_join_select aqjs2 ON (aqjs2.id_ad_query = atjs.id_ad_query AND aqjs2.to_left>atjs.to_left AND aqjs2.to_right<atjs.to_right AND aqjs2.to_left<aqjs1.to_left AND aqjs2.to_right>aqjs1.to_right)
LEFT JOIN administration.ad_query_join_select aqjs3 ON (aqjs3.id_ad_query = atjs.id_ad_query AND aqjs3.to_left<atjs.to_left AND aqjs3.to_right>atjs.to_right)
WHERE aqjs2.id IS NULL AND atjs.id_ad_query = 475543
GROUP BY aq.id, atjs.id
ORDER BY aq.id ASC, atjs.to_left ASC;
 
And "childen" contain doubles. The result is:
id          ; children
1399029;"1399031,1399031"
1399031;"1399032,1399032,1399032,1399033,1399033,1399033"
 
There are doubles because of the join aqjs3 witch is producing this problem.
Can I make it so the children ID's are unique?
 

Well if you can live with losing the to_left ordering, then you could just do

SELECT STRING_AGG(DISTINCT CAST(aqjs1.id​ AS VARCHAR), '') AS children
​ ...​

​no?

Geoff​

Re: DISTINCT in STRING_AGG

From
"Sterpu Victor"
Date:
I can't skip the ordering.
I'm sure aqjs3 is the one that produces the duplication.
I guess subqueries are the only option, like this:
SELECT atjs.id,  tmp.children AS children
FROM administration.ad_query_join_select atjs
 
JOIN (SELECT
 atjs.id, STRING_AGG(CAST(aqjs1.id AS VARCHAR), ',' ORDER BY aqjs1.to_left) AS children
 FROM administration.ad_query_join_select atjs
 LEFT JOIN administration.ad_query_join_select aqjs1 ON (aqjs1.id_ad_query = atjs.id_ad_query AND aqjs1.to_left>atjs.to_left AND aqjs1.to_right<atjs.to_right)
 LEFT JOIN administration.ad_query_join_select aqjs2 ON (aqjs2.id_ad_query = atjs.id_ad_query AND aqjs2.to_left>atjs.to_left AND aqjs2.to_right<atjs.to_right AND aqjs2.to_left<aqjs1.to_left AND aqjs2.to_right>aqjs1.to_right)
 WHERE aqjs2.id IS NULL AND atjs.id_ad_query = 475543
 GROUP BY atjs.id) tmp ON (tmp.id = atjs.id)
 
JOIN administration.ad_query aq ON (aq.id=atjs.id_ad_query)
LEFT JOIN administration.ad_query_join_select aqjs3 ON (aqjs3.id_ad_query = atjs.id_ad_query AND aqjs3.to_left<atjs.to_left AND aqjs3.to_right>atjs.to_right)
WHERE atjs.id_ad_query = 475543
GROUP BY aq.id, atjs.id, tmp.children
ORDER BY aq.id ASC, atjs.to_left ASC;
 
 
Result is:
id          ; children
1399029;"1399031"
1399031;"1399032,1399033"
Is there a better way? I usualy try to avoid subqueries.
 
------ Original Message ------
From: "Geoff Winkless" <pgsqladmin@geoff.dj>
To: "Sterpu Victor" <victor@caido.ro>
Cc: "PostgreSQL General" <pgsql-general@postgresql.org>
Sent: 11/29/2015 6:42:18 PM
Subject: Re: [GENERAL] DISTINCT in STRING_AGG
 
On 28 November 2015 at 18:35, Sterpu Victor <victor@caido.ro> wrote:
Can I make a distinct STRING_AGG?
This is my query :
SELECT atjs.id, STRING_AGG(CAST(aqjs1.id AS VARCHAR), ',' ORDER BY aqjs1.to_left) AS children
FROM administration.ad_query_join_select atjs
JOIN administration.ad_query aq ON (aq.id=atjs.id_ad_query)
LEFT JOIN administration.ad_query_join_select aqjs1 ON (aqjs1.id_ad_query = atjs.id_ad_query AND aqjs1.to_left>atjs.to_left AND aqjs1.to_right<atjs.to_right)
LEFT JOIN administration.ad_query_join_select aqjs2 ON (aqjs2.id_ad_query = atjs.id_ad_query AND aqjs2.to_left>atjs.to_left AND aqjs2.to_right<atjs.to_right AND aqjs2.to_left<aqjs1.to_left AND aqjs2.to_right>aqjs1.to_right)
LEFT JOIN administration.ad_query_join_select aqjs3 ON (aqjs3.id_ad_query = atjs.id_ad_query AND aqjs3.to_left<atjs.to_left AND aqjs3.to_right>atjs.to_right)
WHERE aqjs2.id IS NULL AND atjs.id_ad_query = 475543
GROUP BY aq.id, atjs.id
ORDER BY aq.id ASC, atjs.to_left ASC;
 
And "childen" contain doubles. The result is:
id          ; children
1399029;"1399031,1399031"
1399031;"1399032,1399032,1399032,1399033,1399033,1399033"
 
There are doubles because of the join aqjs3 witch is producing this problem.
Can I make it so the children ID's are unique?
 

Well if you can live with losing the to_left ordering, then you could just do

SELECT STRING_AGG(DISTINCT CAST(aqjs1.id​ AS VARCHAR), '') AS children
​ ...​

​no?

Geoff​


DISCLAIMER:
Acest mesaj de posta electronica si documentele aferente sunt confidentiale. Este interzisa distribuirea, dezvaluirea sau orice alt mod de utilizare a lor. Daca nu sunteti destinatarul acestui mesaj, este interzis sa actionati in baza acestor informatii. Citirea, copierea, distribuirea, dezvaluirea sau utilizarea in alt mod a informatiei continute in acest mesaj constituie o incalcare a legii. Daca ati primit mesajul din greseala, va rugam sa il distrugeti, anuntand expeditorul de eroarea comisa. Intrucat nu poate fi garantat faptul ca posta electronica este un mod sigur si lipsit de erori de transmitere a informatiilor, este responsabilitatea dvs. sa va asigurati ca mesajul (inclusiv documentele alaturate lui) este validat si autorizat spre a fi utilizat in mediul dvs.

Re: DISTINCT in STRING_AGG

From
Geoff Winkless
Date:
On 29 November 2015 at 18:59, Sterpu Victor <victor@caido.ro> wrote:
I can't skip the ordering.
I'm sure aqjs3 is the one that produces the duplication.
I guess subqueries are the only option, like this:

​Well you could look at the intarray ​extension and a combination of array_agg, uniq() and string_to_array:


but that's probably sledgehammer:nut time.

Not sure why you need to order the values you're getting back by something other than the values themselves - is there a reason you wouldn't want the "children"​ set to be ordered numerically? You can still order the outer query by whatever you like, or you can order the aggregate by the values themselves, it's just the DISTINCT inside the aggregate query requires that an internal ORDER includes the ordering term in the result.

​Geoff​

Re: DISTINCT in STRING_AGG

From
Geoff Winkless
Date:
On 29 November 2015 at 20:51,
​I
 wrote:
Well you could look at the intarray ​extension and a combination of array_agg, uniq() and
​​
string_to_array
:

​Mind blip, apologies, obviously I meant array_to_string :)​

​Geoff​

Re: DISTINCT in STRING_AGG

From
"Sterpu Victor"
Date:
Thank you.
 
I think there is no native function that will solve this problem, intarray extension ​can't order as in the example query.
I could write a new SQL function but I solved the problem with subqueries and works fine.
 
I need order because I use nested trees.
To_left and to_right are the coordinates for each node and the result must be ordered by this to obtain the correct result. 
 
 
 
 
------ Original Message ------
From: "Geoff Winkless" <pgsqladmin@geoff.dj>
To: "Sterpu Victor" <victor@caido.ro>
Cc: "PostgreSQL General" <pgsql-general@postgresql.org>
Sent: 11/29/2015 10:51:32 PM
Subject: Re: Re[2]: [GENERAL] DISTINCT in STRING_AGG
 
On 29 November 2015 at 18:59, Sterpu Victor <victor@caido.ro> wrote:
I can't skip the ordering.
I'm sure aqjs3 is the one that produces the duplication.
I guess subqueries are the only option, like this:

​Well you could look at the intarray ​extension and a combination of array_agg, uniq() and string_to_array:


but that's probably sledgehammer:nut time.

Not sure why you need to order the values you're getting back by something other than the values themselves - is there a reason you wouldn't want the "children"​ set to be ordered numerically? You can still order the outer query by whatever you like, or you can order the aggregate by the values themselves, it's just the DISTINCT inside the aggregate query requires that an internal ORDER includes the ordering term in the result.

​Geoff​


DISCLAIMER:
Acest mesaj de posta electronica si documentele aferente sunt confidentiale. Este interzisa distribuirea, dezvaluirea sau orice alt mod de utilizare a lor. Daca nu sunteti destinatarul acestui mesaj, este interzis sa actionati in baza acestor informatii. Citirea, copierea, distribuirea, dezvaluirea sau utilizarea in alt mod a informatiei continute in acest mesaj constituie o incalcare a legii. Daca ati primit mesajul din greseala, va rugam sa il distrugeti, anuntand expeditorul de eroarea comisa. Intrucat nu poate fi garantat faptul ca posta electronica este un mod sigur si lipsit de erori de transmitere a informatiilor, este responsabilitatea dvs. sa va asigurati ca mesajul (inclusiv documentele alaturate lui) este validat si autorizat spre a fi utilizat in mediul dvs.