Thread: Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ?

Hi list,

I have a basic need, often encountered in spatial analysis: I have a list of cities, parks, childcare centres, schools. I need to count the number of items for each city (0 if no item exists for this city)

I have tested 3 different SQL queries to achieve this goal:

* one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3
* one with sub-queries: http://sqlfiddle.com/#!17/fe902/4
* one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6

I love the first one, with LEFT JOINS, because it is concise, seems simple, and allows querying easily more than one aggregated field from the child items. But we need to use the DISTINCT clause inside the aggregation functions (count, string_agg), in order to count each child only once.

The one with the subqueries seems the more common way (I have seen it a lot) but I find it cumbersome, and I guess it won't scale well for bigger datasets.

The one with the LATERAL joins seems overcomplicated, but I probably missed some easier way to use the lateral join. I do not know well how the LATERAL differs from the subqueries...

I would like to have your opinion on this scenario. What is the best query for this use case, considering the fact that it should perform well on heavier datasets (1 million cities and thousands of children).

NB: I used SQL Fiddle to help everyone see the data and SQL queries. Not sure if everyone can modify it or not. Please try to keep the 3 example unchanged. It seems SQL Fiddle has not been update since at least 2018, so PostgreSQL version is 9.6.

Regards
Michaël


Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou:
> Hi list,
>
> I have a basic need, often encountered in spatial analysis: I have a list
> of cities, parks, childcare centres, schools. I need to count the number of
> items for each city (0 if no item exists for this city)
>
> I have tested 3 different SQL queries to achieve this goal:
>
> * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3
> * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4
> * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6

Hello,

Cost of queries see link "View Execution Plan" in fiddle

query 1:  134.62
query 2: 8522.32
query 3:  134.62

query 1 and 3 have wrong count in result (columns nb_school,
nb_childcare, nb_park)

My try has cost of 81.83

select  c.*
        , coalesce(s.cnt,0) as cnt_school
        , s.schools
        , coalesce(cc.cnt,0) as cnt_childcare
        , cc.childcares
        , coalesce(p.cnt,0) as cnt_park
        , p.parks
  from city c
    left outer join
       (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS schools
           from school
           group by fk_id_city) s
      on s.fk_id_city = c.id
    left outer join
      (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS childcares
            from childcare
           group by fk_id_city) cc
      on cc.fk_id_city = c.id
    left outer join
      (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS parks
         from park
         group by fk_id_city) p
      on p.fk_id_city = c.id
  order by c.id
;

IMHO, but without a where clause, the cost will increase with the amount
of data.

Regards,
Frank




So you

Le lun. 23 mai 2022 à 15:14, Frank Streitzig <fstreitzig@gmx.net> a écrit :
Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou:
> Hi list,
>
> I have a basic need, often encountered in spatial analysis: I have a list
> of cities, parks, childcare centres, schools. I need to count the number of
> items for each city (0 if no item exists for this city)
>
> I have tested 3 different SQL queries to achieve this goal:
>
> * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3
> * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4
> * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6

Hello,

Cost of queries see link "View Execution Plan" in fiddle

query 1:  134.62
query 2: 8522.32
query 3:  134.62

query 1 and 3 have wrong count in result (columns nb_school,
nb_childcare, nb_park)

My try has cost of 81.83

select  c.*
        , coalesce(s.cnt,0) as cnt_school
        , s.schools
        , coalesce(cc.cnt,0) as cnt_childcare
        , cc.childcares
        , coalesce(p.cnt,0) as cnt_park
        , p.parks
  from city c
    left outer join
       (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS schools
           from school
           group by fk_id_city) s
      on s.fk_id_city = c.id
    left outer join
      (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS childcares
            from childcare
           group by fk_id_city) cc
      on cc.fk_id_city = c.id
    left outer join
      (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS parks
         from park
         group by fk_id_city) p
      on p.fk_id_city = c.id
  order by c.id
;

IMHO, but without a where clause, the cost will increase with the amount
of data.

Regards,
Frank

Hi Frank,

Thanks for your answer !

It seems it would perform better to aggregate as soon as possible, like you illustrated in your example.
I will rewrite the query with "WITH" clauses to improve readability.

Thanks also for the Coalesce idea. It is better to see 0 instead of NULL.

Michaël

Le lun. 23 mai 2022 à 16:15, kimaidou <kimaidou@gmail.com> a écrit :
So you

Le lun. 23 mai 2022 à 15:14, Frank Streitzig <fstreitzig@gmx.net> a écrit :
Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou:
> Hi list,
>
> I have a basic need, often encountered in spatial analysis: I have a list
> of cities, parks, childcare centres, schools. I need to count the number of
> items for each city (0 if no item exists for this city)
>
> I have tested 3 different SQL queries to achieve this goal:
>
> * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3
> * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4
> * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6

Hello,

Cost of queries see link "View Execution Plan" in fiddle

query 1:  134.62
query 2: 8522.32
query 3:  134.62

query 1 and 3 have wrong count in result (columns nb_school,
nb_childcare, nb_park)

My try has cost of 81.83

select  c.*
        , coalesce(s.cnt,0) as cnt_school
        , s.schools
        , coalesce(cc.cnt,0) as cnt_childcare
        , cc.childcares
        , coalesce(p.cnt,0) as cnt_park
        , p.parks
  from city c
    left outer join
       (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS schools
           from school
           group by fk_id_city) s
      on s.fk_id_city = c.id
    left outer join
      (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS childcares
            from childcare
           group by fk_id_city) cc
      on cc.fk_id_city = c.id
    left outer join
      (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS parks
         from park
         group by fk_id_city) p
      on p.fk_id_city = c.id
  order by c.id
;

IMHO, but without a where clause, the cost will increase with the amount
of data.

Regards,
Frank

By the way, I was in fact aware of the duplicate count for the "nb_schools" and other fields, this is why I used a count(DISTINCT ) to have a correct count in the first example. I kept the nb_schools and 2 other fields to illustrate the cost of using DISTINCT in the aggregate functions.

Le lun. 23 mai 2022 à 16:20, kimaidou <kimaidou@gmail.com> a écrit :
Hi Frank,

Thanks for your answer !

It seems it would perform better to aggregate as soon as possible, like you illustrated in your example.
I will rewrite the query with "WITH" clauses to improve readability.

Thanks also for the Coalesce idea. It is better to see 0 instead of NULL.

Michaël

Le lun. 23 mai 2022 à 16:15, kimaidou <kimaidou@gmail.com> a écrit :
So you

Le lun. 23 mai 2022 à 15:14, Frank Streitzig <fstreitzig@gmx.net> a écrit :
Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou:
> Hi list,
>
> I have a basic need, often encountered in spatial analysis: I have a list
> of cities, parks, childcare centres, schools. I need to count the number of
> items for each city (0 if no item exists for this city)
>
> I have tested 3 different SQL queries to achieve this goal:
>
> * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3
> * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4
> * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6

Hello,

Cost of queries see link "View Execution Plan" in fiddle

query 1:  134.62
query 2: 8522.32
query 3:  134.62

query 1 and 3 have wrong count in result (columns nb_school,
nb_childcare, nb_park)

My try has cost of 81.83

select  c.*
        , coalesce(s.cnt,0) as cnt_school
        , s.schools
        , coalesce(cc.cnt,0) as cnt_childcare
        , cc.childcares
        , coalesce(p.cnt,0) as cnt_park
        , p.parks
  from city c
    left outer join
       (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS schools
           from school
           group by fk_id_city) s
      on s.fk_id_city = c.id
    left outer join
      (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS childcares
            from childcare
           group by fk_id_city) cc
      on cc.fk_id_city = c.id
    left outer join
      (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS parks
         from park
         group by fk_id_city) p
      on p.fk_id_city = c.id
  order by c.id
;

IMHO, but without a where clause, the cost will increase with the amount
of data.

Regards,
Frank

Here is the 4th SQL fiddle with your proposal organized with "WITH" clauses

Le lun. 23 mai 2022 à 16:22, kimaidou <kimaidou@gmail.com> a écrit :
By the way, I was in fact aware of the duplicate count for the "nb_schools" and other fields, this is why I used a count(DISTINCT ) to have a correct count in the first example. I kept the nb_schools and 2 other fields to illustrate the cost of using DISTINCT in the aggregate functions.

Le lun. 23 mai 2022 à 16:20, kimaidou <kimaidou@gmail.com> a écrit :
Hi Frank,

Thanks for your answer !

It seems it would perform better to aggregate as soon as possible, like you illustrated in your example.
I will rewrite the query with "WITH" clauses to improve readability.

Thanks also for the Coalesce idea. It is better to see 0 instead of NULL.

Michaël

Le lun. 23 mai 2022 à 16:15, kimaidou <kimaidou@gmail.com> a écrit :
So you

Le lun. 23 mai 2022 à 15:14, Frank Streitzig <fstreitzig@gmx.net> a écrit :
Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou:
> Hi list,
>
> I have a basic need, often encountered in spatial analysis: I have a list
> of cities, parks, childcare centres, schools. I need to count the number of
> items for each city (0 if no item exists for this city)
>
> I have tested 3 different SQL queries to achieve this goal:
>
> * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3
> * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4
> * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6

Hello,

Cost of queries see link "View Execution Plan" in fiddle

query 1:  134.62
query 2: 8522.32
query 3:  134.62

query 1 and 3 have wrong count in result (columns nb_school,
nb_childcare, nb_park)

My try has cost of 81.83

select  c.*
        , coalesce(s.cnt,0) as cnt_school
        , s.schools
        , coalesce(cc.cnt,0) as cnt_childcare
        , cc.childcares
        , coalesce(p.cnt,0) as cnt_park
        , p.parks
  from city c
    left outer join
       (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS schools
           from school
           group by fk_id_city) s
      on s.fk_id_city = c.id
    left outer join
      (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS childcares
            from childcare
           group by fk_id_city) cc
      on cc.fk_id_city = c.id
    left outer join
      (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS parks
         from park
         group by fk_id_city) p
      on p.fk_id_city = c.id
  order by c.id
;

IMHO, but without a where clause, the cost will increase with the amount
of data.

Regards,
Frank

Am Mon, May 23, 2022 at 04:33:16PM +0200 schrieb kimaidou:
> Here is the 4th SQL fiddle with your proposal organized with "WITH" clauses
> http://sqlfiddle.com/#!17/fe902/31/0

yes, you can do it like this.

Regards
Frank