Home > mailing lists

Using distinct in an aggregate prevents parallel execution? - Mailing list pgsql-general

From	Thomas Kellerer
Subject	Using distinct in an aggregate prevents parallel execution?
Date	June 6, 2018 15:41:25
Msg-id	pf8kli$i7a$1@blaine.gmane.org Whole thread Raw
Responses	Re: Using distinct in an aggregate prevents parallel execution?
List	pgsql-general

Tree view

Consider this simplified example:

    select c.id, 
           count(*) as total_orders,
           sum(p.price) as total_value
    from customer c
      join orders o ON c.id = o.customer_id 
      join order_line ol ON o.id = ol.order_id 
      join product p ON ol.product_id = p.id
    group by c.id;

This uses parallel execution quite nicely: https://explain.depesz.com/s/aSPNn

However, the query is incorrect as it does not count the number of orders, but (essentially) the number of
order_lines.

This can easily be fixed using:

    select c.id, 
           count(distinct o.id) as total_orders,
           sum(p.price) as total_value
    from customer c
      join orders o ON c.id = o.customer_id 
      join order_line ol ON o.id = ol.order_id 
      join product p ON ol.product_id = p.id
    group by c.id;  

But in that case Postgres 10.4 decides to no longer use parallel execution: https://explain.depesz.com/s/7Ua3

Which increases the query execution time quite a bit (from 3 to 8 seconds). 

Is this a known limitation? 

Thomas

pgsql-general by date:

From: Lionel Tressens
Date: 06 June 2018, 15:16:22
Subject: Re: Setting up replication from 9.4 to 10.4

From: Adrian Klaver
Date: 06 June 2018, 17:06:37
Subject: Re: Which backend using which pg_temp_N schema?

Using distinct in an aggregate prevents parallel execution? - Mailing list pgsql-general

Previous

Next