Thread: Sort is generating rows

Sort is generating rows

From
Nicolas Seinlet
Date:
Hi,

I have a query with a strange query plan.

This query is roughly searching for sales, and convert them with a currency rate. As currency rate changes from time to time, table contains the currency, the company, the rate, the start date of availability of this rate and the end date of availability.

The join is done using :
    left join currency_rate cr on (cr.currency_id = pp.currency_id and
          cr.company_id = s.company_id and
          cr.date_start <= coalesce(s.date_order, now()) and
         (cr.date_end is null or cr.date_end > coalesce(s.date_order, now())))

The tricky part is the date range on the currency rate, which is not an equality.

the query plan shows:
->  Sort  (cost=120.13..124.22 rows=1637 width=56) (actual time=14.300..72084.758 rows=308054684 loops=1)
                          Sort Key: cr.currency_id, cr.company_id
                          Sort Method: quicksort  Memory: 172kB
                          ->  CTE Scan on currency_rate cr  (cost=0.00..32.74 rows=1637 width=56) (actual time=1.403..13.610 rows=1576 loops=1)

There's 2 challenging things :
- planner estimates 1637 rows, and get 300 million lines
- sorting is generating lines

later in the query plan, you find the join:
->  Merge Left Join  (cost=341056.75..351344.76 rows=1165112 width=224) (actual time=9792.635..269120.409 rows=1170055 loops=1)
                    Merge Cond: ((pp.currency_id = cr.currency_id) AND (s.company_id = cr.company_id))
                    Join Filter: ((cr.date_start <= COALESCE((s.date_order)::timestamp with time zone, now())) AND ((cr.date_end IS NULL) OR (cr.date_end > COALESCE((s.date_order)::timestamp with time zone, now()))))
                    Rows Removed by Join Filter: 307266434

It seems the join deletes all the generated million lines, which is correct.

My question is then , is there a better way to join a table to another using a date range, knowing that there's no overlap between date ranges?
Should we generate a virtual table with rates for all dates, and joining using an equality?

For now, the more currency rates, the slowest the query. There's not that much currency rates (1k in this case), as you can only have one rate per day per currency.

Have a nice day,

Nicolas.

Re: Sort is generating rows

From
Adrian Klaver
Date:
On 05/31/2018 02:09 AM, Nicolas Seinlet wrote:
> Hi,
> 
> I have a query with a strange query plan.
> 
> This query is roughly searching for sales, and convert them with a 
> currency rate. As currency rate changes from time to time, table 
> contains the currency, the company, the rate, the start date of 
> availability of this rate and the end date of availability.

My guess is to get a complete answer you are going to need to provide:

1) The complete query.

2) The complete EXPLAIN ANALYZE.

More comments inline below.

> 
> The join is done using :
>      left join currency_rate cr on (cr.currency_id = pp.currency_id and
>            cr.company_id = s.company_id and
>            cr.date_start <= coalesce(s.date_order, now()) and
>           (cr.date_end is null or cr.date_end > coalesce(s.date_order, 
> now())))
> 
> The tricky part is the date range on the currency rate, which is not an 
> equality.

> 
> My question is then , is there a better way to join a table to another 
> using a date range, knowing that there's no overlap between date ranges?

Use date ranges?:

https://www.postgresql.org/docs/10/static/functions-range.html


> Should we generate a virtual table with rates for all dates, and joining 
> using an equality?
> 
> For now, the more currency rates, the slowest the query. There's not 
> that much currency rates (1k in this case), as you can only have one 
> rate per day per currency.
> 
> Have a nice day,
> 
> Nicolas.


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Sort is generating rows

From
Nicolas Seinlet
Date:
Hi,

thanks for the answer. The query is based on a view, so here are the view, the query as well as the query plan.
I've already taken into account remarks like date ranges.

 SELECT min(l.id) AS id,
    l.product_id,
    t.uom_id AS product_uom,
    sum(l.product_uom_qty / u.factor * u2.factor) AS product_uom_qty,
    sum(l.qty_delivered / u.factor * u2.factor) AS qty_delivered,
    sum(l.qty_invoiced / u.factor * u2.factor) AS qty_invoiced,
    sum(l.qty_to_invoice / u.factor * u2.factor) AS qty_to_invoice,
    sum(l.price_total / COALESCE(cr.rate, 1.0)) AS price_total,
    sum(l.price_subtotal / COALESCE(cr.rate, 1.0)) AS price_subtotal,
    sum(l.price_reduce * l.qty_to_invoice / COALESCE(cr.rate, 1.0)) AS amount_to_invoice,
    sum(l.price_reduce * l.qty_invoiced / COALESCE(cr.rate, 1.0)) AS amount_invoiced,
    count(*) AS nbr,
    s.name,
    s.date_order AS date,
    s.confirmation_date,
    s.state,
    s.partner_id,
    s.user_id,
    s.company_id,
    date_part('epoch'::text, avg(date_trunc('day'::text, s.date_order) - date_trunc('day'::text, s.create_date))) / (24 * 60 * 60)::numeric(16,2)::double precision AS delay,
    t.categ_id,
    s.pricelist_id,
    s.analytic_account_id,
    s.team_id,
    p.product_tmpl_id,
    partner.country_id,
    partner.commercial_partner_id,
    sum(p.weight * l.product_uom_qty / u.factor * u2.factor) AS weight,
    sum(p.volume * l.product_uom_qty::double precision / u.factor::double precision * u2.factor::double precision) AS volume
   FROM sale_order_line l
     JOIN sale_order s ON l.order_id = s.id
     JOIN res_partner partner ON s.partner_id = partner.id
     LEFT JOIN product_product p ON l.product_id = p.id
     LEFT JOIN product_template t ON p.product_tmpl_id = t.id
     LEFT JOIN uom_uom u ON u.id = l.product_uom
     LEFT JOIN uom_uom u2 ON u2.id = t.uom_id
     JOIN product_pricelist pp ON s.pricelist_id = pp.id
     LEFT JOIN LATERAL ( SELECT res_currency_rate.rate
           FROM res_currency_rate
          WHERE res_currency_rate.currency_id = pp.currency_id AND (res_currency_rate.company_id = s.company_id OR res_currency_rate.company_id IS NULL) AND daterange(res_currency_rate.name, COALESCE(res_currency_rate.date_end, now()::date)) @> COALESCE(s.date_order::timestamp with time zone, now())::date
         LIMIT 1) cr ON true
  GROUP BY l.product_id, l.order_id, t.uom_id, t.categ_id, s.name, s.date_order, s.confirmation_date, s.partner_id, s.user_id, s.state, s.company_id, s.pricelist_id, s.analytic_account_id, s.team_id, p.product_tmpl_id, partner.country_id, partner.commercial_partner_id;

explain analyse select team_id,partner_id,sum(price_total) from sale_report group by team_id,partner_id;

                                                                                                                                              QUERY PLAN                                                                                                                                               
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=1344575.91..1344986.97 rows=3654 width=40) (actual time=8934.915..8944.487 rows=43 loops=1)
   Group Key: sale_report.team_id, sale_report.partner_id
   ->  Sort  (cost=1344575.91..1344667.26 rows=36539 width=40) (actual time=8934.686..8937.833 rows=32732 loops=1)
         Sort Key: sale_report.team_id, sale_report.partner_id
         Sort Method: quicksort  Memory: 3323kB
         ->  Subquery Scan on sale_report  (cost=1339157.70..1341806.77 rows=36539 width=40) (actual time=8870.269..8923.114 rows=32732 loops=1)
               ->  GroupAggregate  (cost=1339157.70..1341441.38 rows=36539 width=395) (actual time=8870.268..8920.155 rows=32732 loops=1)
                     Group Key: l.product_id, l.order_id, t.uom_id, t.categ_id, s.name, s.date_order, s.confirmation_date, s.partner_id, s.user_id, s.state, s.company_id, s.pricelist_id, s.analytic_account_id, s.team_id, p.product_tmpl_id, partner.country_id, partner.commercial_partner_id
                     ->  Sort  (cost=1339157.70..1339249.04 rows=36539 width=92) (actual time=8870.247..8875.191 rows=32732 loops=1)
                           Sort Key: l.product_id, l.order_id, t.uom_id, t.categ_id, s.name, s.date_order, s.confirmation_date, s.partner_id, s.user_id, s.state, s.company_id, s.pricelist_id, s.analytic_account_id, s.team_id, p.product_tmpl_id, partner.country_id, partner.commercial_partner_id
                           Sort Method: quicksort  Memory: 5371kB
                           ->  Nested Loop Left Join  (cost=695.71..1336388.56 rows=36539 width=92) (actual time=13.468..8797.655 rows=32732 loops=1)
                                 ->  Hash Left Join  (cost=695.43..3338.19 rows=36539 width=88) (actual time=13.323..65.600 rows=32732 loops=1)
                                       Hash Cond: (l.product_id = p.id)
                                       ->  Hash Join  (cost=656.36..2796.71 rows=36539 width=76) (actual time=13.236..49.047 rows=32732 loops=1)
                                             Hash Cond: (l.order_id = s.id)
                                             ->  Seq Scan on sale_order_line l  (cost=0.00..1673.39 rows=36539 width=17) (actual time=0.019..7.338 rows=32732 loops=1)
                                             ->  Hash  (cost=550.72..550.72 rows=8451 width=67) (actual time=13.184..13.184 rows=8382 loops=1)
                                                   Buckets: 16384  Batches: 1  Memory Usage: 947kB
                                                   ->  Hash Join  (cost=37.69..550.72 rows=8451 width=67) (actual time=0.164..10.135 rows=8382 loops=1)
                                                         Hash Cond: (s.pricelist_id = pp.id)
                                                         ->  Hash Join  (cost=13.97..420.42 rows=8451 width=63) (actual time=0.151..7.064 rows=8382 loops=1)
                                                               Hash Cond: (s.partner_id = partner.id)
                                                               ->  Seq Scan on sale_order s  (cost=0.00..301.51 rows=8451 width=55) (actual time=0.005..1.807 rows=8382 loops=1)
                                                               ->  Hash  (cost=13.43..13.43 rows=43 width=12) (actual time=0.136..0.136 rows=43 loops=1)
                                                                     Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                                                     ->  Seq Scan on res_partner partner  (cost=0.00..13.43 rows=43 width=12) (actual time=0.013..0.112 rows=43 loops=1)
                                                         ->  Hash  (cost=16.10..16.10 rows=610 width=8) (actual time=0.007..0.007 rows=1 loops=1)
                                                               Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                               ->  Seq Scan on product_pricelist pp  (cost=0.00..16.10 rows=610 width=8) (actual time=0.005..0.005 rows=1 loops=1)
                                       ->  Hash  (cost=32.95..32.95 rows=490 width=16) (actual time=0.076..0.076 rows=43 loops=1)
                                             Buckets: 1024  Batches: 1  Memory Usage: 11kB
                                             ->  Hash Left Join  (cost=11.88..32.95 rows=490 width=16) (actual time=0.051..0.068 rows=43 loops=1)
                                                   Hash Cond: (p.product_tmpl_id = t.id)
                                                   ->  Seq Scan on product_product p  (cost=0.00..14.90 rows=490 width=8) (actual time=0.007..0.010 rows=43 loops=1)
                                                   ->  Hash  (cost=11.39..11.39 rows=39 width=12) (actual time=0.039..0.039 rows=39 loops=1)
                                                         Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                                         ->  Seq Scan on product_template t  (cost=0.00..11.39 rows=39 width=12) (actual time=0.006..0.030 rows=39 loops=1)
                                 ->  Limit  (cost=0.28..36.46 rows=1 width=8) (actual time=0.266..0.266 rows=1 loops=32732)
                                       ->  Index Scan using res_currency_rate_currency_id_index on res_currency_rate  (cost=0.28..36.46 rows=1 width=8) (actual time=0.266..0.266 rows=1 loops=32732)
                                             Index Cond: (currency_id = pp.currency_id)
                                             Filter: (((company_id = s.company_id) OR (company_id IS NULL)) AND (daterange(name, COALESCE(date_end, (now())::date)) @> (COALESCE((s.date_order)::timestamp with time zone, now()))::date))
                                             Rows Removed by Filter: 502
 Planning time: 5.731 ms
 Execution time: 8944.950 ms
(45 rows)


Have a nice day,

Nicolas.

2018-06-01 0:32 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com>:
On 05/31/2018 02:09 AM, Nicolas Seinlet wrote:
Hi,

I have a query with a strange query plan.

This query is roughly searching for sales, and convert them with a currency rate. As currency rate changes from time to time, table contains the currency, the company, the rate, the start date of availability of this rate and the end date of availability.

My guess is to get a complete answer you are going to need to provide:

1) The complete query.

2) The complete EXPLAIN ANALYZE.

More comments inline below.


The join is done using :
     left join currency_rate cr on (cr.currency_id = pp.currency_id and
           cr.company_id = s.company_id and
           cr.date_start <= coalesce(s.date_order, now()) and
          (cr.date_end is null or cr.date_end > coalesce(s.date_order, now())))

The tricky part is the date range on the currency rate, which is not an equality.


My question is then , is there a better way to join a table to another using a date range, knowing that there's no overlap between date ranges?

Use date ranges?:

https://www.postgresql.org/docs/10/static/functions-range.html



Should we generate a virtual table with rates for all dates, and joining using an equality?

For now, the more currency rates, the slowest the query. There's not that much currency rates (1k in this case), as you can only have one rate per day per currency.

Have a nice day,

Nicolas.


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Sort is generating rows

From
Adrian Klaver
Date:
On 06/01/2018 02:36 AM, Nicolas Seinlet wrote:
> Hi,
> 
> thanks for the answer. The query is based on a view, so here are the 
> view, the query as well as the query plan.
> I've already taken into account remarks like date ranges.

You changed the query from the original, besides just adding the 
daterange, I see an addition of a LATERAL, where there other changes?

The changes did eliminate the 300 million line sort from what I can see.

The new query takes ~9 secs is that an improvement over the old?

I took the liberty of running the EXPLAIN ANALYZE through 
explain.depesz.com:

https://explain.depesz.com/s/9thl

The largest amount of time was in the Index Scan(8,706.712ms) and that 
was because the scan was looped 32,732 times. I have not used LATERAL in 
my own code so I looked it up:

https://www.postgresql.org/docs/10/static/sql-select.html

LATERAL

"
...

When a FROM item contains LATERAL cross-references, evaluation proceeds 
as follows: for each row of the FROM item providing the cross-referenced 
column(s), or set of rows of multiple FROM items providing the columns, 
the LATERAL item is evaluated using that row or row set's values of the 
columns. The resulting row(s) are joined as usual with the rows they 
were computed from. This is repeated for each row or set of rows from 
the column source table(s).
...
"

If I am following correctly that might explain some of looping seen above.

> 
>   SELECT min(l.id <http://l.id>) AS id,
>      l.product_id,
>      t.uom_id AS product_uom,
>      sum(l.product_uom_qty / u.factor * u2.factor) AS product_uom_qty,
>      sum(l.qty_delivered / u.factor * u2.factor) AS qty_delivered,
>      sum(l.qty_invoiced / u.factor * u2.factor) AS qty_invoiced,
>      sum(l.qty_to_invoice / u.factor * u2.factor) AS qty_to_invoice,
>      sum(l.price_total / COALESCE(cr.rate, 1.0)) AS price_total,
>      sum(l.price_subtotal / COALESCE(cr.rate, 1.0)) AS price_subtotal,
>      sum(l.price_reduce * l.qty_to_invoice / COALESCE(cr.rate, 1.0)) AS 
> amount_to_invoice,
>      sum(l.price_reduce * l.qty_invoiced / COALESCE(cr.rate, 1.0)) AS 
> amount_invoiced,
>      count(*) AS nbr,
> s.name <http://s.name>,
>      s.date_order AS date,
>      s.confirmation_date,
>      s.state,
>      s.partner_id,
>      s.user_id,
>      s.company_id,
>      date_part('epoch'::text, avg(date_trunc('day'::text, s.date_order) 
> - date_trunc('day'::text, s.create_date))) / (24 * 60 * 
> 60)::numeric(16,2)::double precision AS delay,
>      t.categ_id,
>      s.pricelist_id,
>      s.analytic_account_id,
>      s.team_id,
>      p.product_tmpl_id,
>      partner.country_id,
>      partner.commercial_partner_id,
>      sum(p.weight * l.product_uom_qty / u.factor * u2.factor) AS weight,
>      sum(p.volume * l.product_uom_qty::double precision / 
> u.factor::double precision * u2.factor::double precision) AS volume
>     FROM sale_order_line l
>       JOIN sale_order s ON l.order_id = s.id <http://s.id>
>       JOIN res_partner partner ON s.partner_id = partner.id 
> <http://partner.id>
>       LEFT JOIN product_product p ON l.product_id = p.id <http://p.id>
>       LEFT JOIN product_template t ON p.product_tmpl_id = t.id <http://t.id>
>       LEFT JOIN uom_uom u ON u.id <http://u.id> = l.product_uom
>       LEFT JOIN uom_uom u2 ON u2.id <http://u2.id> = t.uom_id
>       JOIN product_pricelist pp ON s.pricelist_id = pp.id <http://pp.id>
>       LEFT JOIN LATERAL ( SELECT res_currency_rate.rate
>             FROM res_currency_rate
>            WHERE res_currency_rate.currency_id = pp.currency_id AND 
> (res_currency_rate.company_id = s.company_id OR 
> res_currency_rate.company_id IS NULL) AND 
> daterange(res_currency_rate.name <http://res_currency_rate.name>, 
> COALESCE(res_currency_rate.date_end, now()::date)) @> 
> COALESCE(s.date_order::timestamp with time zone, now())::date
>           LIMIT 1) cr ON true
>    GROUP BY l.product_id, l.order_id, t.uom_id, t.categ_id, s.name 
> <http://s.name>, s.date_order, s.confirmation_date, s.partner_id, 
> s.user_id, s.state, s.company_id, s.pricelist_id, s.analytic_account_id, 
> s.team_id, p.product_tmpl_id, partner.country_id, 
> partner.commercial_partner_id;
> 
> explain analyse select team_id,partner_id,sum(price_total) from 
> sale_report group by team_id,partner_id;
> 
>                                                                          
>                                                                        
> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   GroupAggregate  (cost=1344575.91..1344986.97 rows=3654 width=40) 
> (actual time=8934.915..8944.487 rows=43 loops=1)
>     Group Key: sale_report.team_id, sale_report.partner_id
>     ->  Sort  (cost=1344575.91..1344667.26 rows=36539 width=40) (actual 
> time=8934.686..8937.833 rows=32732 loops=1)
>           Sort Key: sale_report.team_id, sale_report.partner_id
>           Sort Method: quicksort  Memory: 3323kB
>           ->  Subquery Scan on sale_report  (cost=1339157.70..1341806.77 
> rows=36539 width=40) (actual time=8870.269..8923.114 rows=32732 loops=1)
>                 ->  GroupAggregate  (cost=1339157.70..1341441.38 
> rows=36539 width=395) (actual time=8870.268..8920.155 rows=32732 loops=1)
>                       Group Key: l.product_id, l.order_id, t.uom_id, 
> t.categ_id, s.name <http://s.name>, s.date_order, s.confirmation_date, 
> s.partner_id, s.user_id, s.state, s.company_id, s.pricelist_id, 
> s.analytic_account_id, s.team_id, p.product_tmpl_id, partner.country_id, 
> partner.commercial_partner_id
>                       ->  Sort  (cost=1339157.70..1339249.04 rows=36539 
> width=92) (actual time=8870.247..8875.191 rows=32732 loops=1)
>                             Sort Key: l.product_id, l.order_id, 
> t.uom_id, t.categ_id, s.name <http://s.name>, s.date_order, 
> s.confirmation_date, s.partner_id, s.user_id, s.state, s.company_id, 
> s.pricelist_id, s.analytic_account_id, s.team_id, p.product_tmpl_id, 
> partner.country_id, partner.commercial_partner_id
>                             Sort Method: quicksort  Memory: 5371kB
>                             ->  Nested Loop Left Join  
> (cost=695.71..1336388.56 rows=36539 width=92) (actual 
> time=13.468..8797.655 rows=32732 loops=1)
>                                   ->  Hash Left Join  
> (cost=695.43..3338.19 rows=36539 width=88) (actual time=13.323..65.600 
> rows=32732 loops=1)
>                                         Hash Cond: (l.product_id = p.id 
> <http://p.id>)
>                                         ->  Hash Join  
> (cost=656.36..2796.71 rows=36539 width=76) (actual time=13.236..49.047 
> rows=32732 loops=1)
>                                               Hash Cond: (l.order_id = 
> s.id <http://s.id>)
>                                               ->  Seq Scan on 
> sale_order_line l  (cost=0.00..1673.39 rows=36539 width=17) (actual 
> time=0.019..7.338 rows=32732 loops=1)
>                                               ->  Hash  
> (cost=550.72..550.72 rows=8451 width=67) (actual time=13.184..13.184 
> rows=8382 loops=1)
>                                                     Buckets: 16384  
> Batches: 1  Memory Usage: 947kB
>                                                     ->  Hash Join  
> (cost=37.69..550.72 rows=8451 width=67) (actual time=0.164..10.135 
> rows=8382 loops=1)
>                                                           Hash Cond:
> (s.pricelist_id = pp.id <http://pp.id>)
>                                                           ->  Hash Join  
> (cost=13.97..420.42 rows=8451 width=63) (actual time=0.151..7.064 
> rows=8382 loops=1)
>                                                                 Hash 
> Cond: (s.partner_id = partner.id <http://partner.id>)
>                                                                 ->  Seq 
> Scan on sale_order s  (cost=0.00..301.51 rows=8451 width=55) (actual 
> time=0.005..1.807 rows=8382 loops=1)
>                                                                 ->  
> Hash  (cost=13.43..13.43 rows=43 width=12) (actual time=0.136..0.136 
> rows=43 loops=1)
>                                                                      
>   Buckets: 1024  Batches: 1  Memory Usage: 10kB
>                                                                      
>   ->  Seq Scan on res_partner partner  (cost=0.00..13.43 rows=43 
> width=12) (actual time=0.013..0.112 rows=43 loops=1)
>                                                           ->  Hash  
> (cost=16.10..16.10 rows=610 width=8) (actual time=0.007..0.007 rows=1 
> loops=1)
>                                                                 Buckets: 
> 1024  Batches: 1  Memory Usage: 9kB
>                                                                 ->  Seq 
> Scan on product_pricelist pp  (cost=0.00..16.10 rows=610 width=8) 
> (actual time=0.005..0.005 rows=1 loops=1)
>                                         ->  Hash  (cost=32.95..32.95 
> rows=490 width=16) (actual time=0.076..0.076 rows=43 loops=1)
>                                               Buckets: 1024  Batches: 1  
> Memory Usage: 11kB
>                                               ->  Hash Left Join  
> (cost=11.88..32.95 rows=490 width=16) (actual time=0.051..0.068 rows=43 
> loops=1)
>                                                     Hash Cond: 
> (p.product_tmpl_id = t.id <http://t.id>)
>                                                     ->  Seq Scan on 
> product_product p  (cost=0.00..14.90 rows=490 width=8) (actual 
> time=0.007..0.010 rows=43 loops=1)
>                                                     ->  Hash  
> (cost=11.39..11.39 rows=39 width=12) (actual time=0.039..0.039 rows=39 
> loops=1)
>                                                           Buckets: 1024  
> Batches: 1  Memory Usage: 10kB
>                                                           ->  Seq Scan 
> on product_template t  (cost=0.00..11.39 rows=39 width=12) (actual 
> time=0.006..0.030 rows=39 loops=1)
>                                   ->  Limit  (cost=0.28..36.46 rows=1 
> width=8) (actual time=0.266..0.266 rows=1 loops=32732)
>                                         ->  Index Scan using 
> res_currency_rate_currency_id_index on res_currency_rate  
> (cost=0.28..36.46 rows=1 width=8) (actual time=0.266..0.266 rows=1 
> loops=32732)
>                                               Index Cond: (currency_id = 
> pp.currency_id)
>                                               Filter: (((company_id = 
> s.company_id) OR (company_id IS NULL)) AND (daterange(name, 
> COALESCE(date_end, (now())::date)) @> 
> (COALESCE((s.date_order)::timestamp with time zone, now()))::date))
>                                               Rows Removed by Filter: 502
>   Planning time: 5.731 ms
>   Execution time: 8944.950 ms
> (45 rows)
> 
> 
> Have a nice day,
> 
> Nicolas.
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Sort is generating rows

From
Nicolas Seinlet
Date:
Hi,

you're right, I've also changed the lateral join. The lateral join remove the row generation, and use a loop.
I've remove the lateral join, on the same dataset as the one used for provided explain plan, and time to execute grows up to 18s:
 SELECT min(l.id) AS id,
    l.product_id,
    t.uom_id AS product_uom,
    sum(l.product_uom_qty / u.factor * u2.factor) AS product_uom_qty,
    sum(l.qty_delivered / u.factor * u2.factor) AS qty_delivered,
    sum(l.qty_invoiced / u.factor * u2.factor) AS qty_invoiced,
    sum(l.qty_to_invoice / u.factor * u2.factor) AS qty_to_invoice,
    sum(l.price_total / COALESCE(cr.rate, 1.0)) AS price_total,
    sum(l.price_subtotal / COALESCE(cr.rate, 1.0)) AS price_subtotal,
    sum(l.price_reduce * l.qty_to_invoice / COALESCE(cr.rate, 1.0)) AS amount_to_invoice,
    sum(l.price_reduce * l.qty_invoiced / COALESCE(cr.rate, 1.0)) AS amount_invoiced,
    count(*) AS nbr,
    s.name,
    s.date_order AS date,
    s.confirmation_date,
    s.state,
    s.partner_id,
    s.user_id,
    s.company_id,
    date_part('epoch'::text, avg(date_trunc('day'::text, s.date_order) - date_trunc('day'::text, s.create_date))) / (24 * 60 * 60)::numeric(16,2)::double precision AS delay,
    t.categ_id,
    s.pricelist_id,
    s.analytic_account_id,
    s.team_id,
    p.product_tmpl_id,
    partner.country_id,
    partner.commercial_partner_id,
    sum(p.weight * l.product_uom_qty / u.factor * u2.factor) AS weight,
    sum(p.volume * l.product_uom_qty::double precision / u.factor::double precision * u2.factor::double precision) AS volume
   FROM sale_order_line l
     JOIN sale_order s ON l.order_id = s.id
     JOIN res_partner partner ON s.partner_id = partner.id
     LEFT JOIN product_product p ON l.product_id = p.id
     LEFT JOIN product_template t ON p.product_tmpl_id = t.id
     LEFT JOIN uom_uom u ON u.id = l.product_uom
     LEFT JOIN uom_uom u2 ON u2.id = t.uom_id
     JOIN product_pricelist pp ON s.pricelist_id = pp.id
     LEFT JOIN ( SELECT r.currency_id,
            COALESCE(r.company_id, c.id) AS company_id,
            r.rate,
            r.name AS date_start,
            r.date_end
           FROM res_currency_rate r
             JOIN res_company c ON r.company_id IS NULL OR r.company_id = c.id) cr ON cr.currency_id = pp.currency_id AND (cr.company_id = s.company_id OR cr.company_id IS NULL) AND daterange(cr.date_start, COALESCE(cr.date_end, now()::date)) @> COALESCE(s.date_order::timestamp with time zone, now())::date
  GROUP BY l.product_id, l.order_id, t.uom_id, t.categ_id, s.name, s.date_order, s.confirmation_date, s.partner_id, s.user_id, s.state, s.company_id, s.pricelist_id, s.analytic_account_id, s.team_id, p.product_tmpl_id, partner.country_id, partner.commercial_partner_id;

I've put the new explain plan as an optimisation of the previous one:

If I remove the daterange, and use original query, execution time is better and fall to 13s:

 SELECT min(l.id) AS id,
    l.product_id,
    t.uom_id AS product_uom,
    sum(l.product_uom_qty / u.factor * u2.factor) AS product_uom_qty,
    sum(l.qty_delivered / u.factor * u2.factor) AS qty_delivered,
    sum(l.qty_invoiced / u.factor * u2.factor) AS qty_invoiced,
    sum(l.qty_to_invoice / u.factor * u2.factor) AS qty_to_invoice,
    sum(l.price_total / COALESCE(cr.rate, 1.0)) AS price_total,
    sum(l.price_subtotal / COALESCE(cr.rate, 1.0)) AS price_subtotal,
    sum(l.price_reduce * l.qty_to_invoice / COALESCE(cr.rate, 1.0)) AS amount_to_invoice,
    sum(l.price_reduce * l.qty_invoiced / COALESCE(cr.rate, 1.0)) AS amount_invoiced,
    count(*) AS nbr,
    s.name,
    s.date_order AS date,
    s.confirmation_date,
    s.state,
    s.partner_id,
    s.user_id,
    s.company_id,
    date_part('epoch'::text, avg(date_trunc('day'::text, s.date_order) - date_trunc('day'::text, s.create_date))) / (24 * 60 * 60)::numeric(16,2)::double precision AS delay,
    t.categ_id,
    s.pricelist_id,
    s.analytic_account_id,
    s.team_id,
    p.product_tmpl_id,
    partner.country_id,
    partner.commercial_partner_id,
    sum(p.weight * l.product_uom_qty / u.factor * u2.factor) AS weight,
    sum(p.volume * l.product_uom_qty::double precision / u.factor::double precision * u2.factor::double precision) AS volume
   FROM sale_order_line l
     JOIN sale_order s ON l.order_id = s.id
     JOIN res_partner partner ON s.partner_id = partner.id
     LEFT JOIN product_product p ON l.product_id = p.id
     LEFT JOIN product_template t ON p.product_tmpl_id = t.id
     LEFT JOIN uom_uom u ON u.id = l.product_uom
     LEFT JOIN uom_uom u2 ON u2.id = t.uom_id
     JOIN product_pricelist pp ON s.pricelist_id = pp.id
     LEFT JOIN ( SELECT r.currency_id,
            COALESCE(r.company_id, c.id) AS company_id,
            r.rate,
            r.name AS date_start,
            r.date_end
           FROM res_currency_rate r
             JOIN res_company c ON r.company_id IS NULL OR r.company_id = c.id) cr ON cr.currency_id = pp.currency_id AND (cr.company_id = s.company_id OR cr.company_id IS NULL) AND cr.date_start <= COALESCE(s.date_order::timestamp with time zone, now()) AND (cr.date_end IS NULL OR cr.date_end > COALESCE(s.date_order::timestamp with time zone, now()))
  GROUP BY l.product_id, l.order_id, t.uom_id, t.categ_id, s.name, s.date_order, s.confirmation_date, s.partner_id, s.user_id, s.state, s.company_id, s.pricelist_id, s.analytic_account_id, s.team_id, p.product_tmpl_id, partner.country_id, partner.commercial_partner_id;

And thanks again for the help.

Have a nice day,

Nicolas


2018-06-01 16:33 GMT+02:00 Adrian Klaver <adrian.klaver@aklaver.com>:
On 06/01/2018 02:36 AM, Nicolas Seinlet wrote:
Hi,

thanks for the answer. The query is based on a view, so here are the view, the query as well as the query plan.
I've already taken into account remarks like date ranges.

You changed the query from the original, besides just adding the daterange, I see an addition of a LATERAL, where there other changes?

The changes did eliminate the 300 million line sort from what I can see.

The new query takes ~9 secs is that an improvement over the old?

I took the liberty of running the EXPLAIN ANALYZE through explain.depesz.com:

https://explain.depesz.com/s/9thl

The largest amount of time was in the Index Scan(8,706.712ms) and that was because the scan was looped 32,732 times. I have not used LATERAL in my own code so I looked it up:

https://www.postgresql.org/docs/10/static/sql-select.html

LATERAL

"
...

When a FROM item contains LATERAL cross-references, evaluation proceeds as follows: for each row of the FROM item providing the cross-referenced column(s), or set of rows of multiple FROM items providing the columns, the LATERAL item is evaluated using that row or row set's values of the columns. The resulting row(s) are joined as usual with the rows they were computed from. This is repeated for each row or set of rows from the column source table(s).
...
"

If I am following correctly that might explain some of looping seen above.


  SELECT min(l.id <http://l.id>) AS id,
     l.product_id,
     t.uom_id AS product_uom,
     sum(l.product_uom_qty / u.factor * u2.factor) AS product_uom_qty,
     sum(l.qty_delivered / u.factor * u2.factor) AS qty_delivered,
     sum(l.qty_invoiced / u.factor * u2.factor) AS qty_invoiced,
     sum(l.qty_to_invoice / u.factor * u2.factor) AS qty_to_invoice,
     sum(l.price_total / COALESCE(cr.rate, 1.0)) AS price_total,
     sum(l.price_subtotal / COALESCE(cr.rate, 1.0)) AS price_subtotal,
     sum(l.price_reduce * l.qty_to_invoice / COALESCE(cr.rate, 1.0)) AS amount_to_invoice,
     sum(l.price_reduce * l.qty_invoiced / COALESCE(cr.rate, 1.0)) AS amount_invoiced,
     count(*) AS nbr,
s.name <http://s.name>,
     s.date_order AS date,
     s.confirmation_date,
     s.state,
     s.partner_id,
     s.user_id,
     s.company_id,
     date_part('epoch'::text, avg(date_trunc('day'::text, s.date_order) - date_trunc('day'::text, s.create_date))) / (24 * 60 * 60)::numeric(16,2)::double precision AS delay,
     t.categ_id,
     s.pricelist_id,
     s.analytic_account_id,
     s.team_id,
     p.product_tmpl_id,
     partner.country_id,
     partner.commercial_partner_id,
     sum(p.weight * l.product_uom_qty / u.factor * u2.factor) AS weight,
     sum(p.volume * l.product_uom_qty::double precision / u.factor::double precision * u2.factor::double precision) AS volume
    FROM sale_order_line l
      JOIN sale_order s ON l.order_id = s.id <http://s.id>
      JOIN res_partner partner ON s.partner_id = partner.id <http://partner.id>
      LEFT JOIN product_product p ON l.product_id = p.id <http://p.id>
      LEFT JOIN product_template t ON p.product_tmpl_id = t.id <http://t.id>
      LEFT JOIN uom_uom u ON u.id <http://u.id> = l.product_uom
      LEFT JOIN uom_uom u2 ON u2.id <http://u2.id> = t.uom_id
      JOIN product_pricelist pp ON s.pricelist_id = pp.id <http://pp.id>
      LEFT JOIN LATERAL ( SELECT res_currency_rate.rate
            FROM res_currency_rate
           WHERE res_currency_rate.currency_id = pp.currency_id AND (res_currency_rate.company_id = s.company_id OR res_currency_rate.company_id IS NULL) AND daterange(res_currency_rate.name <http://res_currency_rate.name>, COALESCE(res_currency_rate.date_end, now()::date)) @> COALESCE(s.date_order::timestamp with time zone, now())::date
          LIMIT 1) cr ON true
   GROUP BY l.product_id, l.order_id, t.uom_id, t.categ_id, s.name <http://s.name>, s.date_order, s.confirmation_date, s.partner_id, s.user_id, s.state, s.company_id, s.pricelist_id, s.analytic_account_id, s.team_id, p.product_tmpl_id, partner.country_id, partner.commercial_partner_id;

explain analyse select team_id,partner_id,sum(price_total) from sale_report group by team_id,partner_id;

                                                                                                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  GroupAggregate  (cost=1344575.91..1344986.97 rows=3654 width=40) (actual time=8934.915..8944.487 rows=43 loops=1)
    Group Key: sale_report.team_id, sale_report.partner_id
    ->  Sort  (cost=1344575.91..1344667.26 rows=36539 width=40) (actual time=8934.686..8937.833 rows=32732 loops=1)
          Sort Key: sale_report.team_id, sale_report.partner_id
          Sort Method: quicksort  Memory: 3323kB
          ->  Subquery Scan on sale_report  (cost=1339157.70..1341806.77 rows=36539 width=40) (actual time=8870.269..8923.114 rows=32732 loops=1)
                ->  GroupAggregate  (cost=1339157.70..1341441.38 rows=36539 width=395) (actual time=8870.268..8920.155 rows=32732 loops=1)
                      Group Key: l.product_id, l.order_id, t.uom_id, t.categ_id, s.name <http://s.name>, s.date_order, s.confirmation_date, s.partner_id, s.user_id, s.state, s.company_id, s.pricelist_id, s.analytic_account_id, s.team_id, p.product_tmpl_id, partner.country_id, partner.commercial_partner_id
                      ->  Sort  (cost=1339157.70..1339249.04 rows=36539 width=92) (actual time=8870.247..8875.191 rows=32732 loops=1)
                            Sort Key: l.product_id, l.order_id, t.uom_id, t.categ_id, s.name <http://s.name>, s.date_order, s.confirmation_date, s.partner_id, s.user_id, s.state, s.company_id, s.pricelist_id, s.analytic_account_id, s.team_id, p.product_tmpl_id, partner.country_id, partner.commercial_partner_id
                            Sort Method: quicksort  Memory: 5371kB
                            ->  Nested Loop Left Join  (cost=695.71..1336388.56 rows=36539 width=92) (actual time=13.468..8797.655 rows=32732 loops=1)
                                  ->  Hash Left Join  (cost=695.43..3338.19 rows=36539 width=88) (actual time=13.323..65.600 rows=32732 loops=1)
                                        Hash Cond: (l.product_id = p.id <http://p.id>)
                                        ->  Hash Join  (cost=656.36..2796.71 rows=36539 width=76) (actual time=13.236..49.047 rows=32732 loops=1)
                                              Hash Cond: (l.order_id = s.id <http://s.id>)
                                              ->  Seq Scan on sale_order_line l  (cost=0.00..1673.39 rows=36539 width=17) (actual time=0.019..7.338 rows=32732 loops=1)
                                              ->  Hash  (cost=550.72..550.72 rows=8451 width=67) (actual time=13.184..13.184 rows=8382 loops=1)
                                                    Buckets: 16384  Batches: 1  Memory Usage: 947kB
                                                    ->  Hash Join  (cost=37.69..550.72 rows=8451 width=67) (actual time=0.164..10.135 rows=8382 loops=1)
                                                          Hash Cond: (s.pricelist_id = pp.id <http://pp.id>)
                                                          ->  Hash Join  (cost=13.97..420.42 rows=8451 width=63) (actual time=0.151..7.064 rows=8382 loops=1)
                                                                Hash Cond: (s.partner_id = partner.id <http://partner.id>)
                                                                ->  Seq Scan on sale_order s  (cost=0.00..301.51 rows=8451 width=55) (actual time=0.005..1.807 rows=8382 loops=1)
                                                                ->  Hash  (cost=13.43..13.43 rows=43 width=12) (actual time=0.136..0.136 rows=43 loops=1)
                                                                       Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                                                       ->  Seq Scan on res_partner partner  (cost=0.00..13.43 rows=43 width=12) (actual time=0.013..0.112 rows=43 loops=1)
                                                          ->  Hash  (cost=16.10..16.10 rows=610 width=8) (actual time=0.007..0.007 rows=1 loops=1)
                                                                Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                                ->  Seq Scan on product_pricelist pp  (cost=0.00..16.10 rows=610 width=8) (actual time=0.005..0.005 rows=1 loops=1)
                                        ->  Hash  (cost=32.95..32.95 rows=490 width=16) (actual time=0.076..0.076 rows=43 loops=1)
                                              Buckets: 1024  Batches: 1  Memory Usage: 11kB
                                              ->  Hash Left Join  (cost=11.88..32.95 rows=490 width=16) (actual time=0.051..0.068 rows=43 loops=1)
                                                    Hash Cond: (p.product_tmpl_id = t.id <http://t.id>)
                                                    ->  Seq Scan on product_product p  (cost=0.00..14.90 rows=490 width=8) (actual time=0.007..0.010 rows=43 loops=1)
                                                    ->  Hash  (cost=11.39..11.39 rows=39 width=12) (actual time=0.039..0.039 rows=39 loops=1)
                                                          Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                                          ->  Seq Scan on product_template t  (cost=0.00..11.39 rows=39 width=12) (actual time=0.006..0.030 rows=39 loops=1)
                                  ->  Limit  (cost=0.28..36.46 rows=1 width=8) (actual time=0.266..0.266 rows=1 loops=32732)
                                        ->  Index Scan using res_currency_rate_currency_id_index on res_currency_rate  (cost=0.28..36.46 rows=1 width=8) (actual time=0.266..0.266 rows=1 loops=32732)
                                              Index Cond: (currency_id = pp.currency_id)
                                              Filter: (((company_id = s.company_id) OR (company_id IS NULL)) AND (daterange(name, COALESCE(date_end, (now())::date)) @> (COALESCE((s.date_order)::timestamp with time zone, now()))::date))
                                              Rows Removed by Filter: 502
  Planning time: 5.731 ms
  Execution time: 8944.950 ms
(45 rows)


Have a nice day,

Nicolas.



--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Sort is generating rows

From
Nicolas Seinlet
Date:
Hi,

I've tried some tests, by generating various datas in the res_currency_rate table.

If I generate res_currency_rate rows for unsused currencies, this doesn't influence the execution time.
if I generate more res_currency_rate for used currencies, this slower the query.
If I generate 100 rates, on a one per day basis, I get an execution time of 4.5 seconds
If I generate 100 rates, on a one per 3 days basis, execution time drops to 4 seconds
If I generate 100 rates, on a one per 6 days basis, execution time drops to 3.8 seconds.

I've executed following tests many times, to avoid cache or buffers related issues, each time after vacuuming table, ...

The execution time with a join of type daterange :
res_currency_rate.currency_id = pp.currency_id AND (res_currency_rate.company_id = s.company_id OR res_currency_rate.company_id IS NULL) AND daterange(res_currency_rate.name, res_currency_rate.date_end) @> COALESCE(s.date_order::timestamp with time zone, now())::date

is slower than the date comparison equivalent:
res_currency_rate.currency_id = pp.currency_id AND (res_currency_rate.company_id = s.company_id OR res_currency_rate.company_id IS NULL) AND res_currency_rate.name <= COALESCE(s.date_order::timestamp with time zone, now()) AND (res_currency_rate.date_end IS NULL OR res_currency_rate.date_end > COALESCE(s.date_order::timestamp with time zone, now()))


Re: Sort is generating rows

From
Adrian Klaver
Date:
On 06/05/2018 07:58 AM, Nicolas Seinlet wrote:
> Hi,
> 
> I've tried some tests, by generating various datas in 
> the res_currency_rate table.
> 
> If I generate res_currency_rate rows for unsused currencies, this 
> doesn't influence the execution time.
> if I generate more res_currency_rate for used currencies, this slower 
> the query.
> If I generate 100 rates, on a one per day basis, I get an execution time 
> of 4.5 seconds
> If I generate 100 rates, on a one per 3 days basis, execution time drops 
> to 4 seconds
> If I generate 100 rates, on a one per 6 days basis, execution time drops 
> to 3.8 seconds.

I took another look at the query and got to wondering about the snippet 
below:


LEFT JOIN (SELECT
                 r.currency_id, COALESCE(r.company_id, c.id) AS 
company_id, r.rate,
                 r.name AS date_start, r.date_end
            FROM
                 res_currency_rate r
            JOIN
                 res_company c
            ON
                 r.company_id IS NULL OR r.company_id = c.id
            ) cr
ON
     cr.currency_id = pp.currency_id
AND
     (cr.company_id = s.company_id OR cr.company_id IS NULL)
AND
     daterange(cr.date_start, COALESCE(cr.date_end, now()::date)) @> 
COALESCE(s.date_order::timestamp with time zone, now())::date

In particular:

FROM
                 res_currency_rate r
            JOIN
                 res_company c
            ON
                 r.company_id IS NULL OR r.company_id = c.id

Are there NULL company_id values in res_currency_rate?

If so I am trying to figure out how the JOIN to res_company would work 
in that situation.

What happens if eliminate the r.company_id IS NULL?

> 
> I've executed following tests many times, to avoid cache or buffers 
> related issues, each time after vacuuming table, ...
> 
> The execution time with a join of type daterange :
> res_currency_rate.currency_id = pp.currency_id AND 
> (res_currency_rate.company_id = s.company_id OR 
> res_currency_rate.company_id IS NULL) AND 
> daterange(res_currency_rate.name <http://res_currency_rate.name>, 
> res_currency_rate.date_end) @> COALESCE(s.date_order::timestamp with 
> time zone, now())::date
> 
> is slower than the date comparison equivalent:
> res_currency_rate.currency_id = pp.currency_id AND 
> (res_currency_rate.company_id = s.company_id OR 
> res_currency_rate.company_id IS NULL) AND res_currency_rate.name 
> <http://res_currency_rate.name> <= COALESCE(s.date_order::timestamp with 
> time zone, now()) AND (res_currency_rate.date_end IS NULL OR 
> res_currency_rate.date_end > COALESCE(s.date_order::timestamp with time 
> zone, now()))
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Sort is generating rows

From
Nicolas Seinlet
Date:
Hi,

a currency rate can have no company, and is then applicable to currencies which have no rate specific for the company.

Le dim. 10 juin 2018 à 17:24, Adrian Klaver <adrian.klaver@aklaver.com> a écrit :
On 06/05/2018 07:58 AM, Nicolas Seinlet wrote:
> Hi,
>
> I've tried some tests, by generating various datas in
> the res_currency_rate table.
>
> If I generate res_currency_rate rows for unsused currencies, this
> doesn't influence the execution time.
> if I generate more res_currency_rate for used currencies, this slower
> the query.
> If I generate 100 rates, on a one per day basis, I get an execution time
> of 4.5 seconds
> If I generate 100 rates, on a one per 3 days basis, execution time drops
> to 4 seconds
> If I generate 100 rates, on a one per 6 days basis, execution time drops
> to 3.8 seconds.

I took another look at the query and got to wondering about the snippet
below:


LEFT JOIN (SELECT
                 r.currency_id, COALESCE(r.company_id, c.id) AS
company_id, r.rate,
                 r.name AS date_start, r.date_end
            FROM
                 res_currency_rate r
            JOIN
                 res_company c
            ON
                 r.company_id IS NULL OR r.company_id = c.id
            ) cr
ON
     cr.currency_id = pp.currency_id
AND
     (cr.company_id = s.company_id OR cr.company_id IS NULL)
AND
     daterange(cr.date_start, COALESCE(cr.date_end, now()::date)) @>
COALESCE(s.date_order::timestamp with time zone, now())::date

In particular:

FROM
                 res_currency_rate r
            JOIN
                 res_company c
            ON
                 r.company_id IS NULL OR r.company_id = c.id

Are there NULL company_id values in res_currency_rate?

If so I am trying to figure out how the JOIN to res_company would work
in that situation.

What happens if eliminate the r.company_id IS NULL?

>
> I've executed following tests many times, to avoid cache or buffers
> related issues, each time after vacuuming table, ...
>
> The execution time with a join of type daterange :
> res_currency_rate.currency_id = pp.currency_id AND
> (res_currency_rate.company_id = s.company_id OR
> res_currency_rate.company_id IS NULL) AND
> daterange(res_currency_rate.name <http://res_currency_rate.name>,
> res_currency_rate.date_end) @> COALESCE(s.date_order::timestamp with
> time zone, now())::date
>
> is slower than the date comparison equivalent:
> res_currency_rate.currency_id = pp.currency_id AND
> (res_currency_rate.company_id = s.company_id OR
> res_currency_rate.company_id IS NULL) AND res_currency_rate.name
> <http://res_currency_rate.name> <= COALESCE(s.date_order::timestamp with
> time zone, now()) AND (res_currency_rate.date_end IS NULL OR
> res_currency_rate.date_end > COALESCE(s.date_order::timestamp with time
> zone, now()))
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Sort is generating rows

From
Adrian Klaver
Date:
On 06/10/2018 11:47 PM, Nicolas Seinlet wrote:
> Hi,
> 
> a currency rate can have no company, and is then applicable to 
> currencies which have no rate specific for the company.
> 

I see. So what happens if, for testing purposes, you eliminate the

r.company_id IS NULL OR

part?



-- 
Adrian Klaver
adrian.klaver@aklaver.com