Re: Query plan: SELECT vs INSERT from same select - Mailing list pgsql-general

From Alban Hertroys
Subject Re: Query plan: SELECT vs INSERT from same select
Date
Msg-id 346DB5C5-3B26-4FA6-9B17-93FDED88A738@gmail.com
Whole thread Raw
In response to Query plan: SELECT vs INSERT from same select  (Alexander Voytsekhovskyy <young.inbox@gmail.com>)
List pgsql-general
> On 23 Jul 2019, at 22:29, Alexander Voytsekhovskyy <young.inbox@gmail.com> wrote:
>
> I have quite complicated query:
>
> SELECT axis_x1, axis_y1, SUM(delivery_price)  as v_1 FROM (
> SELECT to_char(delivery_data.delivery_date, 'YYYY-MM') as axis_x1, clients.id_client as axis_y1, delivery_data.amount
*production_price.price * groups.discount as delivery_price 
>
> FROM delivery_data
> JOIN client_tt ON (client_tt.id_client_tt = delivery_data.id_client_tt)
> JOIN clients ON (client_tt.id_client = clients.id_client)
> JOIN production ON (production.id = delivery_data.id_product)
> JOIN groups ON (groups.id = delivery_data.delivery_group_id AND client_tt.id_group = groups.id AND groups.id =
clients.id_group) 

Are client_tt.id_group and clients.id_group ever different from each other? It looks like you might have redundant
informationthere, but... If they are guaranteed to be the same then you don’t need the JOIN to clients, which would
bothremove a JOIN and reduce the complexity of the JOIN condition on groups. 

Or (assuming the group id’s are indeed supposed to be equal), you could
 JOIN clients ON (client_tt.id_client = clients.id_client AND client_tt.id_group = clients.id_group)
instead of putting that condition within the JOIN condition on groups.

I don’t think either option will make a huge difference (the first probably more than the second, as it reduces an
entirejoin), but it could be enough to help the database figure out a better plan. 

> LEFT JOIN production_price on (delivery_data.id_product = production_price.id_production AND groups.price_list_id =
production_price.price_list_idAND delivery_data.delivery_date BETWEEN production_price.date_from AND
production_price.date_to) 
>
> WHERE delivery_data.delivery_date between '2019-03-01' AND '2019-06-30'
> AND delivery_data.delivery_group_id IN (...short list of values...)
> AND delivery_data.id_product IN ()) AS tmpsource

You don’t have a price if your goods weren’t produced in the delivery window you set? Or do you have goods that get
deliveredwithout having a price? 

You seem to be using this query for a report on nett sales by month, but I have my doubts whether that LEFT JOIN, and
especiallythe condition on the production date window, is really what you want: Your formula for delivery_price
includesthe price column from that LEFT JOIN, so you’re going to get 0 values when there is no production_price record
inyour delivery-window, resulting in a SUM that’s too low if the product was produced before (or after, but that seems
unlikely)the delivery window. 

> WHERE TRUE

This line is unnecessary.

> GROUP BY GROUPING SETS ((axis_x1, axis_y1), (axis_x1), (axis_y1), ())

Apparently (I’m new to these statements), CUBE (axis_x1, axis_y1) is a shorthand for the above. They seem to have been
introducedat the same time (in 9.6?). See:
https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-GROUPING-SETS

> It runs well, took 1s and returns 4000 rows.

I won’t go into the performance issue ash this point, other more knowledgeable people already did.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




pgsql-general by date:

Previous
From: Imre Samu
Date:
Subject: Re: partition table slow planning
Next
From: Thomas Tignor
Date:
Subject: postgres 9.5 DB corruption