Thread: Problems pushing down WHERE-clause to underlying view

Problems pushing down WHERE-clause to underlying view

From
Nicklas Avén
Date:
Hi


We have a system with 2 layers of views. It is about forestry.

The first layer contains the logic like grouping volumes in logs 
together to stems or harvesting areas and joining species names to codes 
and things like that.

The second layer just joins this underlying views to a table with user 
ids and machine ids. So, when used by the application there is a where 
clause containing the user id which gives access to the correct data

in the underlying view based on machine id.


The underlying view in this case can return approx 22000 rows, grouped 
from approx 8 million logs by harvest date, harvest object and so on.


The problem is that it always calculates all those 22000 rows even if 
the user id I use only gives 250 rows.

So, the query uses 4 seconds instead of under 100 ms.


I have tried this on 2 servers and my laptop, PostgreSQL 9.6 and 10 and 
get the same issue.


I have tried to pick the query apart to understand what is happening.


First, the underlying view looks like this except I removed some fields 
that doesn't affect the case to save some space:

CREATE OR REPLACE VIEW underlying_view AS
  SELECT
     l.machine_key,
     o.object_name,
     o.sub_object_name,
     s.species_group_name,
     p.product_group_name ,
     l.m3_sub AS volume_m3sub,
     l.number_of_logs,
     mi.basemachine_manufacturer,
     mi.basemachine_model
    FROM
     (
        SELECT
         hl.contractor_id,
         hl.machine_key,
         hl.operator_key,
         hl.object_key,
         hl.sub_object_key,
         date(hl.harvest_date) AS harvest_date, --this is timestamptz 
since we use the time in other places
         hl.species_group_key,
         hl.product_key,
         sum(hl.m3_sub) AS m3_sub,
         count(*) AS number_of_logs
     FROM version_union_tables_r02.harvester_logs hl
     GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key, 
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), 
hl.species_group_key, hl.product_key
     ) l
      LEFT JOIN version_union_tables_r02.machine_info mi ON 
l.machine_key::text = mi.machine_key::text
      LEFT JOIN version_union_tables_r02.objects o ON 
l.machine_key::text = o.machine_key::text AND l.object_key = 
o.object_key AND l.sub_object_key = o.sub_object_key
      LEFT JOIN version_union_tables_r02.products p ON 
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
      LEFT JOIN version_union_tables_r02.species s ON 
l.machine_key::text = s.machine_key::text AND l.species_group_key = 
s.species_group_key;


In the next level is a view that looks like this:


CREATE OR REPLACE VIEW top_level_view AS
  SELECT *
    FROM underlying_view a
      JOIN contractor_access b ON a.machine_key = b.machine_key WHERE 
b.active <> 0;


If I query this top_level_view like:

SELECT * FROM top_level_view WHERE user_id = 'name@email.address';

I haven't succeeded to avoid the underlying view to compute the full 
dataset.


The user_id 'name@email.address' returns 1 machine_key from 
contractor_access table that has any hits in the underlying view (4 in 
total but 3 machines are not represented in the underlying view)

We call the machine that we get from contractor_id table 'machine1'

Here is what I have tried to find out when the plan changes:


1) SELECT * FROM underlying_view WHERE machine_key = 'machine1'; -- 
returns 250 rows in approx 100 ms, so, works as expected


2) select * from (select 'machine1' machine_key) a, underlying_view b 
where a.machine_key = b.machine_key; --same as above, works as expected


3) select * from (select * from contractor_access where user_id = 
'name@email.address') a,
underlying_view b where a.machine_key = b.machine_key;      -- Here I am 
hit. this returns the same 250 rows, but in over 4 seconds


/*So I thought I should try to force down the machine_key to the 
underlying view with lateral like this*/


4) select * from (select * from contractor_access where user_id = 
'name@email.address') a,
lateral (select * from underlying_view where machine_key = a.machine_key) b;

But this doesn't work either. It returns the same 250 rows in approx 4 
seconds.


My question is, is there some trick to force the planner to push down 
the machine_key.

I cannot understand what is fooling the planner.

The table is analyzed, I have tried on several machines, so I do not 
think it is miss leading statistics.


I haven't done any configuration more than tried with 
max_parallel_workers_per_gather to 0 since the workers makes it harder 
to understand what is happening.


Here is the quer plan on query number 3 above:

EXPLAIN select * from (select * from contractor_access where user_id = 
'name@email.address') a,
underlying_view b where a.machine_key = b.machine_key;

Aggregate  (cost=543839.03..543839.04 rows=1 width=8)
   ->  Hash Join  (cost=395402.74..543798.72 rows=16123 width=0)
         Hash Cond: ((hl.machine_key)::text = contractor_access.machine_key)
         ->  Hash Left Join  (cost=395395.10..533563.59 rows=806147 
width=400)
               Hash Cond: (((hl.machine_key)::text = 
(s.machine_key)::text) AND (hl.species_group_key = s.species_group_key))
               ->  Hash Left Join  (cost=395380.73..485122.31 
rows=806147 width=32)
                     Hash Cond: (((hl.machine_key)::text = 
(p.machine_key)::text) AND (hl.product_key = p.product_key))
                     ->  Hash Left Join (cost=395320.48..444697.18 
rows=806147 width=36)
                           Hash Cond: (((hl.machine_key)::text = 
(o.machine_key)::text) AND (hl.object_key = o.object_key) AND 
(hl.sub_object_key = o.sub_object_key))
                           ->  Hash Left Join (cost=395257.01..417426.05 
rows=806147 width=36)
                                 Hash Cond: ((hl.machine_key)::text = 
(mi.machine_key)::text)
                                 Join Filter: (((hl.contractor_id)::text 
= (mi.contractor_id)::text) OR ((hl.contractor_id IS NULL) AND 
(mi.contractor_id IS NULL)))
                                 ->  HashAggregate 
(cost=395254.66..403316.13 rows=806147 width=86)
                                       Group Key: hl.machine_key, 
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
hl.harvest_date, hl.species_group_key, hl.product_key
                                       ->  Seq Scan on harvester_logs 
hl  (cost=0.00..234025.22 rows=8061472 width=54)
                                 ->  Hash  (cost=1.60..1.60 rows=60 
width=35)
                                       ->  Seq Scan on machine_info mi  
(cost=0.00..1.60 rows=60 width=35)
                           ->  Hash  (cost=33.26..33.26 rows=1726 width=23)
                                 ->  Seq Scan on objects o 
(cost=0.00..33.26 rows=1726 width=23)
                     ->  Hash  (cost=37.90..37.90 rows=1490 width=29)
                           ->  Seq Scan on products p (cost=0.00..37.90 
rows=1490 width=29)
               ->  Hash  (cost=7.55..7.55 rows=455 width=31)
                     ->  Seq Scan on species s  (cost=0.00..7.55 
rows=455 width=31)
         ->  Hash  (cost=7.59..7.59 rows=4 width=21)
               ->  Seq Scan on contractor_access  (cost=0.00..7.59 
rows=4 width=21)
                     Filter: (t4e_contractor_id = 
'name@email.address'::text)



Thanks


Nicklas Avén










Re: Problems pushing down WHERE-clause to underlying view

From
Adrian Klaver
Date:
On 2/15/19 7:28 AM, Nicklas Avén wrote:
> Hi
> 
> 
> We have a system with 2 layers of views. It is about forestry.
> 
> The first layer contains the logic like grouping volumes in logs 
> together to stems or harvesting areas and joining species names to codes 
> and things like that.
> 
> The second layer just joins this underlying views to a table with user 
> ids and machine ids. So, when used by the application there is a where 
> clause containing the user id which gives access to the correct data
> 
> in the underlying view based on machine id.
> 
> 
> The underlying view in this case can return approx 22000 rows, grouped 
> from approx 8 million logs by harvest date, harvest object and so on.
> 
> 
> The problem is that it always calculates all those 22000 rows even if 
> the user id I use only gives 250 rows.
> 
> So, the query uses 4 seconds instead of under 100 ms.

https://www.postgresql.org/docs/10/sql-createview.html

"CREATE VIEW defines a view of a query. The view is not physically 
materialized. Instead, the query is run every time the view is 
referenced in a query."


Might want to look at materialized view:
https://www.postgresql.org/docs/10/sql-creatematerializedview.html

"CREATE MATERIALIZED VIEW is similar to CREATE TABLE AS, except that it 
also remembers the query used to initialize the view, so that it can be 
refreshed later upon demand. A materialized view has many of the same 
properties as a table, but there is no support for temporary 
materialized views or automatic generation of OIDs."


I would also suggest running the EXPLAIN below with ANALYZE so actual 
timings are returned. Also try:

SELECT
    *
FROM
    underlying_view AS b
JOIN
    contractor_access AS b
ON
    a.machine_key = b.machine_key
WHERE
    user_id = 'name@email.address'
> 
> 
> I have tried this on 2 servers and my laptop, PostgreSQL 9.6 and 10 and 
> get the same issue.
> 
> 
> I have tried to pick the query apart to understand what is happening.
> 
> 
> First, the underlying view looks like this except I removed some fields 
> that doesn't affect the case to save some space:
> 
> CREATE OR REPLACE VIEW underlying_view AS
>   SELECT
>      l.machine_key,
>      o.object_name,
>      o.sub_object_name,
>      s.species_group_name,
>      p.product_group_name ,
>      l.m3_sub AS volume_m3sub,
>      l.number_of_logs,
>      mi.basemachine_manufacturer,
>      mi.basemachine_model
>     FROM
>      (
>         SELECT
>          hl.contractor_id,
>          hl.machine_key,
>          hl.operator_key,
>          hl.object_key,
>          hl.sub_object_key,
>          date(hl.harvest_date) AS harvest_date, --this is timestamptz 
> since we use the time in other places
>          hl.species_group_key,
>          hl.product_key,
>          sum(hl.m3_sub) AS m3_sub,
>          count(*) AS number_of_logs
>      FROM version_union_tables_r02.harvester_logs hl
>      GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key, 
> hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), 
> hl.species_group_key, hl.product_key
>      ) l
>       LEFT JOIN version_union_tables_r02.machine_info mi ON 
> l.machine_key::text = mi.machine_key::text
>       LEFT JOIN version_union_tables_r02.objects o ON 
> l.machine_key::text = o.machine_key::text AND l.object_key = 
> o.object_key AND l.sub_object_key = o.sub_object_key
>       LEFT JOIN version_union_tables_r02.products p ON 
> l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
>       LEFT JOIN version_union_tables_r02.species s ON 
> l.machine_key::text = s.machine_key::text AND l.species_group_key = 
> s.species_group_key;
> 
> 
> In the next level is a view that looks like this:
> 
> 
> CREATE OR REPLACE VIEW top_level_view AS
>   SELECT *
>     FROM underlying_view a
>       JOIN contractor_access b ON a.machine_key = b.machine_key WHERE 
> b.active <> 0;
> 
> 
> If I query this top_level_view like:
> 
> SELECT * FROM top_level_view WHERE user_id = 'name@email.address';
> 
> I haven't succeeded to avoid the underlying view to compute the full 
> dataset.
> 
> 
> The user_id 'name@email.address' returns 1 machine_key from 
> contractor_access table that has any hits in the underlying view (4 in 
> total but 3 machines are not represented in the underlying view)
> 
> We call the machine that we get from contractor_id table 'machine1'
> 
> Here is what I have tried to find out when the plan changes:
> 
> 
> 1) SELECT * FROM underlying_view WHERE machine_key = 'machine1'; -- 
> returns 250 rows in approx 100 ms, so, works as expected
> 
> 
> 2) select * from (select 'machine1' machine_key) a, underlying_view b 
> where a.machine_key = b.machine_key; --same as above, works as expected
> 
> 
> 3) select * from (select * from contractor_access where user_id = 
> 'name@email.address') a,
> underlying_view b where a.machine_key = b.machine_key;      -- Here I am 
> hit. this returns the same 250 rows, but in over 4 seconds
> 
> 
> /*So I thought I should try to force down the machine_key to the 
> underlying view with lateral like this*/
> 
> 
> 4) select * from (select * from contractor_access where user_id = 
> 'name@email.address') a,
> lateral (select * from underlying_view where machine_key = 
> a.machine_key) b;
> 
> But this doesn't work either. It returns the same 250 rows in approx 4 
> seconds.
> 
> 
> My question is, is there some trick to force the planner to push down 
> the machine_key.
> 
> I cannot understand what is fooling the planner.
> 
> The table is analyzed, I have tried on several machines, so I do not 
> think it is miss leading statistics.
> 
> 
> I haven't done any configuration more than tried with 
> max_parallel_workers_per_gather to 0 since the workers makes it harder 
> to understand what is happening.
> 
> 
> Here is the quer plan on query number 3 above:
> 
> EXPLAIN select * from (select * from contractor_access where user_id = 
> 'name@email.address') a,
> underlying_view b where a.machine_key = b.machine_key;
> 
> Aggregate  (cost=543839.03..543839.04 rows=1 width=8)
>    ->  Hash Join  (cost=395402.74..543798.72 rows=16123 width=0)
>          Hash Cond: ((hl.machine_key)::text = 
> contractor_access.machine_key)
>          ->  Hash Left Join  (cost=395395.10..533563.59 rows=806147 
> width=400)
>                Hash Cond: (((hl.machine_key)::text = 
> (s.machine_key)::text) AND (hl.species_group_key = s.species_group_key))
>                ->  Hash Left Join  (cost=395380.73..485122.31 
> rows=806147 width=32)
>                      Hash Cond: (((hl.machine_key)::text = 
> (p.machine_key)::text) AND (hl.product_key = p.product_key))
>                      ->  Hash Left Join (cost=395320.48..444697.18 
> rows=806147 width=36)
>                            Hash Cond: (((hl.machine_key)::text = 
> (o.machine_key)::text) AND (hl.object_key = o.object_key) AND 
> (hl.sub_object_key = o.sub_object_key))
>                            ->  Hash Left Join (cost=395257.01..417426.05 
> rows=806147 width=36)
>                                  Hash Cond: ((hl.machine_key)::text = 
> (mi.machine_key)::text)
>                                  Join Filter: (((hl.contractor_id)::text 
> = (mi.contractor_id)::text) OR ((hl.contractor_id IS NULL) AND 
> (mi.contractor_id IS NULL)))
>                                  ->  HashAggregate 
> (cost=395254.66..403316.13 rows=806147 width=86)
>                                        Group Key: hl.machine_key, 
> hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
> hl.harvest_date, hl.species_group_key, hl.product_key
>                                        ->  Seq Scan on harvester_logs 
> hl  (cost=0.00..234025.22 rows=8061472 width=54)
>                                  ->  Hash  (cost=1.60..1.60 rows=60 
> width=35)
>                                        ->  Seq Scan on machine_info mi 
> (cost=0.00..1.60 rows=60 width=35)
>                            ->  Hash  (cost=33.26..33.26 rows=1726 width=23)
>                                  ->  Seq Scan on objects o 
> (cost=0.00..33.26 rows=1726 width=23)
>                      ->  Hash  (cost=37.90..37.90 rows=1490 width=29)
>                            ->  Seq Scan on products p (cost=0.00..37.90 
> rows=1490 width=29)
>                ->  Hash  (cost=7.55..7.55 rows=455 width=31)
>                      ->  Seq Scan on species s  (cost=0.00..7.55 
> rows=455 width=31)
>          ->  Hash  (cost=7.59..7.59 rows=4 width=21)
>                ->  Seq Scan on contractor_access  (cost=0.00..7.59 
> rows=4 width=21)
>                      Filter: (t4e_contractor_id = 
> 'name@email.address'::text)
> 
> 
> 
> Thanks
> 
> 
> Nicklas Avén
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Problems pushing down WHERE-clause to underlying view

From
Nicklas Avén
Date:
On 2/15/19 5:06 PM, Adrian Klaver wrote:
 > On 2/15/19 7:28 AM, Nicklas Avén wrote:
 >> Hi
 >>
 >> The problem is that it always calculates all those 22000 rows even 
if the user id I use only gives 250 rows.
 >>
 >> So, the query uses 4 seconds instead of under 100 ms.
 >
 > https://www.postgresql.org/docs/10/sql-createview.html
 >
 > "CREATE VIEW defines a view of a query. The view is not physically 
materialized. Instead, the query is run every time the view is 
referenced in a query."

 >



Sorry, I must have expressed what I mean bad. Of course a view is not 
materialized.
I will explain without views what I mean here below



 > Might want to look at materialized view:
 > https://www.postgresql.org/docs/10/sql-creatematerializedview.html
 >
 > "CREATE MATERIALIZED VIEW is similar to CREATE TABLE AS, except that 
it also remembers the query used to initialize the view, so that it can 
be refreshed later upon demand. A materialized view has many of the same 
properties as a table, but there is no support for temporary 
materialized views or automatic generation of OIDs."

 >


No, materialized views is not an option. We get some data into those 
tables daily. Recalculating the full dataset on a lot of views like this 
doesn't make sense.
Instead we have tables maintained with processed new data. But I want 
this last part of logic on top as views for flexibility,
to not need cached tables for each possible type of grouping that we need.
Started out with materialized views and it didn't work out well.




 >
 > I would also suggest running the EXPLAIN below with ANALYZE so actual 
timings are returned. Also try:
 >
 > SELECT
 >     *
 > FROM
 >     underlying_view AS b
 > JOIN
 >     contractor_access AS b
 > ON
 >     a.machine_key = b.machine_key
 > WHERE
 >     user_id = 'name@email.address'
 >>

Sorry again, I didn't mention. This I have tried this since this is what 
the top level view do.
So first step when trying to understand this was (of course) to apply 
the where-clause directly to the query



So, let's do that also on the underlying query (view) .


Here I have 2 queries, where I apply the where clause directly to the 
query in the underlying view

(joining the contractor_access table directly on that query).

The first takes 30-40 ms ms and returns the same 250 rows as the second.
In the first I use the machine_key in the where clause.
In the second query that takes about 16 seconds to return the same 250 
rows I use the user_id in the contractor_access table.

I have also cleaned up the contractor_access table. So there is only 1 
row now, with my email as user_id and the same machine_key as used in 
the first query.



Query 1:


EXPLAIN ANALYZE
SELECT
     l.machine_key,
     o.object_name,
     o.sub_object_name,
     o.object_user_id,
     o.sub_object_user_id,
     o.start_date AS object_start_date,
     s.species_group_name,
     p.product_group_name,
     l.m3_sub AS volume_m3sub,
     l.number_of_logs,
     mi.basemachine_manufacturer,
     mi.basemachine_model
    FROM ( SELECT hl.contractor_id,
             hl.machine_key,
             hl.operator_key,
             hl.object_key,
             hl.sub_object_key,
             date(hl.harvest_date) AS harvest_date,
             hl.species_group_key,
             hl.product_key,
             sum(hl.m3_sub) AS m3_sub,
             count(*) AS number_of_logs
            FROM version_union_tables_r02.harvester_logs hl
           GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key, 
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), 
hl.species_group_key, hl.product_key) l
      LEFT JOIN version_union_tables_r02.machine_info mi ON 
l.machine_key::text = mi.machine_key::text
      LEFT JOIN version_union_tables_r02.objects o ON 
l.machine_key::text = o.machine_key::text AND l.object_key = 
o.object_key AND l.sub_object_key = o.sub_object_key
      LEFT JOIN version_union_tables_r02.products p ON 
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
      LEFT JOIN version_union_tables_r02.species s ON 
l.machine_key::text = s.machine_key::text AND l.species_group_key = 
s.species_group_key
join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
where l.machine_key = '887655635442600'
;

which results in this query plan

Nested Loop  (cost=61865.25..65302.20 rows=22624 width=122) (actual 
time=25.804..27.134 rows=250 loops=1)
   ->  Seq Scan on contractor_access ci  (cost=0.00..1.01 rows=1 
width=19) (actual time=0.009..0.010 rows=1 loops=1)
         Filter: (machine_key = '887655635442600'::text)
   ->  Hash Left Join  (cost=61865.25..65074.95 rows=22624 width=122) 
(actual time=25.793..26.959 rows=250 loops=1)
         Hash Cond: (((hl.machine_key)::text = (s.machine_key)::text) 
AND (hl.species_group_key = s.species_group_key))
         ->  Hash Left Join  (cost=61854.55..64263.92 rows=22624 
width=120) (actual time=25.755..26.763 rows=250 loops=1)
               Hash Cond: (((hl.machine_key)::text = 
(p.machine_key)::text) AND (hl.product_key = p.product_key))
               ->  Hash Left Join  (cost=61815.97..63145.14 rows=22624 
width=118) (actual time=25.706..26.543 rows=250 loops=1)
                     Hash Cond: (((hl.machine_key)::text = 
(o.machine_key)::text) AND (hl.object_key = o.object_key) AND 
(hl.sub_object_key = o.sub_object_key))
                     ->  Hash Left Join  (cost=61799.78..62619.90 
rows=22624 width=65) (actual time=25.668..26.327 rows=250 loops=1)
                           Hash Cond: ((hl.machine_key)::text = 
(mi.machine_key)::text)
                           ->  HashAggregate (cost=61796.99..62079.79 
rows=22624 width=69) (actual time=25.627..26.132 rows=250 loops=1)
                                 Group Key: hl.machine_key, 
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
date(hl.harvest_date), hl.species_group_key, hl.product_key
                                 ->  Bitmap Heap Scan on harvester_logs 
hl  (cost=570.14..61224.14 rows=22914 width=61) (actual 
time=0.909..11.573 rows=24151 loops=1)
                                       Recheck Cond: 
((machine_key)::text = '887655635442600'::text)
                                       Heap Blocks: exact=538
                                       ->  Bitmap Index Scan on 
version_union_tables_r02_harvester_logs_machine_key (cost=0.00..564.41 
rows=22914 width=0) (actual time=0.870..0.870 rows=24151 loops=1)
                                             Index Cond: 
((machine_key)::text = '887655635442600'::text)
                           ->  Hash  (cost=2.77..2.77 rows=1 width=38) 
(actual time=0.023..0.023 rows=1 loops=1)
                                 Buckets: 1024  Batches: 1  Memory 
Usage: 9kB
                                 ->  Seq Scan on machine_info mi 
(cost=0.00..2.77 rows=1 width=38) (actual time=0.018..0.019 rows=1 loops=1)
                                       Filter: ((machine_key)::text = 
'887655635442600'::text)
                                       Rows Removed by Filter: 61
                     ->  Hash  (cost=16.12..16.12 rows=4 width=84) 
(actual time=0.025..0.026 rows=3 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Bitmap Heap Scan on objects o 
(cost=4.31..16.12 rows=4 width=84) (actual time=0.020..0.021 rows=3 loops=1)
                                 Recheck Cond: ((machine_key)::text = 
'887655635442600'::text)
                                 Heap Blocks: exact=1
                                 ->  Bitmap Index Scan on 
version_union_tables_r02_objects_machine_key  (cost=0.00..4.31 rows=4 
width=0) (actual time=0.015..0.015 rows=3 loops=1)
                                       Index Cond: ((machine_key)::text 
= '887655635442600'::text)
               ->  Hash  (cost=38.19..38.19 rows=26 width=35) (actual 
time=0.037..0.037 rows=26 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 10kB
                     ->  Bitmap Heap Scan on products p 
(cost=4.48..38.19 rows=26 width=35) (actual time=0.015..0.026 rows=26 
loops=1)
                           Recheck Cond: ((machine_key)::text = 
'887655635442600'::text)
                           Heap Blocks: exact=1
                           ->  Bitmap Index Scan on 
version_union_tables_r02_products_machine_key  (cost=0.00..4.47 rows=26 
width=0) (actual time=0.009..0.009 rows=26 loops=1)
                                 Index Cond: ((machine_key)::text = 
'887655635442600'::text)
         ->  Hash  (cost=10.52..10.52 rows=12 width=37) (actual 
time=0.028..0.029 rows=12 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  Bitmap Heap Scan on species s (cost=4.37..10.52 
rows=12 width=37) (actual time=0.016..0.021 rows=12 loops=1)
                     Recheck Cond: ((machine_key)::text = 
'887655635442600'::text)
                     Heap Blocks: exact=1
                     ->  Bitmap Index Scan on 
version_union_tables_r02_species_machine_key  (cost=0.00..4.36 rows=12 
width=0) (actual time=0.008..0.008 rows=12 loops=1)
                           Index Cond: ((machine_key)::text = 
'887655635442600'::text)
Planning time: 0.434 ms
Execution time: 27.370 ms



Next query, the slow one that calculates the whole dataset:


EXPLAIN ANALYZE
SELECT
     l.machine_key,
     o.object_name,
     o.sub_object_name,
     o.object_user_id,
     o.sub_object_user_id,
     o.start_date AS object_start_date,
     s.species_group_name,
     p.product_group_name,
     l.m3_sub AS volume_m3sub,
     l.number_of_logs,
     mi.basemachine_manufacturer,
     mi.basemachine_model
    FROM ( SELECT hl.contractor_id,
             hl.machine_key,
             hl.operator_key,
             hl.object_key,
             hl.sub_object_key,
             date(hl.harvest_date) AS harvest_date,
             hl.species_group_key,
             hl.product_key,
             sum(hl.m3_sub) AS m3_sub,
             count(*) AS number_of_logs
            FROM version_union_tables_r02.harvester_logs hl
           GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key, 
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), 
hl.species_group_key, hl.product_key) l
      LEFT JOIN version_union_tables_r02.machine_info mi ON 
l.machine_key::text = mi.machine_key::text
      LEFT JOIN version_union_tables_r02.objects o ON 
l.machine_key::text = o.machine_key::text AND l.object_key = 
o.object_key AND l.sub_object_key = o.sub_object_key
      LEFT JOIN version_union_tables_r02.products p ON 
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
      LEFT JOIN version_union_tables_r02.species s ON 
l.machine_key::text = s.machine_key::text AND l.species_group_key = 
s.species_group_key
join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
where t4e_contractor_id = 'nicklas.aven@jordogskog.no';


results in this query plan:

Hash Left Join  (cost=1780026.09..2023556.15 rows=4044 width=122) 
(actual time=15860.900..15888.766 rows=250 loops=1)
   Hash Cond: (((hl.machine_key)::text = (o.machine_key)::text) AND 
(hl.object_key = o.object_key) AND (hl.sub_object_key = o.sub_object_key))
   ->  Merge Left Join  (cost=1779946.65..2023340.22 rows=4044 width=69) 
(actual time=15859.604..15887.287 rows=250 loops=1)
         Merge Cond: ((hl.machine_key)::text = (s.machine_key)::text)
         Join Filter: (hl.species_group_key = s.species_group_key)
         Rows Removed by Join Filter: 2750
         ->  Merge Left Join  (cost=1779915.71..2023136.40 rows=4044 
width=67) (actual time=15859.072..15884.912 rows=250 loops=1)
               Merge Cond: ((hl.machine_key)::text = (p.machine_key)::text)
               Join Filter: (hl.product_key = p.product_key)
               Rows Removed by Join Filter: 6250
               ->  Merge Left Join  (cost=1779788.20..2022471.20 
rows=4044 width=65) (actual time=15857.473..15879.504 rows=250 loops=1)
                     Merge Cond: ((hl.machine_key)::text = 
(mi.machine_key)::text)
                     ->  Merge Join  (cost=1779783.74..2022437.81 
rows=4044 width=48) (actual time=15857.359..15879.102 rows=250 loops=1)
                           Merge Cond: ((hl.machine_key)::text = 
ci.machine_key)
                           ->  GroupAggregate 
(cost=1779782.72..2012287.44 rows=808712 width=69) (actual 
time=15088.353..15878.172 rows=2683 loops=1)
                                 Group Key: hl.machine_key, 
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
(date(hl.harvest_date)), hl.species_group_key, hl.product_key
                                 ->  Sort (cost=1779782.72..1800000.52 
rows=8087121 width=61) (actual time=15088.336..15488.144 rows=942552 
loops=1)
                                       Sort Key: hl.machine_key, 
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
(date(hl.harvest_date)), hl.species_group_key, hl.product_key
                                       Sort Method: external merge Disk: 
543456kB
                                       ->  Seq Scan on harvester_logs 
hl  (cost=0.00..243781.01 rows=8087121 width=61) (actual 
time=0.007..3169.984 rows=8084464 loops=1)
                           ->  Sort  (cost=1.02..1.03 rows=1 width=19) 
(actual time=0.019..0.020 rows=1 loops=1)
                                 Sort Key: ci.machine_key
                                 Sort Method: quicksort  Memory: 25kB
                                 ->  Seq Scan on contractor_access ci  
(cost=0.00..1.01 rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=1)
                                       Filter: (t4e_contractor_id = 
'nicklas.aven@jordogskog.no'::text)
                     ->  Sort  (cost=4.47..4.62 rows=62 width=38) 
(actual time=0.102..0.173 rows=266 loops=1)
                           Sort Key: mi.machine_key
                           Sort Method: quicksort  Memory: 30kB
                           ->  Seq Scan on machine_info mi 
(cost=0.00..2.62 rows=62 width=38) (actual time=0.008..0.030 rows=62
loops=1)
               ->  Sort  (cost=127.50..131.23 rows=1491 width=35) 
(actual time=1.205..3.071 rows=7204 loops=1)
                     Sort Key: p.machine_key
                     Sort Method: quicksort  Memory: 175kB
                     ->  Seq Scan on products p  (cost=0.00..48.91 
rows=1491 width=35) (actual time=0.004..0.497 rows=1491 loops=1)
         ->  Sort  (cost=30.94..32.09 rows=460 width=37) (actual 
time=0.385..1.233 rows=3259 loops=1)
               Sort Key: s.machine_key
               Sort Method: quicksort  Memory: 65kB
               ->  Seq Scan on species s  (cost=0.00..10.60 rows=460 
width=37) (actual time=0.004..0.146 rows=460 loops=1)
   ->  Hash  (cost=49.25..49.25 rows=1725 width=84) (actual 
time=1.286..1.287 rows=1690 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 190kB
         ->  Seq Scan on objects o  (cost=0.00..49.25 rows=1725 
width=84) (actual time=0.004..0.600 rows=1725 loops=1)
Planning time: 0.527 ms
Execution time: 15945.641 ms




Thanks

Nicklas


Re: Problems pushing down WHERE-clause to underlying view

From
Adrian Klaver
Date:
On 2/15/19 9:27 AM, Nicklas Avén wrote:
> 
> On 2/15/19 5:06 PM, Adrian Klaver wrote:
>  > On 2/15/19 7:28 AM, Nicklas Avén wrote:
>  >> Hi
>  >>
>  >> The problem is that it always calculates all those 22000 rows even 
> if the user id I use only gives 250 rows.
>  >>
>  >> So, the query uses 4 seconds instead of under 100 ms.
>  >
>  > https://www.postgresql.org/docs/10/sql-createview.html
>  >
>  > "CREATE VIEW defines a view of a query. The view is not physically 
> materialized. Instead, the query is run every time the view is 
> referenced in a query."
> 
>  >
> 
> 
> 
> Sorry, I must have expressed what I mean bad. Of course a view is not 
> materialized.
> I will explain without views what I mean here below
> 
> 
> 
>  > Might want to look at materialized view:
>  > https://www.postgresql.org/docs/10/sql-creatematerializedview.html
>  >
>  > "CREATE MATERIALIZED VIEW is similar to CREATE TABLE AS, except that 
> it also remembers the query used to initialize the view, so that it can 
> be refreshed later upon demand. A materialized view has many of the same 
> properties as a table, but there is no support for temporary 
> materialized views or automatic generation of OIDs."
> 
>  >
> 
> 
> No, materialized views is not an option. We get some data into those 
> tables daily. Recalculating the full dataset on a lot of views like this 
> doesn't make sense.
> Instead we have tables maintained with processed new data. But I want 
> this last part of logic on top as views for flexibility,
> to not need cached tables for each possible type of grouping that we need.
> Started out with materialized views and it didn't work out well.
> 
> 
> 
> 
>  >
>  > I would also suggest running the EXPLAIN below with ANALYZE so actual 
> timings are returned. Also try:
>  >
>  > SELECT
>  >     *
>  > FROM
>  >     underlying_view AS b
>  > JOIN
>  >     contractor_access AS b
>  > ON
>  >     a.machine_key = b.machine_key
>  > WHERE
>  >     user_id = 'name@email.address'
>  >>
> 
> Sorry again, I didn't mention. This I have tried this since this is what 
> the top level view do.
> So first step when trying to understand this was (of course) to apply 
> the where-clause directly to the query
> 
> 
> 
> So, let's do that also on the underlying query (view) .
> 
> 
> Here I have 2 queries, where I apply the where clause directly to the 
> query in the underlying view
> 
> (joining the contractor_access table directly on that query).
> 
> The first takes 30-40 ms ms and returns the same 250 rows as the second.
> In the first I use the machine_key in the where clause.
> In the second query that takes about 16 seconds to return the same 250 
> rows I use the user_id in the contractor_access table.
> 
> I have also cleaned up the contractor_access table. So there is only 1 
> row now, with my email as user_id and the same machine_key as used in 
> the first query.
> 

I have not had chance to fully go through all of below. Some 
questions/suggestions:

1) Thanks for the formatted queries. If I could make a suggestion, when 
aliasing could you include AS. It would make finding what l.* refers to 
easier for those of us with old eyes:)

2) t4e_contractor_id is in the shiny_adm.contractor_access table?
If not where?

3) What is the schema for shiny_adm.contractor_access?
In particular what indexes are on it?

> 
> 
> Query 1:
> 
> 
> EXPLAIN ANALYZE
> SELECT
>      l.machine_key,
>      o.object_name,
>      o.sub_object_name,
>      o.object_user_id,
>      o.sub_object_user_id,
>      o.start_date AS object_start_date,
>      s.species_group_name,
>      p.product_group_name,
>      l.m3_sub AS volume_m3sub,
>      l.number_of_logs,
>      mi.basemachine_manufacturer,
>      mi.basemachine_model
>     FROM ( SELECT hl.contractor_id,
>              hl.machine_key,
>              hl.operator_key,
>              hl.object_key,
>              hl.sub_object_key,
>              date(hl.harvest_date) AS harvest_date,
>              hl.species_group_key,
>              hl.product_key,
>              sum(hl.m3_sub) AS m3_sub,
>              count(*) AS number_of_logs
>             FROM version_union_tables_r02.harvester_logs hl
>            GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key, 
> hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), 
> hl.species_group_key, hl.product_key) l
>       LEFT JOIN version_union_tables_r02.machine_info mi ON 
> l.machine_key::text = mi.machine_key::text
>       LEFT JOIN version_union_tables_r02.objects o ON 
> l.machine_key::text = o.machine_key::text AND l.object_key = 
> o.object_key AND l.sub_object_key = o.sub_object_key
>       LEFT JOIN version_union_tables_r02.products p ON 
> l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
>       LEFT JOIN version_union_tables_r02.species s ON 
> l.machine_key::text = s.machine_key::text AND l.species_group_key = 
> s.species_group_key
> join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
> where l.machine_key = '887655635442600'
> ;
> 
> which results in this query plan
> 
> Nested Loop  (cost=61865.25..65302.20 rows=22624 width=122) (actual 
> time=25.804..27.134 rows=250 loops=1)
>    ->  Seq Scan on contractor_access ci  (cost=0.00..1.01 rows=1 
> width=19) (actual time=0.009..0.010 rows=1 loops=1)
>          Filter: (machine_key = '887655635442600'::text)
>    ->  Hash Left Join  (cost=61865.25..65074.95 rows=22624 width=122) 
> (actual time=25.793..26.959 rows=250 loops=1)
>          Hash Cond: (((hl.machine_key)::text = (s.machine_key)::text) 
> AND (hl.species_group_key = s.species_group_key))
>          ->  Hash Left Join  (cost=61854.55..64263.92 rows=22624 
> width=120) (actual time=25.755..26.763 rows=250 loops=1)
>                Hash Cond: (((hl.machine_key)::text = 
> (p.machine_key)::text) AND (hl.product_key = p.product_key))
>                ->  Hash Left Join  (cost=61815.97..63145.14 rows=22624 
> width=118) (actual time=25.706..26.543 rows=250 loops=1)
>                      Hash Cond: (((hl.machine_key)::text = 
> (o.machine_key)::text) AND (hl.object_key = o.object_key) AND 
> (hl.sub_object_key = o.sub_object_key))
>                      ->  Hash Left Join  (cost=61799.78..62619.90 
> rows=22624 width=65) (actual time=25.668..26.327 rows=250 loops=1)
>                            Hash Cond: ((hl.machine_key)::text = 
> (mi.machine_key)::text)
>                            ->  HashAggregate (cost=61796.99..62079.79 
> rows=22624 width=69) (actual time=25.627..26.132 rows=250 loops=1)
>                                  Group Key: hl.machine_key, 
> hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
> date(hl.harvest_date), hl.species_group_key, hl.product_key
>                                  ->  Bitmap Heap Scan on harvester_logs 
> hl  (cost=570.14..61224.14 rows=22914 width=61) (actual 
> time=0.909..11.573 rows=24151 loops=1)
>                                        Recheck Cond: 
> ((machine_key)::text = '887655635442600'::text)
>                                        Heap Blocks: exact=538
>                                        ->  Bitmap Index Scan on 
> version_union_tables_r02_harvester_logs_machine_key (cost=0.00..564.41 
> rows=22914 width=0) (actual time=0.870..0.870 rows=24151 loops=1)
>                                              Index Cond: 
> ((machine_key)::text = '887655635442600'::text)
>                            ->  Hash  (cost=2.77..2.77 rows=1 width=38)
> (actual time=0.023..0.023 rows=1 loops=1)
>                                  Buckets: 1024  Batches: 1  Memory 
> Usage: 9kB
>                                  ->  Seq Scan on machine_info mi 
> (cost=0.00..2.77 rows=1 width=38) (actual time=0.018..0.019 rows=1 loops=1)
>                                        Filter: ((machine_key)::text = 
> '887655635442600'::text)
>                                        Rows Removed by Filter: 61
>                      ->  Hash  (cost=16.12..16.12 rows=4 width=84) 
> (actual time=0.025..0.026 rows=3 loops=1)
>                            Buckets: 1024  Batches: 1  Memory Usage: 9kB
>                            ->  Bitmap Heap Scan on objects o 
> (cost=4.31..16.12 rows=4 width=84) (actual time=0.020..0.021 rows=3 
> loops=1)
>                                  Recheck Cond: ((machine_key)::text = 
> '887655635442600'::text)
>                                  Heap Blocks: exact=1
>                                  ->  Bitmap Index Scan on 
> version_union_tables_r02_objects_machine_key  (cost=0.00..4.31 rows=4 
> width=0) (actual time=0.015..0.015 rows=3 loops=1)
>                                        Index Cond: ((machine_key)::text 
> = '887655635442600'::text)
>                ->  Hash  (cost=38.19..38.19 rows=26 width=35) (actual 
> time=0.037..0.037 rows=26 loops=1)
>                      Buckets: 1024  Batches: 1  Memory Usage: 10kB
>                      ->  Bitmap Heap Scan on products p 
> (cost=4.48..38.19 rows=26 width=35) (actual time=0.015..0.026 rows=26 
> loops=1)
>                            Recheck Cond: ((machine_key)::text = 
> '887655635442600'::text)
>                            Heap Blocks: exact=1
>                            ->  Bitmap Index Scan on 
> version_union_tables_r02_products_machine_key  (cost=0.00..4.47 rows=26 
> width=0) (actual time=0.009..0.009 rows=26 loops=1)
>                                  Index Cond: ((machine_key)::text = 
> '887655635442600'::text)
>          ->  Hash  (cost=10.52..10.52 rows=12 width=37) (actual 
> time=0.028..0.029 rows=12 loops=1)
>                Buckets: 1024  Batches: 1  Memory Usage: 9kB
>                ->  Bitmap Heap Scan on species s (cost=4.37..10.52 
> rows=12 width=37) (actual time=0.016..0.021 rows=12 loops=1)
>                      Recheck Cond: ((machine_key)::text = 
> '887655635442600'::text)
>                      Heap Blocks: exact=1
>                      ->  Bitmap Index Scan on 
> version_union_tables_r02_species_machine_key  (cost=0.00..4.36 rows=12 
> width=0) (actual time=0.008..0.008 rows=12 loops=1)
>                            Index Cond: ((machine_key)::text = 
> '887655635442600'::text)
> Planning time: 0.434 ms
> Execution time: 27.370 ms
> 
> 
> 
> Next query, the slow one that calculates the whole dataset:
> 
> 
> EXPLAIN ANALYZE
> SELECT
>      l.machine_key,
>      o.object_name,
>      o.sub_object_name,
>      o.object_user_id,
>      o.sub_object_user_id,
>      o.start_date AS object_start_date,
>      s.species_group_name,
>      p.product_group_name,
>      l.m3_sub AS volume_m3sub,
>      l.number_of_logs,
>      mi.basemachine_manufacturer,
>      mi.basemachine_model
>     FROM ( SELECT hl.contractor_id,
>              hl.machine_key,
>              hl.operator_key,
>              hl.object_key,
>              hl.sub_object_key,
>              date(hl.harvest_date) AS harvest_date,
>              hl.species_group_key,
>              hl.product_key,
>              sum(hl.m3_sub) AS m3_sub,
>              count(*) AS number_of_logs
>             FROM version_union_tables_r02.harvester_logs hl
>            GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key, 
> hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), 
> hl.species_group_key, hl.product_key) l
>       LEFT JOIN version_union_tables_r02.machine_info mi ON 
> l.machine_key::text = mi.machine_key::text
>       LEFT JOIN version_union_tables_r02.objects o ON 
> l.machine_key::text = o.machine_key::text AND l.object_key = 
> o.object_key AND l.sub_object_key = o.sub_object_key
>       LEFT JOIN version_union_tables_r02.products p ON 
> l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
>       LEFT JOIN version_union_tables_r02.species s ON 
> l.machine_key::text = s.machine_key::text AND l.species_group_key = 
> s.species_group_key
> join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
> where t4e_contractor_id = 'nicklas.aven@jordogskog.no';
> 
> 
> results in this query plan:
> 
> Hash Left Join  (cost=1780026.09..2023556.15 rows=4044 width=122) 
> (actual time=15860.900..15888.766 rows=250 loops=1)
>    Hash Cond: (((hl.machine_key)::text = (o.machine_key)::text) AND 
> (hl.object_key = o.object_key) AND (hl.sub_object_key = o.sub_object_key))
>    ->  Merge Left Join  (cost=1779946.65..2023340.22 rows=4044 width=69) 
> (actual time=15859.604..15887.287 rows=250 loops=1)
>          Merge Cond: ((hl.machine_key)::text = (s.machine_key)::text)
>          Join Filter: (hl.species_group_key = s.species_group_key)
>          Rows Removed by Join Filter: 2750
>          ->  Merge Left Join  (cost=1779915.71..2023136.40 rows=4044 
> width=67) (actual time=15859.072..15884.912 rows=250 loops=1)
>                Merge Cond: ((hl.machine_key)::text = (p.machine_key)::text)
>                Join Filter: (hl.product_key = p.product_key)
>                Rows Removed by Join Filter: 6250
>                ->  Merge Left Join  (cost=1779788.20..2022471.20 
> rows=4044 width=65) (actual time=15857.473..15879.504 rows=250 loops=1)
>                      Merge Cond: ((hl.machine_key)::text = 
> (mi.machine_key)::text)
>                      ->  Merge Join  (cost=1779783.74..2022437.81 
> rows=4044 width=48) (actual time=15857.359..15879.102 rows=250 loops=1)
>                            Merge Cond: ((hl.machine_key)::text = 
> ci.machine_key)
>                            ->  GroupAggregate 
> (cost=1779782.72..2012287.44 rows=808712 width=69) (actual 
> time=15088.353..15878.172 rows=2683 loops=1)
>                                  Group Key: hl.machine_key, 
> hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
> (date(hl.harvest_date)), hl.species_group_key, hl.product_key
>                                  ->  Sort (cost=1779782.72..1800000.52 
> rows=8087121 width=61) (actual time=15088.336..15488.144 rows=942552 
> loops=1)
>                                        Sort Key: hl.machine_key, 
> hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
> (date(hl.harvest_date)), hl.species_group_key, hl.product_key
>                                        Sort Method: external merge Disk: 
> 543456kB
>                                        ->  Seq Scan on harvester_logs 
> hl  (cost=0.00..243781.01 rows=8087121 width=61) (actual 
> time=0.007..3169.984 rows=8084464 loops=1)
>                            ->  Sort  (cost=1.02..1.03 rows=1 width=19)
> (actual time=0.019..0.020 rows=1 loops=1)
>                                  Sort Key: ci.machine_key
>                                  Sort Method: quicksort  Memory: 25kB
>                                  ->  Seq Scan on contractor_access ci 
> (cost=0.00..1.01 rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=1)
>                                        Filter: (t4e_contractor_id = 
> 'nicklas.aven@jordogskog.no'::text)
>                      ->  Sort  (cost=4.47..4.62 rows=62 width=38) 
> (actual time=0.102..0.173 rows=266 loops=1)
>                            Sort Key: mi.machine_key
>                            Sort Method: quicksort  Memory: 30kB
>                            ->  Seq Scan on machine_info mi 
> (cost=0.00..2.62 rows=62 width=38) (actual time=0.008..0.030 rows=62 
> loops=1)
>                ->  Sort  (cost=127.50..131.23 rows=1491 width=35) 
> (actual time=1.205..3.071 rows=7204 loops=1)
>                      Sort Key: p.machine_key
>                      Sort Method: quicksort  Memory: 175kB
>                      ->  Seq Scan on products p  (cost=0.00..48.91 
> rows=1491 width=35) (actual time=0.004..0.497 rows=1491 loops=1)
>          ->  Sort  (cost=30.94..32.09 rows=460 width=37) (actual 
> time=0.385..1.233 rows=3259 loops=1)
>                Sort Key: s.machine_key
>                Sort Method: quicksort  Memory: 65kB
>                ->  Seq Scan on species s  (cost=0.00..10.60 rows=460 
> width=37) (actual time=0.004..0.146 rows=460 loops=1)
>    ->  Hash  (cost=49.25..49.25 rows=1725 width=84) (actual 
> time=1.286..1.287 rows=1690 loops=1)
>          Buckets: 2048  Batches: 1  Memory Usage: 190kB
>          ->  Seq Scan on objects o  (cost=0.00..49.25 rows=1725 
> width=84) (actual time=0.004..0.600 rows=1725 loops=1)
> Planning time: 0.527 ms
> Execution time: 15945.641 ms
> 
> 
> 
> 
> Thanks
> 
> Nicklas
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Problems pushing down WHERE-clause to underlying view

From
Nicklas Avén
Date:
 > I have not had chance to fully go through all of below. Some 
questions/suggestions:
 >
 > 1) Thanks for the formatted queries. If I could make a suggestion, 
when aliasing could you include AS. It would make finding what l.* 
refers to easier for those of us with old eyes:)
 >
Yes, of course, sorry :-)


 > 2) t4e_contractor_id is in the shiny_adm.contractor_access table?
 > If not where?

Yes, sorry again, it is there


 >
 > 3) What is the schema for shiny_adm.contractor_access?
 > In particular what indexes are on it?
 >
shiny_adm.contractor_access looks like this:


CREATE TABLE shiny_adm.contractor_access
(
   machine_key text,
   t4e_contractor_id text,
   active integer DEFAULT 1,
   id serial NOT NULL,
   CONSTRAINT contractor_access_pkey PRIMARY KEY (id),
   CONSTRAINT contractor_unique UNIQUE (machine_key, t4e_contractor_id),
   CONSTRAINT co_check_t4e_co_email CHECK 
(utils.verify_email(t4e_contractor_id))
)


CREATE INDEX idx_contractor
   ON shiny_adm.contractor_access
   USING btree
   (t4e_contractor_id COLLATE pg_catalog."default");

CREATE INDEX idx_contractor_mk
   ON shiny_adm.contractor_access
   USING btree
   (machine_key COLLATE pg_catalog."default");


I tried to format the below a little better with AS and some more 
consistent indents.


I also, in the first query, changed the where clause to filter on 
machine_key in table contractor _access. Just to illustrate the problem 
better.

Both queries filter on the same table which is joined the same way. But 
in the second example the where clause is not pushed to the subquery l


Thanks a lot for looking into it


Nicklas




Query 1:
EXPLAIN ANALYZE
SELECT
     l.machine_key,
     o.object_name,
     o.sub_object_name,
     o.object_user_id,
     o.sub_object_user_id,
     o.start_date AS object_start_date,
     s.species_group_name,
     p.product_group_name,
     l.m3_sub AS volume_m3sub,
     l.number_of_logs,
     mi.basemachine_manufacturer,
     mi.basemachine_model
FROM
     (
         SELECT
             hl.contractor_id,
             hl.machine_key,
             hl.operator_key,
             hl.object_key,
             hl.sub_object_key,
             date(hl.harvest_date) AS harvest_date,
             hl.species_group_key,
             hl.product_key,
             sum(hl.m3_sub) AS m3_sub,
             count(*) AS number_of_logs
         FROM
             version_union_tables_r02.harvester_logs hl
         GROUP BY
             hl.machine_key, hl.contractor_id, hl.operator_key, 
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), 
hl.species_group_key, hl.product_key
     )  AS l
     LEFT JOIN version_union_tables_r02.machine_info  AS mi ON 
l.machine_key::text = mi.machine_key::text
     LEFT JOIN version_union_tables_r02.objects AS  o ON 
l.machine_key::text = o.machine_key::text AND l.object_key = 
o.object_key AND l.sub_object_key = o.sub_object_key
     LEFT JOIN version_union_tables_r02.products AS  p ON 
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
     LEFT JOIN version_union_tables_r02.species  AS s ON 
l.machine_key::text = s.machine_key::text AND l.species_group_key = 
s.species_group_key
     join shiny_adm.contractor_access AS ci on l.machine_key=ci.machine_key
where ci.machine_key = '887655635442600';


Resulting in this query plan:
Nested Loop  (cost=61865.25..65302.20 rows=22624 width=122) (actual 
time=27.801..29.225 rows=250 loops=1)
   ->  Seq Scan on contractor_access ci  (cost=0.00..1.01 rows=1 
width=19) (actual time=0.005..0.006 rows=1 loops=1)
         Filter: (machine_key = '887655635442600'::text)
   ->  Hash Left Join  (cost=61865.25..65074.95 rows=22624 width=122) 
(actual time=27.794..29.070 rows=250 loops=1)
         Hash Cond: (((hl.machine_key)::text = (s.machine_key)::text) 
AND (hl.species_group_key = s.species_group_key))
         ->  Hash Left Join  (cost=61854.55..64263.92 rows=22624 
width=120) (actual time=27.771..28.851 rows=250 loops=1)
               Hash Cond: (((hl.machine_key)::text = 
(p.machine_key)::text) AND (hl.product_key = p.product_key))
               ->  Hash Left Join  (cost=61815.97..63145.14 rows=22624 
width=118) (actual time=27.736..28.628 rows=250 loops=1)
                     Hash Cond: (((hl.machine_key)::text = 
(o.machine_key)::text) AND (hl.object_key = o.object_key) AND 
(hl.sub_object_key = o.sub_object_key))
                     ->  Hash Left Join  (cost=61799.78..62619.90 
rows=22624 width=65) (actual time=27.709..28.416 rows=250 loops=1)
                           Hash Cond: ((hl.machine_key)::text = 
(mi.machine_key)::text)
                           ->  HashAggregate (cost=61796.99..62079.79 
rows=22624 width=69) (actual time=27.677..28.217 rows=250 loops=1)
                                 Group Key: hl.machine_key, 
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
date(hl.harvest_date), hl.species_group_key, hl.product_key
                                 ->  Bitmap Heap Scan on harvester_logs 
hl  (cost=570.14..61224.14 rows=22914 width=61) (actual 
time=1.040..12.977 rows=24151 loops=1)
                                       Recheck Cond: 
((machine_key)::text = '887655635442600'::text)
                                       Heap Blocks: exact=538
                                       ->  Bitmap Index Scan on 
version_union_tables_r02_harvester_logs_machine_key (cost=0.00..564.41 
rows=22914 width=0) (actual time=0.996..0.997 rows=24151 loops=1)
                                             Index Cond: 
((machine_key)::text = '887655635442600'::text)
                           ->  Hash  (cost=2.77..2.77 rows=1 width=38) 
(actual time=0.018..0.018 rows=1 loops=1)
                                 Buckets: 1024  Batches: 1  Memory 
Usage: 9kB
                                 ->  Seq Scan on machine_info mi 
(cost=0.00..2.77 rows=1 width=38) (actual time=0.013..0.014 rows=1 loops=1)
                                       Filter: ((machine_key)::text = 
'887655635442600'::text)
                                       Rows Removed by Filter: 61
                     ->  Hash  (cost=16.12..16.12 rows=4 width=84) 
(actual time=0.020..0.020 rows=3 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Bitmap Heap Scan on objects o 
(cost=4.31..16.12 rows=4 width=84) (actual time=0.015..0.016 rows=3 loops=1)
                                 Recheck Cond: ((machine_key)::text = 
'887655635442600'::text)
                                 Heap Blocks: exact=1
                                 ->  Bitmap Index Scan on 
version_union_tables_r02_objects_machine_key  (cost=0.00..4.31 rows=4 
width=0) (actual time=0.011..0.012 rows=3 loops=1)
                                       Index Cond: ((machine_key)::text 
= '887655635442600'::text)
               ->  Hash  (cost=38.19..38.19 rows=26 width=35) (actual 
time=0.030..0.031 rows=26 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 10kB
                     ->  Bitmap Heap Scan on products p
(cost=4.48..38.19 rows=26 width=35) (actual time=0.010..0.019 rows=26 
loops=1)
                           Recheck Cond: ((machine_key)::text = 
'887655635442600'::text)
                           Heap Blocks: exact=1
                           ->  Bitmap Index Scan on 
version_union_tables_r02_products_machine_key  (cost=0.00..4.47 rows=26 
width=0) (actual time=0.006..0.006 rows=26 loops=1)
                                 Index Cond: ((machine_key)::text = 
'887655635442600'::text)
         ->  Hash  (cost=10.52..10.52 rows=12 width=37) (actual 
time=0.018..0.018 rows=12 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  Bitmap Heap Scan on species s (cost=4.37..10.52 
rows=12 width=37) (actual time=0.008..0.012 rows=12 loops=1)
                     Recheck Cond: ((machine_key)::text = 
'887655635442600'::text)
                     Heap Blocks: exact=1
                     ->  Bitmap Index Scan on 
version_union_tables_r02_species_machine_key  (cost=0.00..4.36 rows=12 
width=0) (actual time=0.004..0.005 rows=12 loops=1)
                           Index Cond: ((machine_key)::text = 
'887655635442600'::text)
Planning time: 0.376 ms
Execution time: 29.435 ms


Next query, the slow one that calculates the whole dataset:



EXPLAIN ANALYZE
SELECT
     l.machine_key,
     o.object_name,
     o.sub_object_name,
     o.object_user_id,
     o.sub_object_user_id,
     o.start_date AS object_start_date,
     s.species_group_name,
     p.product_group_name,
     l.m3_sub AS volume_m3sub,
     l.number_of_logs,
     mi.basemachine_manufacturer,
     mi.basemachine_model
FROM  shiny_adm.contractor_access ci join
     (
         SELECT
             hl.contractor_id,
             hl.machine_key,
             hl.operator_key,
             hl.object_key,
             hl.sub_object_key,
             date(hl.harvest_date) AS harvest_date,
             hl.species_group_key,
             hl.product_key,
             sum(hl.m3_sub) AS m3_sub,
             count(*) AS number_of_logs
         FROM
             version_union_tables_r02.harvester_logs AS hl
         GROUP BY
             hl.machine_key, hl.contractor_id, hl.operator_key, 
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), 
hl.species_group_key, hl.product_key
     )  AS l on l.machine_key=ci.machine_key
     LEFT JOIN version_union_tables_r02.machine_info  AS mi ON 
l.machine_key::text = mi.machine_key::text
     LEFT JOIN version_union_tables_r02.objects AS o ON 
l.machine_key::text = o.machine_key::text AND l.object_key = 
o.object_key AND l.sub_object_key = o.sub_object_key
     LEFT JOIN version_union_tables_r02.products  AS p ON 
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
     LEFT JOIN version_union_tables_r02.species  AS s ON 
l.machine_key::text = s.machine_key::text AND l.species_group_key = 
s.species_group_key
WHERE t4e_contractor_id = 'nicklas.aven@jordogskog.no';

results in this query plan:


Hash Left Join  (cost=1780026.09..2023556.15 rows=4044 width=122) 
(actual time=16336.200..16366.486 rows=250 loops=1)
   Hash Cond: (((hl.machine_key)::text = (o.machine_key)::text) AND 
(hl.object_key = o.object_key) AND (hl.sub_object_key = o.sub_object_key))
   ->  Merge Left Join  (cost=1779946.65..2023340.22 rows=4044 width=69) 
(actual time=16334.747..16364.834 rows=250 loops=1)
         Merge Cond: ((hl.machine_key)::text = (s.machine_key)::text)
         Join Filter: (hl.species_group_key = s.species_group_key)
         Rows Removed by Join Filter: 2750
         ->  Merge Left Join  (cost=1779915.71..2023136.40 rows=4044 
width=67) (actual time=16334.145..16362.241 rows=250 loops=1)
               Merge Cond: ((hl.machine_key)::text = (p.machine_key)::text)
               Join Filter: (hl.product_key = p.product_key)
               Rows Removed by Join Filter: 6250
               ->  Merge Left Join  (cost=1779788.20..2022471.20 
rows=4044 width=65) (actual time=16332.364..16356.313 rows=250 loops=1)
                     Merge Cond: ((hl.machine_key)::text = 
(mi.machine_key)::text)
                     ->  Merge Join  (cost=1779783.74..2022437.81 
rows=4044 width=48) (actual time=16332.238..16355.855 rows=250 loops=1)
                           Merge Cond: ((hl.machine_key)::text = 
ci.machine_key)
                           ->  GroupAggregate 
(cost=1779782.72..2012287.44 rows=808712 width=69) (actual 
time=15552.813..16354.893 rows=2683 loops=1)
                                 Group Key: hl.machine_key, 
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
(date(hl.harvest_date)), hl.species_group_key, hl.product_key
                                 ->  Sort (cost=1779782.72..1800000.52 
rows=8087121 width=61) (actual time=15552.795..15959.066 rows=942552 
loops=1)
                                       Sort Key: hl.machine_key, 
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
(date(hl.harvest_date)), hl.species_group_key, hl.product_key
                                       Sort Method: external merge Disk: 
543456kB
                                       ->  Seq Scan on harvester_logs 
hl  (cost=0.00..243781.01 rows=8087121 width=61) (actual 
time=0.008..3221.502 rows=8084464 loops=1)
                           ->  Sort  (cost=1.02..1.03 rows=1 width=19) 
(actual time=0.018..0.019 rows=1 loops=1)
                                 Sort Key: ci.machine_key
                                 Sort Method: quicksort  Memory: 25kB
                                 ->  Seq Scan on contractor_access ci  
(cost=0.00..1.01 rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=1)
                                       Filter: (t4e_contractor_id = 
'nicklas.aven@jordogskog.no'::text)
                     ->  Sort  (cost=4.47..4.62 rows=62 width=38) 
(actual time=0.112..0.197 rows=266 loops=1)
                           Sort Key: mi.machine_key
                           Sort Method: quicksort  Memory: 30kB
                           ->  Seq Scan on machine_info mi 
(cost=0.00..2.62 rows=62 width=38) (actual time=0.008..0.032 rows=62 
loops=1)
               ->  Sort  (cost=127.50..131.23 rows=1491 width=35) 
(actual time=1.353..3.404 rows=7204 loops=1)
                     Sort Key: p.machine_key
                     Sort Method: quicksort  Memory: 175kB
                     ->  Seq Scan on products p  (cost=0.00..48.91 
rows=1491 width=35) (actual time=0.005..0.556 rows=1491 loops=1)
         ->  Sort  (cost=30.94..32.09 rows=460 width=37) (actual 
time=0.436..1.363 rows=3259 loops=1)
               Sort Key: s.machine_key
               Sort Method: quicksort  Memory: 65kB
               ->  Seq Scan on species s  (cost=0.00..10.60 rows=460 
width=37) (actual time=0.004..0.161 rows=460 loops=1)
   ->  Hash  (cost=49.25..49.25 rows=1725 width=84) (actual 
time=1.444..1.444 rows=1690 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 190kB
         ->  Seq Scan on objects o  (cost=0.00..49.25 rows=1725 
width=84) (actual time=0.004..0.656 rows=1725 loops=1)
Planning time: 0.653 ms
Execution time: 16428.966 ms






Re: Problems pushing down WHERE-clause to underlying view

From
Adrian Klaver
Date:
On 2/15/19 12:43 PM, Nicklas Avén wrote:
> 
>  > I have not had chance to fully go through all of below. Some 
> questions/suggestions:
>  >
>  > 1) Thanks for the formatted queries. If I could make a suggestion, 
> when aliasing could you include AS. It would make finding what l.* 
> refers to easier for those of us with old eyes:)
>  >
> Yes, of course, sorry :-)
> 
> 
>  > 2) t4e_contractor_id is in the shiny_adm.contractor_access table?
>  > If not where?
> 
> Yes, sorry again, it is there
> 
> 
>  >
>  > 3) What is the schema for shiny_adm.contractor_access?
>  > In particular what indexes are on it?
>  >
> shiny_adm.contractor_access looks like this:
> 
> 
> CREATE TABLE shiny_adm.contractor_access
> (
>    machine_key text,
>    t4e_contractor_id text,
>    active integer DEFAULT 1,
>    id serial NOT NULL,
>    CONSTRAINT contractor_access_pkey PRIMARY KEY (id),
>    CONSTRAINT contractor_unique UNIQUE (machine_key, t4e_contractor_id),
>    CONSTRAINT co_check_t4e_co_email CHECK 
> (utils.verify_email(t4e_contractor_id))
> )
> 
> 
> CREATE INDEX idx_contractor
>    ON shiny_adm.contractor_access
>    USING btree
>    (t4e_contractor_id COLLATE pg_catalog."default");
> 
> CREATE INDEX idx_contractor_mk
>    ON shiny_adm.contractor_access
>    USING btree
>    (machine_key COLLATE pg_catalog."default");
> 
> 
> I tried to format the below a little better with AS and some more 
> consistent indents.
> 
> 
> I also, in the first query, changed the where clause to filter on 
> machine_key in table contractor _access. Just to illustrate the problem 
> better.
> 
> Both queries filter on the same table which is joined the same way. But 
> in the second example the where clause is not pushed to the subquery l
> 
> 
> Thanks a lot for looking into it
> 
> 
> Nicklas
> 

> 
> Next query, the slow one that calculates the whole dataset:
> 
> 
> 
> EXPLAIN ANALYZE
> SELECT
>      l.machine_key,
>      o.object_name,
>      o.sub_object_name,
>      o.object_user_id,
>      o.sub_object_user_id,
>      o.start_date AS object_start_date,
>      s.species_group_name,
>      p.product_group_name,
>      l.m3_sub AS volume_m3sub,
>      l.number_of_logs,
>      mi.basemachine_manufacturer,
>      mi.basemachine_model
> FROM  shiny_adm.contractor_access ci join
>      (
>          SELECT
>              hl.contractor_id,
>              hl.machine_key,
>              hl.operator_key,
>              hl.object_key,
>              hl.sub_object_key,
>              date(hl.harvest_date) AS harvest_date,
>              hl.species_group_key,
>              hl.product_key,
>              sum(hl.m3_sub) AS m3_sub,
>              count(*) AS number_of_logs
>          FROM
>              version_union_tables_r02.harvester_logs AS hl
>          GROUP BY
>              hl.machine_key, hl.contractor_id, hl.operator_key, 
> hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), 
> hl.species_group_key, hl.product_key
>      )  AS l on l.machine_key=ci.machine_key
>      LEFT JOIN version_union_tables_r02.machine_info  AS mi ON 
> l.machine_key::text = mi.machine_key::text
>      LEFT JOIN version_union_tables_r02.objects AS o ON 
> l.machine_key::text = o.machine_key::text AND l.object_key = 
> o.object_key AND l.sub_object_key = o.sub_object_key
>      LEFT JOIN version_union_tables_r02.products  AS p ON 
> l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
>      LEFT JOIN version_union_tables_r02.species  AS s ON 
> l.machine_key::text = s.machine_key::text AND l.species_group_key = 
> s.species_group_key
> WHERE t4e_contractor_id = 'nicklas.aven@jordogskog.no';

To make it apples to apples try changing above to be more like first query:

...

  AS l
     LEFT JOIN version_union_tables_r02.machine_info  AS mi ON 
l.machine_key::text = mi.machine_key::text
     LEFT JOIN version_union_tables_r02.objects AS  o ON 
l.machine_key::text = o.machine_key::text AND l.object_key = 
o.object_key AND l.sub_object_key = o.sub_object_key
     LEFT JOIN version_union_tables_r02.products AS  p ON 
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
     LEFT JOIN version_union_tables_r02.species  AS s ON 
l.machine_key::text = s.machine_key::text AND l.species_group_key = 
s.species_group_key
     JOIN shiny_adm.contractor_access AS ci ON l.machine_key=ci.machine_key
  WHERE t4e_contractor_id = 'nicklas.aven@jordogskog.no'

;
> 
> results in this query plan:



-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Problems pushing down WHERE-clause to underlying view

From
Tom Lane
Date:
=?UTF-8?Q?Nicklas_Av=c3=a9n?= <nicklas.aven@jordogskog.no> writes:
> I also, in the first query, changed the where clause to filter on
> machine_key in table contractor _access. Just to illustrate the problem
> better.

> Both queries filter on the same table which is joined the same way. But
> in the second example the where clause is not pushed to the subquery

The filters are totally different though.  In one case you provide

    where ci.machine_key = '887655635442600'

and there is also a join condition

    l.machine_key=ci.machine_key

From these two things the planner can deduce

    l.machine_key='887655635442600'

which is a restriction condition that it knows how to push down into the
"l" subquery.  Furthermore, it can also deduce that it can restrict
all of the left-joined tables to consider only that value of their
join keys.

In query #2 you have no constant value for machine_key so none of that
happens.

IIRC, the propagated value doesn't have to be a constant, exactly,
just a fixed expression.  So you might consider something like

<query 1 as written, up to the WHERE>
where ci.machine_key = (select machine_key from contractor_access
                        where t4e_contractor_id = 'nicklas.aven@jordogskog.no');

when you need to drive the lookup from something other than raw
machine_key.  This'll fail, as-is, if there's more than one
contractor_access row with t4e_contractor_id =
'nicklas.aven@jordogskog.no', but you can probably adapt the idea
to make it work.

            regards, tom lane


Re: Problems pushing down WHERE-clause to underlying view

From
Nicklas Avén
Date:


On 16 February 2019 06:02:50 GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nicklas Avén <nicklas.aven@jordogskog.no> writes:
I also, in the first query, changed the where clause to filter on
machine_key in table contractor _access. Just to illustrate the problem
better.

Both queries filter on the same table which is joined the same way. But
in the second example the where clause is not pushed to the subquery

The filters are totally different though. In one case you provide

where ci.machine_key = '887655635442600'

and there is also a join condition

l.machine_key=ci.machine_key

From these two things the planner can deduce

l.machine_key='887655635442600'

which is a restriction condition that it knows how to push down into the
"l" subquery. Furthermore, it can also deduce that it can restrict
all of the left-joined tables to consider only that value of their
join keys.

In query #2 you have no constant value for machine_key so none of that
happens.

IIRC, the propagated value doesn't have to be a constant, exactly,
just a fixed expression. So you might consider something like

<query 1 as written, up to the WHERE>
where ci.machine_key = (select machine_key from contractor_access
where t4e_contractor_id = 'nicklas.aven@jordogskog.no');

when you need to drive the lookup from something other than raw
machine_key. This'll fail, as-is, if there's more than one
contractor_access row with t4e_contractor_id =
'nicklas.aven@jordogskog.no', but you can probably adapt the idea
to make it work.

regards, tom lane


Thanks Tom
This is what I suspected was happening.
What I was hoping though was that the planner could see that the contractor_access table only contains a few hundred rows, and that the logs table with millions of rows with an index on machine_key should be reduced as much as possible before start grouping.
At first I thought this didn't happen because the logs table is hidden in a subquery. But since it works when filtering directly on machine_key I guess that is not the problem.

But I am still a little confused why I cannot trick this with lateral as I showed in the first mail.

I guess I will have to rewrite this into a function and only give one machine_key at a time to this query.

I think I have bumped into this before, and I might even have asked the same question som years aho, I think I recognize you answer. Sorry for not learning.

Thanks

Nicklas


Sent from my Android device with K-9 Mail. Please excuse my brevity.