Re: Problems pushing down WHERE-clause to underlying view - Mailing list pgsql-general

From Nicklas Avén
Subject Re: Problems pushing down WHERE-clause to underlying view
Date
Msg-id 9770a1ec-c7be-7527-66d7-0c95b8277508@jordogskog.no
Whole thread Raw
In response to Re: Problems pushing down WHERE-clause to underlying view  (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses Re: Problems pushing down WHERE-clause to underlying view  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
On 2/15/19 5:06 PM, Adrian Klaver wrote:
 > On 2/15/19 7:28 AM, Nicklas Avén wrote:
 >> Hi
 >>
 >> The problem is that it always calculates all those 22000 rows even 
if the user id I use only gives 250 rows.
 >>
 >> So, the query uses 4 seconds instead of under 100 ms.
 >
 > https://www.postgresql.org/docs/10/sql-createview.html
 >
 > "CREATE VIEW defines a view of a query. The view is not physically 
materialized. Instead, the query is run every time the view is 
referenced in a query."

 >



Sorry, I must have expressed what I mean bad. Of course a view is not 
materialized.
I will explain without views what I mean here below



 > Might want to look at materialized view:
 > https://www.postgresql.org/docs/10/sql-creatematerializedview.html
 >
 > "CREATE MATERIALIZED VIEW is similar to CREATE TABLE AS, except that 
it also remembers the query used to initialize the view, so that it can 
be refreshed later upon demand. A materialized view has many of the same 
properties as a table, but there is no support for temporary 
materialized views or automatic generation of OIDs."

 >


No, materialized views is not an option. We get some data into those 
tables daily. Recalculating the full dataset on a lot of views like this 
doesn't make sense.
Instead we have tables maintained with processed new data. But I want 
this last part of logic on top as views for flexibility,
to not need cached tables for each possible type of grouping that we need.
Started out with materialized views and it didn't work out well.




 >
 > I would also suggest running the EXPLAIN below with ANALYZE so actual 
timings are returned. Also try:
 >
 > SELECT
 >     *
 > FROM
 >     underlying_view AS b
 > JOIN
 >     contractor_access AS b
 > ON
 >     a.machine_key = b.machine_key
 > WHERE
 >     user_id = 'name@email.address'
 >>

Sorry again, I didn't mention. This I have tried this since this is what 
the top level view do.
So first step when trying to understand this was (of course) to apply 
the where-clause directly to the query



So, let's do that also on the underlying query (view) .


Here I have 2 queries, where I apply the where clause directly to the 
query in the underlying view

(joining the contractor_access table directly on that query).

The first takes 30-40 ms ms and returns the same 250 rows as the second.
In the first I use the machine_key in the where clause.
In the second query that takes about 16 seconds to return the same 250 
rows I use the user_id in the contractor_access table.

I have also cleaned up the contractor_access table. So there is only 1 
row now, with my email as user_id and the same machine_key as used in 
the first query.



Query 1:


EXPLAIN ANALYZE
SELECT
     l.machine_key,
     o.object_name,
     o.sub_object_name,
     o.object_user_id,
     o.sub_object_user_id,
     o.start_date AS object_start_date,
     s.species_group_name,
     p.product_group_name,
     l.m3_sub AS volume_m3sub,
     l.number_of_logs,
     mi.basemachine_manufacturer,
     mi.basemachine_model
    FROM ( SELECT hl.contractor_id,
             hl.machine_key,
             hl.operator_key,
             hl.object_key,
             hl.sub_object_key,
             date(hl.harvest_date) AS harvest_date,
             hl.species_group_key,
             hl.product_key,
             sum(hl.m3_sub) AS m3_sub,
             count(*) AS number_of_logs
            FROM version_union_tables_r02.harvester_logs hl
           GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key, 
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), 
hl.species_group_key, hl.product_key) l
      LEFT JOIN version_union_tables_r02.machine_info mi ON 
l.machine_key::text = mi.machine_key::text
      LEFT JOIN version_union_tables_r02.objects o ON 
l.machine_key::text = o.machine_key::text AND l.object_key = 
o.object_key AND l.sub_object_key = o.sub_object_key
      LEFT JOIN version_union_tables_r02.products p ON 
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
      LEFT JOIN version_union_tables_r02.species s ON 
l.machine_key::text = s.machine_key::text AND l.species_group_key = 
s.species_group_key
join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
where l.machine_key = '887655635442600'
;

which results in this query plan

Nested Loop  (cost=61865.25..65302.20 rows=22624 width=122) (actual 
time=25.804..27.134 rows=250 loops=1)
   ->  Seq Scan on contractor_access ci  (cost=0.00..1.01 rows=1 
width=19) (actual time=0.009..0.010 rows=1 loops=1)
         Filter: (machine_key = '887655635442600'::text)
   ->  Hash Left Join  (cost=61865.25..65074.95 rows=22624 width=122) 
(actual time=25.793..26.959 rows=250 loops=1)
         Hash Cond: (((hl.machine_key)::text = (s.machine_key)::text) 
AND (hl.species_group_key = s.species_group_key))
         ->  Hash Left Join  (cost=61854.55..64263.92 rows=22624 
width=120) (actual time=25.755..26.763 rows=250 loops=1)
               Hash Cond: (((hl.machine_key)::text = 
(p.machine_key)::text) AND (hl.product_key = p.product_key))
               ->  Hash Left Join  (cost=61815.97..63145.14 rows=22624 
width=118) (actual time=25.706..26.543 rows=250 loops=1)
                     Hash Cond: (((hl.machine_key)::text = 
(o.machine_key)::text) AND (hl.object_key = o.object_key) AND 
(hl.sub_object_key = o.sub_object_key))
                     ->  Hash Left Join  (cost=61799.78..62619.90 
rows=22624 width=65) (actual time=25.668..26.327 rows=250 loops=1)
                           Hash Cond: ((hl.machine_key)::text = 
(mi.machine_key)::text)
                           ->  HashAggregate (cost=61796.99..62079.79 
rows=22624 width=69) (actual time=25.627..26.132 rows=250 loops=1)
                                 Group Key: hl.machine_key, 
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
date(hl.harvest_date), hl.species_group_key, hl.product_key
                                 ->  Bitmap Heap Scan on harvester_logs 
hl  (cost=570.14..61224.14 rows=22914 width=61) (actual 
time=0.909..11.573 rows=24151 loops=1)
                                       Recheck Cond: 
((machine_key)::text = '887655635442600'::text)
                                       Heap Blocks: exact=538
                                       ->  Bitmap Index Scan on 
version_union_tables_r02_harvester_logs_machine_key (cost=0.00..564.41 
rows=22914 width=0) (actual time=0.870..0.870 rows=24151 loops=1)
                                             Index Cond: 
((machine_key)::text = '887655635442600'::text)
                           ->  Hash  (cost=2.77..2.77 rows=1 width=38) 
(actual time=0.023..0.023 rows=1 loops=1)
                                 Buckets: 1024  Batches: 1  Memory 
Usage: 9kB
                                 ->  Seq Scan on machine_info mi 
(cost=0.00..2.77 rows=1 width=38) (actual time=0.018..0.019 rows=1 loops=1)
                                       Filter: ((machine_key)::text = 
'887655635442600'::text)
                                       Rows Removed by Filter: 61
                     ->  Hash  (cost=16.12..16.12 rows=4 width=84) 
(actual time=0.025..0.026 rows=3 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Bitmap Heap Scan on objects o 
(cost=4.31..16.12 rows=4 width=84) (actual time=0.020..0.021 rows=3 loops=1)
                                 Recheck Cond: ((machine_key)::text = 
'887655635442600'::text)
                                 Heap Blocks: exact=1
                                 ->  Bitmap Index Scan on 
version_union_tables_r02_objects_machine_key  (cost=0.00..4.31 rows=4 
width=0) (actual time=0.015..0.015 rows=3 loops=1)
                                       Index Cond: ((machine_key)::text 
= '887655635442600'::text)
               ->  Hash  (cost=38.19..38.19 rows=26 width=35) (actual 
time=0.037..0.037 rows=26 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 10kB
                     ->  Bitmap Heap Scan on products p 
(cost=4.48..38.19 rows=26 width=35) (actual time=0.015..0.026 rows=26 
loops=1)
                           Recheck Cond: ((machine_key)::text = 
'887655635442600'::text)
                           Heap Blocks: exact=1
                           ->  Bitmap Index Scan on 
version_union_tables_r02_products_machine_key  (cost=0.00..4.47 rows=26 
width=0) (actual time=0.009..0.009 rows=26 loops=1)
                                 Index Cond: ((machine_key)::text = 
'887655635442600'::text)
         ->  Hash  (cost=10.52..10.52 rows=12 width=37) (actual 
time=0.028..0.029 rows=12 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  Bitmap Heap Scan on species s (cost=4.37..10.52 
rows=12 width=37) (actual time=0.016..0.021 rows=12 loops=1)
                     Recheck Cond: ((machine_key)::text = 
'887655635442600'::text)
                     Heap Blocks: exact=1
                     ->  Bitmap Index Scan on 
version_union_tables_r02_species_machine_key  (cost=0.00..4.36 rows=12 
width=0) (actual time=0.008..0.008 rows=12 loops=1)
                           Index Cond: ((machine_key)::text = 
'887655635442600'::text)
Planning time: 0.434 ms
Execution time: 27.370 ms



Next query, the slow one that calculates the whole dataset:


EXPLAIN ANALYZE
SELECT
     l.machine_key,
     o.object_name,
     o.sub_object_name,
     o.object_user_id,
     o.sub_object_user_id,
     o.start_date AS object_start_date,
     s.species_group_name,
     p.product_group_name,
     l.m3_sub AS volume_m3sub,
     l.number_of_logs,
     mi.basemachine_manufacturer,
     mi.basemachine_model
    FROM ( SELECT hl.contractor_id,
             hl.machine_key,
             hl.operator_key,
             hl.object_key,
             hl.sub_object_key,
             date(hl.harvest_date) AS harvest_date,
             hl.species_group_key,
             hl.product_key,
             sum(hl.m3_sub) AS m3_sub,
             count(*) AS number_of_logs
            FROM version_union_tables_r02.harvester_logs hl
           GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key, 
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)), 
hl.species_group_key, hl.product_key) l
      LEFT JOIN version_union_tables_r02.machine_info mi ON 
l.machine_key::text = mi.machine_key::text
      LEFT JOIN version_union_tables_r02.objects o ON 
l.machine_key::text = o.machine_key::text AND l.object_key = 
o.object_key AND l.sub_object_key = o.sub_object_key
      LEFT JOIN version_union_tables_r02.products p ON 
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
      LEFT JOIN version_union_tables_r02.species s ON 
l.machine_key::text = s.machine_key::text AND l.species_group_key = 
s.species_group_key
join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
where t4e_contractor_id = 'nicklas.aven@jordogskog.no';


results in this query plan:

Hash Left Join  (cost=1780026.09..2023556.15 rows=4044 width=122) 
(actual time=15860.900..15888.766 rows=250 loops=1)
   Hash Cond: (((hl.machine_key)::text = (o.machine_key)::text) AND 
(hl.object_key = o.object_key) AND (hl.sub_object_key = o.sub_object_key))
   ->  Merge Left Join  (cost=1779946.65..2023340.22 rows=4044 width=69) 
(actual time=15859.604..15887.287 rows=250 loops=1)
         Merge Cond: ((hl.machine_key)::text = (s.machine_key)::text)
         Join Filter: (hl.species_group_key = s.species_group_key)
         Rows Removed by Join Filter: 2750
         ->  Merge Left Join  (cost=1779915.71..2023136.40 rows=4044 
width=67) (actual time=15859.072..15884.912 rows=250 loops=1)
               Merge Cond: ((hl.machine_key)::text = (p.machine_key)::text)
               Join Filter: (hl.product_key = p.product_key)
               Rows Removed by Join Filter: 6250
               ->  Merge Left Join  (cost=1779788.20..2022471.20 
rows=4044 width=65) (actual time=15857.473..15879.504 rows=250 loops=1)
                     Merge Cond: ((hl.machine_key)::text = 
(mi.machine_key)::text)
                     ->  Merge Join  (cost=1779783.74..2022437.81 
rows=4044 width=48) (actual time=15857.359..15879.102 rows=250 loops=1)
                           Merge Cond: ((hl.machine_key)::text = 
ci.machine_key)
                           ->  GroupAggregate 
(cost=1779782.72..2012287.44 rows=808712 width=69) (actual 
time=15088.353..15878.172 rows=2683 loops=1)
                                 Group Key: hl.machine_key, 
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
(date(hl.harvest_date)), hl.species_group_key, hl.product_key
                                 ->  Sort (cost=1779782.72..1800000.52 
rows=8087121 width=61) (actual time=15088.336..15488.144 rows=942552 
loops=1)
                                       Sort Key: hl.machine_key, 
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key, 
(date(hl.harvest_date)), hl.species_group_key, hl.product_key
                                       Sort Method: external merge Disk: 
543456kB
                                       ->  Seq Scan on harvester_logs 
hl  (cost=0.00..243781.01 rows=8087121 width=61) (actual 
time=0.007..3169.984 rows=8084464 loops=1)
                           ->  Sort  (cost=1.02..1.03 rows=1 width=19) 
(actual time=0.019..0.020 rows=1 loops=1)
                                 Sort Key: ci.machine_key
                                 Sort Method: quicksort  Memory: 25kB
                                 ->  Seq Scan on contractor_access ci  
(cost=0.00..1.01 rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=1)
                                       Filter: (t4e_contractor_id = 
'nicklas.aven@jordogskog.no'::text)
                     ->  Sort  (cost=4.47..4.62 rows=62 width=38) 
(actual time=0.102..0.173 rows=266 loops=1)
                           Sort Key: mi.machine_key
                           Sort Method: quicksort  Memory: 30kB
                           ->  Seq Scan on machine_info mi 
(cost=0.00..2.62 rows=62 width=38) (actual time=0.008..0.030 rows=62
loops=1)
               ->  Sort  (cost=127.50..131.23 rows=1491 width=35) 
(actual time=1.205..3.071 rows=7204 loops=1)
                     Sort Key: p.machine_key
                     Sort Method: quicksort  Memory: 175kB
                     ->  Seq Scan on products p  (cost=0.00..48.91 
rows=1491 width=35) (actual time=0.004..0.497 rows=1491 loops=1)
         ->  Sort  (cost=30.94..32.09 rows=460 width=37) (actual 
time=0.385..1.233 rows=3259 loops=1)
               Sort Key: s.machine_key
               Sort Method: quicksort  Memory: 65kB
               ->  Seq Scan on species s  (cost=0.00..10.60 rows=460 
width=37) (actual time=0.004..0.146 rows=460 loops=1)
   ->  Hash  (cost=49.25..49.25 rows=1725 width=84) (actual 
time=1.286..1.287 rows=1690 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 190kB
         ->  Seq Scan on objects o  (cost=0.00..49.25 rows=1725 
width=84) (actual time=0.004..0.600 rows=1725 loops=1)
Planning time: 0.527 ms
Execution time: 15945.641 ms




Thanks

Nicklas


pgsql-general by date:

Previous
From: Philippe Ebersohl
Date:
Subject: Using NOTIFY from a java application.
Next
From: Andres Freund
Date:
Subject: Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2