View with an outer join - is there any way to optimise this? - Mailing list pgsql-general

From Rich Doughty
Subject View with an outer join - is there any way to optimise this?
Date
Msg-id 439DA5CE.8000704@opusvl.com
Whole thread Raw
Responses Re: View with an outer join - is there any way to optimise
Re: View with an outer join - is there any way to optimise this?
List pgsql-general
I have a view vw_tokens defined as

   CREATE VIEW tokens.vw_tokens AS SELECT
       -- too many columns to mention
   FROM
       tokens.ta_tokens      t LEFT JOIN
       tokens.ta_tokenhist   i ON t.token_id = i.token_id AND
                                      i.status   = 'issued'   LEFT JOIN
       tokens.ta_tokenhist   s ON t.token_id = s.token_id AND
                                      s.status   = 'sold'     LEFT JOIN
       tokens.ta_tokenhist   r ON t.token_id = r.token_id AND
                                    r.status   = 'redeemed'
   ;


the ta_tokens table contains approx 4 million records, and ta_tokenhist
approx 10 millions. queries against the view itself on the primary key
execute with no issues at all.

I cannot however perform a meaningful join against this view. when i
execute

   SELECT *
   FROM
       tokens.ta_tokenhist h INNER JOIN
       tokens.vw_tokens    t ON h.token_id = t.token_id
   WHERE
       h.sarreport_id = 9
   ;

PG forms the full output of the view. the query plan is


  Hash Join  (cost=1638048.47..3032073.73 rows=1 width=702)
    Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
    ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907 width=470)
          Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
          ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907 width=322)
                Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
                ->  Hash Left Join  (cost=559931.55..1093783.71 rows=4052907 width=174)
                      Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
                      ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07 rows=4052907 width=26)
                      ->  Hash  (cost=459239.41..459239.41 rows=4114456 width=152)
                            ->  Seq Scan on ta_tokenhist i  (cost=0.00..459239.41 rows=4114456 width=152)
                                  Filter: ((status)::text = 'issued'::text)
                ->  Hash  (cost=459239.41..459239.41 rows=3905186 width=152)
                      ->  Seq Scan on ta_tokenhist s  (cost=0.00..459239.41 rows=3905186 width=152)
                            Filter: ((status)::text = 'sold'::text)
          ->  Hash  (cost=459239.41..459239.41 rows=2617645 width=152)
                ->  Seq Scan on ta_tokenhist r  (cost=0.00..459239.41 rows=2617645 width=152)
                      Filter: ((status)::text = 'redeemed'::text)
    ->  Hash  (cost=6.01..6.01 rows=1 width=236)
          ->  Index Scan using fkx_tokenhist__sarreports on ta_tokenhist h  (cost=0.00..6.01 rows=1 width=236)
                Index Cond: ((sarreport_id)::integer = 9)


I have also tried explicitly querying token_id in the view, hoping
to force a nested loop:


   EXPLAIN
   SELECT *
   FROM
       tokens.vw_tokens__user
   WHERE
    token_id IN (SELECT token_id FROM tokens.ta_tokenhist WHERE sarreport_id = 9);


                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------
  Hash IN Join  (cost=1638048.47..3032073.73 rows=1 width=470)
    Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
    ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907 width=470)
          Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
          ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907 width=322)
                Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
                ->  Hash Left Join  (cost=559931.55..1093783.71 rows=4052907 width=174)
                      Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
                      ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07 rows=4052907 width=26)
                      ->  Hash  (cost=459239.41..459239.41 rows=4114456 width=152)
                            ->  Seq Scan on ta_tokenhist i  (cost=0.00..459239.41 rows=4114456 width=152)
                                  Filter: ((status)::text = 'issued'::text)
                ->  Hash  (cost=459239.41..459239.41 rows=3905186 width=152)
                      ->  Seq Scan on ta_tokenhist s  (cost=0.00..459239.41 rows=3905186 width=152)
                            Filter: ((status)::text = 'sold'::text)
          ->  Hash  (cost=459239.41..459239.41 rows=2617645 width=152)
                ->  Seq Scan on ta_tokenhist r  (cost=0.00..459239.41 rows=2617645 width=152)
                      Filter: ((status)::text = 'redeemed'::text)
    ->  Hash  (cost=6.01..6.01 rows=1 width=4)
          ->  Index Scan using fkx_tokenhist__sarreports on ta_tokenhist  (cost=0.00..6.01 rows=1 width=4)
                Index Cond: ((sarreport_id)::integer = 9)


Setting enable_mergejoin and enable_hashjoin to off results in a nested
but still forms the view output.


I can achieve the results i need be eliminating the view and writing
the query manually but for various reasons i'd prefer to query a
view.

Any advice is greatly appreciated. i'm starting to wonder if the using
a view in this instance is futile.

Many thanks

--

   - Rich Doughty

pgsql-general by date:

Previous
From: Frank van Vugt
Date:
Subject: Re: PL/pgSQL : notion of deferred execution
Next
From: Rich Doughty
Date:
Subject: Re: View with an outer join - is there any way to optimise