Thread: View with an outer join - is there any way to optimise this?

View with an outer join - is there any way to optimise this?

From
Rich Doughty
Date:
I have a view vw_tokens defined as

   CREATE VIEW tokens.vw_tokens AS SELECT
       -- too many columns to mention
   FROM
       tokens.ta_tokens      t LEFT JOIN
       tokens.ta_tokenhist   i ON t.token_id = i.token_id AND
                                      i.status   = 'issued'   LEFT JOIN
       tokens.ta_tokenhist   s ON t.token_id = s.token_id AND
                                      s.status   = 'sold'     LEFT JOIN
       tokens.ta_tokenhist   r ON t.token_id = r.token_id AND
                                    r.status   = 'redeemed'
   ;


the ta_tokens table contains approx 4 million records, and ta_tokenhist
approx 10 millions. queries against the view itself on the primary key
execute with no issues at all.

I cannot however perform a meaningful join against this view. when i
execute

   SELECT *
   FROM
       tokens.ta_tokenhist h INNER JOIN
       tokens.vw_tokens    t ON h.token_id = t.token_id
   WHERE
       h.sarreport_id = 9
   ;

PG forms the full output of the view. the query plan is


  Hash Join  (cost=1638048.47..3032073.73 rows=1 width=702)
    Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
    ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907 width=470)
          Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
          ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907 width=322)
                Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
                ->  Hash Left Join  (cost=559931.55..1093783.71 rows=4052907 width=174)
                      Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
                      ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07 rows=4052907 width=26)
                      ->  Hash  (cost=459239.41..459239.41 rows=4114456 width=152)
                            ->  Seq Scan on ta_tokenhist i  (cost=0.00..459239.41 rows=4114456 width=152)
                                  Filter: ((status)::text = 'issued'::text)
                ->  Hash  (cost=459239.41..459239.41 rows=3905186 width=152)
                      ->  Seq Scan on ta_tokenhist s  (cost=0.00..459239.41 rows=3905186 width=152)
                            Filter: ((status)::text = 'sold'::text)
          ->  Hash  (cost=459239.41..459239.41 rows=2617645 width=152)
                ->  Seq Scan on ta_tokenhist r  (cost=0.00..459239.41 rows=2617645 width=152)
                      Filter: ((status)::text = 'redeemed'::text)
    ->  Hash  (cost=6.01..6.01 rows=1 width=236)
          ->  Index Scan using fkx_tokenhist__sarreports on ta_tokenhist h  (cost=0.00..6.01 rows=1 width=236)
                Index Cond: ((sarreport_id)::integer = 9)


I have also tried explicitly querying token_id in the view, hoping
to force a nested loop:


   EXPLAIN
   SELECT *
   FROM
       tokens.vw_tokens__user
   WHERE
    token_id IN (SELECT token_id FROM tokens.ta_tokenhist WHERE sarreport_id = 9);


                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------
  Hash IN Join  (cost=1638048.47..3032073.73 rows=1 width=470)
    Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
    ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907 width=470)
          Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
          ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907 width=322)
                Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
                ->  Hash Left Join  (cost=559931.55..1093783.71 rows=4052907 width=174)
                      Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
                      ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07 rows=4052907 width=26)
                      ->  Hash  (cost=459239.41..459239.41 rows=4114456 width=152)
                            ->  Seq Scan on ta_tokenhist i  (cost=0.00..459239.41 rows=4114456 width=152)
                                  Filter: ((status)::text = 'issued'::text)
                ->  Hash  (cost=459239.41..459239.41 rows=3905186 width=152)
                      ->  Seq Scan on ta_tokenhist s  (cost=0.00..459239.41 rows=3905186 width=152)
                            Filter: ((status)::text = 'sold'::text)
          ->  Hash  (cost=459239.41..459239.41 rows=2617645 width=152)
                ->  Seq Scan on ta_tokenhist r  (cost=0.00..459239.41 rows=2617645 width=152)
                      Filter: ((status)::text = 'redeemed'::text)
    ->  Hash  (cost=6.01..6.01 rows=1 width=4)
          ->  Index Scan using fkx_tokenhist__sarreports on ta_tokenhist  (cost=0.00..6.01 rows=1 width=4)
                Index Cond: ((sarreport_id)::integer = 9)


Setting enable_mergejoin and enable_hashjoin to off results in a nested
but still forms the view output.


I can achieve the results i need be eliminating the view and writing
the query manually but for various reasons i'd prefer to query a
view.

Any advice is greatly appreciated. i'm starting to wonder if the using
a view in this instance is futile.

Many thanks

--

   - Rich Doughty

Re: View with an outer join - is there any way to optimise

From
Rich Doughty
Date:
John McCawley wrote:
> First of all, check out this thread:
>
> http://archives.postgresql.org/pgsql-general/2005-11/msg00734.php
>
> I had a similar performance issue with a view.  Look at my solution and
> it might help you out.

i'm not sure that'll help in my case as ta_tokens has a 1-to-many
relationship with ta_tokenhist.

there are various indexed tokenhist columns that i want to query on (in
addition to salesorder_id). none of them will return more than 100 or so
rows. it'd probably be easier to abandon the view altogether (which isn't
something i'd really like to do)

> Second, you might want to change your token history status from a string
> to an integer that references a status table.  If your view is causing a
> sequential scan, you're going to end up will bazillions of string
> comparisons.  I don't know if Postgres has some form of black magic
> string comparison optimization, but I generally avoid string comparisons
> when I am dealing with a few known values, as would be the case in a
> status table.

interesting thought. of course, i'd rather postgres didn't do a full
sequential scan ;-)

>
>
> Rich Doughty wrote:
>
>>
>> I have a view vw_tokens defined as
>>
>>   CREATE VIEW tokens.vw_tokens AS SELECT
>>       -- too many columns to mention
>>   FROM
>>       tokens.ta_tokens      t LEFT JOIN
>>       tokens.ta_tokenhist   i ON t.token_id = i.token_id AND
>>                                      i.status   = 'issued'   LEFT JOIN
>>       tokens.ta_tokenhist   s ON t.token_id = s.token_id AND
>>                                      s.status   = 'sold'     LEFT JOIN
>>       tokens.ta_tokenhist   r ON t.token_id = r.token_id AND
>>                                    r.status   = 'redeemed'
>>   ;
>>
>>
>> the ta_tokens table contains approx 4 million records, and ta_tokenhist
>> approx 10 millions. queries against the view itself on the primary key
>> execute with no issues at all.
>>
>> I cannot however perform a meaningful join against this view. when i
>> execute
>>
>>   SELECT *
>>   FROM
>>       tokens.ta_tokenhist h INNER JOIN
>>       tokens.vw_tokens    t ON h.token_id = t.token_id
>>   WHERE
>>       h.sarreport_id = 9
>>   ;
>>
>> PG forms the full output of the view. the query plan is
>>
>>
>>  Hash Join  (cost=1638048.47..3032073.73 rows=1 width=702)
>>    Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
>>    ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907
>> width=470)
>>          Hash Cond: (("outer".token_id)::integer =
>> ("inner".token_id)::integer)
>>          ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907
>> width=322)
>>                Hash Cond: (("outer".token_id)::integer =
>> ("inner".token_id)::integer)
>>                ->  Hash Left Join  (cost=559931.55..1093783.71
>> rows=4052907 width=174)
>>                      Hash Cond: (("outer".token_id)::integer =
>> ("inner".token_id)::integer)
>>                      ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07
>> rows=4052907 width=26)
>>                      ->  Hash  (cost=459239.41..459239.41 rows=4114456
>> width=152)
>>                            ->  Seq Scan on ta_tokenhist i
>> (cost=0.00..459239.41 rows=4114456 width=152)
>>                                  Filter: ((status)::text =
>> 'issued'::text)
>>                ->  Hash  (cost=459239.41..459239.41 rows=3905186
>> width=152)
>>                      ->  Seq Scan on ta_tokenhist s
>> (cost=0.00..459239.41 rows=3905186 width=152)
>>                            Filter: ((status)::text = 'sold'::text)
>>          ->  Hash  (cost=459239.41..459239.41 rows=2617645 width=152)
>>                ->  Seq Scan on ta_tokenhist r  (cost=0.00..459239.41
>> rows=2617645 width=152)
>>                      Filter: ((status)::text = 'redeemed'::text)
>>    ->  Hash  (cost=6.01..6.01 rows=1 width=236)
>>          ->  Index Scan using fkx_tokenhist__sarreports on
>> ta_tokenhist h  (cost=0.00..6.01 rows=1 width=236)
>>                Index Cond: ((sarreport_id)::integer = 9)
>>
>>
>> I have also tried explicitly querying token_id in the view, hoping
>> to force a nested loop:
>>
>>
>>   EXPLAIN
>>   SELECT *
>>   FROM
>>       tokens.vw_tokens__user
>>   WHERE
>>     token_id IN (SELECT token_id FROM tokens.ta_tokenhist WHERE
>> sarreport_id = 9);
>>
>>
>>                                                 QUERY PLAN
>> ----------------------------------------------------------------------------------------------------------
>>
>>  Hash IN Join  (cost=1638048.47..3032073.73 rows=1 width=470)
>>    Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
>>    ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907
>> width=470)
>>          Hash Cond: (("outer".token_id)::integer =
>> ("inner".token_id)::integer)
>>          ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907
>> width=322)
>>                Hash Cond: (("outer".token_id)::integer =
>> ("inner".token_id)::integer)
>>                ->  Hash Left Join  (cost=559931.55..1093783.71
>> rows=4052907 width=174)
>>                      Hash Cond: (("outer".token_id)::integer =
>> ("inner".token_id)::integer)
>>                      ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07
>> rows=4052907 width=26)
>>                      ->  Hash  (cost=459239.41..459239.41 rows=4114456
>> width=152)
>>                            ->  Seq Scan on ta_tokenhist i
>> (cost=0.00..459239.41 rows=4114456 width=152)
>>                                  Filter: ((status)::text =
>> 'issued'::text)
>>                ->  Hash  (cost=459239.41..459239.41 rows=3905186
>> width=152)
>>                      ->  Seq Scan on ta_tokenhist s
>> (cost=0.00..459239.41 rows=3905186 width=152)
>>                            Filter: ((status)::text = 'sold'::text)
>>          ->  Hash  (cost=459239.41..459239.41 rows=2617645 width=152)
>>                ->  Seq Scan on ta_tokenhist r  (cost=0.00..459239.41
>> rows=2617645 width=152)
>>                      Filter: ((status)::text = 'redeemed'::text)
>>    ->  Hash  (cost=6.01..6.01 rows=1 width=4)
>>          ->  Index Scan using fkx_tokenhist__sarreports on
>> ta_tokenhist  (cost=0.00..6.01 rows=1 width=4)
>>                Index Cond: ((sarreport_id)::integer = 9)
>>
>>
>> Setting enable_mergejoin and enable_hashjoin to off results in a nested
>> but still forms the view output.
>>
>>
>> I can achieve the results i need be eliminating the view and writing
>> the query manually but for various reasons i'd prefer to query a
>> view.
>>
>> Any advice is greatly appreciated. i'm starting to wonder if the using
>> a view in this instance is futile.
>>
>> Many thanks
>>


--

   - Rich Doughty

Re: View with an outer join - is there any way to optimise

From
John McCawley
Date:
First of all, check out this thread:

http://archives.postgresql.org/pgsql-general/2005-11/msg00734.php

I had a similar performance issue with a view.  Look at my solution and
it might help you out.

Second, you might want to change your token history status from a string
to an integer that references a status table.  If your view is causing a
sequential scan, you're going to end up will bazillions of string
comparisons.  I don't know if Postgres has some form of black magic
string comparison optimization, but I generally avoid string comparisons
when I am dealing with a few known values, as would be the case in a
status table.


Rich Doughty wrote:

>
> I have a view vw_tokens defined as
>
>   CREATE VIEW tokens.vw_tokens AS SELECT
>       -- too many columns to mention
>   FROM
>       tokens.ta_tokens      t LEFT JOIN
>       tokens.ta_tokenhist   i ON t.token_id = i.token_id AND
>                                      i.status   = 'issued'   LEFT JOIN
>       tokens.ta_tokenhist   s ON t.token_id = s.token_id AND
>                                      s.status   = 'sold'     LEFT JOIN
>       tokens.ta_tokenhist   r ON t.token_id = r.token_id AND
>                                    r.status   = 'redeemed'
>   ;
>
>
> the ta_tokens table contains approx 4 million records, and ta_tokenhist
> approx 10 millions. queries against the view itself on the primary key
> execute with no issues at all.
>
> I cannot however perform a meaningful join against this view. when i
> execute
>
>   SELECT *
>   FROM
>       tokens.ta_tokenhist h INNER JOIN
>       tokens.vw_tokens    t ON h.token_id = t.token_id
>   WHERE
>       h.sarreport_id = 9
>   ;
>
> PG forms the full output of the view. the query plan is
>
>
>  Hash Join  (cost=1638048.47..3032073.73 rows=1 width=702)
>    Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
>    ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907
> width=470)
>          Hash Cond: (("outer".token_id)::integer =
> ("inner".token_id)::integer)
>          ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907
> width=322)
>                Hash Cond: (("outer".token_id)::integer =
> ("inner".token_id)::integer)
>                ->  Hash Left Join  (cost=559931.55..1093783.71
> rows=4052907 width=174)
>                      Hash Cond: (("outer".token_id)::integer =
> ("inner".token_id)::integer)
>                      ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07
> rows=4052907 width=26)
>                      ->  Hash  (cost=459239.41..459239.41 rows=4114456
> width=152)
>                            ->  Seq Scan on ta_tokenhist i
> (cost=0.00..459239.41 rows=4114456 width=152)
>                                  Filter: ((status)::text =
> 'issued'::text)
>                ->  Hash  (cost=459239.41..459239.41 rows=3905186
> width=152)
>                      ->  Seq Scan on ta_tokenhist s
> (cost=0.00..459239.41 rows=3905186 width=152)
>                            Filter: ((status)::text = 'sold'::text)
>          ->  Hash  (cost=459239.41..459239.41 rows=2617645 width=152)
>                ->  Seq Scan on ta_tokenhist r  (cost=0.00..459239.41
> rows=2617645 width=152)
>                      Filter: ((status)::text = 'redeemed'::text)
>    ->  Hash  (cost=6.01..6.01 rows=1 width=236)
>          ->  Index Scan using fkx_tokenhist__sarreports on
> ta_tokenhist h  (cost=0.00..6.01 rows=1 width=236)
>                Index Cond: ((sarreport_id)::integer = 9)
>
>
> I have also tried explicitly querying token_id in the view, hoping
> to force a nested loop:
>
>
>   EXPLAIN
>   SELECT *
>   FROM
>       tokens.vw_tokens__user
>   WHERE
>     token_id IN (SELECT token_id FROM tokens.ta_tokenhist WHERE
> sarreport_id = 9);
>
>
>                                                 QUERY PLAN
> ----------------------------------------------------------------------------------------------------------
>
>  Hash IN Join  (cost=1638048.47..3032073.73 rows=1 width=470)
>    Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
>    ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907
> width=470)
>          Hash Cond: (("outer".token_id)::integer =
> ("inner".token_id)::integer)
>          ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907
> width=322)
>                Hash Cond: (("outer".token_id)::integer =
> ("inner".token_id)::integer)
>                ->  Hash Left Join  (cost=559931.55..1093783.71
> rows=4052907 width=174)
>                      Hash Cond: (("outer".token_id)::integer =
> ("inner".token_id)::integer)
>                      ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07
> rows=4052907 width=26)
>                      ->  Hash  (cost=459239.41..459239.41 rows=4114456
> width=152)
>                            ->  Seq Scan on ta_tokenhist i
> (cost=0.00..459239.41 rows=4114456 width=152)
>                                  Filter: ((status)::text =
> 'issued'::text)
>                ->  Hash  (cost=459239.41..459239.41 rows=3905186
> width=152)
>                      ->  Seq Scan on ta_tokenhist s
> (cost=0.00..459239.41 rows=3905186 width=152)
>                            Filter: ((status)::text = 'sold'::text)
>          ->  Hash  (cost=459239.41..459239.41 rows=2617645 width=152)
>                ->  Seq Scan on ta_tokenhist r  (cost=0.00..459239.41
> rows=2617645 width=152)
>                      Filter: ((status)::text = 'redeemed'::text)
>    ->  Hash  (cost=6.01..6.01 rows=1 width=4)
>          ->  Index Scan using fkx_tokenhist__sarreports on
> ta_tokenhist  (cost=0.00..6.01 rows=1 width=4)
>                Index Cond: ((sarreport_id)::integer = 9)
>
>
> Setting enable_mergejoin and enable_hashjoin to off results in a nested
> but still forms the view output.
>
>
> I can achieve the results i need be eliminating the view and writing
> the query manually but for various reasons i'd prefer to query a
> view.
>
> Any advice is greatly appreciated. i'm starting to wonder if the using
> a view in this instance is futile.
>
> Many thanks
>

Re: View with an outer join - is there any way to optimise

From
John McCawley
Date:
You should be able to use my trick...the join that is giving you the
problem is:

SELECT *
  FROM
      tokens.ta_tokenhist h INNER JOIN
      tokens.vw_tokens    t ON h.token_id = t.token_id
  WHERE
      h.sarreport_id = 9 ;


ta_tokenhist is already part of your view, right?  So you should be able
to include the sarreport_id as part of your view, and then restructure
your query as:


SELECT *
  FROM
      tokens.ta_tokenhist INNER JOIN
      tokens.vw_tokens ON tokens.ta_tokenhist.token_id =
tokens.vw_tokens.token_id
  WHERE
      tokens.vw_tokens.sarreport_id = 9 ;

I removed the aliases because they confuse me ;)



Re: View with an outer join - is there any way to optimise

From
Rich Doughty
Date:
John McCawley wrote:
> You should be able to use my trick...the join that is giving you the
> problem is:
>
> SELECT *
>  FROM
>      tokens.ta_tokenhist h INNER JOIN
>      tokens.vw_tokens    t ON h.token_id = t.token_id
>  WHERE
>      h.sarreport_id = 9 ;
>
>
> ta_tokenhist is already part of your view, right?  So you should be able
> to include the sarreport_id as part of your view, and then restructure
> your query as:
>
>
> SELECT *
>  FROM
>      tokens.ta_tokenhist INNER JOIN
>      tokens.vw_tokens ON tokens.ta_tokenhist.token_id =
> tokens.vw_tokens.token_id
>  WHERE
>      tokens.vw_tokens.sarreport_id = 9 ;
>
> I removed the aliases because they confuse me ;)

i don't think i can do that. basically i want to run a variety of queries
on the vw_tokens view. for example, joins i hope to do may include:

   tokens.ta_tokenhist h INNER JOIN tokens.vw_tokens WHERE h.customer_id = ?
   tokens.ta_tokenhist h INNER JOIN tokens.vw_tokens WHERE h.histdate between 'then' and 'now'
   tokens.vw_tokens WHERE number = ?

i just want vw_tokens to give me a constant resultset. i have a feeling
though that views aren't go to be able to give me what i need.

i suppose i could go for a set returning function, or just write the
queries manually.


--

   - Rich Doughty

Re: View with an outer join - is there any way to optimise this?

From
Tom Lane
Date:
Rich Doughty <rich@opusvl.com> writes:
> I have a view vw_tokens defined as
> ...
> I cannot however perform a meaningful join against this view.
> ...
> PG forms the full output of the view.

You seem to be wishing that PG would push the INNER JOIN down inside the
nested LEFT JOINs.  In general, rearranging inner and outer joins like
that can change the results.  There are limited cases where it can be
done without breaking the query semantics, but the planner doesn't
currently have any logic to analyze whether it's safe or not, so it just
doesn't try.

Improving this situation is (or ought to be) on the TODO list, but I dunno
when it will happen.

            regards, tom lane

Re: View with an outer join - is there any way to optimise

From
Rich Doughty
Date:
Tom Lane wrote:
> Rich Doughty <rich@opusvl.com> writes:
>
>>I have a view vw_tokens defined as
>>...
>>I cannot however perform a meaningful join against this view.
>>...
>>PG forms the full output of the view.
>
>
> You seem to be wishing that PG would push the INNER JOIN down inside the
> nested LEFT JOINs.  In general, rearranging inner and outer joins like
> that can change the results.  There are limited cases where it can be
> done without breaking the query semantics, but the planner doesn't
> currently have any logic to analyze whether it's safe or not, so it just
> doesn't try.
>
> Improving this situation is (or ought to be) on the TODO list, but I dunno
> when it will happen.

ok, thanks. as i suspected, i don't think i'm going to be able to views for
this. when the query is ultimately returning only 100 or so rows, i cannot
afford a full 4 million row table scan to form the full view when a nested
loop might make more sense (anything makes more sense than the full view :-)

i have a workaround (of sorts). instead of

   WHERE token_id IN (SELECT token_id FROM ta_tokenhist WHERE sarreport_id = 9)

if i perform the subquery manually, then create a second query of the form

   WHERE token_id IN (?,?,?,?,?)

i get decent results. it's pretty ugly but it works. i doubt that it will
scale up to 500 or more results (if that), but thankfully in general, neither
do the query results.

cheers anyway

   - Rich Doughty