Re: View with an outer join - is there any way to optimise - Mailing list pgsql-general

From Rich Doughty
Subject Re: View with an outer join - is there any way to optimise
Date
Msg-id 439DACC8.2080609@opusvl.com
Whole thread Raw
In response to View with an outer join - is there any way to optimise this?  (Rich Doughty <rich@opusvl.com>)
Responses Re: View with an outer join - is there any way to optimise
List pgsql-general
John McCawley wrote:
> First of all, check out this thread:
>
> http://archives.postgresql.org/pgsql-general/2005-11/msg00734.php
>
> I had a similar performance issue with a view.  Look at my solution and
> it might help you out.

i'm not sure that'll help in my case as ta_tokens has a 1-to-many
relationship with ta_tokenhist.

there are various indexed tokenhist columns that i want to query on (in
addition to salesorder_id). none of them will return more than 100 or so
rows. it'd probably be easier to abandon the view altogether (which isn't
something i'd really like to do)

> Second, you might want to change your token history status from a string
> to an integer that references a status table.  If your view is causing a
> sequential scan, you're going to end up will bazillions of string
> comparisons.  I don't know if Postgres has some form of black magic
> string comparison optimization, but I generally avoid string comparisons
> when I am dealing with a few known values, as would be the case in a
> status table.

interesting thought. of course, i'd rather postgres didn't do a full
sequential scan ;-)

>
>
> Rich Doughty wrote:
>
>>
>> I have a view vw_tokens defined as
>>
>>   CREATE VIEW tokens.vw_tokens AS SELECT
>>       -- too many columns to mention
>>   FROM
>>       tokens.ta_tokens      t LEFT JOIN
>>       tokens.ta_tokenhist   i ON t.token_id = i.token_id AND
>>                                      i.status   = 'issued'   LEFT JOIN
>>       tokens.ta_tokenhist   s ON t.token_id = s.token_id AND
>>                                      s.status   = 'sold'     LEFT JOIN
>>       tokens.ta_tokenhist   r ON t.token_id = r.token_id AND
>>                                    r.status   = 'redeemed'
>>   ;
>>
>>
>> the ta_tokens table contains approx 4 million records, and ta_tokenhist
>> approx 10 millions. queries against the view itself on the primary key
>> execute with no issues at all.
>>
>> I cannot however perform a meaningful join against this view. when i
>> execute
>>
>>   SELECT *
>>   FROM
>>       tokens.ta_tokenhist h INNER JOIN
>>       tokens.vw_tokens    t ON h.token_id = t.token_id
>>   WHERE
>>       h.sarreport_id = 9
>>   ;
>>
>> PG forms the full output of the view. the query plan is
>>
>>
>>  Hash Join  (cost=1638048.47..3032073.73 rows=1 width=702)
>>    Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
>>    ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907
>> width=470)
>>          Hash Cond: (("outer".token_id)::integer =
>> ("inner".token_id)::integer)
>>          ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907
>> width=322)
>>                Hash Cond: (("outer".token_id)::integer =
>> ("inner".token_id)::integer)
>>                ->  Hash Left Join  (cost=559931.55..1093783.71
>> rows=4052907 width=174)
>>                      Hash Cond: (("outer".token_id)::integer =
>> ("inner".token_id)::integer)
>>                      ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07
>> rows=4052907 width=26)
>>                      ->  Hash  (cost=459239.41..459239.41 rows=4114456
>> width=152)
>>                            ->  Seq Scan on ta_tokenhist i
>> (cost=0.00..459239.41 rows=4114456 width=152)
>>                                  Filter: ((status)::text =
>> 'issued'::text)
>>                ->  Hash  (cost=459239.41..459239.41 rows=3905186
>> width=152)
>>                      ->  Seq Scan on ta_tokenhist s
>> (cost=0.00..459239.41 rows=3905186 width=152)
>>                            Filter: ((status)::text = 'sold'::text)
>>          ->  Hash  (cost=459239.41..459239.41 rows=2617645 width=152)
>>                ->  Seq Scan on ta_tokenhist r  (cost=0.00..459239.41
>> rows=2617645 width=152)
>>                      Filter: ((status)::text = 'redeemed'::text)
>>    ->  Hash  (cost=6.01..6.01 rows=1 width=236)
>>          ->  Index Scan using fkx_tokenhist__sarreports on
>> ta_tokenhist h  (cost=0.00..6.01 rows=1 width=236)
>>                Index Cond: ((sarreport_id)::integer = 9)
>>
>>
>> I have also tried explicitly querying token_id in the view, hoping
>> to force a nested loop:
>>
>>
>>   EXPLAIN
>>   SELECT *
>>   FROM
>>       tokens.vw_tokens__user
>>   WHERE
>>     token_id IN (SELECT token_id FROM tokens.ta_tokenhist WHERE
>> sarreport_id = 9);
>>
>>
>>                                                 QUERY PLAN
>> ----------------------------------------------------------------------------------------------------------
>>
>>  Hash IN Join  (cost=1638048.47..3032073.73 rows=1 width=470)
>>    Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
>>    ->  Hash Left Join  (cost=1638042.45..3011803.15 rows=4052907
>> width=470)
>>          Hash Cond: (("outer".token_id)::integer =
>> ("inner".token_id)::integer)
>>          ->  Hash Left Join  (cost=1114741.93..2011923.86 rows=4052907
>> width=322)
>>                Hash Cond: (("outer".token_id)::integer =
>> ("inner".token_id)::integer)
>>                ->  Hash Left Join  (cost=559931.55..1093783.71
>> rows=4052907 width=174)
>>                      Hash Cond: (("outer".token_id)::integer =
>> ("inner".token_id)::integer)
>>                      ->  Seq Scan on ta_tokens t  (cost=0.00..73250.07
>> rows=4052907 width=26)
>>                      ->  Hash  (cost=459239.41..459239.41 rows=4114456
>> width=152)
>>                            ->  Seq Scan on ta_tokenhist i
>> (cost=0.00..459239.41 rows=4114456 width=152)
>>                                  Filter: ((status)::text =
>> 'issued'::text)
>>                ->  Hash  (cost=459239.41..459239.41 rows=3905186
>> width=152)
>>                      ->  Seq Scan on ta_tokenhist s
>> (cost=0.00..459239.41 rows=3905186 width=152)
>>                            Filter: ((status)::text = 'sold'::text)
>>          ->  Hash  (cost=459239.41..459239.41 rows=2617645 width=152)
>>                ->  Seq Scan on ta_tokenhist r  (cost=0.00..459239.41
>> rows=2617645 width=152)
>>                      Filter: ((status)::text = 'redeemed'::text)
>>    ->  Hash  (cost=6.01..6.01 rows=1 width=4)
>>          ->  Index Scan using fkx_tokenhist__sarreports on
>> ta_tokenhist  (cost=0.00..6.01 rows=1 width=4)
>>                Index Cond: ((sarreport_id)::integer = 9)
>>
>>
>> Setting enable_mergejoin and enable_hashjoin to off results in a nested
>> but still forms the view output.
>>
>>
>> I can achieve the results i need be eliminating the view and writing
>> the query manually but for various reasons i'd prefer to query a
>> view.
>>
>> Any advice is greatly appreciated. i'm starting to wonder if the using
>> a view in this instance is futile.
>>
>> Many thanks
>>


--

   - Rich Doughty

pgsql-general by date:

Previous
From: Rich Doughty
Date:
Subject: View with an outer join - is there any way to optimise this?
Next
From: John McCawley
Date:
Subject: Re: View with an outer join - is there any way to optimise