Thread: Strange query plan invloving a view

Strange query plan invloving a view

From
Rich Doughty
Date:
i have the following query involving a view that i really need to optimise:

SELECT *
FROM
    tokens.ta_tokenhist h INNER JOIN
    tokens.vw_tokens    t ON h.token_id = t.token_id
WHERE
    h.sarreport_id = 9
;

where vw_tokens is defined as

CREATE VIEW tokens.vw_tokens AS SELECT
    -- too many columns to mention
FROM
    tokens.ta_tokens      t LEFT JOIN
    tokens.ta_tokenhist   i ON t.token_id = i.token_id AND
                                    i.status   = 'issued'   LEFT JOIN
    tokens.ta_tokenhist   s ON t.token_id = s.token_id AND
                                    s.status   = 'sold'     LEFT JOIN
    tokens.ta_tokenhist   r ON t.token_id = r.token_id AND
                                    r.status   = 'redeemed'
;

this gives me the following query plan:

Merge Join  (cost=18276278.45..31793043.16 rows=55727 width=322)
    Merge Cond: (("outer".token_id)::integer = "inner"."?column23?")
    ->  Merge Left Join  (cost=18043163.64..31639175.71 rows=4228018 width=76)
          Merge Cond: (("outer".token_id)::integer = "inner"."?column3?")
          ->  Merge Left Join  (cost=13649584.94..27194793.37 rows=4228018 width=48)
                Merge Cond: (("outer".token_id)::integer = "inner"."?column3?")
                ->  Merge Left Join  (cost=7179372.62..20653326.29 rows=4228018 width=44)
                      Merge Cond: (("outer".token_id)::integer = "inner"."?column3?")
                      ->  Index Scan using ta_tokens_pkey on ta_tokens t  (cost=0.00..13400398.89 rows=4053805
width=27)
                      ->  Sort  (cost=7179372.62..7189942.67 rows=4228018 width=21)
                            Sort Key: (i.token_id)::integer
                            ->  Index Scan using fkx_tokenhist__status on ta_tokenhist i  (cost=0.00..6315961.47
rows=4228018width=21) 
                                  Index Cond: ((status)::text = 'issued'::text)
                ->  Sort  (cost=6470212.32..6479909.69 rows=3878949 width=8)
                      Sort Key: (s.token_id)::integer
                      ->  Index Scan using fkx_tokenhist__status on ta_tokenhist s  (cost=0.00..5794509.99 rows=3878949
width=8)
                            Index Cond: ((status)::text = 'sold'::text)
          ->  Sort  (cost=4393578.70..4400008.00 rows=2571718 width=32)
                Sort Key: (r.token_id)::integer
                ->  Index Scan using fkx_tokenhist__status on ta_tokenhist r  (cost=0.00..3841724.02 rows=2571718
width=32)
                      Index Cond: ((status)::text = 'redeemed'::text)
    ->  Sort  (cost=233114.81..233248.38 rows=53430 width=246)
          Sort Key: (h.token_id)::integer
          ->  Index Scan using fkx_tokenhist__sarreports on ta_tokenhist h  (cost=0.00..213909.12 rows=53430 width=246)
                Index Cond: ((sarreport_id)::integer = 9)


However, the following query (which i believe should be equivalent)

SELECT *
FROM
    tokens.ta_tokenhist   h INNER JOIN
    tokens.ta_tokens      t ON h.token_id = t.token_id LEFT JOIN
    tokens.ta_tokenhist   i ON t.token_id = i.token_id AND
                                    i.status   = 'issued'   LEFT JOIN
    tokens.ta_tokenhist   s ON t.token_id = s.token_id AND
                                    s.status   = 'sold'     LEFT JOIN
    tokens.ta_tokenhist   r ON t.token_id = r.token_id AND
                                    r.status   = 'redeemed'
WHERE
    h.sarreport_id = 9
;

gives the following query plan:

  Nested Loop Left Join  (cost=0.00..3475785.52 rows=55727 width=1011)
    ->  Nested Loop Left Join  (cost=0.00..2474425.17 rows=55727 width=765)
          ->  Nested Loop Left Join  (cost=0.00..1472368.23 rows=55727 width=519)
                ->  Nested Loop  (cost=0.00..511614.87 rows=53430 width=273)
                      ->  Index Scan using fkx_tokenhist__sarreports on ta_tokenhist h  (cost=0.00..213909.12
rows=53430width=246) 
                            Index Cond: ((sarreport_id)::integer = 9)
                      ->  Index Scan using ta_tokens_pkey on ta_tokens t  (cost=0.00..5.56 rows=1 width=27)
                            Index Cond: (("outer".token_id)::integer = (t.token_id)::integer)
                ->  Index Scan using fkx_tokenhist__tokens on ta_tokenhist i  (cost=0.00..17.96 rows=2 width=246)
                      Index Cond: (("outer".token_id)::integer = (i.token_id)::integer)
                      Filter: ((status)::text = 'issued'::text)
          ->  Index Scan using fkx_tokenhist__tokens on ta_tokenhist s  (cost=0.00..17.96 rows=2 width=246)
                Index Cond: (("outer".token_id)::integer = (s.token_id)::integer)
                Filter: ((status)::text = 'sold'::text)
    ->  Index Scan using fkx_tokenhist__tokens on ta_tokenhist r  (cost=0.00..17.96 rows=1 width=246)
          Index Cond: (("outer".token_id)::integer = (r.token_id)::integer)
          Filter: ((status)::text = 'redeemed'::text)

This query returns a lot quicker than the plan would suggest, as the
planner is over-estimating the amount of rows where
((sarreport_id)::integer = 9). it thinks there are 53430 when in fact
there are only 7 (despite a vacuum and analyse).

Can anyone give me any suggestions? are the index stats the cause of
my problem, or is it the rewrite of the query?

Cheers


Version: PostgreSQL 8.0.3 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)


--

   - Rich Doughty

Re: Strange query plan invloving a view

From
Tom Lane
Date:
Rich Doughty <rich@opusvl.com> writes:
> However, the following query (which i believe should be equivalent)

> SELECT *
> FROM
>     tokens.ta_tokenhist   h INNER JOIN
>     tokens.ta_tokens      t ON h.token_id = t.token_id LEFT JOIN
>     tokens.ta_tokenhist   i ON t.token_id = i.token_id AND
>                                     i.status   = 'issued'   LEFT JOIN
>     tokens.ta_tokenhist   s ON t.token_id = s.token_id AND
>                                     s.status   = 'sold'     LEFT JOIN
>     tokens.ta_tokenhist   r ON t.token_id = r.token_id AND
>                                     r.status   = 'redeemed'
> WHERE
>     h.sarreport_id = 9
> ;

No, that's not equivalent at all, because the implicit parenthesization
is left-to-right; therefore you've injected the constraint to a few rows
of ta_tokenhist (and therefore only a few rows of ta_tokens) into the
bottom of the LEFT JOIN stack.  In the other case the constraint is at
the wrong end of the join stack, and so the full view output gets formed
before anything gets thrown away.

Some day the Postgres planner will probably be smart enough to rearrange
the join order despite the presence of outer joins ... but today is not
that day.

            regards, tom lane

Re: Strange query plan invloving a view

From
Rich Doughty
Date:
Tom Lane wrote:
> Rich Doughty <rich@opusvl.com> writes:
>
>>However, the following query (which i believe should be equivalent)
>
>
>>SELECT *
>>FROM
>>    tokens.ta_tokenhist   h INNER JOIN
>>    tokens.ta_tokens      t ON h.token_id = t.token_id LEFT JOIN
>>    tokens.ta_tokenhist   i ON t.token_id = i.token_id AND
>>                                    i.status   = 'issued'   LEFT JOIN
>>    tokens.ta_tokenhist   s ON t.token_id = s.token_id AND
>>                                    s.status   = 'sold'     LEFT JOIN
>>    tokens.ta_tokenhist   r ON t.token_id = r.token_id AND
>>                                    r.status   = 'redeemed'
>>WHERE
>>    h.sarreport_id = 9
>>;
>
>
> No, that's not equivalent at all, because the implicit parenthesization
> is left-to-right; therefore you've injected the constraint to a few rows
> of ta_tokenhist (and therefore only a few rows of ta_tokens) into the
> bottom of the LEFT JOIN stack.  In the other case the constraint is at
> the wrong end of the join stack, and so the full view output gets formed
> before anything gets thrown away.
>
> Some day the Postgres planner will probably be smart enough to rearrange
> the join order despite the presence of outer joins ... but today is not
> that day.

thanks for the reply.

is there any way i can achieve what i need to by using views, or should i
just use a normal query? i'd prefer to use a view but i just can't get round
the performance hit.

--

   - Rich Doughty