Thread: bad plan

bad plan

From
Gaetano Mendola
Date:
I posted this on hackers, but I had to post it here.


===================================================================================================================================
Hi all,
running a 7.4.5 engine, I'm facing this bad plan:


empdb=# explain analyze SELECT name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp
empdb-#                    FROM v_sc_user_request
empdb-#                    WHERE
empdb-#                         login = 'babinow1'
empdb-#                    LIMIT 10 ;
                                                                                                    QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1716.38..1716.39 rows=1 width=232) (actual time=52847.239..52847.322 rows=10 loops=1)
   ->  Subquery Scan v_sc_user_request  (cost=1716.38..1716.39 rows=1 width=232) (actual time=52847.234..52847.301
rows=10loops=1) 
         ->  Sort  (cost=1716.38..1716.39 rows=1 width=201) (actual time=52847.219..52847.227 rows=10 loops=1)
               Sort Key: sr.id_sat_request
               ->  Nested Loop Left Join  (cost=1478.82..1716.37 rows=1 width=201) (actual time=3254.483..52847.064
rows=31loops=1) 
                     Join Filter: ("outer".id_package = "inner".id_package)
                     ->  Nested Loop  (cost=493.09..691.55 rows=1 width=193) (actual time=347.665..940.582 rows=31
loops=1)
                           ->  Nested Loop  (cost=493.09..688.49 rows=1 width=40) (actual time=331.446..505.628 rows=31
loops=1)
                                 Join Filter: ("inner".id_user = "outer".id_user)
                                 ->  Index Scan using user_login_login_key on user_login ul  (cost=0.00..4.00 rows=2
width=16)(actual time=12.065..12.071 rows=1 loops=1) 
                                       Index Cond: ((login)::text = 'babinow1'::text)
                                 ->  Materialize  (cost=493.09..531.37 rows=7656 width=28) (actual
time=167.654..481.813rows=8363 loops=1) 
                                       ->  Seq Scan on sat_request sr  (cost=0.00..493.09 rows=7656 width=28) (actual
time=167.644..467.344rows=8363 loops=1) 
                                             Filter: (request_time > (now() - '1 mon'::interval))
                           ->  Index Scan using url_pkey on url u  (cost=0.00..3.05 rows=1 width=161) (actual
time=13.994..14.000rows=1 loops=31) 
                                 Index Cond: ("outer".id_url = u.id_url)
                     ->  Subquery Scan vsp  (cost=985.73..1016.53 rows=1103 width=12) (actual time=25.328..1668.754
rows=493loops=31) 
                           ->  Merge Join  (cost=985.73..1011.01 rows=1103 width=130) (actual time=25.321..1666.666
rows=493loops=31) 
                                 Merge Cond: ("outer".id_program = "inner".id_program)
                                 ->  Sort  (cost=20.74..20.97 rows=93 width=19) (actual time=0.385..0.431 rows=47
loops=31)
                                       Sort Key: programs.id_program
                                       ->  Seq Scan on programs  (cost=0.00..17.70 rows=93 width=19) (actual
time=0.022..11.709rows=48 loops=1) 
                                             Filter: (id_program <> 0)
                                 ->  Sort  (cost=964.99..967.75 rows=1102 width=115) (actual time=14.592..15.218
rows=493loops=31) 
                                       Sort Key: sequences.id_program
                                       ->  Merge Join  (cost=696.16..909.31 rows=1102 width=115) (actual
time=79.717..451.495rows=493 loops=1) 
                                             Merge Cond: ("outer".id_package = "inner".id_package)
                                             ->  Merge Left Join  (cost=0.00..186.59 rows=1229 width=103) (actual
time=0.101..366.854rows=1247 loops=1) 
                                                   Merge Cond: ("outer".id_package = "inner".id_package)
                                                   ->  Index Scan using packages_pkey on packages p  (cost=0.00..131.04
rows=1229width=103) (actual time=0.048..163.503 rows=1247 loops=1) 
                                                   ->  Index Scan using package_security_id_package_key on
package_securityps  (cost=0.00..46.83 rows=855 width=4) (actual time=0.022..178.599 rows=879 loops=1) 
                                             ->  Sort  (cost=696.16..705.69 rows=3812 width=16) (actual
time=79.582..79.968rows=493 loops=1) 
                                                   Sort Key: sequences.id_package
                                                   ->  Seq Scan on sequences  (cost=0.00..469.42 rows=3812 width=16)
(actualtime=0.012..78.863 rows=493 loops=1) 
                                                         Filter: (estimated_start IS NOT NULL)
 Total runtime: 52878.516 ms
(36 rows)


Disabling the nestloop then the execution time become more affordable:

empdb=# set enable_nestloop = false;
SET
empdb=# explain analyze SELECT name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp
empdb-#                    FROM v_sc_user_request
empdb-#                    WHERE
empdb-#                         login = 'babinow1'
empdb-#                    LIMIT 10 ;
                                                                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=4467.64..4467.65 rows=1 width=232) (actual time=7091.233..7091.289 rows=10 loops=1)
   ->  Subquery Scan v_sc_user_request  (cost=4467.64..4467.65 rows=1 width=232) (actual time=7091.228..7091.272
rows=10loops=1) 
         ->  Sort  (cost=4467.64..4467.64 rows=1 width=201) (actual time=7091.216..7091.221 rows=10 loops=1)
               Sort Key: sr.id_sat_request
               ->  Merge Left Join  (cost=4462.07..4467.63 rows=1 width=201) (actual time=6377.732..7091.067 rows=31
loops=1)
                     Merge Cond: ("outer".id_package = "inner".id_package)
                     ->  Sort  (cost=3389.81..3389.81 rows=1 width=193) (actual time=1338.759..1338.814 rows=31
loops=1)
                           Sort Key: sr.id_package
                           ->  Merge Join  (cost=3285.05..3389.80 rows=1 width=193) (actual time=1318.877..1338.651
rows=31loops=1) 
                                 Merge Cond: ("outer".id_url = "inner".id_url)
                                 ->  Sort  (cost=1029.26..1029.26 rows=1 width=40) (actual time=703.085..703.113
rows=31loops=1) 
                                       Sort Key: sr.id_url
                                       ->  Merge Join  (cost=991.00..1029.25 rows=1 width=40) (actual
time=702.740..702.984rows=31 loops=1) 
                                             Merge Cond: ("outer".id_user = "inner".id_user)
                                             ->  Sort  (cost=986.99..1006.13 rows=7656 width=28) (actual
time=648.559..655.302rows=8041 loops=1) 
                                                   Sort Key: sr.id_user
                                                   ->  Seq Scan on sat_request sr  (cost=0.00..493.09 rows=7656
width=28)(actual time=201.968..614.631 rows=8363 loops=1) 
                                                         Filter: (request_time > (now() - '1 mon'::interval))
                                             ->  Sort  (cost=4.01..4.02 rows=2 width=16) (actual time=35.252..35.282
rows=1loops=1) 
                                                   Sort Key: ul.id_user
                                                   ->  Index Scan using user_login_login_key on user_login ul
(cost=0.00..4.00rows=2 width=16) (actual time=35.214..35.221 rows=1 loops=1) 
                                                         Index Cond: ((login)::text = 'babinow1'::text)
                                 ->  Sort  (cost=2255.79..2308.95 rows=21264 width=161) (actual time=587.664..602.490
rows=21250loops=1) 
                                       Sort Key: u.id_url
                                       ->  Seq Scan on url u  (cost=0.00..727.32 rows=21264 width=161) (actual
time=0.026..418.586rows=21264 loops=1) 
                     ->  Sort  (cost=1072.27..1075.03 rows=1103 width=12) (actual time=5015.761..5016.092 rows=493
loops=1)
                           Sort Key: vsp.id_package
                           ->  Subquery Scan vsp  (cost=985.73..1016.53 rows=1103 width=12) (actual
time=898.876..5014.570rows=494 loops=1) 
                                 ->  Merge Join  (cost=985.73..1011.01 rows=1103 width=130) (actual
time=898.869..5011.954rows=494 loops=1) 
                                       Merge Cond: ("outer".id_program = "inner".id_program)
                                       ->  Sort  (cost=20.74..20.97 rows=93 width=19) (actual time=29.669..29.708
rows=47loops=1) 
                                             Sort Key: programs.id_program
                                             ->  Seq Scan on programs  (cost=0.00..17.70 rows=93 width=19) (actual
time=0.035..29.525rows=48 loops=1) 
                                                   Filter: (id_program <> 0)
                                       ->  Sort  (cost=964.99..967.75 rows=1102 width=115) (actual
time=868.619..869.286rows=494 loops=1) 
                                             Sort Key: sequences.id_program
                                             ->  Merge Join  (cost=696.16..909.31 rows=1102 width=115) (actual
time=44.820..867.649rows=494 loops=1) 
                                                   Merge Cond: ("outer".id_package = "inner".id_package)
                                                   ->  Merge Left Join  (cost=0.00..186.59 rows=1229 width=103) (actual
time=19.563..835.352rows=1248 loops=1) 
                                                         Merge Cond: ("outer".id_package = "inner".id_package)
                                                         ->  Index Scan using packages_pkey on packages p
(cost=0.00..131.04rows=1229 width=103) (actual time=12.796..457.520 rows=1248 loops=1) 
                                                         ->  Index Scan using package_security_id_package_key on
package_securityps  (cost=0.00..46.83 rows=855 width=4) (actual time=6.703..283.944 rows=879 loops=1) 
                                                   ->  Sort  (cost=696.16..705.69 rows=3812 width=16) (actual
time=25.222..25.705rows=494 loops=1) 
                                                         Sort Key: sequences.id_package
                                                         ->  Seq Scan on sequences  (cost=0.00..469.42 rows=3812
width=16)(actual time=0.017..24.412 rows=494 loops=1) 
                                                               Filter: (estimated_start IS NOT NULL)
 Total runtime: 7104.946 ms
(47 rows)



May I know wich parameter may I tune in order to avoid to "disable" the nested loop ?

===================================================================================================================================

I tried to reduce the runtime cost adding a new column on sat_request ( expired boolean ) in order to
have a better row extimation ( I used a partial index ) but nothing changed.


I finally was able to reduce the cost putting ( just for this query ):
set cpu_tuple_cost = 0.07

is it a resonable value ?


empdb=# set cpu_tuple_cost = 0.07;
SET
empdb=# explain analyze select *
empdb-# from v_sc_user_request
empdb-# where login = 'babinow1';
                                                                                       QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan v_sc_user_request  (cost=1978.23..1978.30 rows=1 width=364) (actual time=1612.719..1613.064 rows=31
loops=1)
   ->  Sort  (cost=1978.23..1978.23 rows=1 width=201) (actual time=1612.700..1612.728 rows=31 loops=1)
         Sort Key: sr.id_sat_request
         ->  Merge Left Join  (cost=1974.05..1978.22 rows=1 width=201) (actual time=1537.343..1612.565 rows=31 loops=1)
               Merge Cond: ("outer".id_package = "inner".id_package)
               ->  Sort  (cost=887.92..887.93 rows=1 width=193) (actual time=475.924..476.020 rows=31 loops=1)
                     Sort Key: sr.id_package
                     ->  Nested Loop  (cost=4.07..887.91 rows=1 width=193) (actual time=145.782..475.851 rows=31
loops=1)
                           ->  Hash Join  (cost=4.07..884.65 rows=1 width=40) (actual time=139.816..464.678 rows=31
loops=1)
                                 Hash Cond: ("outer".id_user = "inner".id_user)
                                 ->  Index Scan using idx_sat_request_expired on sat_request sr  (cost=0.00..838.69
rows=8363width=28) (actual time=19.696..443.702 rows=8460 loops=1) 
                                       Index Cond: (expired = false)
                                 ->  Hash  (cost=4.07..4.07 rows=2 width=16) (actual time=11.779..11.779 rows=0
loops=1)
                                       ->  Index Scan using user_login_login_key on user_login ul  (cost=0.00..4.07
rows=2width=16) (actual time=11.725..11.732 rows=1 loops=1) 
                                             Index Cond: ((login)::text = 'babinow1'::text)
                           ->  Index Scan using url_pkey on url u  (cost=0.00..3.19 rows=1 width=161) (actual
time=0.345..0.347rows=1 loops=31) 
                                 Index Cond: ("outer".id_url = u.id_url)
               ->  Sort  (cost=1086.13..1088.16 rows=813 width=12) (actual time=1060.374..1060.622 rows=390 loops=1)
                     Sort Key: vsp.id_package
                     ->  Subquery Scan vsp  (cost=676.18..1046.83 rows=813 width=12) (actual time=625.645..1059.388
rows=480loops=1) 
                           ->  Hash Join  (cost=676.18..989.92 rows=813 width=131) (actual time=625.637..1057.105
rows=480loops=1) 
                                 Hash Cond: ("outer".id_package = "inner".id_package)
                                 ->  Hash Left Join  (cost=79.67..302.87 rows=1341 width=104) (actual
time=4.336..18.549rows=1342 loops=1) 
                                       Hash Cond: ("outer".id_package = "inner".id_package)
                                       ->  Seq Scan on packages p  (cost=0.00..145.87 rows=1341 width=104) (actual
time=0.007..3.357rows=1342 loops=1) 
                                       ->  Hash  (cost=77.27..77.27 rows=961 width=4) (actual time=3.685..3.685 rows=0
loops=1)
                                             ->  Seq Scan on package_security ps  (cost=0.00..77.27 rows=961 width=4)
(actualtime=0.016..2.175 rows=974 loops=1) 
                                 ->  Hash  (cost=594.48..594.48 rows=813 width=31) (actual time=620.397..620.397 rows=0
loops=1)
                                       ->  Hash Join  (cost=20.60..594.48 rows=813 width=31) (actual
time=38.307..619.406rows=480 loops=1) 
                                             Hash Cond: ("outer".id_program = "inner".id_program)
                                             ->  Seq Scan on sequences  (cost=0.00..512.82 rows=830 width=16) (actual
time=16.858..595.262rows=480 loops=1) 
                                                   Filter: (estimated_start IS NOT NULL)
                                             ->  Hash  (cost=20.48..20.48 rows=47 width=19) (actual time=21.093..21.093
rows=0loops=1) 
                                                   ->  Seq Scan on programs  (cost=0.00..20.48 rows=47 width=19)
(actualtime=9.369..20.980 rows=48 loops=1) 
                                                         Filter: (id_program <> 0)
 Total runtime: 1614.123 ms
(36 rows)



Regards
Gaetano Mendola









Re: bad plan

From
Richard Huxton
Date:
Gaetano Mendola wrote:
> running a 7.4.5 engine, I'm facing this bad plan:
>
> empdb=# explain analyze SELECT name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp
> empdb-#                    FROM v_sc_user_request
> empdb-#                    WHERE
> empdb-#                         login = 'babinow1'
> empdb-#                    LIMIT 10 ;

>                      ->  Subquery Scan vsp  (cost=985.73..1016.53 rows=1103 width=12) (actual time=25.328..1668.754
rows=493loops=31) 
>                            ->  Merge Join  (cost=985.73..1011.01 rows=1103 width=130) (actual time=25.321..1666.666
rows=493loops=31) 
>                                  Merge Cond: ("outer".id_program = "inner".id_program)

The problem to address is in this subquery. That's a total of 31 x
(1668.754 - 25.328) = 50seconds (about).

Since your query is so simple, I'm guessing v_sc_user_request is a view.
Can you provide the definition?
--
   Richard Huxton
   Archonet Ltd

Re: bad plan

From
Gaetano Mendola
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Richard Huxton wrote:
> Gaetano Mendola wrote:
>
>> running a 7.4.5 engine, I'm facing this bad plan:
>>
>> empdb=# explain analyze SELECT
>> name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp
>>
>> empdb-#                    FROM v_sc_user_request
>> empdb-#                    WHERE
>> empdb-#                         login = 'babinow1'
>> empdb-#                    LIMIT 10 ;
>
>
>>                      ->  Subquery Scan vsp  (cost=985.73..1016.53
>> rows=1103 width=12) (actual time=25.328..1668.754 rows=493 loops=31)
>>                            ->  Merge Join  (cost=985.73..1011.01
>> rows=1103 width=130) (actual time=25.321..1666.666 rows=493 loops=31)
>>                                  Merge Cond: ("outer".id_program =
>> "inner".id_program)
>
>
> The problem to address is in this subquery. That's a total of 31 x
> (1668.754 - 25.328) = 50seconds (about).
>
> Since your query is so simple, I'm guessing v_sc_user_request is a view.
> Can you provide the definition?

Of course:



CREATE OR REPLACE VIEW v_sc_user_request AS
  SELECT
      *
  FROM
      v_sat_request vsr LEFT OUTER JOIN v_sc_packages vsp USING ( id_package )
  WHERE
      vsr.request_time > now() - '1 month'::interval AND
      vsr.expired = FALSE
  ORDER BY id_sat_request DESC
;


CREATE OR REPLACE VIEW v_sc_packages AS
  SELECT
     *
  FROM
      v_programs           vpr,
      v_packages           vpk,
      v_sequences          vs

  WHERE
     ------------ JOIN -------------
      vpr.id_program = vs.id_program AND
      vpk.id_package = vs.id_package AND
     -------------------------------
      vs.estimated_start IS NOT NULL
;

CREATE OR REPLACE VIEW v_sat_request AS
  SELECT
     *
  FROM
     sat_request sr,
     url         u,
     user_login  ul
  WHERE
     ---------------- JOIN ---------------------
     sr.id_url  = u.id_url AND
     sr.id_user = ul.id_user
     -------------------------------------------
;


that column expired was added since yesterday

Regards
Gaetano Mendola









-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCLZkD7UpzwH2SGd4RAv8/AKCA5cNfu6vEKZ6m/ke1JsVRdsOTXQCbBMt4
ZPTFjwyb52CrFxdUTD6gejs=
=STzz
-----END PGP SIGNATURE-----


Re: bad plan

From
Richard Huxton
Date:
Gaetano Mendola wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Richard Huxton wrote:
>
>>Gaetano Mendola wrote:
>>
>>
>>>running a 7.4.5 engine, I'm facing this bad plan:
>>>
>>>empdb=# explain analyze SELECT
>>>name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp
>>>
>>>empdb-#                    FROM v_sc_user_request
>>>empdb-#                    WHERE
>>>empdb-#                         login = 'babinow1'
>>>empdb-#                    LIMIT 10 ;
>>
>>
>>>                     ->  Subquery Scan vsp  (cost=985.73..1016.53
>>>rows=1103 width=12) (actual time=25.328..1668.754 rows=493 loops=31)
>>>                           ->  Merge Join  (cost=985.73..1011.01
>>>rows=1103 width=130) (actual time=25.321..1666.666 rows=493 loops=31)
>>>                                 Merge Cond: ("outer".id_program =
>>>"inner".id_program)
>>
>>
>>The problem to address is in this subquery. That's a total of 31 x
>>(1668.754 - 25.328) = 50seconds (about).
>>
>>Since your query is so simple, I'm guessing v_sc_user_request is a view.
>>Can you provide the definition?
>
>
> Of course:
>
>
>
> CREATE OR REPLACE VIEW v_sc_user_request AS
>   SELECT
>       *
>   FROM
>       v_sat_request vsr LEFT OUTER JOIN v_sc_packages vsp USING ( id_package )
>   WHERE
>       vsr.request_time > now() - '1 month'::interval AND
>       vsr.expired = FALSE
>   ORDER BY id_sat_request DESC
> ;
>
>
> CREATE OR REPLACE VIEW v_sc_packages AS
>   SELECT
>      *
>   FROM
>       v_programs           vpr,
>       v_packages           vpk,
>       v_sequences          vs
>
>   WHERE
>      ------------ JOIN -------------
>       vpr.id_program = vs.id_program AND
>       vpk.id_package = vs.id_package AND
>      -------------------------------
>       vs.estimated_start IS NOT NULL
> ;
>
> CREATE OR REPLACE VIEW v_sat_request AS
>   SELECT
>      *
>   FROM
>      sat_request sr,
>      url         u,
>      user_login  ul
>   WHERE
>      ---------------- JOIN ---------------------
>      sr.id_url  = u.id_url AND
>      sr.id_user = ul.id_user
>      -------------------------------------------
> ;

OK, so looking at the original EXPLAIN the order of processing seems to be:
1. v_sat_request is evaluated and filtered on login='...' (lines 7..15)
This gives us 31 rows
2. The left-join from v_sat_request to v_sc_packages is processed (lines
5..6)
This involves the subquery scan on vsp (from line 16) where it seems to
think the best idea is a merge join of programs to sequences.

So - I think we need to look at the performance of your view
"v_sc_packages" and the views that it depends on. OK - can you reply to
this with just the definitions of v_sc_packages and what it depends on,
and we can have a look at that.

Do you need all these tables involved in this query? I don't think PG is
smart enough to completely discard a join if it's not needed by the
output. Thinking about it, I'm not sure you could safely.
--
   Richard Huxton
   Archonet Ltd

Re: bad plan

From
Gaetano Mendola
Date:
Richard Huxton wrote:

> OK, so looking at the original EXPLAIN the order of processing seems to be:
> 1. v_sat_request is evaluated and filtered on login='...' (lines 7..15)
> This gives us 31 rows
> 2. The left-join from v_sat_request to v_sc_packages is processed (lines
> 5..6)
> This involves the subquery scan on vsp (from line 16) where it seems to
> think the best idea is a merge join of programs to sequences.

Whel basically v_sc_packages depends on other 3 views that are just a simple
interface to a plain table.


If I execute a select only on this table I get reasonable executions time:


=== cpu_tuple_cost = 0.07

# explain analyze select * from v_sc_packages where id_package = 19628;
                                                                              QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..15.96 rows=1 width=131) (actual time=41.450..41.494 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..11.86 rows=1 width=116) (actual time=1.022..1.055 rows=1 loops=1)
         ->  Nested Loop Left Join  (cost=0.00..7.89 rows=1 width=104) (actual time=0.330..0.345 rows=1 loops=1)
               ->  Index Scan using packages_pkey on packages p  (cost=0.00..3.90 rows=1 width=104) (actual
time=0.070..0.075rows=1 loops=1) 
                     Index Cond: (id_package = 19628)
               ->  Index Scan using package_security_id_package_key on package_security ps  (cost=0.00..3.91 rows=1
width=4)(actual time=0.232..0.237 rows=1 loops=1) 
                     Index Cond: ("outer".id_package = ps.id_package)
         ->  Index Scan using idx_sequences_id_package on sequences  (cost=0.00..3.90 rows=1 width=16) (actual
time=0.670..0.685rows=1 loops=1) 
               Index Cond: (19628 = id_package)
               Filter: (estimated_start IS NOT NULL)
   ->  Index Scan using programs_pkey on programs  (cost=0.00..4.02 rows=1 width=19) (actual time=0.078..0.086 rows=1
loops=1)
         Index Cond: (programs.id_program = "outer".id_program)
         Filter: (id_program <> 0)
 Total runtime: 42.650 ms
(14 rows)

=== cpu_tuple_cost = 0.01

# explain analyze select * from v_sc_packages where id_package = 19628;
                                                                               QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..15.54 rows=1 width=131) (actual time=25.062..69.977 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..11.56 rows=1 width=116) (actual time=5.396..50.299 rows=1 loops=1)
         ->  Nested Loop Left Join  (cost=0.00..7.71 rows=1 width=104) (actual time=5.223..32.842 rows=1 loops=1)
               ->  Index Scan using packages_pkey on packages p  (cost=0.00..3.84 rows=1 width=104) (actual
time=0.815..7.235rows=1 loops=1) 
                     Index Cond: (id_package = 19628)
               ->  Index Scan using package_security_id_package_key on package_security ps  (cost=0.00..3.85 rows=1
width=4)(actual time=4.366..25.555 rows=1 loops=1) 
                     Index Cond: ("outer".id_package = ps.id_package)
         ->  Index Scan using idx_sequences_id_package on sequences  (cost=0.00..3.84 rows=1 width=16) (actual
time=0.147..17.422rows=1 loops=1) 
               Index Cond: (19628 = id_package)
               Filter: (estimated_start IS NOT NULL)
   ->  Index Scan using programs_pkey on programs  (cost=0.00..3.96 rows=1 width=19) (actual time=0.043..0.049 rows=1
loops=1)
         Index Cond: (programs.id_program = "outer".id_program)
         Filter: (id_program <> 0)
 Total runtime: 70.254 ms
(14 rows)


and I get the best with this:

=== cpu_tuple_cost = 0.001


# explain analyze select * from v_sc_packages where id_package = 19628;
                                                                           QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..15.48 rows=1 width=131) (actual time=2.516..2.553 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..7.78 rows=1 width=31) (actual time=1.439..1.457 rows=1 loops=1)
         ->  Index Scan using idx_sequences_id_package on sequences  (cost=0.00..3.83 rows=1 width=16) (actual
time=0.442..0.450rows=1 loops=1) 
               Index Cond: (19628 = id_package)
               Filter: (estimated_start IS NOT NULL)
         ->  Index Scan using programs_pkey on programs  (cost=0.00..3.95 rows=1 width=19) (actual time=0.972..0.978
rows=1loops=1) 
               Index Cond: (programs.id_program = "outer".id_program)
               Filter: (id_program <> 0)
   ->  Nested Loop Left Join  (cost=0.00..7.68 rows=1 width=104) (actual time=0.110..0.125 rows=1 loops=1)
         ->  Index Scan using packages_pkey on packages p  (cost=0.00..3.84 rows=1 width=104) (actual time=0.040..0.046
rows=1loops=1) 
               Index Cond: (id_package = 19628)
         ->  Index Scan using package_security_id_package_key on package_security ps  (cost=0.00..3.84 rows=1 width=4)
(actualtime=0.036..0.042 rows=1 loops=1) 
               Index Cond: ("outer".id_package = ps.id_package)
 Total runtime: 2.878 ms
(14 rows)



but with this last setting for the original query is choosed a very bad plan.


Regards
Gaetano Mendola






Re: bad plan

From
Richard Huxton
Date:
Gaetano Mendola wrote:
> Richard Huxton wrote:
>
>
>>OK, so looking at the original EXPLAIN the order of processing seems to be:
>>1. v_sat_request is evaluated and filtered on login='...' (lines 7..15)
>>This gives us 31 rows
>>2. The left-join from v_sat_request to v_sc_packages is processed (lines
>>5..6)
>>This involves the subquery scan on vsp (from line 16) where it seems to
>>think the best idea is a merge join of programs to sequences.
>
>
> Whel basically v_sc_packages depends on other 3 views that are just a simple
> interface to a plain table.
>
>
> If I execute a select only on this table I get reasonable executions time:
>
>
> === cpu_tuple_cost = 0.07
>
> # explain analyze select * from v_sc_packages where id_package = 19628;
>                                                                               QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..15.96 rows=1 width=131) (actual time=41.450..41.494 rows=1 loops=1)
>    ->  Nested Loop  (cost=0.00..11.86 rows=1 width=116) (actual time=1.022..1.055 rows=1 loops=1)
>          ->  Nested Loop Left Join  (cost=0.00..7.89 rows=1 width=104) (actual time=0.330..0.345 rows=1 loops=1)
>                ->  Index Scan using packages_pkey on packages p  (cost=0.00..3.90 rows=1 width=104) (actual
time=0.070..0.075rows=1 loops=1) 
>                      Index Cond: (id_package = 19628)
>                ->  Index Scan using package_security_id_package_key on package_security ps  (cost=0.00..3.91 rows=1
width=4)(actual time=0.232..0.237 rows=1 loops=1) 
>                      Index Cond: ("outer".id_package = ps.id_package)
>          ->  Index Scan using idx_sequences_id_package on sequences  (cost=0.00..3.90 rows=1 width=16) (actual
time=0.670..0.685rows=1 loops=1) 
>                Index Cond: (19628 = id_package)
>                Filter: (estimated_start IS NOT NULL)
>    ->  Index Scan using programs_pkey on programs  (cost=0.00..4.02 rows=1 width=19) (actual time=0.078..0.086 rows=1
loops=1)
>          Index Cond: (programs.id_program = "outer".id_program)
>          Filter: (id_program <> 0)
>  Total runtime: 42.650 ms
> (14 rows)

> === cpu_tuple_cost = 0.01

> === cpu_tuple_cost = 0.001

I don't know what you think you're measuring, but it's nothing to do
with the plans. If you look at the plans carefully, you'll see they're
all the same. The "cost" numbers change because that's the parameter
you're changing.

I'm not sure it makes sense to vary cpu_tuple_cost from 0.07 down to
0.001 - that's a factor of 70 difference. I might be tempted to halve or
double it, but even then only after some serious testing.

--
   Richard Huxton
   Archonet Ltd

Re: bad plan

From
Tom Lane
Date:
Gaetano Mendola <mendola@bigfoot.com> writes:
>> Since your query is so simple, I'm guessing v_sc_user_request is a view.
>> Can you provide the definition?

> Of course:

I don't think you've told us the whole truth about the v_sc_packages
view.  The definition as given doesn't work at all (it'll have
duplicate column names), but more to the point, if it were that simple
then the planner would fold it into the parent query.  The subquery
scan node indicates that folding did not occur.  The most likely reason
for that is that there's an ORDER BY in the view.

Putting ORDER BYs in views that you intend to use as components of other
views is a bad practice from a performance perspective...

            regards, tom lane

Re: bad plan

From
Richard Huxton
Date:
Tom Lane wrote:
>
> Putting ORDER BYs in views that you intend to use as components of other
> views is a bad practice from a performance perspective...

There are also a lot of views involved here for very few output columns.
Tom - is the planner smart enough to optimise-out unneeded columns from
a SELECT * view if it's part of a join/subquery and you only use one or
two columns?

Secondly, in the original plan we have:
->  Nested Loop Left Join  (cost=1478.82..1716.37 rows=1 width=201)
(actual time=3254.483..52847.064 rows=31 loops=1)

Now, we've got 31 rows instead of 1 here. The one side of the join ends
up as:
->  Subquery Scan vsp  (cost=985.73..1016.53 rows=1103 width=12) (actual
time=25.328..1668.754 rows=493 loops=31)
->  Merge Join  (cost=985.73..1011.01 rows=1103 width=130) (actual
time=25.321..1666.666 rows=493 loops=31)

Would I be right in thinking the planner doesn't materialise the
subquery because it's expecting 1 loop not 31? If there were 1 row the
plan would seem OK to me.

Is there any mileage in the idea of a "lazy" planner that keeps some
alternative paths around in case they're needed? Or a reactive one that
can re-plan nodes when assumptions turn out to be wrong?

--
   Richard Huxton
   Archonet Ltd

Re: bad plan

From
Tom Lane
Date:
Richard Huxton <dev@archonet.com> writes:
> There are also a lot of views involved here for very few output columns.
> Tom - is the planner smart enough to optimise-out unneeded columns from
> a SELECT * view if it's part of a join/subquery and you only use one or
> two columns?

If the view gets flattened, yes, but I believe that it's not bright
enough to do so when it can't flatten the view.  You could tell easily
enough by looking at the row-width estimates at various levels of the
plan.  (Let's see ... in Gaetano's plan the SubqueryScan is returning
12-byte rows where its input MergeJoin is returning 130-byte rows,
so sure enough the view is computing a lot of stuff that then gets
thrown away.)

> Would I be right in thinking the planner doesn't materialise the
> subquery because it's expecting 1 loop not 31? If there were 1 row the
> plan would seem OK to me.

Right; it doesn't see any predicted gain from the extra cost of
materializing.  But to me the main problem here is not that, it is that
the entire shape of the plan would likely be different if it weren't for
the "optimization fence" that the Subquery Scan node represents.  I
suspect too that the use of mergejoin as opposed to anything else within
the vsp subplan is driven more by the need to produce sorted output than
by what is the cheapest way to get the rows.

            regards, tom lane

Re: bad plan

From
Gaetano Mendola
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Lane wrote:
> Gaetano Mendola <mendola@bigfoot.com> writes:
>
>>>Since your query is so simple, I'm guessing v_sc_user_request is a view.
>>>Can you provide the definition?
>
>
>>Of course:
>
>
> I don't think you've told us the whole truth about the v_sc_packages
> view.  The definition as given doesn't work at all (it'll have
> duplicate column names), but more to the point, if it were that simple
> then the planner would fold it into the parent query.  The subquery
> scan node indicates that folding did not occur.  The most likely reason
> for that is that there's an ORDER BY in the view.

I didn't say the complete truth because the view definition is long so I just omitted
all fields.

explain analyze SELECT name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp
FROM v_sc_user_request
WHERE login = 'babinow1'
LIMIT 10 ;

these are the complete  definitions of views involved in the query:



CREATE OR REPLACE VIEW v_sc_user_request AS
  SELECT
      vsr.id_sat_request                                               AS id_sat_request,
      vsr.id_user                                                      AS id_user,
      vsr.login                                                        AS login,
      vsr.url                                                          AS url,
      vsr.name                                                         AS name,
      vsr.descr                                                        AS descr,
      vsr.size                                                         AS size,
      trunc(vsr.size/1024.0/1024.0,2)                                  AS size_mb,
      vsr.id_sat_request_status                                        AS id_sat_request_status,
      sp_lookup_key('sat_request_status',   vsr.id_sat_request_status) AS request_status,
      sp_lookup_descr('sat_request_status', vsr.id_sat_request_status) AS request_status_descr,
      vsr.id_url_status                                                AS id_url_status,
      sp_lookup_key('url_status',           vsr.id_url_status)         AS url_status,
      sp_lookup_descr('url_status',         vsr.id_url_status)         AS url_status_descr,
      vsr.url_time_stamp                                               AS url_time_stamp,
      date_trunc('seconds',vsr.request_time)                           AS request_time_stamp,
      vsr.id_package                                                   AS id_package,
      COALESCE(date_trunc('seconds',vsp.estimated_start)::text,'NA')   AS estimated_start

  FROM
      v_sat_request vsr LEFT OUTER JOIN v_sc_packages vsp USING ( id_package )
  WHERE
      vsr.request_time > now() - '1 month'::interval AND
      vsr.expired = FALSE
  ORDER BY id_sat_request DESC
;




CREATE OR REPLACE VIEW v_sat_request AS
  SELECT
     sr.id_user               AS id_user,
     ul.login                 AS login,
     sr.id_sat_request        AS id_sat_request,
     u.id_url                 AS id_url,
     u.url                    AS url,
     u.name                   AS name,
     u.descr                  AS descr,
     u.size                   AS size,
     u.storage                AS storage,
     sr.id_package            AS id_package,
     sr.id_sat_request_status AS id_sat_request_status,
     sr.request_time          AS request_time,
     sr.work_time             AS request_work_time,
     u.id_url_status          AS id_url_status,
     u.time_stamp             AS url_time_stamp,
     sr.expired               AS expired
  FROM
     sat_request sr,
     url         u,
     user_login  ul
  WHERE
     ---------------- JOIN ---------------------
     sr.id_url  = u.id_url AND
     sr.id_user = ul.id_user
     -------------------------------------------
;




CREATE OR REPLACE VIEW v_sc_packages AS
  SELECT

     vpr.id_program                            AS id_program,
     vpr.name                                  AS program_name,

     vpk.id_package                            AS id_package,
     date_trunc('seconds', vs.estimated_start) AS estimated_start,

     vpk.name                                  AS package_name,
     vpk.TYPE                                  AS TYPE,
     vpk.description                           AS description,
     vpk.target                                AS target,
     vpk.fec                                   AS fec_alg,
     vpk.output_group - vpk.input_group        AS fec_redundancy,
     vpk.priority                              AS priority,
     vpk.updatable                             AS updatable,
     vpk.auto_listen                           AS auto_listen,
     vpk.start_file                            AS start_file,
     vpk.view_target_group                     AS view_target_group,
     vpk.target_group                          AS target_group

  FROM
      v_programs           vpr,
      v_packages           vpk,
      v_sequences          vs

  WHERE
     ------------ JOIN -------------
      vpr.id_program = vs.id_program AND
      vpk.id_package = vs.id_package AND

     -------------------------------
      vs.estimated_start IS NOT NULL
;



CREATE OR REPLACE VIEW v_programs AS
  SELECT id_program     AS id_program,
         id_publisher   AS id_publisher,
         name           AS name,
         description    AS description,
         sp_lookup_key('program_type',   id_program_type)    AS TYPE,
         sp_lookup_key('program_status', id_program_status)  AS status,
         last_position  AS last_position
  FROM programs
  WHERE id_program<>0
;


CREATE OR REPLACE VIEW v_packages AS
  SELECT p.id_package            AS id_package,
         p.id_publisher          AS id_publisher,
         p.name                  AS name,
         p.information           AS information,
         p.description           AS description,
         sp_lookup_key('package_type', p.id_package_type)
                                 AS TYPE,
         sp_lookup_key('target', p.id_target)
                                 AS target,
         p.port                  AS port,
         p.priority              AS priority,
         sp_lookup_key('fec', p.id_fec)
                                 AS fec,
         p.input_group           AS input_group,
         p.output_group          AS output_group,
         p.updatable             AS updatable,
         p.checksum              AS checksum,
         p.version               AS version,
         p.start_file            AS start_file,
         p.view_target_group     AS view_target_group,
         p.target_group          AS target_group,
         p.auto_listen           AS auto_listen,
         p.public_flag           AS public_flag,
         p.needed_version        AS needed_version,
         p.logic_version         AS logic_version,
         p.package_size          AS package_size,
         ps.id_drm_process       AS id_drm_process,
         ps.id_cas_service       AS id_cas_service,
         ps.id_cas_settings      AS id_cas_settings,
         ps.id_drm_service       AS id_drm_service

  FROM packages p LEFT OUTER JOIN package_security ps USING (id_package)
 ;



CREATE OR REPLACE VIEW v_sequences AS
  SELECT id_package        AS id_package,
         id_program        AS id_program,
         internal_position AS internal_position,
         estimated_start   AS estimated_start
  FROM sequences
;



> Putting ORDER BYs in views that you intend to use as components of other
> views is a bad practice from a performance perspective...

Indeed when a view is involved in a join we do not put "order by" in it ( at
least this is what I try to do ), I have to say also that some time I see that replacing
the view with the tables that it represent the execution time is better
( I have an example to show you if you are interested in it ).


Regards
Gaetano Mendola





-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCLkef7UpzwH2SGd4RAt90AJ9e3qUSx2fxiOO2aA30TbLsOdyV7ACfd0RY
+2A3U6dDfWw/H4eWcmI8mS0=
=t1AD
-----END PGP SIGNATURE-----