Thread: bad plan
I posted this on hackers, but I had to post it here. =================================================================================================================================== Hi all, running a 7.4.5 engine, I'm facing this bad plan: empdb=# explain analyze SELECT name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp empdb-# FROM v_sc_user_request empdb-# WHERE empdb-# login = 'babinow1' empdb-# LIMIT 10 ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=1716.38..1716.39 rows=1 width=232) (actual time=52847.239..52847.322 rows=10 loops=1) -> Subquery Scan v_sc_user_request (cost=1716.38..1716.39 rows=1 width=232) (actual time=52847.234..52847.301 rows=10loops=1) -> Sort (cost=1716.38..1716.39 rows=1 width=201) (actual time=52847.219..52847.227 rows=10 loops=1) Sort Key: sr.id_sat_request -> Nested Loop Left Join (cost=1478.82..1716.37 rows=1 width=201) (actual time=3254.483..52847.064 rows=31loops=1) Join Filter: ("outer".id_package = "inner".id_package) -> Nested Loop (cost=493.09..691.55 rows=1 width=193) (actual time=347.665..940.582 rows=31 loops=1) -> Nested Loop (cost=493.09..688.49 rows=1 width=40) (actual time=331.446..505.628 rows=31 loops=1) Join Filter: ("inner".id_user = "outer".id_user) -> Index Scan using user_login_login_key on user_login ul (cost=0.00..4.00 rows=2 width=16)(actual time=12.065..12.071 rows=1 loops=1) Index Cond: ((login)::text = 'babinow1'::text) -> Materialize (cost=493.09..531.37 rows=7656 width=28) (actual time=167.654..481.813rows=8363 loops=1) -> Seq Scan on sat_request sr (cost=0.00..493.09 rows=7656 width=28) (actual time=167.644..467.344rows=8363 loops=1) Filter: (request_time > (now() - '1 mon'::interval)) -> Index Scan using url_pkey on url u (cost=0.00..3.05 rows=1 width=161) (actual time=13.994..14.000rows=1 loops=31) Index Cond: ("outer".id_url = u.id_url) -> Subquery Scan vsp (cost=985.73..1016.53 rows=1103 width=12) (actual time=25.328..1668.754 rows=493loops=31) -> Merge Join (cost=985.73..1011.01 rows=1103 width=130) (actual time=25.321..1666.666 rows=493loops=31) Merge Cond: ("outer".id_program = "inner".id_program) -> Sort (cost=20.74..20.97 rows=93 width=19) (actual time=0.385..0.431 rows=47 loops=31) Sort Key: programs.id_program -> Seq Scan on programs (cost=0.00..17.70 rows=93 width=19) (actual time=0.022..11.709rows=48 loops=1) Filter: (id_program <> 0) -> Sort (cost=964.99..967.75 rows=1102 width=115) (actual time=14.592..15.218 rows=493loops=31) Sort Key: sequences.id_program -> Merge Join (cost=696.16..909.31 rows=1102 width=115) (actual time=79.717..451.495rows=493 loops=1) Merge Cond: ("outer".id_package = "inner".id_package) -> Merge Left Join (cost=0.00..186.59 rows=1229 width=103) (actual time=0.101..366.854rows=1247 loops=1) Merge Cond: ("outer".id_package = "inner".id_package) -> Index Scan using packages_pkey on packages p (cost=0.00..131.04 rows=1229width=103) (actual time=0.048..163.503 rows=1247 loops=1) -> Index Scan using package_security_id_package_key on package_securityps (cost=0.00..46.83 rows=855 width=4) (actual time=0.022..178.599 rows=879 loops=1) -> Sort (cost=696.16..705.69 rows=3812 width=16) (actual time=79.582..79.968rows=493 loops=1) Sort Key: sequences.id_package -> Seq Scan on sequences (cost=0.00..469.42 rows=3812 width=16) (actualtime=0.012..78.863 rows=493 loops=1) Filter: (estimated_start IS NOT NULL) Total runtime: 52878.516 ms (36 rows) Disabling the nestloop then the execution time become more affordable: empdb=# set enable_nestloop = false; SET empdb=# explain analyze SELECT name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp empdb-# FROM v_sc_user_request empdb-# WHERE empdb-# login = 'babinow1' empdb-# LIMIT 10 ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=4467.64..4467.65 rows=1 width=232) (actual time=7091.233..7091.289 rows=10 loops=1) -> Subquery Scan v_sc_user_request (cost=4467.64..4467.65 rows=1 width=232) (actual time=7091.228..7091.272 rows=10loops=1) -> Sort (cost=4467.64..4467.64 rows=1 width=201) (actual time=7091.216..7091.221 rows=10 loops=1) Sort Key: sr.id_sat_request -> Merge Left Join (cost=4462.07..4467.63 rows=1 width=201) (actual time=6377.732..7091.067 rows=31 loops=1) Merge Cond: ("outer".id_package = "inner".id_package) -> Sort (cost=3389.81..3389.81 rows=1 width=193) (actual time=1338.759..1338.814 rows=31 loops=1) Sort Key: sr.id_package -> Merge Join (cost=3285.05..3389.80 rows=1 width=193) (actual time=1318.877..1338.651 rows=31loops=1) Merge Cond: ("outer".id_url = "inner".id_url) -> Sort (cost=1029.26..1029.26 rows=1 width=40) (actual time=703.085..703.113 rows=31loops=1) Sort Key: sr.id_url -> Merge Join (cost=991.00..1029.25 rows=1 width=40) (actual time=702.740..702.984rows=31 loops=1) Merge Cond: ("outer".id_user = "inner".id_user) -> Sort (cost=986.99..1006.13 rows=7656 width=28) (actual time=648.559..655.302rows=8041 loops=1) Sort Key: sr.id_user -> Seq Scan on sat_request sr (cost=0.00..493.09 rows=7656 width=28)(actual time=201.968..614.631 rows=8363 loops=1) Filter: (request_time > (now() - '1 mon'::interval)) -> Sort (cost=4.01..4.02 rows=2 width=16) (actual time=35.252..35.282 rows=1loops=1) Sort Key: ul.id_user -> Index Scan using user_login_login_key on user_login ul (cost=0.00..4.00rows=2 width=16) (actual time=35.214..35.221 rows=1 loops=1) Index Cond: ((login)::text = 'babinow1'::text) -> Sort (cost=2255.79..2308.95 rows=21264 width=161) (actual time=587.664..602.490 rows=21250loops=1) Sort Key: u.id_url -> Seq Scan on url u (cost=0.00..727.32 rows=21264 width=161) (actual time=0.026..418.586rows=21264 loops=1) -> Sort (cost=1072.27..1075.03 rows=1103 width=12) (actual time=5015.761..5016.092 rows=493 loops=1) Sort Key: vsp.id_package -> Subquery Scan vsp (cost=985.73..1016.53 rows=1103 width=12) (actual time=898.876..5014.570rows=494 loops=1) -> Merge Join (cost=985.73..1011.01 rows=1103 width=130) (actual time=898.869..5011.954rows=494 loops=1) Merge Cond: ("outer".id_program = "inner".id_program) -> Sort (cost=20.74..20.97 rows=93 width=19) (actual time=29.669..29.708 rows=47loops=1) Sort Key: programs.id_program -> Seq Scan on programs (cost=0.00..17.70 rows=93 width=19) (actual time=0.035..29.525rows=48 loops=1) Filter: (id_program <> 0) -> Sort (cost=964.99..967.75 rows=1102 width=115) (actual time=868.619..869.286rows=494 loops=1) Sort Key: sequences.id_program -> Merge Join (cost=696.16..909.31 rows=1102 width=115) (actual time=44.820..867.649rows=494 loops=1) Merge Cond: ("outer".id_package = "inner".id_package) -> Merge Left Join (cost=0.00..186.59 rows=1229 width=103) (actual time=19.563..835.352rows=1248 loops=1) Merge Cond: ("outer".id_package = "inner".id_package) -> Index Scan using packages_pkey on packages p (cost=0.00..131.04rows=1229 width=103) (actual time=12.796..457.520 rows=1248 loops=1) -> Index Scan using package_security_id_package_key on package_securityps (cost=0.00..46.83 rows=855 width=4) (actual time=6.703..283.944 rows=879 loops=1) -> Sort (cost=696.16..705.69 rows=3812 width=16) (actual time=25.222..25.705rows=494 loops=1) Sort Key: sequences.id_package -> Seq Scan on sequences (cost=0.00..469.42 rows=3812 width=16)(actual time=0.017..24.412 rows=494 loops=1) Filter: (estimated_start IS NOT NULL) Total runtime: 7104.946 ms (47 rows) May I know wich parameter may I tune in order to avoid to "disable" the nested loop ? =================================================================================================================================== I tried to reduce the runtime cost adding a new column on sat_request ( expired boolean ) in order to have a better row extimation ( I used a partial index ) but nothing changed. I finally was able to reduce the cost putting ( just for this query ): set cpu_tuple_cost = 0.07 is it a resonable value ? empdb=# set cpu_tuple_cost = 0.07; SET empdb=# explain analyze select * empdb-# from v_sc_user_request empdb-# where login = 'babinow1'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Subquery Scan v_sc_user_request (cost=1978.23..1978.30 rows=1 width=364) (actual time=1612.719..1613.064 rows=31 loops=1) -> Sort (cost=1978.23..1978.23 rows=1 width=201) (actual time=1612.700..1612.728 rows=31 loops=1) Sort Key: sr.id_sat_request -> Merge Left Join (cost=1974.05..1978.22 rows=1 width=201) (actual time=1537.343..1612.565 rows=31 loops=1) Merge Cond: ("outer".id_package = "inner".id_package) -> Sort (cost=887.92..887.93 rows=1 width=193) (actual time=475.924..476.020 rows=31 loops=1) Sort Key: sr.id_package -> Nested Loop (cost=4.07..887.91 rows=1 width=193) (actual time=145.782..475.851 rows=31 loops=1) -> Hash Join (cost=4.07..884.65 rows=1 width=40) (actual time=139.816..464.678 rows=31 loops=1) Hash Cond: ("outer".id_user = "inner".id_user) -> Index Scan using idx_sat_request_expired on sat_request sr (cost=0.00..838.69 rows=8363width=28) (actual time=19.696..443.702 rows=8460 loops=1) Index Cond: (expired = false) -> Hash (cost=4.07..4.07 rows=2 width=16) (actual time=11.779..11.779 rows=0 loops=1) -> Index Scan using user_login_login_key on user_login ul (cost=0.00..4.07 rows=2width=16) (actual time=11.725..11.732 rows=1 loops=1) Index Cond: ((login)::text = 'babinow1'::text) -> Index Scan using url_pkey on url u (cost=0.00..3.19 rows=1 width=161) (actual time=0.345..0.347rows=1 loops=31) Index Cond: ("outer".id_url = u.id_url) -> Sort (cost=1086.13..1088.16 rows=813 width=12) (actual time=1060.374..1060.622 rows=390 loops=1) Sort Key: vsp.id_package -> Subquery Scan vsp (cost=676.18..1046.83 rows=813 width=12) (actual time=625.645..1059.388 rows=480loops=1) -> Hash Join (cost=676.18..989.92 rows=813 width=131) (actual time=625.637..1057.105 rows=480loops=1) Hash Cond: ("outer".id_package = "inner".id_package) -> Hash Left Join (cost=79.67..302.87 rows=1341 width=104) (actual time=4.336..18.549rows=1342 loops=1) Hash Cond: ("outer".id_package = "inner".id_package) -> Seq Scan on packages p (cost=0.00..145.87 rows=1341 width=104) (actual time=0.007..3.357rows=1342 loops=1) -> Hash (cost=77.27..77.27 rows=961 width=4) (actual time=3.685..3.685 rows=0 loops=1) -> Seq Scan on package_security ps (cost=0.00..77.27 rows=961 width=4) (actualtime=0.016..2.175 rows=974 loops=1) -> Hash (cost=594.48..594.48 rows=813 width=31) (actual time=620.397..620.397 rows=0 loops=1) -> Hash Join (cost=20.60..594.48 rows=813 width=31) (actual time=38.307..619.406rows=480 loops=1) Hash Cond: ("outer".id_program = "inner".id_program) -> Seq Scan on sequences (cost=0.00..512.82 rows=830 width=16) (actual time=16.858..595.262rows=480 loops=1) Filter: (estimated_start IS NOT NULL) -> Hash (cost=20.48..20.48 rows=47 width=19) (actual time=21.093..21.093 rows=0loops=1) -> Seq Scan on programs (cost=0.00..20.48 rows=47 width=19) (actualtime=9.369..20.980 rows=48 loops=1) Filter: (id_program <> 0) Total runtime: 1614.123 ms (36 rows) Regards Gaetano Mendola
Gaetano Mendola wrote: > running a 7.4.5 engine, I'm facing this bad plan: > > empdb=# explain analyze SELECT name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp > empdb-# FROM v_sc_user_request > empdb-# WHERE > empdb-# login = 'babinow1' > empdb-# LIMIT 10 ; > -> Subquery Scan vsp (cost=985.73..1016.53 rows=1103 width=12) (actual time=25.328..1668.754 rows=493loops=31) > -> Merge Join (cost=985.73..1011.01 rows=1103 width=130) (actual time=25.321..1666.666 rows=493loops=31) > Merge Cond: ("outer".id_program = "inner".id_program) The problem to address is in this subquery. That's a total of 31 x (1668.754 - 25.328) = 50seconds (about). Since your query is so simple, I'm guessing v_sc_user_request is a view. Can you provide the definition? -- Richard Huxton Archonet Ltd
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Richard Huxton wrote: > Gaetano Mendola wrote: > >> running a 7.4.5 engine, I'm facing this bad plan: >> >> empdb=# explain analyze SELECT >> name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp >> >> empdb-# FROM v_sc_user_request >> empdb-# WHERE >> empdb-# login = 'babinow1' >> empdb-# LIMIT 10 ; > > >> -> Subquery Scan vsp (cost=985.73..1016.53 >> rows=1103 width=12) (actual time=25.328..1668.754 rows=493 loops=31) >> -> Merge Join (cost=985.73..1011.01 >> rows=1103 width=130) (actual time=25.321..1666.666 rows=493 loops=31) >> Merge Cond: ("outer".id_program = >> "inner".id_program) > > > The problem to address is in this subquery. That's a total of 31 x > (1668.754 - 25.328) = 50seconds (about). > > Since your query is so simple, I'm guessing v_sc_user_request is a view. > Can you provide the definition? Of course: CREATE OR REPLACE VIEW v_sc_user_request AS SELECT * FROM v_sat_request vsr LEFT OUTER JOIN v_sc_packages vsp USING ( id_package ) WHERE vsr.request_time > now() - '1 month'::interval AND vsr.expired = FALSE ORDER BY id_sat_request DESC ; CREATE OR REPLACE VIEW v_sc_packages AS SELECT * FROM v_programs vpr, v_packages vpk, v_sequences vs WHERE ------------ JOIN ------------- vpr.id_program = vs.id_program AND vpk.id_package = vs.id_package AND ------------------------------- vs.estimated_start IS NOT NULL ; CREATE OR REPLACE VIEW v_sat_request AS SELECT * FROM sat_request sr, url u, user_login ul WHERE ---------------- JOIN --------------------- sr.id_url = u.id_url AND sr.id_user = ul.id_user ------------------------------------------- ; that column expired was added since yesterday Regards Gaetano Mendola -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCLZkD7UpzwH2SGd4RAv8/AKCA5cNfu6vEKZ6m/ke1JsVRdsOTXQCbBMt4 ZPTFjwyb52CrFxdUTD6gejs= =STzz -----END PGP SIGNATURE-----
Gaetano Mendola wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Richard Huxton wrote: > >>Gaetano Mendola wrote: >> >> >>>running a 7.4.5 engine, I'm facing this bad plan: >>> >>>empdb=# explain analyze SELECT >>>name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp >>> >>>empdb-# FROM v_sc_user_request >>>empdb-# WHERE >>>empdb-# login = 'babinow1' >>>empdb-# LIMIT 10 ; >> >> >>> -> Subquery Scan vsp (cost=985.73..1016.53 >>>rows=1103 width=12) (actual time=25.328..1668.754 rows=493 loops=31) >>> -> Merge Join (cost=985.73..1011.01 >>>rows=1103 width=130) (actual time=25.321..1666.666 rows=493 loops=31) >>> Merge Cond: ("outer".id_program = >>>"inner".id_program) >> >> >>The problem to address is in this subquery. That's a total of 31 x >>(1668.754 - 25.328) = 50seconds (about). >> >>Since your query is so simple, I'm guessing v_sc_user_request is a view. >>Can you provide the definition? > > > Of course: > > > > CREATE OR REPLACE VIEW v_sc_user_request AS > SELECT > * > FROM > v_sat_request vsr LEFT OUTER JOIN v_sc_packages vsp USING ( id_package ) > WHERE > vsr.request_time > now() - '1 month'::interval AND > vsr.expired = FALSE > ORDER BY id_sat_request DESC > ; > > > CREATE OR REPLACE VIEW v_sc_packages AS > SELECT > * > FROM > v_programs vpr, > v_packages vpk, > v_sequences vs > > WHERE > ------------ JOIN ------------- > vpr.id_program = vs.id_program AND > vpk.id_package = vs.id_package AND > ------------------------------- > vs.estimated_start IS NOT NULL > ; > > CREATE OR REPLACE VIEW v_sat_request AS > SELECT > * > FROM > sat_request sr, > url u, > user_login ul > WHERE > ---------------- JOIN --------------------- > sr.id_url = u.id_url AND > sr.id_user = ul.id_user > ------------------------------------------- > ; OK, so looking at the original EXPLAIN the order of processing seems to be: 1. v_sat_request is evaluated and filtered on login='...' (lines 7..15) This gives us 31 rows 2. The left-join from v_sat_request to v_sc_packages is processed (lines 5..6) This involves the subquery scan on vsp (from line 16) where it seems to think the best idea is a merge join of programs to sequences. So - I think we need to look at the performance of your view "v_sc_packages" and the views that it depends on. OK - can you reply to this with just the definitions of v_sc_packages and what it depends on, and we can have a look at that. Do you need all these tables involved in this query? I don't think PG is smart enough to completely discard a join if it's not needed by the output. Thinking about it, I'm not sure you could safely. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > OK, so looking at the original EXPLAIN the order of processing seems to be: > 1. v_sat_request is evaluated and filtered on login='...' (lines 7..15) > This gives us 31 rows > 2. The left-join from v_sat_request to v_sc_packages is processed (lines > 5..6) > This involves the subquery scan on vsp (from line 16) where it seems to > think the best idea is a merge join of programs to sequences. Whel basically v_sc_packages depends on other 3 views that are just a simple interface to a plain table. If I execute a select only on this table I get reasonable executions time: === cpu_tuple_cost = 0.07 # explain analyze select * from v_sc_packages where id_package = 19628; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..15.96 rows=1 width=131) (actual time=41.450..41.494 rows=1 loops=1) -> Nested Loop (cost=0.00..11.86 rows=1 width=116) (actual time=1.022..1.055 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..7.89 rows=1 width=104) (actual time=0.330..0.345 rows=1 loops=1) -> Index Scan using packages_pkey on packages p (cost=0.00..3.90 rows=1 width=104) (actual time=0.070..0.075rows=1 loops=1) Index Cond: (id_package = 19628) -> Index Scan using package_security_id_package_key on package_security ps (cost=0.00..3.91 rows=1 width=4)(actual time=0.232..0.237 rows=1 loops=1) Index Cond: ("outer".id_package = ps.id_package) -> Index Scan using idx_sequences_id_package on sequences (cost=0.00..3.90 rows=1 width=16) (actual time=0.670..0.685rows=1 loops=1) Index Cond: (19628 = id_package) Filter: (estimated_start IS NOT NULL) -> Index Scan using programs_pkey on programs (cost=0.00..4.02 rows=1 width=19) (actual time=0.078..0.086 rows=1 loops=1) Index Cond: (programs.id_program = "outer".id_program) Filter: (id_program <> 0) Total runtime: 42.650 ms (14 rows) === cpu_tuple_cost = 0.01 # explain analyze select * from v_sc_packages where id_package = 19628; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..15.54 rows=1 width=131) (actual time=25.062..69.977 rows=1 loops=1) -> Nested Loop (cost=0.00..11.56 rows=1 width=116) (actual time=5.396..50.299 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..7.71 rows=1 width=104) (actual time=5.223..32.842 rows=1 loops=1) -> Index Scan using packages_pkey on packages p (cost=0.00..3.84 rows=1 width=104) (actual time=0.815..7.235rows=1 loops=1) Index Cond: (id_package = 19628) -> Index Scan using package_security_id_package_key on package_security ps (cost=0.00..3.85 rows=1 width=4)(actual time=4.366..25.555 rows=1 loops=1) Index Cond: ("outer".id_package = ps.id_package) -> Index Scan using idx_sequences_id_package on sequences (cost=0.00..3.84 rows=1 width=16) (actual time=0.147..17.422rows=1 loops=1) Index Cond: (19628 = id_package) Filter: (estimated_start IS NOT NULL) -> Index Scan using programs_pkey on programs (cost=0.00..3.96 rows=1 width=19) (actual time=0.043..0.049 rows=1 loops=1) Index Cond: (programs.id_program = "outer".id_program) Filter: (id_program <> 0) Total runtime: 70.254 ms (14 rows) and I get the best with this: === cpu_tuple_cost = 0.001 # explain analyze select * from v_sc_packages where id_package = 19628; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..15.48 rows=1 width=131) (actual time=2.516..2.553 rows=1 loops=1) -> Nested Loop (cost=0.00..7.78 rows=1 width=31) (actual time=1.439..1.457 rows=1 loops=1) -> Index Scan using idx_sequences_id_package on sequences (cost=0.00..3.83 rows=1 width=16) (actual time=0.442..0.450rows=1 loops=1) Index Cond: (19628 = id_package) Filter: (estimated_start IS NOT NULL) -> Index Scan using programs_pkey on programs (cost=0.00..3.95 rows=1 width=19) (actual time=0.972..0.978 rows=1loops=1) Index Cond: (programs.id_program = "outer".id_program) Filter: (id_program <> 0) -> Nested Loop Left Join (cost=0.00..7.68 rows=1 width=104) (actual time=0.110..0.125 rows=1 loops=1) -> Index Scan using packages_pkey on packages p (cost=0.00..3.84 rows=1 width=104) (actual time=0.040..0.046 rows=1loops=1) Index Cond: (id_package = 19628) -> Index Scan using package_security_id_package_key on package_security ps (cost=0.00..3.84 rows=1 width=4) (actualtime=0.036..0.042 rows=1 loops=1) Index Cond: ("outer".id_package = ps.id_package) Total runtime: 2.878 ms (14 rows) but with this last setting for the original query is choosed a very bad plan. Regards Gaetano Mendola
Gaetano Mendola wrote: > Richard Huxton wrote: > > >>OK, so looking at the original EXPLAIN the order of processing seems to be: >>1. v_sat_request is evaluated and filtered on login='...' (lines 7..15) >>This gives us 31 rows >>2. The left-join from v_sat_request to v_sc_packages is processed (lines >>5..6) >>This involves the subquery scan on vsp (from line 16) where it seems to >>think the best idea is a merge join of programs to sequences. > > > Whel basically v_sc_packages depends on other 3 views that are just a simple > interface to a plain table. > > > If I execute a select only on this table I get reasonable executions time: > > > === cpu_tuple_cost = 0.07 > > # explain analyze select * from v_sc_packages where id_package = 19628; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Nested Loop (cost=0.00..15.96 rows=1 width=131) (actual time=41.450..41.494 rows=1 loops=1) > -> Nested Loop (cost=0.00..11.86 rows=1 width=116) (actual time=1.022..1.055 rows=1 loops=1) > -> Nested Loop Left Join (cost=0.00..7.89 rows=1 width=104) (actual time=0.330..0.345 rows=1 loops=1) > -> Index Scan using packages_pkey on packages p (cost=0.00..3.90 rows=1 width=104) (actual time=0.070..0.075rows=1 loops=1) > Index Cond: (id_package = 19628) > -> Index Scan using package_security_id_package_key on package_security ps (cost=0.00..3.91 rows=1 width=4)(actual time=0.232..0.237 rows=1 loops=1) > Index Cond: ("outer".id_package = ps.id_package) > -> Index Scan using idx_sequences_id_package on sequences (cost=0.00..3.90 rows=1 width=16) (actual time=0.670..0.685rows=1 loops=1) > Index Cond: (19628 = id_package) > Filter: (estimated_start IS NOT NULL) > -> Index Scan using programs_pkey on programs (cost=0.00..4.02 rows=1 width=19) (actual time=0.078..0.086 rows=1 loops=1) > Index Cond: (programs.id_program = "outer".id_program) > Filter: (id_program <> 0) > Total runtime: 42.650 ms > (14 rows) > === cpu_tuple_cost = 0.01 > === cpu_tuple_cost = 0.001 I don't know what you think you're measuring, but it's nothing to do with the plans. If you look at the plans carefully, you'll see they're all the same. The "cost" numbers change because that's the parameter you're changing. I'm not sure it makes sense to vary cpu_tuple_cost from 0.07 down to 0.001 - that's a factor of 70 difference. I might be tempted to halve or double it, but even then only after some serious testing. -- Richard Huxton Archonet Ltd
Gaetano Mendola <mendola@bigfoot.com> writes: >> Since your query is so simple, I'm guessing v_sc_user_request is a view. >> Can you provide the definition? > Of course: I don't think you've told us the whole truth about the v_sc_packages view. The definition as given doesn't work at all (it'll have duplicate column names), but more to the point, if it were that simple then the planner would fold it into the parent query. The subquery scan node indicates that folding did not occur. The most likely reason for that is that there's an ORDER BY in the view. Putting ORDER BYs in views that you intend to use as components of other views is a bad practice from a performance perspective... regards, tom lane
Tom Lane wrote: > > Putting ORDER BYs in views that you intend to use as components of other > views is a bad practice from a performance perspective... There are also a lot of views involved here for very few output columns. Tom - is the planner smart enough to optimise-out unneeded columns from a SELECT * view if it's part of a join/subquery and you only use one or two columns? Secondly, in the original plan we have: -> Nested Loop Left Join (cost=1478.82..1716.37 rows=1 width=201) (actual time=3254.483..52847.064 rows=31 loops=1) Now, we've got 31 rows instead of 1 here. The one side of the join ends up as: -> Subquery Scan vsp (cost=985.73..1016.53 rows=1103 width=12) (actual time=25.328..1668.754 rows=493 loops=31) -> Merge Join (cost=985.73..1011.01 rows=1103 width=130) (actual time=25.321..1666.666 rows=493 loops=31) Would I be right in thinking the planner doesn't materialise the subquery because it's expecting 1 loop not 31? If there were 1 row the plan would seem OK to me. Is there any mileage in the idea of a "lazy" planner that keeps some alternative paths around in case they're needed? Or a reactive one that can re-plan nodes when assumptions turn out to be wrong? -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > There are also a lot of views involved here for very few output columns. > Tom - is the planner smart enough to optimise-out unneeded columns from > a SELECT * view if it's part of a join/subquery and you only use one or > two columns? If the view gets flattened, yes, but I believe that it's not bright enough to do so when it can't flatten the view. You could tell easily enough by looking at the row-width estimates at various levels of the plan. (Let's see ... in Gaetano's plan the SubqueryScan is returning 12-byte rows where its input MergeJoin is returning 130-byte rows, so sure enough the view is computing a lot of stuff that then gets thrown away.) > Would I be right in thinking the planner doesn't materialise the > subquery because it's expecting 1 loop not 31? If there were 1 row the > plan would seem OK to me. Right; it doesn't see any predicted gain from the extra cost of materializing. But to me the main problem here is not that, it is that the entire shape of the plan would likely be different if it weren't for the "optimization fence" that the Subquery Scan node represents. I suspect too that the use of mergejoin as opposed to anything else within the vsp subplan is driven more by the need to produce sorted output than by what is the cheapest way to get the rows. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tom Lane wrote: > Gaetano Mendola <mendola@bigfoot.com> writes: > >>>Since your query is so simple, I'm guessing v_sc_user_request is a view. >>>Can you provide the definition? > > >>Of course: > > > I don't think you've told us the whole truth about the v_sc_packages > view. The definition as given doesn't work at all (it'll have > duplicate column names), but more to the point, if it were that simple > then the planner would fold it into the parent query. The subquery > scan node indicates that folding did not occur. The most likely reason > for that is that there's an ORDER BY in the view. I didn't say the complete truth because the view definition is long so I just omitted all fields. explain analyze SELECT name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp FROM v_sc_user_request WHERE login = 'babinow1' LIMIT 10 ; these are the complete definitions of views involved in the query: CREATE OR REPLACE VIEW v_sc_user_request AS SELECT vsr.id_sat_request AS id_sat_request, vsr.id_user AS id_user, vsr.login AS login, vsr.url AS url, vsr.name AS name, vsr.descr AS descr, vsr.size AS size, trunc(vsr.size/1024.0/1024.0,2) AS size_mb, vsr.id_sat_request_status AS id_sat_request_status, sp_lookup_key('sat_request_status', vsr.id_sat_request_status) AS request_status, sp_lookup_descr('sat_request_status', vsr.id_sat_request_status) AS request_status_descr, vsr.id_url_status AS id_url_status, sp_lookup_key('url_status', vsr.id_url_status) AS url_status, sp_lookup_descr('url_status', vsr.id_url_status) AS url_status_descr, vsr.url_time_stamp AS url_time_stamp, date_trunc('seconds',vsr.request_time) AS request_time_stamp, vsr.id_package AS id_package, COALESCE(date_trunc('seconds',vsp.estimated_start)::text,'NA') AS estimated_start FROM v_sat_request vsr LEFT OUTER JOIN v_sc_packages vsp USING ( id_package ) WHERE vsr.request_time > now() - '1 month'::interval AND vsr.expired = FALSE ORDER BY id_sat_request DESC ; CREATE OR REPLACE VIEW v_sat_request AS SELECT sr.id_user AS id_user, ul.login AS login, sr.id_sat_request AS id_sat_request, u.id_url AS id_url, u.url AS url, u.name AS name, u.descr AS descr, u.size AS size, u.storage AS storage, sr.id_package AS id_package, sr.id_sat_request_status AS id_sat_request_status, sr.request_time AS request_time, sr.work_time AS request_work_time, u.id_url_status AS id_url_status, u.time_stamp AS url_time_stamp, sr.expired AS expired FROM sat_request sr, url u, user_login ul WHERE ---------------- JOIN --------------------- sr.id_url = u.id_url AND sr.id_user = ul.id_user ------------------------------------------- ; CREATE OR REPLACE VIEW v_sc_packages AS SELECT vpr.id_program AS id_program, vpr.name AS program_name, vpk.id_package AS id_package, date_trunc('seconds', vs.estimated_start) AS estimated_start, vpk.name AS package_name, vpk.TYPE AS TYPE, vpk.description AS description, vpk.target AS target, vpk.fec AS fec_alg, vpk.output_group - vpk.input_group AS fec_redundancy, vpk.priority AS priority, vpk.updatable AS updatable, vpk.auto_listen AS auto_listen, vpk.start_file AS start_file, vpk.view_target_group AS view_target_group, vpk.target_group AS target_group FROM v_programs vpr, v_packages vpk, v_sequences vs WHERE ------------ JOIN ------------- vpr.id_program = vs.id_program AND vpk.id_package = vs.id_package AND ------------------------------- vs.estimated_start IS NOT NULL ; CREATE OR REPLACE VIEW v_programs AS SELECT id_program AS id_program, id_publisher AS id_publisher, name AS name, description AS description, sp_lookup_key('program_type', id_program_type) AS TYPE, sp_lookup_key('program_status', id_program_status) AS status, last_position AS last_position FROM programs WHERE id_program<>0 ; CREATE OR REPLACE VIEW v_packages AS SELECT p.id_package AS id_package, p.id_publisher AS id_publisher, p.name AS name, p.information AS information, p.description AS description, sp_lookup_key('package_type', p.id_package_type) AS TYPE, sp_lookup_key('target', p.id_target) AS target, p.port AS port, p.priority AS priority, sp_lookup_key('fec', p.id_fec) AS fec, p.input_group AS input_group, p.output_group AS output_group, p.updatable AS updatable, p.checksum AS checksum, p.version AS version, p.start_file AS start_file, p.view_target_group AS view_target_group, p.target_group AS target_group, p.auto_listen AS auto_listen, p.public_flag AS public_flag, p.needed_version AS needed_version, p.logic_version AS logic_version, p.package_size AS package_size, ps.id_drm_process AS id_drm_process, ps.id_cas_service AS id_cas_service, ps.id_cas_settings AS id_cas_settings, ps.id_drm_service AS id_drm_service FROM packages p LEFT OUTER JOIN package_security ps USING (id_package) ; CREATE OR REPLACE VIEW v_sequences AS SELECT id_package AS id_package, id_program AS id_program, internal_position AS internal_position, estimated_start AS estimated_start FROM sequences ; > Putting ORDER BYs in views that you intend to use as components of other > views is a bad practice from a performance perspective... Indeed when a view is involved in a join we do not put "order by" in it ( at least this is what I try to do ), I have to say also that some time I see that replacing the view with the tables that it represent the execution time is better ( I have an example to show you if you are interested in it ). Regards Gaetano Mendola -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCLkef7UpzwH2SGd4RAt90AJ9e3qUSx2fxiOO2aA30TbLsOdyV7ACfd0RY +2A3U6dDfWw/H4eWcmI8mS0= =t1AD -----END PGP SIGNATURE-----