Thread: Why performance improvement on converting subselect to a function ?
Hi, For each company_id in certain table i have to search the same table get certain rows sort them and pick up the top one , i tried using this subselect: explain analyze SELECT company_id , (SELECT edition FROM ONLY public.branding_master b WHERE old_company_id = a.company_id OR company_id = a.company_id ORDER BY b.company_id DESC LIMIT 1) from public.branding_master a limit 50; QUERY PLAN Limit (cost=0.00..3.52 rows=50 width=4) (actual time=463.97..19429.54 rows=50 loops=1) -> Seq Scan on branding_master a (cost=0.00..6530.79 rows=92679 width=4) (actual time=463.97..19429.28 rows=51 loops=1) SubPlan -> Limit (cost=0.00..168.36 rows=1 width=6) (actual time=66.96..380.94 rows=1 loops=51) -> Index Scan Backward using branding_master_pkey on branding_master b (cost=0.00..23990.26 rows=142 width=6) (actual time=66.95..380.93 rows=1 loops=51) Filter: ((old_company_id = $0) OR (company_id = $0)) Total runtime: 19429.76 msec (7 rows) Very Slow 20 secs. CREATE FUNCTION most_recent_edition (integer) returns integer AS 'SELECT edition::integer FROM ONLY public.branding_master b WHERE old_company_id = $1 OR company_id = $1 ORDER BY b.company_id DESC LIMIT 1 ' language 'sql'; tradein_clients=# explain analyze SELECT company_id , most_recent_edition(company_id) from public.branding_master limit 50; QUERY PLAN Limit (cost=0.00..3.52 rows=50 width=4) (actual time=208.23..3969.39 rows=50 loops=1) -> Seq Scan on branding_master (cost=0.00..6530.79 rows=92679 width=4) (actual time=208.22..3969.15 rows=51 loops=1) Total runtime: 3969.52 msec (3 rows) Time: 4568.33 ms 4 times faster. But i feel it can be lot more faster , can anyone suggest me something to try. Indexes exists on company_id(pkey) and old_company_id Most of the chores are already done [ vacuum full analyze , reindex ] Regds mallah.
Rajesh Kumar Mallah <mallah@trade-india.com> writes: > explain analyze SELECT company_id , (SELECT edition FROM ONLY > public.branding_master b WHERE old_company_id = a.company_id OR company_id = > a.company_id ORDER BY b.company_id DESC LIMIT 1) from public.branding_master > a limit 50; > Total runtime: 19429.76 msec > CREATE FUNCTION most_recent_edition (integer) returns integer AS 'SELECT > edition::integer FROM ONLY public.branding_master b WHERE old_company_id = $1 > OR company_id = $1 ORDER BY b.company_id DESC LIMIT 1 ' language 'sql'; > tradein_clients=# explain analyze SELECT company_id , > most_recent_edition(company_id) from public.branding_master limit 50; > Total runtime: 3969.52 msec Odd. Apparently the planner is picking a better plan in the function context than in the subselect context --- which is strange since it ought to have less information. AFAIK the only way to see the plan generated for a SQL function's query is like this: regression=# create function foo(int) returns int as regression-# 'select unique1 from tenk1 where unique1 = $1' language sql; CREATE FUNCTION regression=# set debug_print_plan TO 1; SET regression=# set client_min_messages TO debug; SET regression=# select foo(55); DEBUG: plan: DETAIL: {RESULT :startup_cost 0.00 :total_cost 0.01 :plan_rows 1 :plan_width 0 :targetlist ({TARGETENTRY :resdom {RESDOM :resno 1 :restype 23 :restypmod -1 :resname foo :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false} :expr {FUNCEXPR :funcid 706101 :funcresulttype 23 :funcretset false ... (etc etc) Would you do that and send it along? I'm curious ... > But i feel it can be lot more faster , can anyone suggest me something > to try. Create an index on old_company_id, perhaps. regards, tom lane
Tom Lane wrote:
Sorry for the delayed response.
branding_master_old_comapany_id btree (old_company_id),
regds , mallah.
Rajesh Kumar Mallah <mallah@trade-india.com> writes:explain analyze SELECT company_id , (SELECT edition FROM ONLY public.branding_master b WHERE old_company_id = a.company_id OR company_id = a.company_id ORDER BY b.company_id DESC LIMIT 1) from public.branding_master a limit 50; Total runtime: 19429.76 msecCREATE FUNCTION most_recent_edition (integer) returns integer AS 'SELECT edition::integer FROM ONLY public.branding_master b WHERE old_company_id = $1 OR company_id = $1 ORDER BY b.company_id DESC LIMIT 1 ' language 'sql';tradein_clients=# explain analyze SELECT company_id , most_recent_edition(company_id) from public.branding_master limit 50; Total runtime: 3969.52 msecOdd. Apparently the planner is picking a better plan in the function context than in the subselect context --- which is strange since it ought to have less information. AFAIK the only way to see the plan generated for a SQL function's query is like this: regression=# create function foo(int) returns int as regression-# 'select unique1 from tenk1 where unique1 = $1' language sql; CREATE FUNCTION regression=# set debug_print_plan TO 1; SET regression=# set client_min_messages TO debug; SET regression=# select foo(55); DEBUG: plan: DETAIL: {RESULT :startup_cost 0.00 :total_cost 0.01 :plan_rows 1 :plan_width 0 :targetlist ({TARGETENTRY :resdom {RESDOM :resno 1 :restype 23 :restypmod -1 :resname foo :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false} :expr {FUNCEXPR :funcid 706101 :funcresulttype 23 :funcretset false... (etc etc) Would you do that and send it along? I'm curious ...
Sorry for the delayed response.
tradein_clients=# explain analyze SELECT company_id , data_bank.most_recent_edition(company_id) from public.branding_master limit 50;
--------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..3.57 rows=50 width=4) (actual time=149.52..2179.49 rows=50 loops=1)
-> Seq Scan on branding_master (cost=0.00..6626.52 rows=92752 width=4) (actual time=149.51..2179.30 rows=51 loops=1)
tradein_clients=#
tradein_clients=#
tradein_clients=# explain analyze SELECT company_id , data_bank.most_recent_edition(company_id) from public.branding_master limit 50;
DEBUG: StartTransactionCommand
LOG: plan:
{ LIMIT :startup_cost 0.00 :total_cost 185.65 :rows 1 :width 6 :qptargetlist
({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 23 :restypmod -1 :resname
edition :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { EXPR
:typeOid 23 :opType func :oper { FUNC :funcid 313 :funcresulttype 23
:funcretset false :funcformat 1 } :args ({ VAR :varno 1 :varattno 31 :vartype
21 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 31})}} { TARGETENTRY
:resdom { RESDOM :resno 2 :restype 23 :restypmod -1 :resname company_id
:reskey 0 :reskeyop 0 :ressortgroupref 1 :resjunk true } :expr { VAR :varno 1
:varattno 1 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno
1}}) :qpqual <> :lefttree { INDEXSCAN :startup_cost 0.00 :total_cost 24302.69
:rows 131 :width 6 :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1
:restype 23 :restypmod -1 :resname edition :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { EXPR :typeOid 23 :opType func
:oper { FUNC :funcid 313 :funcresulttype 23 :funcretset false :funcformat 1 }
:args ({ VAR :varno 1 :varattno 31 :vartype 21 :vartypmod -1 :varlevelsup 0
:varnoold 1 :varoattno 31})}} { TARGETENTRY :resdom { RESDOM :resno 2 :restype
23 :restypmod -1 :resname company_id :reskey 0 :reskeyop 0 :ressortgroupref 1
:resjunk true } :expr { VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1
:varlevelsup 0 :varnoold 1 :varoattno 1}}) :qpqual ({ EXPR :typeOid 16
:opType or :oper <> :args ({ EXPR :typeOid 16 :opType op :oper { OPER :opno
96 :opid 65 :opresulttype 16 :opretset false } :args ({ VAR :varno 1 :varattno
19 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 19} {
PARAM :paramkind 12 :paramid 1 :paramname \<unnamed> :paramtype 23 })} { EXPR
:typeOid 16 :opType op :oper { OPER :opno 96 :opid 65 :opresulttype 16
:opretset false } :args ({ VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1
:varlevelsup 0 :varnoold 1 :varoattno 1} { PARAM :paramkind 12 :paramid 1
:paramname \<unnamed> :paramtype 23 })})}) :lefttree <> :righttree <> :extprm
() :locprm () :initplan <> :nprm 0 :scanrelid 1 :indxid ( 310742439)
:indxqual (<>) :indxqualorig (<>) :indxorderdir -1 } :righttree <> :extprm ()
:locprm () :initplan <> :nprm 0 :limitOffset <> :limitCount { CONST
:consttype 23 :constlen 4 :constbyval true :constisnull false :constvalue 4 [
1 0 0 0 ] }}
DEBUG: CommitTransactionCommand
But i feel it can be lot more faster , can anyone suggest me something to try.
Its there already..Create an index on old_company_id, perhaps.
branding_master_old_comapany_id btree (old_company_id),
regds , mallah.
regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Rajesh Kumar Mallah <mallah@trade-india.com> writes: > Tom Lane wrote: >> Odd. Apparently the planner is picking a better plan in the function >> context than in the subselect context --- which is strange since it >> ought to have less information. > [ verbose plan snipped ] Well, that sure seems to be the same plan. Curious that the runtime wasn't about the same. Perhaps the slow execution of the first query was a caching effect? If you alternate trying the query both ways, does the speed difference persist? regards, tom lane
Re: Why performance improvement on converting subselect to a function ?
From
Rajesh Kumar Mallah
Date:
Dear Tom, the problem was repeatble in the sense repeated execution of queries made no difference on performance. What lead to degradation was the bumping off of effective_cache_size parameter from 1000 to 64K Can any one point me the recent guide done by Sridhar and Josh i want to see what i mis(read|understood) from there ;-) [ it was on GeneralBits' Home Page ] Anyway the performance gain was from 32 secs to less than a sec what i restored cache size from 64K to 1000. I will post again with more details but at the moment i got to load my data_bank :) Regds Mallah. On Wednesday 30 Jul 2003 3:02 am, Tom Lane wrote: > Rajesh Kumar Mallah <mallah@trade-india.com> writes: > > Tom Lane wrote: > >> Odd. Apparently the planner is picking a better plan in the function > >> context than in the subselect context --- which is strange since it > >> ought to have less information. > > > > [ verbose plan snipped ] > > Well, that sure seems to be the same plan. Curious that the runtime > wasn't about the same. Perhaps the slow execution of the first query > was a caching effect? If you alternate trying the query both ways, > does the speed difference persist? > > regards, tom lane
Rajesh Kumar Mallah <mallah@trade-india.com> writes: > What lead to degradation was the bumping off of > effective_cache_size parameter from 1000 to 64K Check the plan then; AFAIR the only possible effect of changing effective_cache_size is to influence which plan the planner picks. regards, tom lane
Re: Why performance improvement on converting subselect to a function ?
From
"Shridhar Daithankar"
Date:
On 30 Jul 2003 at 12:54, Rajesh Kumar Mallah wrote: > Can any one point me the recent guide done by > Sridhar and Josh i want to see what i mis(read|understood) > from there ;-) [ it was on GeneralBits' Home Page ] http://www.varlena.com/GeneralBits/Tidbits/perf.html HTH Bye Shridhar -- program, n.: A magic spell cast over a computer allowing it to turn one's input into error messages. tr.v. To engage in a pastime similar to banging one's head against a wall, but with fewer opportunities for reward.
Tom Lane wrote: >Rajesh Kumar Mallah <mallah@trade-india.com> writes: > > >>What lead to degradation was the bumping off of >>effective_cache_size parameter from 1000 to 64K >> >> > >Check the plan then; AFAIR the only possible effect of changing >effective_cache_size is to influence which plan the planner picks. > Dear Tom, Below are the plans for two cases. I dont know how to read them accurately can u please explain them. Also can anyone point to some documentation oriented towards understanding explain analyze output? Regds Mallah. tradein_clients=# SET effective_cache_size = 1000; SET tradein_clients=# explain analyze SELECT pri_key,most_recent_edition(pri_key) from profiles where source='BRANDING' limit 100; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..25.67 rows=100 width=4) (actual time=141.11..154.71 rows=100 loops=1) -> Seq Scan on profiles (cost=0.00..15754.83 rows=61385 width=4) (actual time=141.11..154.51 rows=101 loops=1) Filter: (source = 'BRANDING'::character varying) Total runtime: 154.84 msec (4 rows) tradein_clients=# SET effective_cache_size = 64000; SET tradein_clients=# explain analyze SELECT pri_key,most_recent_edition(pri_key) from profiles where source='BRANDING' limit 100; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..25.67 rows=100 width=4) (actual time=587.61..22884.75 rows=100 loops=1) -> Seq Scan on profiles (cost=0.00..15754.83 rows=61385 width=4) (actual time=587.60..22884.25 rows=101 loops=1) Filter: (source = 'BRANDING'::character varying) Total runtime: 22884.97 msec (4 rows) tradein_clients=# > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > >
Rajesh Kumar Mallah <mallah@trade-india.com> writes: > Below are the plans for two cases. I dont know how to read them accurately > can u please explain them. Well, they're the same plan, as far as they go. I suppose that the runtime difference must come from choosing a different plan inside the most_recent_edition() function, which we cannot see in the explain output. As before, turning on logging of verbose query plans is the only way to look at what the function is doing. > Also can anyone point to some documentation > oriented towards understanding explain analyze output? http://www.postgresql.org/docs/view.php?version=7.3&idoc=0&file=performance-tips.html regards, tom lane