Thread: Why performance improvement on converting subselect to a function ?

Why performance improvement on converting subselect to a function ?

From
Rajesh Kumar Mallah
Date:
Hi,

For each company_id in certain table i have to search the same table
get certain rows sort them and pick up the top one , i tried using this
subselect:

explain analyze SELECT company_id , (SELECT edition FROM ONLY
public.branding_master b WHERE old_company_id = a.company_id OR company_id =
a.company_id  ORDER BY b.company_id DESC LIMIT 1) from public.branding_master
a   limit 50;



QUERY PLAN

Limit  (cost=0.00..3.52 rows=50 width=4) (actual time=463.97..19429.54 rows=50
loops=1)
  ->  Seq Scan on branding_master a  (cost=0.00..6530.79 rows=92679 width=4)
(actual time=463.97..19429.28 rows=51 loops=1)
        SubPlan
          ->  Limit  (cost=0.00..168.36 rows=1 width=6) (actual
time=66.96..380.94 rows=1 loops=51)
                ->  Index Scan Backward using branding_master_pkey on
branding_master b  (cost=0.00..23990.26 rows=142 width=6) (actual
time=66.95..380.93 rows=1 loops=51)
                      Filter: ((old_company_id = $0) OR (company_id = $0))
Total runtime: 19429.76 msec
(7 rows)

Very Slow 20 secs.


CREATE FUNCTION most_recent_edition (integer) returns integer AS 'SELECT
edition::integer FROM ONLY public.branding_master b WHERE old_company_id = $1
OR company_id = $1  ORDER BY b.company_id DESC LIMIT 1 ' language 'sql';

tradein_clients=# explain analyze SELECT company_id ,
most_recent_edition(company_id) from public.branding_master limit 50;

QUERY PLAN

Limit  (cost=0.00..3.52 rows=50 width=4) (actual time=208.23..3969.39 rows=50
loops=1)
  ->  Seq Scan on branding_master  (cost=0.00..6530.79 rows=92679 width=4)
(actual time=208.22..3969.15 rows=51 loops=1)
Total runtime: 3969.52 msec
(3 rows)

Time: 4568.33 ms

 4 times faster.


But i feel it can be lot more faster , can anyone suggest me something
to try.

Indexes exists on company_id(pkey) and old_company_id Most of the chores
are already done [ vacuum full analyze , reindex ]


Regds
mallah.







Rajesh Kumar Mallah <mallah@trade-india.com> writes:
> explain analyze SELECT company_id , (SELECT edition FROM ONLY
> public.branding_master b WHERE old_company_id = a.company_id OR company_id =
> a.company_id  ORDER BY b.company_id DESC LIMIT 1) from public.branding_master
> a   limit 50;
> Total runtime: 19429.76 msec

> CREATE FUNCTION most_recent_edition (integer) returns integer AS 'SELECT
> edition::integer FROM ONLY public.branding_master b WHERE old_company_id = $1
> OR company_id = $1  ORDER BY b.company_id DESC LIMIT 1 ' language 'sql';

> tradein_clients=# explain analyze SELECT company_id ,
> most_recent_edition(company_id) from public.branding_master limit 50;
> Total runtime: 3969.52 msec

Odd.  Apparently the planner is picking a better plan in the function
context than in the subselect context --- which is strange since it
ought to have less information.

AFAIK the only way to see the plan generated for a SQL function's query
is like this:

regression=# create function foo(int) returns int as
regression-# 'select unique1 from tenk1 where unique1 = $1' language sql;
CREATE FUNCTION
regression=# set debug_print_plan TO 1;
SET
regression=# set client_min_messages TO debug;
SET
regression=# select foo(55);
DEBUG:  plan:
DETAIL:  {RESULT :startup_cost 0.00 :total_cost 0.01 :plan_rows 1 :plan_width 0
:targetlist ({TARGETENTRY :resdom {RESDOM :resno 1 :restype 23 :restypmod -1
:resname foo :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false}
:expr {FUNCEXPR :funcid 706101 :funcresulttype 23 :funcretset false
 ... (etc etc)

Would you do that and send it along?  I'm curious ...

> But i feel it can be lot more faster , can anyone suggest me something
> to try.

Create an index on old_company_id, perhaps.

            regards, tom lane

Re: Why performance improvement on converting subselect

From
Rajesh Kumar Mallah
Date:
Tom Lane wrote:
Rajesh Kumar Mallah <mallah@trade-india.com> writes: 
explain analyze SELECT company_id , (SELECT edition FROM ONLY 
public.branding_master b WHERE old_company_id = a.company_id OR company_id = 
a.company_id  ORDER BY b.company_id DESC LIMIT 1) from public.branding_master
a   limit 50;
Total runtime: 19429.76 msec   
 
CREATE FUNCTION most_recent_edition (integer) returns integer AS 'SELECT 
edition::integer FROM ONLY public.branding_master b WHERE old_company_id = $1
OR company_id = $1  ORDER BY b.company_id DESC LIMIT 1 ' language 'sql';   
 
tradein_clients=# explain analyze SELECT company_id , 
most_recent_edition(company_id) from public.branding_master limit 50;
Total runtime: 3969.52 msec   
Odd.  Apparently the planner is picking a better plan in the function
context than in the subselect context --- which is strange since it
ought to have less information.

AFAIK the only way to see the plan generated for a SQL function's query
is like this:

regression=# create function foo(int) returns int as
regression-# 'select unique1 from tenk1 where unique1 = $1' language sql;
CREATE FUNCTION
regression=# set debug_print_plan TO 1;
SET
regression=# set client_min_messages TO debug;
SET
regression=# select foo(55);
DEBUG:  plan:
DETAIL:  {RESULT :startup_cost 0.00 :total_cost 0.01 :plan_rows 1 :plan_width 0
:targetlist ({TARGETENTRY :resdom {RESDOM :resno 1 :restype 23 :restypmod -1
:resname foo :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false}
:expr {FUNCEXPR :funcid 706101 :funcresulttype 23 :funcretset false... (etc etc)

Would you do that and send it along?  I'm curious ...

Sorry for the delayed response.
tradein_clients=# explain analyze SELECT company_id ,  data_bank.most_recent_edition(company_id) from public.branding_master limit 50;

--------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..3.57 rows=50 width=4) (actual time=149.52..2179.49 rows=50 loops=1)
   ->  Seq Scan on branding_master  (cost=0.00..6626.52 rows=92752 width=4) (actual time=149.51..2179.30 rows=51 loops=1)
tradein_clients=#
tradein_clients=#
tradein_clients=# explain analyze SELECT company_id ,  data_bank.most_recent_edition(company_id) from public.branding_master limit 50;
DEBUG:  StartTransactionCommand
LOG:  plan:
{ LIMIT :startup_cost 0.00 :total_cost 185.65 :rows 1 :width 6 :qptargetlist
({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 23 :restypmod -1 :resname
edition :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { EXPR
:typeOid 23  :opType func :oper { FUNC :funcid 313 :funcresulttype 23
:funcretset false :funcformat 1 } :args ({ VAR :varno 1 :varattno 31 :vartype
21 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 31})}} { TARGETENTRY
:resdom { RESDOM :resno 2 :restype 23 :restypmod -1 :resname company_id
:reskey 0 :reskeyop 0 :ressortgroupref 1 :resjunk true } :expr { VAR :varno 1
:varattno 1 :vartype 23 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno
1}}) :qpqual <> :lefttree { INDEXSCAN :startup_cost 0.00 :total_cost 24302.69
:rows 131 :width 6 :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1
:restype 23 :restypmod -1 :resname edition :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { EXPR :typeOid 23  :opType func
:oper { FUNC :funcid 313 :funcresulttype 23 :funcretset false :funcformat 1 }
:args ({ VAR :varno 1 :varattno 31 :vartype 21 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno 31})}} { TARGETENTRY :resdom { RESDOM :resno 2 :restype
23 :restypmod -1 :resname company_id :reskey 0 :reskeyop 0 :ressortgroupref 1
:resjunk true } :expr { VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1
:varlevelsup 0 :varnoold 1 :varoattno 1}}) :qpqual ({ EXPR :typeOid 16
:opType or :oper <> :args ({ EXPR :typeOid 16  :opType op :oper { OPER :opno
96 :opid 65 :opresulttype 16 :opretset false } :args ({ VAR :varno 1 :varattno
19 :vartype 23 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 19} {
PARAM :paramkind 12 :paramid 1 :paramname \<unnamed> :paramtype 23 })} { EXPR
:typeOid 16  :opType op :oper { OPER :opno 96 :opid 65 :opresulttype 16
:opretset false } :args ({ VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1
:varlevelsup 0 :varnoold 1 :varoattno 1} { PARAM :paramkind 12 :paramid 1
:paramname \<unnamed> :paramtype 23 })})}) :lefttree <> :righttree <> :extprm
() :locprm () :initplan <> :nprm 0  :scanrelid 1 :indxid ( 310742439)
:indxqual (<>) :indxqualorig (<>) :indxorderdir -1 } :righttree <> :extprm ()
:locprm () :initplan <> :nprm 0  :limitOffset <> :limitCount { CONST
:consttype 23 :constlen 4 :constbyval true :constisnull false :constvalue  4 [
1 0 0 0 ] }}

DEBUG:  CommitTransactionCommand


 
But i feel it can be lot more faster , can anyone suggest me something
to try.   
Create an index on old_company_id, perhaps.
Its there already..
  branding_master_old_comapany_id btree (old_company_id),

regds , mallah.
		regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command   (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) 

Rajesh Kumar Mallah <mallah@trade-india.com> writes:
> Tom Lane wrote:
>> Odd.  Apparently the planner is picking a better plan in the function
>> context than in the subselect context --- which is strange since it
>> ought to have less information.

> [ verbose plan snipped ]

Well, that sure seems to be the same plan.  Curious that the runtime
wasn't about the same.  Perhaps the slow execution of the first query
was a caching effect?  If you alternate trying the query both ways,
does the speed difference persist?

            regards, tom lane

Re: Why performance improvement on converting subselect to a function ?

From
Rajesh Kumar Mallah
Date:
Dear Tom,

the problem was repeatble in the sense repeated
execution of queries made no difference on
performance.


What lead to degradation was the bumping off of
effective_cache_size parameter from 1000 to 64K

Can any one point me  the recent guide done by
Sridhar and Josh i want to see what i mis(read|understood)
 from  there ;-) [ it was on GeneralBits' Home Page ]

Anyway the performance gain was from 32 secs to less
than a sec what i restored cache size from 64K to 1000.

I will post again with more details but at the moment
i got to load my data_bank :)



Regds
Mallah.





On Wednesday 30 Jul 2003 3:02 am, Tom Lane wrote:
> Rajesh Kumar Mallah <mallah@trade-india.com> writes:
> > Tom Lane wrote:
> >> Odd.  Apparently the planner is picking a better plan in the function
> >> context than in the subselect context --- which is strange since it
> >> ought to have less information.
> >
> > [ verbose plan snipped ]
>
> Well, that sure seems to be the same plan.  Curious that the runtime
> wasn't about the same.  Perhaps the slow execution of the first query
> was a caching effect?  If you alternate trying the query both ways,
> does the speed difference persist?
>
>             regards, tom lane


Rajesh Kumar Mallah <mallah@trade-india.com> writes:
> What lead to degradation was the bumping off of
> effective_cache_size parameter from 1000 to 64K

Check the plan then; AFAIR the only possible effect of changing
effective_cache_size is to influence which plan the planner picks.

            regards, tom lane

Re: Why performance improvement on converting subselect to a function ?

From
"Shridhar Daithankar"
Date:
On 30 Jul 2003 at 12:54, Rajesh Kumar Mallah wrote:

> Can any one point me  the recent guide done by
> Sridhar and Josh i want to see what i mis(read|understood)
>  from  there ;-) [ it was on GeneralBits' Home Page ]

http://www.varlena.com/GeneralBits/Tidbits/perf.html

HTH

Bye
 Shridhar

--
program, n.:    A magic spell cast over a computer allowing it to turn one's input
into error messages.  tr.v. To engage in a pastime similar to banging    one's
head against a wall, but with fewer opportunities for reward.


Re: Why performance improvement on converting subselect

From
Rajesh Kumar Mallah
Date:
Tom Lane wrote:

>Rajesh Kumar Mallah <mallah@trade-india.com> writes:
>
>
>>What lead to degradation was the bumping off of
>>effective_cache_size parameter from 1000 to 64K
>>
>>
>
>Check the plan then; AFAIR the only possible effect of changing
>effective_cache_size is to influence which plan the planner picks.
>
Dear Tom,

Below are the plans for two cases. I dont know how to read them accurately
can u please explain them. Also can anyone point to some documentation
oriented towards understanding explain analyze output?

Regds
Mallah.

tradein_clients=# SET effective_cache_size = 1000;
SET
tradein_clients=# explain analyze SELECT
pri_key,most_recent_edition(pri_key) from profiles where
source='BRANDING' limit 100;
                                                     QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..25.67 rows=100 width=4) (actual time=141.11..154.71
rows=100 loops=1)
   ->  Seq Scan on profiles  (cost=0.00..15754.83 rows=61385 width=4)
(actual time=141.11..154.51 rows=101 loops=1)
         Filter: (source = 'BRANDING'::character varying)
 Total runtime: 154.84 msec
(4 rows)

tradein_clients=# SET effective_cache_size = 64000;
SET
tradein_clients=# explain analyze SELECT
pri_key,most_recent_edition(pri_key) from profiles where
source='BRANDING' limit 100;
                                                      QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..25.67 rows=100 width=4) (actual
time=587.61..22884.75 rows=100 loops=1)
   ->  Seq Scan on profiles  (cost=0.00..15754.83 rows=61385 width=4)
(actual time=587.60..22884.25 rows=101 loops=1)
         Filter: (source = 'BRANDING'::character varying)
 Total runtime: 22884.97 msec
(4 rows)

tradein_clients=#



>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faqs/FAQ.html
>
>



Re: Why performance improvement on converting subselect

From
Tom Lane
Date:
Rajesh Kumar Mallah <mallah@trade-india.com> writes:
> Below are the plans for two cases. I dont know how to read them accurately
> can u please explain them.

Well, they're the same plan, as far as they go.  I suppose that the
runtime difference must come from choosing a different plan inside the
most_recent_edition() function, which we cannot see in the explain
output.  As before, turning on logging of verbose query plans is the
only way to look at what the function is doing.

> Also can anyone point to some documentation
> oriented towards understanding explain analyze output?

http://www.postgresql.org/docs/view.php?version=7.3&idoc=0&file=performance-tips.html

            regards, tom lane