Thread: Another speedup idea (two, even)

Another speedup idea (two, even)

From
Tom Lane
Date:
printtup() does a SearchSysCache call for each attribute of each tuple
in order to find the appropriate output routine for the attribute's
type.  (Up till yesterday it did *two* such calls per attribute, but
I fixed that.)  This is fairly expensive, amounting to about 10% of
the runtime in the SELECT-a-large-table test case I'm looking at.
It's probably even more than that in cases that don't stress
heap_getattr as badly as this one does.

It occurs to me that there's no good reason to do this lookup more
than once per column --- all the tuples in a relation should have
the same set of column types, no?  So if we could do these lookups
once at the start of an output pass, and cache the results for use
in individual printtup calls, we could drive that 10% down to zero
at essentially no penalty.

There are a couple different ways this could be handled.  The way
that looks good to me at first glance is to extend the notion of
a "tuple destination" (as selected by DestToFunction in dest.c)
to include not just a per-tuple processing routine but also setup and
cleanup routines, and some storage accessible to all three routines.
The setup routine would be passed the TupleDesc info that is expected
to apply to all tuples subsequently sent to that destination, and it can
do nothing or do setup work for use by the per-tuple routine.  What
we'd actually have it do for the printtup destination type is create
and fill in an array of per-column output function info.  The cleanup
routine is for symmetry --- for this immediate issue all it would need
to do is free the data created by the setup routine, but I can imagine
new kinds of destinations that need more setup/cleanup someday.

That gives us a place to precalculate the system cache search that
finds the type-specific output routine's OID.  But as long as we are
precalculating stuff, it would also be worthwhile to precalculate the
info that fmgr.c needs in order to invoke the routine.  For builtin
functions it seems to me that we ought to be able to reduce the
per-tuple call effort to a straight jump through a function pointer,
which would save almost another 10% of SELECT's runtime.  Even for
non-builtins, finding out that it's not a builtin once per select
instead of once per tuple would be helpful.

This last idea could perhaps be combined with the revision of the
function manager interface that some folks have been muttering about
for a while (ie, fix its deficiencies w.r.t. null parameter values).

I think we're too close to 6.5 beta to start hacking on a function
manager refit, but maybe the tuple destination improvement could be
done in time for 6.5?
        regards, tom lane


Re: [HACKERS] Another speedup idea (two, even)

From
Bruce Momjian
Date:
> printtup() does a SearchSysCache call for each attribute of each tuple
> in order to find the appropriate output routine for the attribute's
> type.  (Up till yesterday it did *two* such calls per attribute, but
> I fixed that.)  This is fairly expensive, amounting to about 10% of
> the runtime in the SELECT-a-large-table test case I'm looking at.
> It's probably even more than that in cases that don't stress
> heap_getattr as badly as this one does.

Interesting.  It seems anything that is applied to every row, or every
column of every row is a good candidate for optimization.  That is what
I have done in the past.

> 
> It occurs to me that there's no good reason to do this lookup more
> than once per column --- all the tuples in a relation should have
> the same set of column types, no?  So if we could do these lookups
> once at the start of an output pass, and cache the results for use
> in individual printtup calls, we could drive that 10% down to zero
> at essentially no penalty.

See copy.c.  It does much of what you suggest, I think.

> 
> There are a couple different ways this could be handled.  The way
> that looks good to me at first glance is to extend the notion of
> a "tuple destination" (as selected by DestToFunction in dest.c)
> to include not just a per-tuple processing routine but also setup and
> cleanup routines, and some storage accessible to all three routines.
> The setup routine would be passed the TupleDesc info that is expected
> to apply to all tuples subsequently sent to that destination, and it can
> do nothing or do setup work for use by the per-tuple routine.  What
> we'd actually have it do for the printtup destination type is create
> and fill in an array of per-column output function info.  The cleanup
> routine is for symmetry --- for this immediate issue all it would need
> to do is free the data created by the setup routine, but I can imagine
> new kinds of destinations that need more setup/cleanup someday.
> 
> That gives us a place to precalculate the system cache search that
> finds the type-specific output routine's OID.  But as long as we are
> precalculating stuff, it would also be worthwhile to precalculate the
> info that fmgr.c needs in order to invoke the routine.  For builtin
> functions it seems to me that we ought to be able to reduce the
> per-tuple call effort to a straight jump through a function pointer,
> which would save almost another 10% of SELECT's runtime.  Even for
> non-builtins, finding out that it's not a builtin once per select
> instead of once per tuple would be helpful.
> 
> This last idea could perhaps be combined with the revision of the
> function manager interface that some folks have been muttering about
> for a while (ie, fix its deficiencies w.r.t. null parameter values).
> 
> I think we're too close to 6.5 beta to start hacking on a function
> manager refit, but maybe the tuple destination improvement could be
> done in time for 6.5?

If you think you understand it, I would recommend going for it.  You
can't put it in between 6.5 beta and 6.5, so why not do it now, though
the function manager fixup may be best left for post 6.5.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Another speedup idea (two, even)

From
Tom Lane
Date:
I wrote:
>> It occurs to me that there's no good reason to do this lookup more
>> than once per column --- all the tuples in a relation should have
>> the same set of column types, no?  So if we could do these lookups
>> once at the start of an output pass, and cache the results for use
>> in individual printtup calls, we could drive that 10% down to zero
>> at essentially no penalty.
>> [ snip ]
>> ... as long as we are
>> precalculating stuff, it would also be worthwhile to precalculate the
>> info that fmgr.c needs in order to invoke the routine.  For builtin
>> functions it seems to me that we ought to be able to reduce the
>> per-tuple call effort to a straight jump through a function pointer,
>> which would save almost another 10% of SELECT's runtime.

I have implemented and checked in both of these ideas, and gotten the
expected savings in runtime of large SELECTs.

It turns out that someone was way ahead of me concerning optimizing
calls through fmgr.c --- it already is possible to precalculate the
target function address (fmgr_info) and then do a direct jump through
the function pointer (fmgr_faddr).  But printtup.c was using the
combined-lookup-and-call routine fmgr() for each tuple, rather than
precalculating the function info and re-using it.  This was probably
because it didn't have any good place to cache the info --- but it
does now.

There are a number of other places that look like they might profit from
the same kind of optimization --- in particular, GROUP BY and UNIQUE
(SELECT DISTINCT) processing call fmgr() for each tuple.  Also, index
processing uses fmgr() rather than precalculated calls.  I haven't done
anything about this but perhaps someone else would like to.
        regards, tom lane


Re: [HACKERS] Another speedup idea (two, even)

From
Bruce Momjian
Date:
> I have implemented and checked in both of these ideas, and gotten the
> expected savings in runtime of large SELECTs.
> 
> It turns out that someone was way ahead of me concerning optimizing
> calls through fmgr.c --- it already is possible to precalculate the
> target function address (fmgr_info) and then do a direct jump through
> the function pointer (fmgr_faddr).  But printtup.c was using the
> combined-lookup-and-call routine fmgr() for each tuple, rather than
> precalculating the function info and re-using it.  This was probably
> because it didn't have any good place to cache the info --- but it
> does now.
> 
> There are a number of other places that look like they might profit from
> the same kind of optimization --- in particular, GROUP BY and UNIQUE
> (SELECT DISTINCT) processing call fmgr() for each tuple.  Also, index
> processing uses fmgr() rather than precalculated calls.  I haven't done
> anything about this but perhaps someone else would like to.
> 

Certainly sounds like it would be a big win.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Another speedup idea (two, even)

From
Bruce Momjian
Date:
Tom, where are we on this?  As I rememeber, you did this already, right?




> printtup() does a SearchSysCache call for each attribute of each tuple
> in order to find the appropriate output routine for the attribute's
> type.  (Up till yesterday it did *two* such calls per attribute, but
> I fixed that.)  This is fairly expensive, amounting to about 10% of
> the runtime in the SELECT-a-large-table test case I'm looking at.
> It's probably even more than that in cases that don't stress
> heap_getattr as badly as this one does.
> 
> It occurs to me that there's no good reason to do this lookup more
> than once per column --- all the tuples in a relation should have
> the same set of column types, no?  So if we could do these lookups
> once at the start of an output pass, and cache the results for use
> in individual printtup calls, we could drive that 10% down to zero
> at essentially no penalty.
> 
> There are a couple different ways this could be handled.  The way
> that looks good to me at first glance is to extend the notion of
> a "tuple destination" (as selected by DestToFunction in dest.c)
> to include not just a per-tuple processing routine but also setup and
> cleanup routines, and some storage accessible to all three routines.
> The setup routine would be passed the TupleDesc info that is expected
> to apply to all tuples subsequently sent to that destination, and it can
> do nothing or do setup work for use by the per-tuple routine.  What
> we'd actually have it do for the printtup destination type is create
> and fill in an array of per-column output function info.  The cleanup
> routine is for symmetry --- for this immediate issue all it would need
> to do is free the data created by the setup routine, but I can imagine
> new kinds of destinations that need more setup/cleanup someday.
> 
> That gives us a place to precalculate the system cache search that
> finds the type-specific output routine's OID.  But as long as we are
> precalculating stuff, it would also be worthwhile to precalculate the
> info that fmgr.c needs in order to invoke the routine.  For builtin
> functions it seems to me that we ought to be able to reduce the
> per-tuple call effort to a straight jump through a function pointer,
> which would save almost another 10% of SELECT's runtime.  Even for
> non-builtins, finding out that it's not a builtin once per select
> instead of once per tuple would be helpful.
> 
> This last idea could perhaps be combined with the revision of the
> function manager interface that some folks have been muttering about
> for a while (ie, fix its deficiencies w.r.t. null parameter values).
> 
> I think we're too close to 6.5 beta to start hacking on a function
> manager refit, but maybe the tuple destination improvement could be
> done in time for 6.5?
> 
>             regards, tom lane
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Another speedup idea (two, even)

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Tom, where are we on this?  As I rememeber, you did this already, right?

Yeah, it's done.
        regards, tom lane


Re: [HACKERS] Another speedup idea (two, even)

From
Bruce Momjian
Date:
Added to TODO:

* use fmgr_info()/fmgr_faddr() instead of fmgr() calls in high-traffic places, like GROUP BY, UNIQUE, index processing,
etc.



> I wrote:
> >> It occurs to me that there's no good reason to do this lookup more
> >> than once per column --- all the tuples in a relation should have
> >> the same set of column types, no?  So if we could do these lookups
> >> once at the start of an output pass, and cache the results for use
> >> in individual printtup calls, we could drive that 10% down to zero
> >> at essentially no penalty.
> >> [ snip ]
> >> ... as long as we are
> >> precalculating stuff, it would also be worthwhile to precalculate the
> >> info that fmgr.c needs in order to invoke the routine.  For builtin
> >> functions it seems to me that we ought to be able to reduce the
> >> per-tuple call effort to a straight jump through a function pointer,
> >> which would save almost another 10% of SELECT's runtime.
> 
> I have implemented and checked in both of these ideas, and gotten the
> expected savings in runtime of large SELECTs.
> 
> It turns out that someone was way ahead of me concerning optimizing
> calls through fmgr.c --- it already is possible to precalculate the
> target function address (fmgr_info) and then do a direct jump through
> the function pointer (fmgr_faddr).  But printtup.c was using the
> combined-lookup-and-call routine fmgr() for each tuple, rather than
> precalculating the function info and re-using it.  This was probably
> because it didn't have any good place to cache the info --- but it
> does now.
> 
> There are a number of other places that look like they might profit from
> the same kind of optimization --- in particular, GROUP BY and UNIQUE
> (SELECT DISTINCT) processing call fmgr() for each tuple.  Also, index
> processing uses fmgr() rather than precalculated calls.  I haven't done
> anything about this but perhaps someone else would like to.
> 
>             regards, tom lane
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026