Thread: Another speedup idea (two, even)
printtup() does a SearchSysCache call for each attribute of each tuple in order to find the appropriate output routine for the attribute's type. (Up till yesterday it did *two* such calls per attribute, but I fixed that.) This is fairly expensive, amounting to about 10% of the runtime in the SELECT-a-large-table test case I'm looking at. It's probably even more than that in cases that don't stress heap_getattr as badly as this one does. It occurs to me that there's no good reason to do this lookup more than once per column --- all the tuples in a relation should have the same set of column types, no? So if we could do these lookups once at the start of an output pass, and cache the results for use in individual printtup calls, we could drive that 10% down to zero at essentially no penalty. There are a couple different ways this could be handled. The way that looks good to me at first glance is to extend the notion of a "tuple destination" (as selected by DestToFunction in dest.c) to include not just a per-tuple processing routine but also setup and cleanup routines, and some storage accessible to all three routines. The setup routine would be passed the TupleDesc info that is expected to apply to all tuples subsequently sent to that destination, and it can do nothing or do setup work for use by the per-tuple routine. What we'd actually have it do for the printtup destination type is create and fill in an array of per-column output function info. The cleanup routine is for symmetry --- for this immediate issue all it would need to do is free the data created by the setup routine, but I can imagine new kinds of destinations that need more setup/cleanup someday. That gives us a place to precalculate the system cache search that finds the type-specific output routine's OID. But as long as we are precalculating stuff, it would also be worthwhile to precalculate the info that fmgr.c needs in order to invoke the routine. For builtin functions it seems to me that we ought to be able to reduce the per-tuple call effort to a straight jump through a function pointer, which would save almost another 10% of SELECT's runtime. Even for non-builtins, finding out that it's not a builtin once per select instead of once per tuple would be helpful. This last idea could perhaps be combined with the revision of the function manager interface that some folks have been muttering about for a while (ie, fix its deficiencies w.r.t. null parameter values). I think we're too close to 6.5 beta to start hacking on a function manager refit, but maybe the tuple destination improvement could be done in time for 6.5? regards, tom lane
> printtup() does a SearchSysCache call for each attribute of each tuple > in order to find the appropriate output routine for the attribute's > type. (Up till yesterday it did *two* such calls per attribute, but > I fixed that.) This is fairly expensive, amounting to about 10% of > the runtime in the SELECT-a-large-table test case I'm looking at. > It's probably even more than that in cases that don't stress > heap_getattr as badly as this one does. Interesting. It seems anything that is applied to every row, or every column of every row is a good candidate for optimization. That is what I have done in the past. > > It occurs to me that there's no good reason to do this lookup more > than once per column --- all the tuples in a relation should have > the same set of column types, no? So if we could do these lookups > once at the start of an output pass, and cache the results for use > in individual printtup calls, we could drive that 10% down to zero > at essentially no penalty. See copy.c. It does much of what you suggest, I think. > > There are a couple different ways this could be handled. The way > that looks good to me at first glance is to extend the notion of > a "tuple destination" (as selected by DestToFunction in dest.c) > to include not just a per-tuple processing routine but also setup and > cleanup routines, and some storage accessible to all three routines. > The setup routine would be passed the TupleDesc info that is expected > to apply to all tuples subsequently sent to that destination, and it can > do nothing or do setup work for use by the per-tuple routine. What > we'd actually have it do for the printtup destination type is create > and fill in an array of per-column output function info. The cleanup > routine is for symmetry --- for this immediate issue all it would need > to do is free the data created by the setup routine, but I can imagine > new kinds of destinations that need more setup/cleanup someday. > > That gives us a place to precalculate the system cache search that > finds the type-specific output routine's OID. But as long as we are > precalculating stuff, it would also be worthwhile to precalculate the > info that fmgr.c needs in order to invoke the routine. For builtin > functions it seems to me that we ought to be able to reduce the > per-tuple call effort to a straight jump through a function pointer, > which would save almost another 10% of SELECT's runtime. Even for > non-builtins, finding out that it's not a builtin once per select > instead of once per tuple would be helpful. > > This last idea could perhaps be combined with the revision of the > function manager interface that some folks have been muttering about > for a while (ie, fix its deficiencies w.r.t. null parameter values). > > I think we're too close to 6.5 beta to start hacking on a function > manager refit, but maybe the tuple destination improvement could be > done in time for 6.5? If you think you understand it, I would recommend going for it. You can't put it in between 6.5 beta and 6.5, so why not do it now, though the function manager fixup may be best left for post 6.5. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
I wrote: >> It occurs to me that there's no good reason to do this lookup more >> than once per column --- all the tuples in a relation should have >> the same set of column types, no? So if we could do these lookups >> once at the start of an output pass, and cache the results for use >> in individual printtup calls, we could drive that 10% down to zero >> at essentially no penalty. >> [ snip ] >> ... as long as we are >> precalculating stuff, it would also be worthwhile to precalculate the >> info that fmgr.c needs in order to invoke the routine. For builtin >> functions it seems to me that we ought to be able to reduce the >> per-tuple call effort to a straight jump through a function pointer, >> which would save almost another 10% of SELECT's runtime. I have implemented and checked in both of these ideas, and gotten the expected savings in runtime of large SELECTs. It turns out that someone was way ahead of me concerning optimizing calls through fmgr.c --- it already is possible to precalculate the target function address (fmgr_info) and then do a direct jump through the function pointer (fmgr_faddr). But printtup.c was using the combined-lookup-and-call routine fmgr() for each tuple, rather than precalculating the function info and re-using it. This was probably because it didn't have any good place to cache the info --- but it does now. There are a number of other places that look like they might profit from the same kind of optimization --- in particular, GROUP BY and UNIQUE (SELECT DISTINCT) processing call fmgr() for each tuple. Also, index processing uses fmgr() rather than precalculated calls. I haven't done anything about this but perhaps someone else would like to. regards, tom lane
> I have implemented and checked in both of these ideas, and gotten the > expected savings in runtime of large SELECTs. > > It turns out that someone was way ahead of me concerning optimizing > calls through fmgr.c --- it already is possible to precalculate the > target function address (fmgr_info) and then do a direct jump through > the function pointer (fmgr_faddr). But printtup.c was using the > combined-lookup-and-call routine fmgr() for each tuple, rather than > precalculating the function info and re-using it. This was probably > because it didn't have any good place to cache the info --- but it > does now. > > There are a number of other places that look like they might profit from > the same kind of optimization --- in particular, GROUP BY and UNIQUE > (SELECT DISTINCT) processing call fmgr() for each tuple. Also, index > processing uses fmgr() rather than precalculated calls. I haven't done > anything about this but perhaps someone else would like to. > Certainly sounds like it would be a big win. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom, where are we on this? As I rememeber, you did this already, right? > printtup() does a SearchSysCache call for each attribute of each tuple > in order to find the appropriate output routine for the attribute's > type. (Up till yesterday it did *two* such calls per attribute, but > I fixed that.) This is fairly expensive, amounting to about 10% of > the runtime in the SELECT-a-large-table test case I'm looking at. > It's probably even more than that in cases that don't stress > heap_getattr as badly as this one does. > > It occurs to me that there's no good reason to do this lookup more > than once per column --- all the tuples in a relation should have > the same set of column types, no? So if we could do these lookups > once at the start of an output pass, and cache the results for use > in individual printtup calls, we could drive that 10% down to zero > at essentially no penalty. > > There are a couple different ways this could be handled. The way > that looks good to me at first glance is to extend the notion of > a "tuple destination" (as selected by DestToFunction in dest.c) > to include not just a per-tuple processing routine but also setup and > cleanup routines, and some storage accessible to all three routines. > The setup routine would be passed the TupleDesc info that is expected > to apply to all tuples subsequently sent to that destination, and it can > do nothing or do setup work for use by the per-tuple routine. What > we'd actually have it do for the printtup destination type is create > and fill in an array of per-column output function info. The cleanup > routine is for symmetry --- for this immediate issue all it would need > to do is free the data created by the setup routine, but I can imagine > new kinds of destinations that need more setup/cleanup someday. > > That gives us a place to precalculate the system cache search that > finds the type-specific output routine's OID. But as long as we are > precalculating stuff, it would also be worthwhile to precalculate the > info that fmgr.c needs in order to invoke the routine. For builtin > functions it seems to me that we ought to be able to reduce the > per-tuple call effort to a straight jump through a function pointer, > which would save almost another 10% of SELECT's runtime. Even for > non-builtins, finding out that it's not a builtin once per select > instead of once per tuple would be helpful. > > This last idea could perhaps be combined with the revision of the > function manager interface that some folks have been muttering about > for a while (ie, fix its deficiencies w.r.t. null parameter values). > > I think we're too close to 6.5 beta to start hacking on a function > manager refit, but maybe the tuple destination improvement could be > done in time for 6.5? > > regards, tom lane > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > Tom, where are we on this? As I rememeber, you did this already, right? Yeah, it's done. regards, tom lane
Added to TODO: * use fmgr_info()/fmgr_faddr() instead of fmgr() calls in high-traffic places, like GROUP BY, UNIQUE, index processing, etc. > I wrote: > >> It occurs to me that there's no good reason to do this lookup more > >> than once per column --- all the tuples in a relation should have > >> the same set of column types, no? So if we could do these lookups > >> once at the start of an output pass, and cache the results for use > >> in individual printtup calls, we could drive that 10% down to zero > >> at essentially no penalty. > >> [ snip ] > >> ... as long as we are > >> precalculating stuff, it would also be worthwhile to precalculate the > >> info that fmgr.c needs in order to invoke the routine. For builtin > >> functions it seems to me that we ought to be able to reduce the > >> per-tuple call effort to a straight jump through a function pointer, > >> which would save almost another 10% of SELECT's runtime. > > I have implemented and checked in both of these ideas, and gotten the > expected savings in runtime of large SELECTs. > > It turns out that someone was way ahead of me concerning optimizing > calls through fmgr.c --- it already is possible to precalculate the > target function address (fmgr_info) and then do a direct jump through > the function pointer (fmgr_faddr). But printtup.c was using the > combined-lookup-and-call routine fmgr() for each tuple, rather than > precalculating the function info and re-using it. This was probably > because it didn't have any good place to cache the info --- but it > does now. > > There are a number of other places that look like they might profit from > the same kind of optimization --- in particular, GROUP BY and UNIQUE > (SELECT DISTINCT) processing call fmgr() for each tuple. Also, index > processing uses fmgr() rather than precalculated calls. I haven't done > anything about this but perhaps someone else would like to. > > regards, tom lane > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026