Re: [HACKERS] Another speedup idea (two, even) - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [HACKERS] Another speedup idea (two, even)
Date
Msg-id 199901250224.VAA07227@candle.pha.pa.us
Whole thread Raw
In response to Another speedup idea (two, even)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Another speedup idea (two, even)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> printtup() does a SearchSysCache call for each attribute of each tuple
> in order to find the appropriate output routine for the attribute's
> type.  (Up till yesterday it did *two* such calls per attribute, but
> I fixed that.)  This is fairly expensive, amounting to about 10% of
> the runtime in the SELECT-a-large-table test case I'm looking at.
> It's probably even more than that in cases that don't stress
> heap_getattr as badly as this one does.

Interesting.  It seems anything that is applied to every row, or every
column of every row is a good candidate for optimization.  That is what
I have done in the past.

> 
> It occurs to me that there's no good reason to do this lookup more
> than once per column --- all the tuples in a relation should have
> the same set of column types, no?  So if we could do these lookups
> once at the start of an output pass, and cache the results for use
> in individual printtup calls, we could drive that 10% down to zero
> at essentially no penalty.

See copy.c.  It does much of what you suggest, I think.

> 
> There are a couple different ways this could be handled.  The way
> that looks good to me at first glance is to extend the notion of
> a "tuple destination" (as selected by DestToFunction in dest.c)
> to include not just a per-tuple processing routine but also setup and
> cleanup routines, and some storage accessible to all three routines.
> The setup routine would be passed the TupleDesc info that is expected
> to apply to all tuples subsequently sent to that destination, and it can
> do nothing or do setup work for use by the per-tuple routine.  What
> we'd actually have it do for the printtup destination type is create
> and fill in an array of per-column output function info.  The cleanup
> routine is for symmetry --- for this immediate issue all it would need
> to do is free the data created by the setup routine, but I can imagine
> new kinds of destinations that need more setup/cleanup someday.
> 
> That gives us a place to precalculate the system cache search that
> finds the type-specific output routine's OID.  But as long as we are
> precalculating stuff, it would also be worthwhile to precalculate the
> info that fmgr.c needs in order to invoke the routine.  For builtin
> functions it seems to me that we ought to be able to reduce the
> per-tuple call effort to a straight jump through a function pointer,
> which would save almost another 10% of SELECT's runtime.  Even for
> non-builtins, finding out that it's not a builtin once per select
> instead of once per tuple would be helpful.
> 
> This last idea could perhaps be combined with the revision of the
> function manager interface that some folks have been muttering about
> for a while (ie, fix its deficiencies w.r.t. null parameter values).
> 
> I think we're too close to 6.5 beta to start hacking on a function
> manager refit, but maybe the tuple destination improvement could be
> done in time for 6.5?

If you think you understand it, I would recommend going for it.  You
can't put it in between 6.5 beta and 6.5, so why not do it now, though
the function manager fixup may be best left for post 6.5.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Another speedup idea (two, even)
Next
From: "Thomas G. Lockhart"
Date:
Subject: Re: Odd behavior of type coercion for datetime