On Fri, Dec 09, 2005 at 11:32:48AM -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > I can see that being useful for a single-user application that doesn't
> > > have locking or I/O bottlenecks, and doesn't have a multi-stage design
> > > like a database. Do we do enough of such processing that we will _see_
> > > an improvement, or will our code become more complex and it will be
> > > harder to make algorithmic optimizations to our code?
> >
> > The main concern I've got about this is the probable negative effect on
> > code readability. There's a limit to the extent to which I'm willing to
> > uglify the code for processor-specific optimizations, and that limit is
> > not real far off. There are a lot of other design levels we can work at
> > to obtain speedups that won't depend on the assumption we are running
> > on this-year's Intel hardware.
>
> That is my guess too. We have seen speedups by inlining and optimizing
> frequently-called functions and using assembler for spinlocks. Proof of
> the assembler is in /pg/include/storage/s_lock.h and proof of the
> inlining is in /pg/include/access/heapam.h. Those were chosen for
> optimization because they were used a lot.
>
> I think the big question is whether there are other areas that have a
> similar CPU load and can be meaningfully optimized, and does the
> optimization include such things as multi-staging. I think we should
> take a wait and see attitude and see what test results people get.
>
I also agree that we should go for the most bang for the buck and include
the coding/maint. aspects in the cost. Pre-fetching is not just available
in x86 processors. Most modern processors now support memory prefetch
operations. If we do not consider memory cache-line stalls while the
processor waits for data in our designs going forward, there will be
substantial performance gains that will be forever out of reach.
Ken