I know people are still reviewing the SDB implementation for PostgreSQL,
but I was thinking about it today.
This is the first time I realized how efficient our current system is.
We have shared buffers that are mapped into the address space of each
backend. When a table is sequentially scanned, buffers are loaded into
that area and the backend accesses that 8k straight out of memory. If I
remember the optimizations I added, much of that access uses inlined
functions (macros) meaning the buffers are scanned at amazing speeds. I
know inlining a few of those functions gained a 10% speedup.
I wonder how SDB performs such file scans. Of course, the real trick is
getting those buffers loaded faster. For sequential scans, the kernel
prefetch does a good job, but index scans that hit many tuples have
problems, I am sure. ISAM helps in this regard, but I don't see that
SDB has it.
There is also the Linux problem of preventing read-ahead after an
seek(), while the BSD/HP kernels prevent prefetch only when prefetch
blocks remain unused.
And there is the problem of cache wiping, where a large sequential scan
removes all other cached blocks from the buffer. I don't know a way to
prevent that one, though we could have large sequential scans reuse
their own buffer, rather than grabbing the oldest buffer.
-- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610)
853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill,
Pennsylvania19026