On Tue, 2005-08-23 at 19:31 -0700, Josh Berkus wrote:
> Steve,
>
> > I would assume that dbt2 with STP helps minimize the amount of hours
> > someone has to invest to determine performance gains with configurable
> > options?
>
> Actually, these I/O operation issues show up mainly with DW workloads, so the
> STP isn't much use there. If I can ever get some of these machines back
> from the build people, I'd like to start testing some stuff.
>
> One issue with testing this is that currently PostgreSQL doesn't support block
> sizes above 128K. We've already done testing on that (well, Mark has) and
> the performance gains aren't even worth the hassle of remembering you're on a
> different block size (like, +4%).
>
> What the Sun people have done with other DB systems is show that substantial
> performance gains are possible on large databases (>100G) using block sizes
> of 1MB. I believe that's possible (and that it probably makes more of a
> difference on Solaris than on BSD) but we can't test it without some hackery
> first.
To get decent I/O you need 1MB fundamental units all the way down the
stack. You need a filesystem that can take a 1MB write well, and you
need an I/O scheduler that will keep it together, and you need a storage
controller that can eat a 1MB request at once. Ideally you'd like an
architecture with a 1MB page (Itanium has this, and AMD64 Linux will
soon have this.) The Lustre people have done some work in this area,
opening up the datapaths in the kernel so they can keep the hardware
really working. They even modified the QLogic SCSI/FC driver so it
supports such large transfers. Their work has shown that you can get
significant perf boost on Linux just by thinking in terms of larger
transfers.
Unfortunately I'm really afraid that this conversation is about trees
when the forest is the problem. PostgreSQL doesn't even have an async
reader, which is the sort of thing that could double or triple its
performance. You're talking about block sizes and such, but the kinds
of improvements you can get there are in the tens of percents at most.
-jwb