Re: [HACKERS] Safe/Fast I/O ... - Mailing list pgsql-hackers
From | dg@illustra.com (David Gould) |
---|---|
Subject | Re: [HACKERS] Safe/Fast I/O ... |
Date | |
Msg-id | 9804120747.AA01991@hawk.illustra.com Whole thread Raw |
In response to | Re: [HACKERS] Safe/Fast I/O ... (The Hermit Hacker <scrappy@hub.org>) |
Responses |
Book recommendation, was Re: [HACKERS] Safe/Fast I/O ...
|
List | pgsql-hackers |
Marc G. Fournier wrote: > On Sun, 12 Apr 1998, Matthew N. Dodd wrote: > > On Sun, 12 Apr 1998, The Hermit Hacker wrote: > > > I hate to have to ask, but how is MMAP or AIO better then sfio? I > > > haven't had enough time to research any of this, and am just starting to > > > look at it... > > > > If its simple to compile and works as a drop in replacement AND is faster, > > I see no reason why PostgreSQL shouldn't try to link with it. > > That didn't really answer the question :( > > > Keep in mind though that in order to use MMAP or AIO you'd be > > restructuring the code to be more efficient rather than doing more of the > > same old thing but optimized. > > I don't know anything about AIO, so if you can give me a pointer > to where I can read up on it, please do... > > ...but, with MMAP, unless I'm mistaken, you'd essentially be > reading the file(s) into memory and then manipulating the file(s) there. > Which means one helluva large amount of RAM being required...no? > > Using stdio vs sfio, to read a 1.2million line file, the time to > complete goes from 7sec to 5sec ... that makes for a substantial savings > in time, if its applicable. > > the problem, as I see it right now, is the docs for it suck ... > so, right now, I'm fumbling through figuring it all out :) One of the options when building perl5 is to use sfio instead of stdio. I haven't tried it, but they seem to think it works. That said, The only place I see this helping pgsql is in copyin and copyout as these use the stdio: fread(), fwrite(), etc interfaces. Everywhere else we use the system call IO interfaces: read(), write(), recv(), send(), select() etc, and do our own buffering. My prediction is that sfio vs stdio will have undetectable performance impact on sql performance and only very minor impact on copyin, copyout (as most of the overhead is in pgsql, not libc). As far as IO, the problem we have is fsync(). To get rid of it means doing a real writeahead log system and (maybe) aio to the log. As soon as we get real logging then we don't need to force datapages out so we can get rid of all the fsync and (given how slow we are otherwise) completely eliminate IO as a bottleneck. Pgsql was built for comfort, not for speed. Fine tuning and code tweeking and microoptimization is fine as far as it goes. But there is probably a maximum 2x speed up to be had that way. Total. We need a 10x speedup to play with serious databases. This will take real architectural changes. If you are interested in what is necessary, I highly recommend the book "Transaction Processing" by Jim Gray (and someone whose name escapes me just now). It is a great big thing and will take a while to get through, but is is decently written and very well worth the time. It pretty much gives away the whole candy store as far as building high performance, reliable, and scalable database and TP systems. I wish it had been available 10 years ago when I got into the DB game. -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 - Linux. Not because it is free. Because it is better.
pgsql-hackers by date: