Re: Database Kernels and O_DIRECT - Mailing list pgsql-hackers

From James Rogers
Subject Re: Database Kernels and O_DIRECT
Date
Msg-id 1066161543.20750.95.camel@localhost.localdomain
Whole thread Raw
In response to [Linus Torvalds ] Re: statfs() / statvfs() syscall ballsup...  (Greg Stark <gsstark@mit.edu>)
Responses Re: Database Kernels and O_DIRECT
List pgsql-hackers
On Sun, 2003-10-12 at 15:13, Greg Stark wrote:
> There's an interesting thread on linux-kernel right now about O_DIRECT and the
> kernel i/o APIs databases need. I noticed a connection between what they were
> discussing and the earlier discussions here and the pining for an interface to
> avoid having vacuum preempt other disk i/o.
>
> Someone from Oracle is on there explaining what Oracle's needs are. Perhaps
> someone more knowledgable than myself could explain what would most help
> postgres in this area.


There is an important difference between Oracle and Postgres that makes
discussions of this complicated because the assumptions are different.

Oracle runs on top of a database kernel, whereas Postgres does not.  In
the former case, it is very useful and conducive to better performance
to have O_DIRECT and direct control of the I/O in general -- the more,
the better.  In the latter case (e.g. Postgres), it is more of a
nuisance and difficult to exploit well.

The point of having a database kernel underneath the DBMS is two-fold.  

First, it improves portability by acting as an operating system
abstraction layer, replacing OS kernel services with its own equivalents
(which may map to any number of mechanisms underneath).  It is the
reason Oracle is easily supported on so many operating systems; to port
to a new OS, they only have to modify the database kernel, and they
probably have a highly portable generic version to start with that they
can then optimize for a given platform at their leisure. All the rest of
Oracle's code only has to compile against and run on the virtual
operating system that is their database kernel.

Second, where possible, the database kernel bypasses the OS kernel
internally (e.g. O_DIRECT) and implements its own versions of the OS
kernel services that are highly-tuned for database purposes. This often
has significant performance benefits.  While it kind of looks like an OS
on top of an OS, well-written database kernels often tend to exist
almost parallel the system kernel in certain respects, only using the
system kernel where it is convenient or for future capabilities that
have been stubbed out in the database kernel.  Writing DBMS code to a
database kernel almost always produces a more scalable system than
writing to portable OS APIs because it eliminates the "lowest common
denominator" effect.

Having a database kernel isn't really important unless you are a
performance junkie or have to address really scalable database systems. 
Some more advanced DBMS features are easier to implement on a database
kernel as a pragmatic concern, because the system model being
implemented for is more database friendly. It lets the database take
advantage of the more advanced features and optimizations of whatever
operating system it is running on without the vast majority of the DBMS
code base being aware of these significant differences.

I'd like to see Postgres move to a database kernel eventually for a lot
of reasons, but it would a relatively significant change. Maybe v8? :-)

Cheers,

-James Rogersjamesr@best.com




pgsql-hackers by date:

Previous
From: "Dave Page"
Date:
Subject: Re: postgres --help-config
Next
From: Bruce Momjian
Date:
Subject: Re: pg_ctl reload - is it safe?