Re: [HACKERS] Async IO description - Mailing list pgsql-hackers

From ocie@paracel.com
Subject Re: [HACKERS] Async IO description
Date
Msg-id 9804222101.AA28998@dolomite.paracel.com
Whole thread Raw
In response to [HACKERS] Async IO description  (Zeugswetter Andreas SARZ <Andreas.Zeugswetter@telecom.at>)
List pgsql-hackers
Zeugswetter Andreas SARZ wrote:
>
>
> > When using aio for file or raw device access the following functions
> > have to be used (from sys/aio.h):
> >
> int     aio_read(int, struct aiocb *);
> int     aio_write(int, struct aiocb *);
> int     aio_cancel(int, struct aiocb *);
>     int     aio_suspend(int, struct aiocb *[]);
>
>     The main advantage is not read ahead or the like (read ahead can be
>     accomplished with other means, e.g. separate reader and writer
> processes).
>     The main advantage is, that a process that calls these for IO will
> not
>     be suspended by the OPsys, and can therefore do other work
>     until the data is available. On fast disks the data will be
> available
>     before the process time slice (20 - 50 ms) is over !
>     A process using normal read or write will have to wait until
>     all other processes have consumed their time slice.
>
>     I think the first step should be separate global IO processes,
>     these could then in a second step use aio.
>
>     Andreas
>
>

This will limit us to operating systems that support POSIX aio.  This
mean Linux (in the future), Solaris (anso in the future) and
presumably FreeBSD.  Developing the support for AIO before we have a
place to test is could lead to trouble.  We should also have an
alternative for those systems that don't (or won't) support POSIX aio.

One solution to this might be to write a group of AIO macros for
postgres.  If done correctly, they could be implemented as calls ot
the POSIX AIO functions on POSIX systems that support this, and could
call the normal I/O functions on non POSIX AIO systems.

Also, in order to make the most effective use of AIO, the program will
have to undergo a major rewrite.  Just a short example to ponder (get
me flamed :)

Suppose we are doing a search with a btree index.  We read in the
first page and find that we will need to read in four of its "child"
pages.  We issue an aio_read for each page and aio_suspend until one
of them comes in.  Then you have to figure out which one is ready, go
work on that page, etc.

Lastly, aio reads and writes require memory copying, which can slow
things down.  memory mapping doesn't have this problem -- what you
write is copied directly to the disk without being copied to another
buffer.

Well enough rambling for one day.

Ocie

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Re: [QUESTIONS] How to use memory instead of hd?
Next
From: Hal Snyder
Date:
Subject: Re: [HACKERS] Re: hackers-digest V1 #771 (safe/fast I/O)