Thread: [HACKERS] Async IO description

[HACKERS] Async IO description

From
Zeugswetter Andreas SARZ
Date:
> When using aio for file or raw device access the following functions
> have to be used (from sys/aio.h):
>
int     aio_read(int, struct aiocb *);
int     aio_write(int, struct aiocb *);
int     aio_cancel(int, struct aiocb *);
    int     aio_suspend(int, struct aiocb *[]);

    The main advantage is not read ahead or the like (read ahead can be
    accomplished with other means, e.g. separate reader and writer
processes).
    The main advantage is, that a process that calls these for IO will
not
    be suspended by the OPsys, and can therefore do other work
    until the data is available. On fast disks the data will be
available
    before the process time slice (20 - 50 ms) is over !
    A process using normal read or write will have to wait until
    all other processes have consumed their time slice.

    I think the first step should be separate global IO processes,
    these could then in a second step use aio.

    Andreas

Re: [HACKERS] Async IO description

From
ocie@paracel.com
Date:
Zeugswetter Andreas SARZ wrote:
>
>
> > When using aio for file or raw device access the following functions
> > have to be used (from sys/aio.h):
> >
> int     aio_read(int, struct aiocb *);
> int     aio_write(int, struct aiocb *);
> int     aio_cancel(int, struct aiocb *);
>     int     aio_suspend(int, struct aiocb *[]);
>
>     The main advantage is not read ahead or the like (read ahead can be
>     accomplished with other means, e.g. separate reader and writer
> processes).
>     The main advantage is, that a process that calls these for IO will
> not
>     be suspended by the OPsys, and can therefore do other work
>     until the data is available. On fast disks the data will be
> available
>     before the process time slice (20 - 50 ms) is over !
>     A process using normal read or write will have to wait until
>     all other processes have consumed their time slice.
>
>     I think the first step should be separate global IO processes,
>     these could then in a second step use aio.
>
>     Andreas
>
>

This will limit us to operating systems that support POSIX aio.  This
mean Linux (in the future), Solaris (anso in the future) and
presumably FreeBSD.  Developing the support for AIO before we have a
place to test is could lead to trouble.  We should also have an
alternative for those systems that don't (or won't) support POSIX aio.

One solution to this might be to write a group of AIO macros for
postgres.  If done correctly, they could be implemented as calls ot
the POSIX AIO functions on POSIX systems that support this, and could
call the normal I/O functions on non POSIX AIO systems.

Also, in order to make the most effective use of AIO, the program will
have to undergo a major rewrite.  Just a short example to ponder (get
me flamed :)

Suppose we are doing a search with a btree index.  We read in the
first page and find that we will need to read in four of its "child"
pages.  We issue an aio_read for each page and aio_suspend until one
of them comes in.  Then you have to figure out which one is ready, go
work on that page, etc.

Lastly, aio reads and writes require memory copying, which can slow
things down.  memory mapping doesn't have this problem -- what you
write is copied directly to the disk without being copied to another
buffer.

Well enough rambling for one day.

Ocie