asynchronous disk io (was : tuplesort memory usage) - Mailing list pgsql-hackers
From | johnlumby |
---|---|
Subject | asynchronous disk io (was : tuplesort memory usage) |
Date | |
Msg-id | 502EC58F.3050003@hotmail.com Whole thread Raw |
List | pgsql-hackers |
> Date: Fri, 17 Aug 2012 00:26:37 +0100 From: Peter Geoghegan > <peter@2ndquadrant.com> To: Jeff Janes <jeff.janes@gmail.com> Cc: > pgsql-hackers <pgsql-hackers@postgresql.org> Subject: Re: tuplesort > memory usage: grow_memtuples Message-ID: > <CAEYLb_VeZpKDX54VEx3X30oy_UOTh89XoejJW6aucjjiUjskXw@mail.gmail.com> > On 27 July 2012 16:39, Jeff Janes <jeff.janes@gmail.com> wrote: >>> >> Can you suggest a benchmark that will usefully exercise this patch? >> > >> > I think the given sizes below work on most 64 bit machines. > [...] > > I think this patch (or at least your observation about I/O waits > within vmstat) may point to a more fundamental issue with our sort > code: Why are we not using asynchronous I/O in our implementation? > There are anecdotal reports of other RDBMS implementations doing far > better than we do here, and I believe asynchronous I/O, pipelining, > and other such optimisations have a lot to do with that. It's > something I'd hoped to find the time to look at in detail, but > probably won't in the 9.3 cycle. One of the more obvious ways of > optimising an external sort is to use asynchronous I/O so that one run > of data can be sorted or merged while other runs are being read from > or written to disk. Our current implementation seems naive about this. > There are some interesting details about how this is exposed by POSIX > here: > > http://www.gnu.org/software/libc/manual/html_node/Asynchronous-I_002fO.html I've recently tried extending the postgresql prefetch mechanism on linux to use the posix (i.e. librt) aio_read and friends where possible. In other words, in PrefetchBuffer(), try getting a buffer and issuing aio_read before falling back to fposix_advise(). It gives me about 8% improvement in throughput relative to the fposix-advise variety, for a workload of 16 highly-disk-read-intensive applications running to 16 backends. For my test each application runs a query chosen to have plenty of bitmap heap scans. I can provide more details on my changes if interested. On whether this technique might improve sort performance : First, the disk access pattern for sorting is mostly sequential (although I think the sort module does some tricky work with reuse of pages in its "logtape" files which maybe is random-like), and there are several claims on the net that linux buffered file handling already does a pretty good job of read-ahead for a sequential access pattern without any need for the application to help it. I can half-confirm that in that I tried adding calls to PrefetchBuffer in regular heap scan and did not see much improvement. But I am still pursuing that area. But second, it would be easy enough to add some fposix_advise calls to sort and see whether that helps. (Can't make use of PrefetchBuffer since sort does not use the regular relation buffer pool) > > It's already anticipated that we might take advantage of libaio for > the benefit of FilePrefetch() (see its accompanying comments - it uses > posix_fadvise itself - effective_io_concurrency must be> 0 for this > to ever be called). It perhaps could be considered parallel > "low-hanging fruit" in that it allows us to offer limited though > useful backend parallelism without first resolving thorny issues > around what abstraction we might use, or how we might eventually make > backends thread-safe. AIO supports registering signal callbacks (a > SIGPOLL handler can be called), which seems relatively > uncontroversial. I believe libaio is dead, as it depended on the old linux kernel asynchronous file io, which was problematic and imposed various restrictions on the application. librt aio has no restrictions and does a good enough job but uses pthreads and synchronous io, which can make CPU overhead a bit heavy and also I believe results in causing more context switching than with synchronous io, whereas one of the benefits of kernel async io (in theory) is reduce context switching. From what I've seen, pthreads aio can give a benefit when there is high IO wait from mostly-read activity, the disk access pattern is not sequential (so kernel readahead cant predict it) but postgresql can predict it, and there's enough spare idle CPU to run the pthreads. So it does seem that bitmap heap scan is a good choice for prefetching. > > Platform support for AIO might be a bit lacking, but then you can say > the same about posix_fadvise. We don't assume that poll(2) is > available, but we already use it where it is within the latch code. > Besides, in-kernel support can be emulated if POSIX threads is > available, which I believe would make this broadly useful on unix-like > platforms. > > -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, > 24x7 Support, Training and Services
pgsql-hackers by date: