Re: Optimize kernel readahead using buffer access strategy - Mailing list pgsql-hackers

From Claudio Freire
Subject Re: Optimize kernel readahead using buffer access strategy
Date
Msg-id CAGTBQpaFC_z=zdWVAXD8wWss3v6jxZ5pNmrrYPsD23LbrqGvgQ@mail.gmail.com
Whole thread Raw
In response to Optimize kernel readahead using buffer access strategy  (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
Responses Re: Optimize kernel readahead using buffer access strategy  (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
List pgsql-hackers
On Thu, Nov 14, 2013 at 9:09 AM, KONDO Mitsumasa
<kondo.mitsumasa@lab.ntt.co.jp> wrote:
> I create a patch that is improvement of disk-read and OS file caches. It can
> optimize kernel readahead parameter using buffer access strategy and
> posix_fadvice() in various disk-read situations.
>
> In general OS, readahead parameter was dynamically decided by disk-read
> situations. If long time disk-read was happened, readahead parameter becomes big.
> However it is based on experienced or heuristic algorithm, it causes waste
> disk-read and throws out useful OS file caches in some case. It is bad for
> disk-read performance a lot.

It would be relevant to know which kernel did you use for those tests.

@@ -677,6 +677,7 @@ mdread(SMgrRelation reln, ForkNumber forknum,
BlockNumber blocknum,                 errmsg("could not seek to block %u in file \"%s\": %m",
blocknum,FilePathName(v->mdfd_vfd))));
 

+    BufferHintIOAdvise(v->mdfd_vfd, buffer, BLCKSZ, strategy);    nbytes = FileRead(v->mdfd_vfd, buffer, BLCKSZ);
    TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum,

A while back, I tried to use posix_fadvise to prefetch index pages. I
ended up finding out that interleaving posix_fadvise with I/O like
that severly hinders (ie: completely disables) the kernel's read-ahead
algorithm.

How exactly did you set up those benchmarks? pg_bench defaults?

pg_bench does not exercise heavy sequential access patterns, or long
index scans. It performs many single-page index lookups per
transaction and that's it. You may want to try your patch with more
real workloads, and maybe you'll confirm what I found out last time I
messed with posix_fadvise. If my experience is still relevant, those
patterns will have suffered a severe performance penalty with this
patch, because it will disable kernel read-ahead on sequential index
access. It may still work for sequential heap scans, because the
access strategy will tell the kernel to do read-ahead, but many other
access methods will suffer.

Try OLAP-style queries.



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: logical changeset generation v6.7
Next
From: Heikki Linnakangas
Date:
Subject: Re: init_sequence spill to hash table