Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance - Mailing list pgsql-hackers

From Jan Kara
Subject Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date
Msg-id 20140115150320.GD9141@quack.suse.cz
Whole thread Raw
In response to Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance  (Hannu Krosing <hannu@2ndQuadrant.com>)
List pgsql-hackers
On Wed 15-01-14 14:38:44, Hannu Krosing wrote:
> On 01/15/2014 02:01 PM, Jan Kara wrote:
> > On Wed 15-01-14 12:16:50, Hannu Krosing wrote:
> >> On 01/14/2014 06:12 PM, Robert Haas wrote:
> >>> This would be pretty similar to copy-on-write, except
> >>> without the copying. It would just be
> >>> forget-from-the-buffer-pool-on-write. 
> >> +1
> >>
> >> A version of this could probably already be implement using MADV_DONTNEED
> >> and MADV_WILLNEED
> >>
> >> Thet is, just after reading the page in, use MADV_DONTNEED on it. When
> >> evicting
> >> a clean page, check that it is still in cache and if it is, then
> >> MADV_WILLNEED it.
> >>
> >> Another nice thing to do would be dynamically adjusting kernel
> >> dirty_background_ratio
> >> and other related knobs in real time based on how many buffers are dirty
> >> inside postgresql.
> >> Maybe in background writer.
> >>
> >> Question to LKM folks - will kernel react well to frequent changes to
> >> /proc/sys/vm/dirty_*  ?
> >> How frequent can they be (every few second? every second? 100Hz ?)
> >   So the question is what do you mean by 'react'. We check whether we
> > should start background writeback every dirty_writeback_centisecs (5s). We
> > will also check whether we didn't exceed the background dirty limit (and
> > wake writeback thread) when dirtying pages. However this check happens once
> > per several dirtied MB (unless we are close to dirty_bytes).
> >
> > When writeback is running we check roughly once per second (the logic is
> > more complex there but I don't think explaining details would be useful
> > here) whether we are below dirty_background_bytes and stop writeback in
> > that case.
> >
> > So changing dirty_background_bytes every few seconds should work
> > reasonably, once a second is pushing it and 100 Hz - no way. But I'd also
> > note that you have conflicting requirements on the kernel writeback. On one
> > hand you want checkpoint data to steadily trickle to disk (well, trickle
> > isn't exactly the proper word since if you need to checkpoing 16 GB every 5
> > minutes than you need a steady throughput of ~50 MB/s just for
> > checkpointing) so you want to set dirty_background_bytes low, on the other
> > hand you don't want temporary files to get to disk so you want to set
> > dirty_background_bytes high. 
> Is it possible to have more fine-grained control over writeback, like
> configuring dirty_background_bytes per file system / device (or even
> a file or a group of files) ? Currently it isn't possible to tune dirty_background_bytes per device
directly. However see below.

> If not, then how hard would it be to provide this ? We do track amount of dirty pages per device and the thread doing
the
flushing is also per device. The thing is that currently we compute the
per-device background limit as dirty_background_bytes * p, where p is a
proportion of writeback happening on this device to total writeback in the
system (computed as floating average with exponential time-based backoff).
BTW, similarly maximum per-device dirty limit is derived from global
dirty_bytes in the same way. And you can also set bounds on the proportion
'p' in /sys/block/sda/bdi/{min,max}_ratio so in theory you should be able
to set fixed background limit for a device by setting matching min and max
proportions.
                            Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Why conf.d should be default, and auto.conf and recovery.conf should be in it
Next
From: Robert Haas
Date:
Subject: Re: Performance Improvement by reducing WAL for Update Operation