Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date
Msg-id 16757.1389713975@sss.pgh.pa.us
Whole thread Raw
In response to Re: Linux kernel impact on PostgreSQL performance  (Josh Berkus <josh@agliodbs.com>)
Responses Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance  (Trond Myklebust <trondmy@gmail.com>)
Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance  (James Bottomley <James.Bottomley@HansenPartnership.com>)
List pgsql-hackers
James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> The current mechanism for coherency between a userspace cache and the
> in-kernel page cache is mmap ... that's the only way you get the same
> page in both currently.

Right.

> glibc used to have an implementation of read/write in terms of mmap, so
> it should be possible to insert it into your current implementation
> without a major rewrite.  The problem I think this brings you is
> uncontrolled writeback: you don't want dirty pages to go to disk until
> you issue a write()

Exactly.

> I think we could fix this with another madvise():
> something like MADV_WILLUPDATE telling the page cache we expect to alter
> the pages again, so don't be aggressive about cleaning them.

"Don't be aggressive" isn't good enough.  The prohibition on early write
has to be absolute, because writing a dirty page before we've done
whatever else we need to do results in a corrupt database.  It has to
be treated like a write barrier.

> The problem is we can't give you absolute control of when pages are
> written back because that interface can be used to DoS the system: once
> we get too many dirty uncleanable pages, we'll thrash looking for memory
> and the system will livelock.

Understood, but that makes this direction a dead end.  We can't use
it if the kernel might decide to write anyway.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: plpgsql.consistent_into
Next
From: Alexander Korotkov
Date:
Subject: Re: PoC: Partial sort