Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date
Msg-id CA+TgmoY8a5_SORA3tzojpPMj6zb9pym05N63PP0zuAenoxtNmw@mail.gmail.com
Whole thread Raw
In response to Re: Linux kernel impact on PostgreSQL performance  (Josh Berkus <josh@agliodbs.com>)
Responses Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance  (Claudio Freire <klaussfreire@gmail.com>)
Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance  (James Bottomley <James.Bottomley@HansenPartnership.com>)
List pgsql-hackers
On Tue, Jan 14, 2014 at 12:20 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Tue, 2014-01-14 at 15:15 -0200, Claudio Freire wrote:
>> On Tue, Jan 14, 2014 at 2:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > In terms of avoiding double-buffering, here's my thought after reading
>> > what's been written so far.  Suppose we read a page into our buffer
>> > pool.  Until the page is clean, it would be ideal for the mapping to
>> > be shared between the buffer cache and our pool, sort of like
>> > copy-on-write.  That way, if we decide to evict the page, it will
>> > still be in the OS cache if we end up needing it again (remember, the
>> > OS cache is typically much larger than our buffer pool).  But if the
>> > page is dirtied, then instead of copying it, just have the buffer pool
>> > forget about it, because at that point we know we're going to write
>> > the page back out anyway before evicting it.
>> >
>> > This would be pretty similar to copy-on-write, except without the
>> > copying.  It would just be forget-from-the-buffer-pool-on-write.
>>
>> But... either copy-on-write or forget-on-write needs a page fault, and
>> thus a page mapping.
>>
>> Is a page fault more expensive than copying 8k?
>>
>> (I really don't know).
>
> A page fault can be expensive, yes ... but perhaps you don't need one.
>
> What you want is a range of memory that's read from a file but treated
> as anonymous for writeout (i.e. written to swap if we need to reclaim
> it). Then at some time later, you want to designate it as written back
> to the file instead so you control the writeout order.  I'm not sure we
> can do this: the separation between file backed and anonymous pages is
> pretty deeply ingrained into the OS, but if it were possible, is that
> what you want?

Doesn't sound exactly like what I had in mind.  What I was suggesting
is an analogue of read() that, if it reads full pages of data to a
page-aligned address, shares the data with the buffer cache until it's
first written instead of actually copying the data.  The pages are
write-protected so that an attempt to write the address range causes a
page fault.  In response to such a fault, the pages become anonymous
memory and the buffer cache no longer holds a reference to the page.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: ALTER TABLE lock strength reduction patch is unsafe
Next
From: Thom Brown
Date:
Subject: Re: shared memory message queues