Re: WAL & SHM principles - Mailing list pgsql-hackers

From Martin Devera
Subject Re: WAL & SHM principles
Date
Msg-id Pine.LNX.4.10.10103091513430.12401-100000@luxik.cdi.cz
Whole thread Raw
In response to Re: WAL & SHM principles  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: WAL & SHM principles  (Giles Lean <giles@nemeton.com.au>)
List pgsql-hackers
> > BTW, what means "bummer" ?
> 
> Sorry, it means, "Oh, I am disappointed."

thanks :)

> > But for many OSes you CAN control when to write data - you can mlock
> > individual pages.
> 
> mlock() controls locking in physical memory.  I don't see it controling
> write().

When you mmap, you don't use write() !
mlock actualy locks page in memory and as long as the page is locked
the OS doesn't attempt to store the dirty page. It is intended also
for security app to ensure that sensitive data are not written to unsecure
storage (hdd). It is definition of mlock so that you can be probably sure
with it.

There is way to do it without mlock (fallback):
You definitely need some kind of page headers. The header should has info
whether the page can be mmaped or is in "dirty pool". Pages in dirty pool
are pages which are dirty but not written yet and are waiting to
appropriate log record to be flushed. When log is flushed the data at
dirty pool can be copied to its regular mmap location and discarded.

If dirty pool is too large, simply sync log and whole pool can be
discarded.

mmap version could be faster when loading data from hdd and will result in
better utilization of memory (because you are directly working with data
at OS' page-cache instead of having duplicates in pg's buffer cache).
Also page cache expiration is handled by OS and it will allow pg to use as
much memory as is available (no need to specify buffer page size).

devik



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Internationalized error messages
Next
From: Martin Devera
Date:
Subject: RE: WAL & SHM principles