Re: WAL & SHM principles - Mailing list pgsql-hackers

From Martin Devera
Subject Re: WAL & SHM principles
Date
Msg-id Pine.LNX.4.10.10103071628380.18899-100000@luxik.cdi.cz
Whole thread Raw
In response to Re: WAL & SHM principles  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
> This was brought up a week ago, and I consider it an interesting idea. 
> The only problem is that we would no longer have control over which
> pages made it to disk.  The OS would perhaps write pages as we modified
> them.  Not sure how important that is.

Yes. As I work on linux kernel I know something about it. When page is
accessed the CPU sets one bit in PTE. The OS writes the page when it
needs page frame. It also tries to launder pages periodicaly but actual
alghoritm changes too often in recent kernels ;-)
Also page write is not atomic - several buffer heads are filled for the
page and asynchronously posted for write. Elevator then sort and coalesce
these buffers heads and create actual scsi/ide write requests. But there
is no guarantee that buffer heads from one page will be coalested to one
write request ...
You can call mlock (PageLock on Win32) to lock page in memory. You can
postpone write using it. It is ok under Win32 and many unices but under
linux only admin or one with CAP_MEMLOCK (not exact name) can mlock. 

> The good news is that most/all OS's are smart enought that if two
> processes mmap() the same file, they see each other's changes, so in a

yes, when using SHARED flag to mmap then IMHO it is mandatory for an OS

> sense it is shared memory, but a much larger, smarter pool of shared
> memory than what we have now.  We would still need buffer headers and
> stuff because we need to synchronize access to the buffers.

Also some smart algorithm which tries to mmap several pages in one
continuous block. You can mmap each page at its own but OSes stores mmap
informations per page range. You need to minimize number of such ranges.

devik



pgsql-hackers by date:

Previous
From: Karel Zak
Date:
Subject: Re: Contributions?
Next
From: Tom Lane
Date:
Subject: Re: Re: Uh, this is *not* a 64-bit CRC ...