Re: Re: [PATCHES] A patch for xlog.c - Mailing list pgsql-hackers

From ncm@zembu.com (Nathan Myers)
Subject Re: Re: [PATCHES] A patch for xlog.c
Date
Msg-id 20010226002125.A2430@store.zembu.com
Whole thread Raw
In response to Re: [PATCHES] A patch for xlog.c  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Re: [PATCHES] A patch for xlog.c  (The Hermit Hacker <scrappy@hub.org>)
Re: Re: [PATCHES] A patch for xlog.c  (Bruce Momjian <pgman@candle.pha.pa.us>)
Re: Re: [PATCHES] A patch for xlog.c  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, Feb 25, 2001 at 11:28:46PM -0500, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > It allows no backing store on disk.  

I.e. it allows you to map memory without an associated inode; the memory
may still be swapped.  Of course, there is no problem with mapping an 
inode too, so that unrelated processes can join in.  Solarix has a flag
to pin the shared pages in RAM so they can't be swapped out.

> > It is the BSD solution to SysV
> > share memory.  Here are all the BSDi flags:
> 
> >      MAP_ANON    Map anonymous memory not associated with any specific
> >                  file.  The file descriptor used for creating MAP_ANON
> >                  must be -1.  The offset parameter is ignored.
> 
> Hmm.  Now that I read down to the "nonstandard extensions" part of the
> HPUX man page for mmap(), I find
> 
>      If MAP_ANONYMOUS is set in flags:
> 
>           o    A new memory region is created and initialized to all zeros.
>                This memory region can be shared only with descendants of
>                the current process.

This is supported on Linux and BSD, but not on Solarix 7.  It's not 
necessary; you can just map /dev/zero on SysV systems that don't 
have MAP_ANON.

> While I've said before that I don't think it's really necessary for
> processes that aren't children of the postmaster to access the shared
> memory, I'm not sure that I want to go over to a mechanism that makes it
> *impossible* for that to be done.  Especially not if the only motivation
> is to avoid having to configure the kernel's shared memory settings.

There are enormous advantages to avoiding the need to configure kernel 
settings.  It makes PG a better citizen.  PG is much easier to drop in 
and use if you don't need attention from the IT department.

But I don't know of any reason to avoid mapping an actual inode,
so using mmap doesn't necessarily mean giving up sharing among
unrelated processes.

> Besides, what makes you think there's not a limit on the size of shmem
> allocatable via mmap()?

I've never seen any mmap limit documented.  Since mmap() is how 
everybody implements shared libraries, such a limit would be equivalent 
to a limit on how much/many shared libraries are used.  mmap() with 
MAP_ANONYMOUS (or its SysV /dev/zero equivalent) is a common, modern 
way to get raw storage for malloc(), so such a limit would be a limit
on malloc() too.

The mmap architecture comes to us from the Mach microkernel memory
manager, backported into BSD and then copied widely.  Since it was
the fundamental mechanism for all memory operations in Mach, arbitrary
limits would make no sense.  That it worked so well is the reason it 
was copied everywhere else, so adding arbitrary limits while copying 
it would be silly.  I don't think we'll see any systems like that.

Nathan Myers
ncm@zembu.com


pgsql-hackers by date:

Previous
From: Dominique Quatravaux
Date:
Subject: Re: CommitDelay performance improvement
Next
From: Zeugswetter Andreas SB
Date:
Subject: http access to ftp.postgresql.org files