Providing anonymous mmap as an option of sharing memory - Mailing list pgsql-hackers

From Shridhar Daithankar
Subject Providing anonymous mmap as an option of sharing memory
Date
Msg-id 3FC35F0D.6050302@myrealbox.com
Whole thread Raw
Responses Re: Providing anonymous mmap as an option of sharing memory
List pgsql-hackers
Hello All,

I was looking thr. the source and thought it would be worth to seek opinion on 
this proposal.
From what I understood so far, the core shared memory handling is done in 
pgsql/src/backend/port/sysv_shmem.c. It is linked by configure as per the 
runtime environment.

So I need to write another source code file which exports same APIs as 
above(i.e. all non static functions in that file) but using mmap and that would 
do it for using anon mmap instead of sysV shared memory.

It might seem unnecessary to provide mmap based shared memory. but this is just 
one step I was thinking of.

In pgsql/src/backend/storage/ipc/shmem.c, all the shared memory allocations are 
done. I was thinking of creating a structure of all global variables in that 
file. The global variables would still be in place so that existing code would 
not break. But the structure would hold database specific buffering information. 
Let's call that structure database context.

That way we can assign different mmaped(anon, of course) regions per database. 
In the backend, we could just switch the database contexts i.e. assign global 
variables from the database context and let the backend write to appropriate 
shared memory region. Every database would need at least two shared memory 
regions. One for operating on it's own buffers and another for system where it 
could write to shared catalogs etc. It can close the shared memory region 
belonging to other databases on startup.

Of course, buffer management alone would not cover database contexts altogether. 
WAL need to be lumped in as well(Not necessarily though. If all WAL buffering go 
thr. system shared region, everything will still work). I don't know if clog and 
data file handling is affected by this. If WAL goes in database context, we can 
probably provide per database WAL which could go well with tablespaces as well.

In case of WAL per database, the operations done on a shared catalog from a 
backend would need flushing system WAL and database WAL to ensure such 
transaction commit. Otherwise only flushing database WAL would do.

This way we can provided a background writer process per database, a common 
buffer per database minimising impact of cross database load significantly. e.g. 
vacuum full on one database would not hog another database due to buffer cache 
pollution. (IO can still saturate though.) This way we can push hardware to 
limit which might not possible right now in some cases.

I was looking for the reason large number of buffers degrades the performance 
and the source code browsing spiralled in this thought. So far I haven't figured 
out any reason why large numebr of buffers can degrade the performance. Still 
looking for it.

Comments?
 Shridhar



pgsql-hackers by date:

Previous
From: Brian Hirt
Date:
Subject: fairly serious bug with pg_autovacuum in pg7.4
Next
From: Tom Lane
Date:
Subject: Re: A rough roadmap for internationalization fixes