Hello,
maybe I missed something, but in last days I was thinking how would I
write my own sql server. I got several ideas and because these are not
used in PG they are probably bad - but I can't figure why.
1) WAL
We have buffer manager, ok. So why not to use WAL as part of it and don't
log INSERT/UPDATE/DELETE xlog records but directly changes into buffer
pages ? When someone dirties page it has to inform bmgr about dirty region
and bmgr would formulate xlog record. The record could be for example
fixed bitmap where each bit corresponds to part of page (of size
pgsize/no-of-bits) which was changed. These changed regions follows.
Multiple writes (by multiple backends) can be coalesced together as long
as their transactions overlaps and there is enough memory to keep changed
buffer pages in memory.
Pros: upper layers can think thet buffers are always safe/logged and thereis no special handling for indices; very
simple/fastredo
Cons: can't implement undo - but in non-overwriting is not needed (?)
2) SHM vs. MMAP
Why don't use mmap to share pages (instead of shm) ? There would be no
problem with tuning pg's buffer cache size - it is balanced by OS.
When using SHM there are often two copies of page: one in OS' page cache
and one in SHM (vaste of memory).
When using mmap the data goes (almost) directly from HDD into your memory
page - now you need to copy it from OS' page to PG's page.
There is one problem: how to assure that dirtied page is not flushed
before its xlog. One can use mlock but you often need root privileges to
use it. Another way is to implement own COW (copy on write) to create
intermediate buffers used only until xlog is flushed.
Are there considerations correct ?
regards, devik