Here is an updated version of the earlier work.
This version:
1) Tracks memory as requested by the backend.
2) Includes allocations made during program startup.
3) Optimizes the "fast path" to only update two local variables.
4) Places a cluster wide limit on total memory allocated.
The cluster wide limit is useful for multi-hosting. One greedy cluster doesn't starve
the other clusters of memory.
Note there isn't a good way to track actual memory used by a cluster.
Ideally, we like to get the working set size of each memory segment along with
the size of the associated kernel data structures.
Gathering that info in a portable way is a "can of worms".
Instead, we're managing memory as requested by the application.
While not identical, the two approaches are strongly correlated.
The memory model used is
1) Each process is assumed to use a certain amount of memory
simply by existing.
2) All pg memory allocations are counted, including those before
the process is fully initialized.
3) Each process maintains its own local counters. These are the "truth".
4) Periodically,
- local counters are added into the global, shared memory counters.
- pgstats is updated
- total memory is checked.
For efficiency, the global total is an approximation, not a precise number.
It can be off by as much as 1 MB per process. Memory limiting
doesn't need precision, just a consistent and reasonable approximation.
Repeating the earlier benchmark test, there is no measurable loss of performance.