"Glen Parker" <glenebob@nwlink.com> writes:
> But I'm curious why we *can't* do 40GB on a 64 bit machine?
Well, the "can't" part is simple: our code to calculate the size of our
shared-memory request uses int32 arithmetic. So we definitely can't go
higher than 4GB shared memory, and probably not higher than 2GB.
This could be fixed if anyone was sufficiently motivated, although you'd
have to be careful about maintaining portability to machines that don't
think size_t is 8 bytes, or don't have any 8-byte integer type at all.
However, given the complete lack of evidence that it's useful to boost
shared_buffers above the few-hundred-meg range, I can't see anyone
spending time on doing it...
The more interesting part of this discussion is why there isn't any such
evidence. You can find this question beat to death in the PG list
archives, but a quick sketch is:
1. PG's algorithms for managing its buffers are not obviously better
than those commonly found in kernels. In particular there are several
situations in which PG does linear scans of all available buffers.
This is OK for NBuffers counts up to some thousands, but we would need
some serious work to make NBuffers in the millions work well.
2. The whole argument for buffering disk pages in memory is very
dependent on the assumption that your buffers are actually in memory.
However on most Unixen there is *no* guarantee that the shared memory
we request from the kernel will not get swapped out --- and in fact the
larger the shmem request we make, the more likely this will happen.
A disk buffer that get swapped to swap space is completely
counterproductive, as it's costing you at least double the I/O work
compared to just re-reading the original file. So in practice it's
better to keep the shared-buffer arena small enough that all of it is
"hot" (heavily used) and not likely to get seen as a swap candidate by
the kernel's VM manager.
3. A large fixed-size shared-buffer arena is the worst of all possible
worlds in terms of dynamic memory management. The real-world situation
is that RAM has to be shared among PG shared buffers, private memory of
PG backend processes, and (usually) workspace of other non-Postgres
processes that are running on the server machine. The kernel is in a
far better position than we are to evaluate these competing demands and
make the right adjustments to changing situations. The kernel can
easily drop cached disk pages from its buffers to allocate more RAM to
process workspace, or the reverse when process demands drop; but we
can't change the size of the shared-buffer arena on the fly.
Bottom line: let the kernel manage as much memory as possible.
regards, tom lane