I have a linux postgres server in the field. Its version is:
PostgreSQL 8.2.4 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-51)
(aka postgresql-8.2.4-1PGDG)
A few days ago, its log started showing this:
May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR: out of memory
May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL: Failed on request of size 16777212.
May 31 03:02:40 sfmelwss postgres[31490]: [1-1] ERROR: out of memory
May 31 03:02:40 sfmelwss postgres[31490]: [1-2] DETAIL: Failed on request of size 16777212.
May 31 03:05:40 sfmelwss postgres[31913]: [1-1] ERROR: out of memory
May 31 03:05:40 sfmelwss postgres[31913]: [1-2] DETAIL: Failed on request of size 16777212.
That seems pretty self-explainitory. But I'm not so sure, because SAR
says:
02:30:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
02:40:01 AM 13332 1003316 98.69 130448 198188 1034572 13996 1.33 32
02:50:01 AM 17116 999532 98.32 128708 196384 1034596 13972 1.33 44
03:00:01 AM 16372 1000276 98.39 129128 196388 1034596 13972 1.33 44
03:10:01 AM 17220 999428 98.31 128268 196828 1034736 13832 1.32 132
03:20:01 AM 14416 1002232 98.58 130464 197348 1035224 13344 1.27 152
03:30:01 AM 16292 1000356 98.40 127604 196684 1035700 12868 1.23 168
...which indicates there was still plenty of space left in swap. Now, I
realize I don't want to be actually using my swap, but I'm wondering if
the out of memory messages are a red herring. Should I be looking at
something else, like the number of processes, open files, or shared memory
segments?
FWIW, I have disabled the OOM killer (but not, as I understand it, my
swap space) by setting:
vm.overcommit_memory = 2
vm.overcommit_ratio = 100