Re: Out of memory error on automatic vacuum - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: Out of memory error on automatic vacuum
Date
Msg-id 20191118141605.4mmynff64c5prxgf@development
Whole thread Raw
In response to Re: Out of memory error on automatic vacuum  (Ekaterina Amez <ekaterina.amez@zunibal.com>)
Responses Re: Out of memory error on automatic vacuum  (Ekaterina Amez <ekaterina.amez@zunibal.com>)
List pgsql-performance
On Mon, Nov 18, 2019 at 03:02:16PM +0100, Ekaterina Amez wrote:
>
>El 18/11/19 a las 13:25, Tomas Vondra escribió:
>>On Mon, Nov 18, 2019 at 12:41:27PM +0100, Ekaterina Amez wrote:
>>>Hi all,
>>>
>>>This morning I was checking postgres servers logs, looking for 
>>>errors (we've recently upgraded them and changed default config) 
>>>and long running queries when I found one of the servers had 
>>>really big logs since yesterday.  It was giving the error of this 
>>>mail's subject: out of memory, failed on request of size XXX on 
>>>automatic vacuum of table YYY. A quick search revealed me some 
>>>postgresql-lists messages talking about work_mem and 
>>>shared_buffers configuration options, some kernel config options 
>>>too. Although all of them were messages from several years ago, I 
>>>decided to cut my shared_buffers configured value and restart 
>>>server: now it looks like error is gone. But I'd like to 
>>>understand what's beyond the logged error (it's really long and 
>>>refers to things about inner functionalities that I'm missing), 
>>>how to detect what config options are possibly conflicting and, 
>>>most important, I want to know if I've solved it right.
>>>
>>
>>Unfortunately that's hard to say, without further data. The "out of
>>memory" errors simply mean we called malloc() and it returned NULL,
>>because the kernel was unable to provide the memory.
>This (kernel unable to provide memory) was because no more RAM was 
>available to allocate? It was because PG process did not have more 
>memory assigned ready to use? Or is something unknown because it 
>depends on the situations where the error is thrown?

Not sure I understand. Whenever PostgreSQL process needs memory it
requests it from the kernel by calling malloc(), and the amount of
availabe RAM is limited.  So when kernel can't provide more memory,
it returns NULL.

>>
>>There probably were other processes using all the available RAM (the
>>limit depends on various config values, e.g. overcommit). What were
>>these processes doing we don't know :-(
>>
>>For example, there might be multiple complex queries, allocating
>>several work_mem each, using quite a bit of memory. Or there might be a
>>runaway query doing HashAgg allocating much more memory than predicted.
>>Or maybe there was running a completely separate job (say, backup)
>>allocating a lot of memory or dirtying data in page cache.
>
>I've looked at cron and I've seen a scheduled process that finished a 
>bit before the error began to log (o couple of minutes or so). Errors 
>began on Sunday morning and this machine doesn't have much workload on 
>work days, and less on weekend. I'll keep an eye on this log and if 
>the problem appears again I'll try to track database activity and 
>machine activity.
>

If it finished a couple of minutes before, it's unlikely to be the
related. But hard to say, without knowing the details.

>>
>>There are countless options what might have happened. The memory context
>>stats are nice, but it's just a snapshot from one particular process,
>>and it does not seem very interesting (the total is just ~1MB, so
>>nothing extreme). We still don't know what else was running.
>
>When you talk about ~1MB are you getting this size from log lines like this?
>
><index>: *1024* total in 1 blocks; 476 free (0 chunks); 548 used
>

No, that's just one of the memory contexts (they form a tree), using
only 1kB of memory. What matters is the "grand total"

Grand total: 1009356 bytes in 130 blocks; 436888 free (72 chunks);
572468 used

which is ~1MB.

>
>>
>>Lowering shared_buffers certainly does reduce the memory pressure in
>>general, i.e. there is 1GB of work for use by processes. It may be
>>sufficient, hard to guess.
>>
>>I don't know if work_mem 64MB is too low, becuase it depends on what
>>queries you're running etc. But you probably don't wat to increase that,
>>as it allows processes to use more memory when executing queries, i.e.
>>it increases memory pressure and makes OOM more likely.
>>
>>So you need to watch system monitoring, see how much memory is being
>>used (excluding page cache) and consider reducing work_mem and/or
>>max_connections if it's too close.
>
>I'll do, thanks for your suggestions.
>

Another thing you might do is adding a swap (if you don't have one
already), as a safety.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-performance by date:

Previous
From: Ekaterina Amez
Date:
Subject: Re: Out of memory error on automatic vacuum
Next
From: Ekaterina Amez
Date:
Subject: Re: Out of memory error on automatic vacuum