Thread: Out of memory error on automatic vacuum
Hi all,
This morning I was checking postgres servers logs, looking for errors (we've recently upgraded them and changed default config) and long running queries when I found one of the servers had really big logs since yesterday. It was giving the error of this mail's subject: out of memory, failed on request of size XXX on automatic vacuum of table YYY. A quick search revealed me some postgresql-lists messages talking about work_mem and shared_buffers configuration options, some kernel config options too. Although all of them were messages from several years ago, I decided to cut my shared_buffers configured value and restart server: now it looks like error is gone. But I'd like to understand what's beyond the logged error (it's really long and refers to things about inner functionalities that I'm missing), how to detect what config options are possibly conflicting and, most important, I want to know if I've solved it right.
-- Logged error (one of them) ==> Original log is in spanish and I've translated myself. I've replaced my database objects names
TopMemoryContext: 77568 total in 5 blocks; 15612 free (5 chunks); 61956 used
TopTransactionContext: 8192 total in 1 blocks; 7604 free (40 chunks); 588 used
TOAST to main relid map: 24576 total in 2 blocks; 13608 free (3 chunks); 10968 used
AV worker: 8192 total in 1 blocks; 3228 free (6 chunks); 4964 used
Autovacuum Portal: 0 total in 0 blocks; 0 free (0 chunks); 0 used => Is this line the one that points to the task that has run of of memory?
Vacuum: 8192 total in 1 blocks; 8132 free (0 chunks); 60 used
Operator class cache: 8192 total in 1 blocks; 4420 free (0 chunks); 3772 used
smgr relation table: 8192 total in 1 blocks; 316 free (0 chunks); 7876 used
TransactionAbortContext: 32768 total in 1 blocks; 32748 free (0 chunks); 20 used
Portal hash: 8192 total in 1 blocks; 3460 free (0 chunks); 4732 used
PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 8192 total in 1 blocks; 1884 free (0 chunks); 6308 used
CacheMemoryContext: 516096 total in 6 blocks; 234136 free (5 chunks); 281960 used
<index>: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
<index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_index_indrelid_index: 1024 total in 1 blocks; 676 free (0 chunks); 348 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 604 free (0 chunks); 420 used
pg_db_role_setting_databaseid_rol_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_enum_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_foreign_server_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_statistic_relid_att_inh_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
pg_cast_source_target_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_language_name_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_transform_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_collation_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_amop_fam_strat_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 676 free (0 chunks); 348 used
pg_ts_template_tmplname_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_ts_config_map_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
pg_opclass_oid_index: 1024 total in 1 blocks; 676 free (0 chunks); 348 used
pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_event_trigger_evtname_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_ts_dict_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_event_trigger_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_conversion_default_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_enum_typid_label_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_ts_config_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_user_mapping_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
pg_foreign_table_relid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_type_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_constraint_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_ts_parser_prsname_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_ts_config_cfgname_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_ts_parser_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_operator_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_namespace_nspname_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_ts_template_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_amop_opr_fam_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
pg_default_acl_role_nsp_obj_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
pg_collation_name_enc_nsp_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
pg_range_rngtypid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_ts_dict_dictname_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_opfamily_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_class_oid_index: 1024 total in 1 blocks; 676 free (0 chunks); 348 used
pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
pg_transform_type_lang_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 604 free (0 chunks); 420 used
pg_proc_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_language_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_namespace_oid_index: 1024 total in 1 blocks; 676 free (0 chunks); 348 used
pg_amproc_fam_proc_index: 1024 total in 1 blocks; 436 free (0 chunks); 588 used
pg_foreign_server_name_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_conversion_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_user_mapping_user_server_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_authid_oid_index: 1024 total in 1 blocks; 676 free (0 chunks); 348 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_tablespace_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_shseclabel_object_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
pg_replication_origin_roname_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_database_datname_index: 1024 total in 1 blocks; 676 free (0 chunks); 348 used
pg_replication_origin_roiident_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
pg_database_oid_index: 1024 total in 1 blocks; 676 free (0 chunks); 348 used
pg_authid_rolname_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
WAL record construction: 49520 total in 2 blocks; 6876 free (0 chunks); 42644 used
PrivateRefCount: 8192 total in 1 blocks; 5516 free (0 chunks); 2676 used
MdSmgr: 8192 total in 1 blocks; 8004 free (0 chunks); 188 used
LOCALLOCK hash: 8192 total in 1 blocks; 2428 free (0 chunks); 5764 used
Timezones: 104028 total in 2 blocks; 5516 free (0 chunks); 98512 used
Postmaster: 8192 total in 1 blocks; 7796 free (8 chunks); 396 used
ident parser context: 0 total in 0 blocks; 0 free (0 chunks); 0 used
hba parser context: 7168 total in 3 blocks; 3148 free (1 chunks); 4020 used
ErrorContext: 8192 total in 1 blocks; 8172 free (4 chunks); 20 used
Grand total: 1009356 bytes in 130 blocks; 436888 free (72 chunks); 572468 used
[2019-11-17 09:27:37.425 CET] – 11899 ERROR: out of memory
[2019-11-17 09:27:37.425 CET] – 11899 DETAIL: Failed on request of size 700378218.
[2019-11-17 09:27:37.425 CET] – 11899 CONTEXT: automatic vacuum on table «my_db.public.my_table»
-- System specs:
Centos 6.5 i686.
12 GB RAM.
[root@myserver pg_log]# lscpu
Architecture: i686
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Vendor ID: GenuineIntel
[root@myserver pg_log]# psql postgres -c "select version();"
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.15 on i686-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 32-bit
(1 row)
-- Postgres configured values
[root@myserver data]# cat postgresql.conf | grep CHANGED
listen_addresses = '*' # what IP address(es) to listen on; - CHANGED
port = 5432 # (change requires restart) - CHANGED
shared_buffers = 1024MB # min 128kB - Default 128MB - CHANGED ==> This value was 2048 when I was getting out of memory error.
work_mem = 64MB # min 64kB - Default 4MB - CHANGED
maintenance_work_mem = 704MB # min 1MB - Default 64MB - CHANGED
effective_io_concurrency = 2 # 1-1000; 0 disables prefetching - Df 1 - CHANGED
wal_buffers = 16MB # min 32kB, -1 sets based on shared_buffers - Df -1 - CHANGED
max_wal_size = 1GB # Default 1GB - CHANGED
min_wal_size = 512MB # Default 80MB - CHANGED
checkpoint_completion_target = 0.8 # checkpoint target duration, 0.0 - 1.0 - Df 0.5 - CHANGED
effective_cache_size = 8448MB #Default 4GB - CHANGED
log_min_duration_statement = 30000 # -1 is disabled, 0 logs all statements - Df -1 - CHANGED
log_line_prefix = '[%m] – %p %q- %u@%h:%d – %a ' # special values: - CHANGED
I've now realized that work_mem is only 64MB, isn't it a bit low?
Please note that server has a 32bit OS installed (don't know why). I'm aware of some limitations with memory configuration on 32bit systems [1]
Summary: how to interprete the log message, and is the error controlled by the change made to shared_buffers value (from 2GB to 1GB).
Kind regards,
Ekaterina
[1] this post written by Robert Haas: http://rhaas.blogspot.com/2011/05/sharedbuffers-on-32-bit-systems.html
On Mon, Nov 18, 2019 at 12:41:27PM +0100, Ekaterina Amez wrote: >Hi all, > >This morning I was checking postgres servers logs, looking for errors >(we've recently upgraded them and changed default config) and long >running queries when I found one of the servers had really big logs >since yesterday. It was giving the error of this mail's subject: out >of memory, failed on request of size XXX on automatic vacuum of table >YYY. A quick search revealed me some postgresql-lists messages talking >about work_mem and shared_buffers configuration options, some kernel >config options too. Although all of them were messages from several >years ago, I decided to cut my shared_buffers configured value and >restart server: now it looks like error is gone. But I'd like to >understand what's beyond the logged error (it's really long and refers >to things about inner functionalities that I'm missing), how to detect >what config options are possibly conflicting and, most important, I >want to know if I've solved it right. > Unfortunately that's hard to say, without further data. The "out of memory" errors simply mean we called malloc() and it returned NULL, because the kernel was unable to provide the memory. There probably were other processes using all the available RAM (the limit depends on various config values, e.g. overcommit). What were these processes doing we don't know :-( For example, there might be multiple complex queries, allocating several work_mem each, using quite a bit of memory. Or there might be a runaway query doing HashAgg allocating much more memory than predicted. Or maybe there was running a completely separate job (say, backup) allocating a lot of memory or dirtying data in page cache. There are countless options what might have happened. The memory context stats are nice, but it's just a snapshot from one particular process, and it does not seem very interesting (the total is just ~1MB, so nothing extreme). We still don't know what else was running. Lowering shared_buffers certainly does reduce the memory pressure in general, i.e. there is 1GB of work for use by processes. It may be sufficient, hard to guess. I don't know if work_mem 64MB is too low, becuase it depends on what queries you're running etc. But you probably don't wat to increase that, as it allows processes to use more memory when executing queries, i.e. it increases memory pressure and makes OOM more likely. So you need to watch system monitoring, see how much memory is being used (excluding page cache) and consider reducing work_mem and/or max_connections if it's too close. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Nov 18, 2019 at 12:41:27PM +0100, Ekaterina Amez wrote:This (kernel unable to provide memory) was because no more RAM was available to allocate? It was because PG process did not have more memory assigned ready to use? Or is something unknown because it depends on the situations where the error is thrown?Hi all,
This morning I was checking postgres servers logs, looking for errors (we've recently upgraded them and changed default config) and long running queries when I found one of the servers had really big logs since yesterday. It was giving the error of this mail's subject: out of memory, failed on request of size XXX on automatic vacuum of table YYY. A quick search revealed me some postgresql-lists messages talking about work_mem and shared_buffers configuration options, some kernel config options too. Although all of them were messages from several years ago, I decided to cut my shared_buffers configured value and restart server: now it looks like error is gone. But I'd like to understand what's beyond the logged error (it's really long and refers to things about inner functionalities that I'm missing), how to detect what config options are possibly conflicting and, most important, I want to know if I've solved it right.
Unfortunately that's hard to say, without further data. The "out of
memory" errors simply mean we called malloc() and it returned NULL,
because the kernel was unable to provide the memory.
There probably were other processes using all the available RAM (the
limit depends on various config values, e.g. overcommit). What were
these processes doing we don't know :-(
For example, there might be multiple complex queries, allocating
several work_mem each, using quite a bit of memory. Or there might be a
runaway query doing HashAgg allocating much more memory than predicted.
Or maybe there was running a completely separate job (say, backup)
allocating a lot of memory or dirtying data in page cache.
I've looked at cron and I've seen a scheduled process that finished a bit before the error began to log (o couple of minutes or so). Errors began on Sunday morning and this machine doesn't have much workload on work days, and less on weekend. I'll keep an eye on this log and if the problem appears again I'll try to track database activity and machine activity.
There are countless options what might have happened. The memory context
stats are nice, but it's just a snapshot from one particular process,
and it does not seem very interesting (the total is just ~1MB, so
nothing extreme). We still don't know what else was running.
When you talk about ~1MB are you getting this size from log lines like this?
<index>: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
Lowering shared_buffers certainly does reduce the memory pressure in
general, i.e. there is 1GB of work for use by processes. It may be
sufficient, hard to guess.
I don't know if work_mem 64MB is too low, becuase it depends on what
queries you're running etc. But you probably don't wat to increase that,
as it allows processes to use more memory when executing queries, i.e.
it increases memory pressure and makes OOM more likely.
So you need to watch system monitoring, see how much memory is being
used (excluding page cache) and consider reducing work_mem and/or
max_connections if it's too close.
I'll do, thanks for your suggestions.
regards
Regards,
Ekaterina
On Mon, Nov 18, 2019 at 03:02:16PM +0100, Ekaterina Amez wrote: > >El 18/11/19 a las 13:25, Tomas Vondra escribió: >>On Mon, Nov 18, 2019 at 12:41:27PM +0100, Ekaterina Amez wrote: >>>Hi all, >>> >>>This morning I was checking postgres servers logs, looking for >>>errors (we've recently upgraded them and changed default config) >>>and long running queries when I found one of the servers had >>>really big logs since yesterday. It was giving the error of this >>>mail's subject: out of memory, failed on request of size XXX on >>>automatic vacuum of table YYY. A quick search revealed me some >>>postgresql-lists messages talking about work_mem and >>>shared_buffers configuration options, some kernel config options >>>too. Although all of them were messages from several years ago, I >>>decided to cut my shared_buffers configured value and restart >>>server: now it looks like error is gone. But I'd like to >>>understand what's beyond the logged error (it's really long and >>>refers to things about inner functionalities that I'm missing), >>>how to detect what config options are possibly conflicting and, >>>most important, I want to know if I've solved it right. >>> >> >>Unfortunately that's hard to say, without further data. The "out of >>memory" errors simply mean we called malloc() and it returned NULL, >>because the kernel was unable to provide the memory. >This (kernel unable to provide memory) was because no more RAM was >available to allocate? It was because PG process did not have more >memory assigned ready to use? Or is something unknown because it >depends on the situations where the error is thrown? Not sure I understand. Whenever PostgreSQL process needs memory it requests it from the kernel by calling malloc(), and the amount of availabe RAM is limited. So when kernel can't provide more memory, it returns NULL. >> >>There probably were other processes using all the available RAM (the >>limit depends on various config values, e.g. overcommit). What were >>these processes doing we don't know :-( >> >>For example, there might be multiple complex queries, allocating >>several work_mem each, using quite a bit of memory. Or there might be a >>runaway query doing HashAgg allocating much more memory than predicted. >>Or maybe there was running a completely separate job (say, backup) >>allocating a lot of memory or dirtying data in page cache. > >I've looked at cron and I've seen a scheduled process that finished a >bit before the error began to log (o couple of minutes or so). Errors >began on Sunday morning and this machine doesn't have much workload on >work days, and less on weekend. I'll keep an eye on this log and if >the problem appears again I'll try to track database activity and >machine activity. > If it finished a couple of minutes before, it's unlikely to be the related. But hard to say, without knowing the details. >> >>There are countless options what might have happened. The memory context >>stats are nice, but it's just a snapshot from one particular process, >>and it does not seem very interesting (the total is just ~1MB, so >>nothing extreme). We still don't know what else was running. > >When you talk about ~1MB are you getting this size from log lines like this? > ><index>: *1024* total in 1 blocks; 476 free (0 chunks); 548 used > No, that's just one of the memory contexts (they form a tree), using only 1kB of memory. What matters is the "grand total" Grand total: 1009356 bytes in 130 blocks; 436888 free (72 chunks); 572468 used which is ~1MB. > >> >>Lowering shared_buffers certainly does reduce the memory pressure in >>general, i.e. there is 1GB of work for use by processes. It may be >>sufficient, hard to guess. >> >>I don't know if work_mem 64MB is too low, becuase it depends on what >>queries you're running etc. But you probably don't wat to increase that, >>as it allows processes to use more memory when executing queries, i.e. >>it increases memory pressure and makes OOM more likely. >> >>So you need to watch system monitoring, see how much memory is being >>used (excluding page cache) and consider reducing work_mem and/or >>max_connections if it's too close. > >I'll do, thanks for your suggestions. > Another thing you might do is adding a swap (if you don't have one already), as a safety. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
El 18/11/19 a las 15:16, Tomas Vondra escribió: > > Not sure I understand. Whenever PostgreSQL process needs memory it > requests it from the kernel by calling malloc(), and the amount of > availabe RAM is limited. So when kernel can't provide more memory, > it returns NULL. > Understood. > > If it finished a couple of minutes before, it's unlikely to be the > related. But hard to say, without knowing the details. Yeah, I thought the same but for me is too much coincidence and is suspicious (or at least a thing to have in mind). > No, that's just one of the memory contexts (they form a tree), using > only 1kB of memory. What matters is the "grand total" > > Grand total: 1009356 bytes in 130 blocks; 436888 free (72 chunks); > 572468 used > > which is ~1MB. > OK, in my lack of knowledge I was understanding "memory context" as the whole message. > Another thing you might do is adding a swap (if you don't have one > already), as a safety. > > Excuse my ignorance but... swap? You mean some mechanism that prevents server to be unavailable by having a second instance running but not accesible and changing from one to the other when the main fails? (It's the best way I find to describe it, as I don't usually speak/write english). Thanks for your time.
On Mon, Nov 18, 2019 at 03:46:03PM +0100, Ekaterina Amez wrote: > >El 18/11/19 a las 15:16, Tomas Vondra escribió: >> >>Not sure I understand. Whenever PostgreSQL process needs memory it >>requests it from the kernel by calling malloc(), and the amount of >>availabe RAM is limited. So when kernel can't provide more memory, >>it returns NULL. >> >Understood. > >> >>If it finished a couple of minutes before, it's unlikely to be the >>related. But hard to say, without knowing the details. > >Yeah, I thought the same but for me is too much coincidence and is >suspicious (or at least a thing to have in mind). > > >>No, that's just one of the memory contexts (they form a tree), using >>only 1kB of memory. What matters is the "grand total" >> >>Grand total: 1009356 bytes in 130 blocks; 436888 free (72 chunks); >>572468 used >> >>which is ~1MB. >> >OK, in my lack of knowledge I was understanding "memory context" as >the whole message. > > >>Another thing you might do is adding a swap (if you don't have one >>already), as a safety. >> >> >Excuse my ignorance but... swap? You mean some mechanism that prevents >server to be unavailable by having a second instance running but not >accesible and changing from one to the other when the main fails? >(It's the best way I find to describe it, as I don't usually >speak/write english). > swap = space on disk, so that OS can page out unused data from RAM when there's memory pressure It's a basic sysadmin knowledge, I think. Search for mkswap. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services