Thread: Autovacuum running out of memory

Autovacuum running out of memory

From
Alexis Lê-Quôc
Date:
Greetings,

I've been hitting a "out of memory error" during autovacuum of
relatively large tables (compared to the amount of RAM available). I'm
trying to trace the cause of the issue; the answer is somewhere below
and I don't know how to interpret the data. I can solve the issue
right now by using more memory but that won't help me understand how
to interpret the data.
I've pasted data that I thought relevant but feel free to direct me
(I've read http://wiki.postgresql.org/wiki/Guide_to_reporting_problems).
I've searched through older answers to the same question back in 2007
but that did not really get me anywhere.

The error message is:
[10236]: [1-1] user=,db=,remote= ERROR:  out of memory
[10236]: [2-1] user=,db=,remote= DETAIL:  Failed on request of size 395973594.
[10236]: [3-1] user=,db=,remote= CONTEXT:  automatic vacuum of table
"***.public.serialized_series"

I can recreate the error by running vacuum verbose serialized_series
straight from psql on the box itself.

Thanks!

Now on to the data themselves.

--- table sizes ---

SELECT
 nspname,
relname,
 pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema') and relname
in ('serialized_series', 'pg_toast_16573');
nspname  |      relname      |  size
----------+-------------------+---------
pg_toast | pg_toast_16573    | 2200 MB
public   | serialized_series | 1772 MB

SELECT
  nspname,
  relname,
  relkind as "type",
  pg_size_pretty(pg_table_size(C.oid)) AS size,
  pg_size_pretty(pg_indexes_size(C.oid)) AS idxsize,
  pg_size_pretty(pg_total_relation_size(C.oid)) as "total"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND
  nspname !~ '^pg_toast' AND
  relkind IN ('r','i') AND
  relname IN ('serialized_series');
 nspname |      relname      | type |  size   | idxsize |  total
---------+-------------------+------+---------+---------+---------
 public  | serialized_series | r    | 4008 MB | 844 MB  | 4853 MB

--- table structure ---
serialized_series has a blob field (that ends up in toast_16573) and a
bunch of integer ids and timestamps.

--- postgresql.conf (subset) ----
max_connections = 200
shared_buffers = 1971421kB
temp_buffers = 8MB
work_mem = 9857kB
maintenance_work_mem = 752MB
max_files_per_process = 1000
effective_cache_size = 3942842kB
autovacuum = on
log_autovacuum_min_duration = -1
autovacuum_max_workers = 3
autovacuum_naptime = 1min
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.1
autovacuum_freeze_max_age = 200000000
autovacuum_vacuum_cost_delay = 20
autovacuum_vacuum_cost_limit = -1

--- os ---
Linux ******* 2.6.32-314-ec2 #27-Ubuntu SMP Wed Mar 2 22:53:38 UTC
2011 x86_64 GNU/Linux

            total       used       free     shared    buffers     cached
Mem:          7700       5521       2179          0         20       5049
-/+ buffers/cache:        451       7249
Swap:            0          0          0

--- versions ---
select version();
                                                   version
-------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.0.4 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real
(Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit

ii  postgresql                        9.0.4-1~lucid1
 object-relational SQL database (supported ve
ii  postgresql-9.0                    9.0.4-1~lucid1
 object-relational SQL database, version 9.0
ii  postgresql-client                 9.0.4-1~lucid1
 front-end programs for PostgreSQL (supported
ii  postgresql-client-9.0             9.0.4-1~lucid1
 front-end programs for PostgreSQL 9.0
ii  postgresql-client-common          119~lucid1
 manager for multiple PostgreSQL client versi
ii  postgresql-common                 119~lucid1
 PostgreSQL database-cluster manager
ii  postgresql-contrib                9.0.4-1~lucid1
 additional facilities for PostgreSQL (suppor
ii  postgresql-contrib-9.0            9.0.4-1~lucid1
 additional facilities for PostgreSQL

--- kernel params ---
kernel.shmmax = 8074940416
kernel.shmall = 1971421
kernel.shmmni = 4096

--- ulimit for postgres user ---
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 20
file size               (blocks, -f) unlimited
pending signals                 (-i) 16382
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


--- ipcs ---
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x0052e6a9 1441792    postgres   600        2087182336 17

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0x0052e6a9 18743296   postgres   600        17
0x0052e6aa 18776065   postgres   600        17
0x0052e6ab 18808834   postgres   600        17
0x0052e6ac 18841603   postgres   600        17
0x0052e6ad 18874372   postgres   600        17
0x0052e6ae 18907141   postgres   600        17
0x0052e6af 18939910   postgres   600        17
0x0052e6b0 18972679   postgres   600        17
0x0052e6b1 19005448   postgres   600        17
0x0052e6b2 19038217   postgres   600        17
0x0052e6b3 19070986   postgres   600        17
0x0052e6b4 19103755   postgres   600        17
0x0052e6b5 19136524   postgres   600        17


--- postgresql log ---
TopMemoryContext: 89936 total in 10 blocks; 8576 free (8 chunks); 81360 used
 TopTransactionContext: 24576 total in 2 blocks; 22448 free (26
chunks); 2128 used
 TOAST to main relid map: 24576 total in 2 blocks; 15984 free (5
chunks); 8592 used
 AV worker: 24576 total in 2 blocks; 19832 free (8 chunks); 4744 used
   Autovacuum Portal: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
     Vacuum: 8192 total in 1 blocks; 8080 free (0 chunks); 112 used
 Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
 smgr relation table: 24576 total in 2 blocks; 13920 free (4 chunks); 10656 used
 TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0
chunks); 32 used
 Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
 PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used
 Relcache by OID: 24576 total in 2 blocks; 13872 free (3 chunks); 10704 used
 CacheMemoryContext: 817840 total in 20 blocks; 167624 free (2
chunks); 650216 used
   serialized_series_dim_context_key_dim_metric_key_resolution_idx:
2048 total in 1 blocks; 128 free (0 chunks); 1920 used
   serialized_series_rolledup_resolution_end_date_start_date_idx: 2048
total in 1 blocks; 440 free (0 chunks); 1608 used
   ix_serialized_series_end_date: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
   ix_serialized_series_start_date: 2048 total in 1 blocks; 752 free
(0 chunks); 1296 used
   ix_serialized_series_dim_metric_key: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
   ix_serialized_series_source_id: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
   ix_serialized_series_dim_context_key: 2048 total in 1 blocks; 752
free (0 chunks); 1296 used
   ix_serialized_series_rolledup: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
   serialized_series_pkey: 2048 total in 1 blocks; 752 free (0
chunks); 1296 used
   pg_index_indrelid_index: 2048 total in 1 blocks; 704 free (0
chunks); 1344 used
   pg_attrdef_adrelid_adnum_index: 2048 total in 1 blocks; 608 free (0
chunks); 1440 used
   pg_db_role_setting_databaseid_rol_index: 2048 total in 1 blocks;
656 free (0 chunks); 1392 used
   pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1496 free (4
chunks); 1576 used
   pg_foreign_data_wrapper_name_index: 3072 total in 2 blocks; 1744
free (3 chunks); 1328 used
   pg_enum_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
   pg_class_relname_nsp_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
   pg_foreign_server_oid_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
   pg_statistic_relid_att_inh_index: 3072 total in 2 blocks; 1496 free
(4 chunks); 1576 used
   pg_cast_source_target_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
   pg_language_name_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
   pg_amop_fam_strat_index: 3072 total in 2 blocks; 1384 free (2
chunks); 1688 used
   pg_index_indexrelid_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
   pg_ts_template_tmplname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
   pg_ts_config_map_index: 3072 total in 2 blocks; 1496 free (4
chunks); 1576 used
   pg_opclass_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
   pg_foreign_data_wrapper_oid_index: 3072 total in 2 blocks; 1744
free (3 chunks); 1328 used
   pg_ts_dict_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
   pg_conversion_default_index: 3072 total in 2 blocks; 1432 free (3
chunks); 1640 used
   pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1432 free
(3 chunks); 1640 used
   pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1600 free
(2 chunks); 1472 used
   pg_enum_typid_label_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
   pg_ts_config_oid_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
   pg_user_mapping_oid_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
   pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1496 free (4
chunks); 1576 used
   pg_type_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
   pg_aggregate_fnoid_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
   pg_constraint_oid_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
   pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
   pg_ts_parser_prsname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
   pg_ts_config_cfgname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
   pg_ts_parser_oid_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
   pg_operator_oid_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
   pg_namespace_nspname_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
   pg_ts_template_oid_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
   pg_amop_opr_fam_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
   pg_default_acl_role_nsp_obj_index: 3072 total in 2 blocks; 1496
free (4 chunks); 1576 used
   pg_ts_dict_dictname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
   pg_type_typname_nsp_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
   pg_opfamily_oid_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
   pg_class_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
   pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1496 free
(4 chunks); 1576 used
   pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1600 free
(2 chunks); 1472 used
   pg_proc_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
   pg_language_oid_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
   pg_namespace_oid_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
   pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1384 free (2
chunks); 1688 used
   pg_foreign_server_name_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
   pg_attribute_relid_attnam_index: 3072 total in 2 blocks; 1648 free
(2 chunks); 1424 used
   pg_conversion_oid_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
   pg_user_mapping_user_server_index: 3072 total in 2 blocks; 1648
free (2 chunks); 1424 used
   pg_conversion_name_nsp_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
   pg_authid_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
   pg_auth_members_member_role_index: 3072 total in 2 blocks; 1648
free (2 chunks); 1424 used
   pg_tablespace_oid_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
   pg_database_datname_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
   pg_auth_members_role_member_index: 3072 total in 2 blocks; 1648
free (2 chunks); 1424 used
   pg_database_oid_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
   pg_authid_rolname_index: 3072 total in 2 blocks; 1744 free (3
chunks); 1328 used
 MdSmgr: 8192 total in 1 blocks; 8000 free (0 chunks); 192 used
 LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
 Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
 Postmaster: 57344 total in 3 blocks; 49184 free (323 chunks); 8160 used
 ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used
[10236]: [1-1] user=,db=,remote= ERROR:  out of memory
[10236]: [2-1] user=,db=,remote= DETAIL:  Failed on request of size 395973594.
[10236]: [3-1] user=,db=,remote= CONTEXT:  automatic vacuum of table
"******.public.serialized_series"

Thanks,

--
Alexis Lê-Quôc

Re: Autovacuum running out of memory

From
Tom Lane
Date:
=?UTF-8?B?QWxleGlzIEzDqi1RdcO0Yw==?= <alq@datadoghq.com> writes:
> I've been hitting a "out of memory error" during autovacuum of
> relatively large tables (compared to the amount of RAM available).

> The error message is:
> [10236]: [1-1] user=,db=,remote= ERROR:  out of memory
> [10236]: [2-1] user=,db=,remote= DETAIL:  Failed on request of size 395973594.
> [10236]: [3-1] user=,db=,remote= CONTEXT:  automatic vacuum of table
> "***.public.serialized_series"

> --- postgresql.conf (subset) ----
> shared_buffers = 1971421kB
> work_mem = 9857kB
> maintenance_work_mem = 752MB

Since the memory map shows that not very much memory has been allocated
by VACUUM yet, I suspect it's failing while trying to create the work
array for remembering dead tuple TIDs.  It will assume that it can use
up to maintenance_work_mem for that.  (The fact that it didn't ask for
the whole 752MB probably means this is a relatively small table in
which there couldn't possibly be that many TIDs.)  So the short answer
is "reduce maintenance_work_mem to something under 300MB".

However, I find it a bit odd that you're getting this failure in what
appears to be a 64-bit build.  That means you're not running out of
address space, so you must actually be out of RAM+swap.  Does the
machine have only 4GB or so of RAM?  If so, that value for
shared_buffers is unrealistically large; it's not leaving enough RAM for
other purposes such as this.

Where did you get the above-quoted parameter settings, anyway?  They
seem a bit weird, as in written to many more decimal places than anyone
could really expect to mean anything.

            regards, tom lane

Re: Autovacuum running out of memory

From
Alexis Lê-Quôc
Date:
On Wed, Aug 10, 2011 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexis Le-Quoc <alq@datadoghq.com> writes:
>> I've been hitting a "out of memory error" during autovacuum of
>> relatively large tables (compared to the amount of RAM available).
>
>> The error message is:
>> [10236]: [1-1] user=,db=,remote= ERROR:  out of memory
>> [10236]: [2-1] user=,db=,remote= DETAIL:  Failed on request of size 395973594.
>> [10236]: [3-1] user=,db=,remote= CONTEXT:  automatic vacuum of table
>> "***.public.serialized_series"
>
>> --- postgresql.conf (subset) ----
>> shared_buffers = 1971421kB
>> work_mem = 9857kB
>> maintenance_work_mem = 752MB
>
> Since the memory map shows that not very much memory has been allocated
> by VACUUM yet, I suspect it's failing while trying to create the work
> array for remembering dead tuple TIDs.  It will assume that it can use
> up to maintenance_work_mem for that.  (The fact that it didn't ask for
> the whole 752MB probably means this is a relatively small table in
> which there couldn't possibly be that many TIDs.)  So the short answer
> is "reduce maintenance_work_mem to something under 300MB".
>
> However, I find it a bit odd that you're getting this failure in what
> appears to be a 64-bit build.  That means you're not running out of
> address space, so you must actually be out of RAM+swap.  Does the
> machine have only 4GB or so of RAM?  If so, that value for
> shared_buffers is unrealistically large; it's not leaving enough RAM for
> other purposes such as this.

The box has little under 8GB (it's on EC2, a "m1.large" instance)

             total       used       free     shared    buffers     cached
Mem:          7700       6662       1038          0         25       6078
-/+ buffers/cache:        558       7142
Swap:            0          0          0

There is no swap.

> Where did you get the above-quoted parameter settings, anyway?  They
> seem a bit weird, as in written to many more decimal places than anyone
> could really expect to mean anything.

I have them computed by our configuration management system. Here's
the logic behind it (edited from ruby):

# Compute shared memory for procps
page_size = getconf PAGE_SIZE
phys_pages = getconf _PHYS_PAGES
shmall = phys_pages
shmmax = shmall * page_size

shared_buffers = kb_memory_total / 4
work_mem = (kb_memory_total / max_connections / 4)
maintenance_work_mem = (kb_memory_total * 100 / (1024 * 1024))

In turn they come from High-Performance Postgresql 9.0
(http://www.postgresql.org/about/news.1249)

Thanks,

--
Alexis Lê-Quôc

Re: Autovacuum running out of memory

From
Tom Lane
Date:
=?UTF-8?B?QWxleGlzIEzDqi1RdcO0Yw==?= <alq@datadoghq.com> writes:
> On Wed, Aug 10, 2011 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> However, I find it a bit odd that you're getting this failure in what
>> appears to be a 64-bit build.  That means you're not running out of
>> address space, so you must actually be out of RAM+swap.  Does the
>> machine have only 4GB or so of RAM?  If so, that value for
>> shared_buffers is unrealistically large; it's not leaving enough RAM for
>> other purposes such as this.

> The box has little under 8GB (it's on EC2, a "m1.large" instance)
> There is no swap.

Hmph.  Is there other stuff being run on the same instance?  Are there a
whole lot of active PG processes?  Maybe Amazon isn't really giving you
a whole 8GB, or there are weird address space restrictions in the EC2
environment.  Anyway I think I'd suggest reducing shared_buffers to 1GB
or so.

>> Where did you get the above-quoted parameter settings, anyway?

> In turn they come from High-Performance Postgresql 9.0
> (http://www.postgresql.org/about/news.1249)

I'm sure even Greg wouldn't claim his methods are good to more than one
or two significant digits.

            regards, tom lane

Re: Autovacuum running out of memory

From
Alexis Lê-Quôc
Date:
On Wed, Aug 10, 2011 at 2:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexis Le-Quoc <alq@datadoghq.com> writes:
>> On Wed, Aug 10, 2011 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> However, I find it a bit odd that you're getting this failure in what
>>> appears to be a 64-bit build.  That means you're not running out of
>>> address space, so you must actually be out of RAM+swap.  Does the
>>> machine have only 4GB or so of RAM?  If so, that value for
>>> shared_buffers is unrealistically large; it's not leaving enough RAM for
>>> other purposes such as this.
>
>> The box has little under 8GB (it's on EC2, a "m1.large" instance)
>> There is no swap.
>
> Hmph.  Is there other stuff being run on the same instance?  Are there a
> whole lot of active PG processes?  Maybe Amazon isn't really giving you
> a whole 8GB, or there are weird address space restrictions in the EC2
> environment.  Anyway I think I'd suggest reducing shared_buffers to 1GB
> or so.
>

Done and that fixed it. Thanks.

Now this is counter-intuitive (so much for intuition).
Any pointers to educate myself on why more shared buffers is
detrimental? I thought they would only compete with the OS page cache.
Could it be caused by the "no-overcommit" policy that I told the
kernel to enforce.

As far as other things running on the same instance, nothing stands
out. It is a "dedicated" db instance.

>>> Where did you get the above-quoted parameter settings, anyway?
>
>> In turn they come from High-Performance Postgresql 9.0
>> (http://www.postgresql.org/about/news.1249)
>
> I'm sure even Greg wouldn't claim his methods are good to more than one
> or two significant digits.

Agreed, they are meaningless. I just did not make the effort to
automatically round the values in my ruby code.

--
Alexis Lê-Quôc

Re: Autovacuum running out of memory

From
Tom Lane
Date:
=?UTF-8?B?QWxleGlzIEzDqi1RdcO0Yw==?= <alq@datadoghq.com> writes:
> On Wed, Aug 10, 2011 at 2:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hmph.  Is there other stuff being run on the same instance?  Are there a
>> whole lot of active PG processes?  Maybe Amazon isn't really giving you
>> a whole 8GB, or there are weird address space restrictions in the EC2
>> environment.  Anyway I think I'd suggest reducing shared_buffers to 1GB
>> or so.

> Done and that fixed it. Thanks.

> Now this is counter-intuitive (so much for intuition).
> Any pointers to educate myself on why more shared buffers is
> detrimental?

My guess is that it's an address space restriction at bottom.  Postgres
starts (on typical systems) with program text at the beginning of its
address space, static data after that, a large hole in the middle, and
stack up at the top.  Then the shared memory block gets allocated
somewhere in the hole, at a spot that's more or less at the whim of the
OS.  If a Postgres process subsequently asks for more private memory via
malloc, it can only possibly get as much as the distance from the
original static area to the shared memory block's position in the
process's address space.

So I'm thinking that the EC2 environment is giving you some lowball
address for the shared memory block that's constraining the process's
private memory space to just a few hundred meg, even though in a 64-bit
build there's room for umpteen gigabytes.  Possibly it's worth filing a
bug with Amazon about how they should pick a more sensible address ...
but first you should confirm that theory by looking at the process
memory map (should be visible in /proc someplace).

It may also be that the problem is not process-address-space related but
reflects some inefficiency in EC2's overall use of RAM, possibly
exacerbated by PG's request for a large shared memory block.  But you'd
need to find an EC2 expert to investigate that idea.

            regards, tom lane