Thread: Memory Leak executing small queries without closing the connection

Memory Leak executing small queries without closing the connection

From
Gerhard Wiesinger
Date:
Hello,

We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple worker
processes connected via persistent connections to PostgreSQL, they
perform just simple queries with SELECT on primary keys and simple
INSERTS/UPDATES. Normally nearly all the workers are idle but they still
consume the maximum configured work mem on the PostgreSQL server and the
memory is also resident. If some other queries get in we get into out of
memory situations. So it looks like PostgreSQL has memory leaks.

I found a test scenario to reproduce it also on a newer FreeBSD 10.2 VM
as well as in a Fedora 23 VM (both with PostgreSQL 9.4.5):

Executions in psql with one persisent connection:
-- Create the table
CREATE TABLE t_random AS SELECT s, md5(random()::text) FROM
generate_Series(1,100000000) s;
-- Create the index
CREATE INDEX ON t_random(s);

-- Restart psql with a new connection:

-- Memory goes slighty up after each execution even after canceling:
-- Memory leak on FreeBSD 10.2/Fedora 23 and PostgreSQL 9.4.5 on cancel
the query or multiple execution
SELECT * FROM t_random ORDER BY md5 LIMIT 100000;

-- Therefore I created a function:
CREATE OR REPLACE FUNCTION execmultiplei(IN num int8)
RETURNS void AS $$
BEGIN
   -- RAISE NOTICE 'num=%', num;
   FOR i IN 1..num LOOP
     PERFORM * FROM t_random WHERE s = i;
   END LOOP;
END;
$$  LANGUAGE plpgsql;

-- Test it several times
SELECT execmultiplei(10000000);

-- Linux testing (FreeBSD is similar), relevant part is RES (resident
memory):
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
-- after startup of psql
26851 postgres  20   0 2363276   7432   6292 S   0.0  0.2 0:00.00
postgres: postgres postgres [local] idle
-- Memory goes up, ok so far
26851 postgres  20   0 2365732 255152 253548 R  99.0  6.3 0:10.77
postgres: postgres postgres [local] SELECT
26851 postgres  20   0 2365732 408464 406788 R 100.0 10.1 0:17.81
postgres: postgres postgres [local] SELECT
26851 postgres  20   0 2365732 864472 862576 R 100.0 21.4 0:38.90
postgres: postgres postgres [local] SELECT
-- Function execmultiplei and transaction terminated, but memory still
allocated!!!
26851 postgres  20   0 2365732 920668 918748 S   0.0 22.7 0:41.40
postgres: postgres postgres [local] idle
-- Calling it again
26851 postgres  20   0 2365732 920668 918748 R  99.0 22.7 0:46.51
postgres: postgres postgres [local] SELECT
-- idle again, memory still allocated
26851 postgres  20   0 2365732 920668 918748 S   0.0 22.7 1:22.54
postgres: postgres postgres [local] idle

Memory will only be released if psql is exited. According to the
PostgreSQL design memory should be freed when the transaction completed.

top commands on FreeBSD: top -SaPz -o res -s 1
top commands on Linux: top -o RES d1

Config: VMs with 4GB of RAM, 2 vCPUs
shared_buffers = 2048MB                 # min 128kB
effective_cache_size = 2GB
work_mem = 892MB
wal_buffers = 8MB
checkpoint_segments = 16

Any ideas?

Thank you.

Ciao,
Gerhard



Re: Memory Leak executing small queries without closing the connection - FreeBSD

From
Gerhard Wiesinger
Date:
Hello,

some further details from the original FreeBSD 10.1 machine:

Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse

   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME WCPU COMMAND
77941 pgsql         5  20    0  7925M  7296M usem    2 352:34 6.98%
postgres: username dbnamee 127.0.0.1(43367)  (postgres)
75862 pgsql         1  20    0  7321M  7266M select  4   7:04 0.00%
postgres: checkpointer process    (postgres)
75863 pgsql         1  20    0  7321M  7260M select  4   3:34 0.00%
postgres: writer process    (postgres)
75867 pgsql         1  20    0  7323M  7237M sbwait  0   3:19 0.00%
postgres: username dbnamee ipaddress(58977)  (postgres)
76178 pgsql         1  20    0  7323M  7237M sbwait  2   3:18 0.00%
postgres: username dbnamee ipaddress(35807)  (postgres)
75868 pgsql         1  20    0  7323M  7237M sbwait  0   3:20 0.00%
postgres: username dbnamee ipaddress(58978)  (postgres)
75884 pgsql         1  20    0  7323M  7236M sbwait  3   3:20 0.00%
postgres: username dbnamee ipaddress(59908)  (postgres)
75869 pgsql         1  20    0  7323M  7236M sbwait  3   3:20 0.00%
postgres: username dbnamee ipaddress(58982)  (postgres)
76181 pgsql         1  20    0  7323M  7236M sbwait  2   3:18 0.00%
postgres: username dbnamee ipaddress(35813)  (postgres)
75883 pgsql         1  20    0  7323M  7236M sbwait  0   3:20 0.00%
postgres: username dbnamee ipaddress(59907)  (postgres)
76180 pgsql         1  20    0  7323M  7236M sbwait  1   3:19 0.00%
postgres: username dbnamee ipaddress(35811)  (postgres)
76177 pgsql         1  20    0  7323M  7236M sbwait  1   3:18 0.00%
postgres: username dbnamee ipaddress(35712)  (postgres)
76179 pgsql         1  20    0  7323M  7236M sbwait  5   3:18 0.00%
postgres: username dbnamee ipaddress(35810)  (postgres)
64951 pgsql         1  75    0  7375M   662M CPU1    1   0:01 11.47%
postgres: username dbnamee 127.0.0.1(32073)  (postgres)
64950 pgsql         1  77    0  7325M   598M CPU6    6   0:02 16.55%
postgres: username dbnamee 127.0.0.1(44251)  (postgres)

Out of memory:
kernel: swap_pager_getswapspace(4): failed
kernel: swap_pager_getswapspace(8): failed
kernel: swap_pager_getswapspace(3): failed

Main issue is IHMO (as far as I understood the FreeBSD Memory system)
that 20G are INACTIVE. When I subtract the shared memory, even ~13GB
should be available, but they are still allocated but inactive
(INACTIVE). INACTIVE memory might be clean or dirty. As we get into out
of memory situations it is likely that the memory is dirty (otherwise it
would have been reused).

Config:
Memory: 32GB, Swap: 512MB

maintenance_work_mem = 512MB
effective_cache_size = 10GB
work_mem = 892MB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 7080MB
max_connections = 80
autovacuum_max_workers = 3

Thnx.

Ciao,
Gerhard


On 13.12.2015 08:49, Gerhard Wiesinger wrote:
> Hello,
>
> We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple
> worker processes connected via persistent connections to PostgreSQL,
> they perform just simple queries with SELECT on primary keys and
> simple INSERTS/UPDATES. Normally nearly all the workers are idle but
> they still consume the maximum configured work mem on the PostgreSQL
> server and the memory is also resident. If some other queries get in
> we get into out of memory situations. So it looks like PostgreSQL has
> memory leaks.
>
> I found a test scenario to reproduce it also on a newer FreeBSD 10.2
> VM as well as in a Fedora 23 VM (both with PostgreSQL 9.4.5):
>
> Executions in psql with one persisent connection:
> -- Create the table
> CREATE TABLE t_random AS SELECT s, md5(random()::text) FROM
> generate_Series(1,100000000) s;
> -- Create the index
> CREATE INDEX ON t_random(s);
>
> -- Restart psql with a new connection:
>
> -- Memory goes slighty up after each execution even after canceling:
> -- Memory leak on FreeBSD 10.2/Fedora 23 and PostgreSQL 9.4.5 on
> cancel the query or multiple execution
> SELECT * FROM t_random ORDER BY md5 LIMIT 100000;
>
> -- Therefore I created a function:
> CREATE OR REPLACE FUNCTION execmultiplei(IN num int8)
> RETURNS void AS $$
> BEGIN
>   -- RAISE NOTICE 'num=%', num;
>   FOR i IN 1..num LOOP
>     PERFORM * FROM t_random WHERE s = i;
>   END LOOP;
> END;
> $$  LANGUAGE plpgsql;
>
> -- Test it several times
> SELECT execmultiplei(10000000);
>
> -- Linux testing (FreeBSD is similar), relevant part is RES (resident
> memory):
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+ COMMAND
> -- after startup of psql
> 26851 postgres  20   0 2363276   7432   6292 S   0.0  0.2 0:00.00
> postgres: postgres postgres [local] idle
> -- Memory goes up, ok so far
> 26851 postgres  20   0 2365732 255152 253548 R  99.0  6.3 0:10.77
> postgres: postgres postgres [local] SELECT
> 26851 postgres  20   0 2365732 408464 406788 R 100.0 10.1 0:17.81
> postgres: postgres postgres [local] SELECT
> 26851 postgres  20   0 2365732 864472 862576 R 100.0 21.4 0:38.90
> postgres: postgres postgres [local] SELECT
> -- Function execmultiplei and transaction terminated, but memory still
> allocated!!!
> 26851 postgres  20   0 2365732 920668 918748 S   0.0 22.7 0:41.40
> postgres: postgres postgres [local] idle
> -- Calling it again
> 26851 postgres  20   0 2365732 920668 918748 R  99.0 22.7 0:46.51
> postgres: postgres postgres [local] SELECT
> -- idle again, memory still allocated
> 26851 postgres  20   0 2365732 920668 918748 S   0.0 22.7 1:22.54
> postgres: postgres postgres [local] idle
>
> Memory will only be released if psql is exited. According to the
> PostgreSQL design memory should be freed when the transaction completed.
>
> top commands on FreeBSD: top -SaPz -o res -s 1
> top commands on Linux: top -o RES d1
>
> Config: VMs with 4GB of RAM, 2 vCPUs
> shared_buffers = 2048MB                 # min 128kB
> effective_cache_size = 2GB
> work_mem = 892MB
> wal_buffers = 8MB
> checkpoint_segments = 16
>
> Any ideas?
>
> Thank you.
>
> Ciao,
> Gerhard
>
>
>



On Sun, 13 Dec 2015 09:57:21 +0100
Gerhard Wiesinger <lists@wiesinger.com> wrote:
>
> some further details from the original FreeBSD 10.1 machine:
>
> Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
> Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse
>
>    PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME WCPU COMMAND
> 77941 pgsql         5  20    0  7925M  7296M usem    2 352:34 6.98%
> postgres: username dbnamee 127.0.0.1(43367)  (postgres)

<snip>

I see no evidence of an actual leak here. Each process is basically using
the 7G of shared_buffers you have allocated in the config (which is only
7G _total_ for all processes, since it's shared memory)

> Out of memory:
> kernel: swap_pager_getswapspace(4): failed
> kernel: swap_pager_getswapspace(8): failed
> kernel: swap_pager_getswapspace(3): failed
>
> Main issue is IHMO (as far as I understood the FreeBSD Memory system)
> that 20G are INACTIVE. When I subtract the shared memory, even ~13GB
> should be available, but they are still allocated but inactive
> (INACTIVE). INACTIVE memory might be clean or dirty. As we get into out
> of memory situations it is likely that the memory is dirty (otherwise it
> would have been reused).

Not quite correct. Inactive memory is _always_ available for re-use.

> Config:
> Memory: 32GB, Swap: 512MB

Probably not the cause of this problem, but this is a non-optimal layout.
If you're going to use swap at all, it should generally be 2x the size of
your RAM for optimal performance under memory pressure.

> maintenance_work_mem = 512MB
> effective_cache_size = 10GB
> work_mem = 892MB

I expect that this value is the cause of the problem. The scenario you
describe below is sorting a large table on an unindexed column, meaning
it will have to use all that work_mem. I'd be interested to see the
output of:

EXPLAIN ANALYZE SELECT * FROM t_random ORDER BY md5 LIMIT 100000;

But even without that information, I'd recommend you reduce work_mem
to about 16M or so.

> wal_buffers = 8MB
> checkpoint_segments = 16
> shared_buffers = 7080MB
> max_connections = 80
> autovacuum_max_workers = 3

[snip]

> > We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple
> > worker processes connected via persistent connections to PostgreSQL,
> > they perform just simple queries with SELECT on primary keys and
> > simple INSERTS/UPDATES.

That's not at all the test scenario you show below. The scenario below
is a large sort operation on a non-indexed column, which is vastly
different than a single-row fetch based on an index.

> > Normally nearly all the workers are idle but
> > they still consume the maximum configured work mem on the PostgreSQL
> > server and the memory is also resident.

I see no evidence of that in your top output. Each process has a
reference to the 7G of shared_buffers you allocated, which is memory
shared by all processes, and is expected. I'm not as familiar with
Linux top, but the results appear to be the same.

> > If some other queries get in
> > we get into out of memory situations. So it looks like PostgreSQL has
> > memory leaks.
> >
> > I found a test scenario to reproduce it also on a newer FreeBSD 10.2
> > VM as well as in a Fedora 23 VM (both with PostgreSQL 9.4.5):
> >
> > Executions in psql with one persisent connection:
> > -- Create the table
> > CREATE TABLE t_random AS SELECT s, md5(random()::text) FROM
> > generate_Series(1,100000000) s;
> > -- Create the index
> > CREATE INDEX ON t_random(s);
> >
> > -- Restart psql with a new connection:
> >
> > -- Memory goes slighty up after each execution even after canceling:
> > -- Memory leak on FreeBSD 10.2/Fedora 23 and PostgreSQL 9.4.5 on
> > cancel the query or multiple execution
> > SELECT * FROM t_random ORDER BY md5 LIMIT 100000;
> >
> > -- Therefore I created a function:
> > CREATE OR REPLACE FUNCTION execmultiplei(IN num int8)
> > RETURNS void AS $$
> > BEGIN
> >   -- RAISE NOTICE 'num=%', num;
> >   FOR i IN 1..num LOOP
> >     PERFORM * FROM t_random WHERE s = i;
> >   END LOOP;
> > END;
> > $$  LANGUAGE plpgsql;
> >
> > -- Test it several times
> > SELECT execmultiplei(10000000);
> >
> > -- Linux testing (FreeBSD is similar), relevant part is RES (resident
> > memory):
> >   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+ COMMAND
> > -- after startup of psql
> > 26851 postgres  20   0 2363276   7432   6292 S   0.0  0.2 0:00.00
> > postgres: postgres postgres [local] idle
> > -- Memory goes up, ok so far
> > 26851 postgres  20   0 2365732 255152 253548 R  99.0  6.3 0:10.77
> > postgres: postgres postgres [local] SELECT
> > 26851 postgres  20   0 2365732 408464 406788 R 100.0 10.1 0:17.81
> > postgres: postgres postgres [local] SELECT
> > 26851 postgres  20   0 2365732 864472 862576 R 100.0 21.4 0:38.90
> > postgres: postgres postgres [local] SELECT
> > -- Function execmultiplei and transaction terminated, but memory still
> > allocated!!!
> > 26851 postgres  20   0 2365732 920668 918748 S   0.0 22.7 0:41.40
> > postgres: postgres postgres [local] idle
> > -- Calling it again
> > 26851 postgres  20   0 2365732 920668 918748 R  99.0 22.7 0:46.51
> > postgres: postgres postgres [local] SELECT
> > -- idle again, memory still allocated
> > 26851 postgres  20   0 2365732 920668 918748 S   0.0 22.7 1:22.54
> > postgres: postgres postgres [local] idle
> >
> > Memory will only be released if psql is exited. According to the
> > PostgreSQL design memory should be freed when the transaction completed.
> >
> > top commands on FreeBSD: top -SaPz -o res -s 1
> > top commands on Linux: top -o RES d1
> >
> > Config: VMs with 4GB of RAM, 2 vCPUs
> > shared_buffers = 2048MB                 # min 128kB
> > effective_cache_size = 2GB
> > work_mem = 892MB
> > wal_buffers = 8MB
> > checkpoint_segments = 16

--
Bill Moran


Re: Memory Leak executing small queries without closing the connection - FreeBSD

From
Gerhard Wiesinger
Date:
Hello Bill,

Thank you for your response, comments inline:

On 13.12.2015 16:05, Bill Moran wrote:
> On Sun, 13 Dec 2015 09:57:21 +0100
> Gerhard Wiesinger <lists@wiesinger.com> wrote:
>> some further details from the original FreeBSD 10.1 machine:
>>
>> Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
>> Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse
>>
>>     PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME WCPU COMMAND
>> 77941 pgsql         5  20    0  7925M  7296M usem    2 352:34 6.98%
>> postgres: username dbnamee 127.0.0.1(43367)  (postgres)
> <snip>
>
> I see no evidence of an actual leak here. Each process is basically using
> the 7G of shared_buffers you have allocated in the config (which is only
> 7G _total_ for all processes, since it's shared memory)

OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed?

>> Out of memory:
>> kernel: swap_pager_getswapspace(4): failed
>> kernel: swap_pager_getswapspace(8): failed
>> kernel: swap_pager_getswapspace(3): failed
>>
>> Main issue is IHMO (as far as I understood the FreeBSD Memory system)
>> that 20G are INACTIVE. When I subtract the shared memory, even ~13GB
>> should be available, but they are still allocated but inactive
>> (INACTIVE). INACTIVE memory might be clean or dirty. As we get into out
>> of memory situations it is likely that the memory is dirty (otherwise it
>> would have been reused).
> Not quite correct. Inactive memory is _always_ available for re-use.

Are you sure that's true?

Monitoring inactive memory:
cat vm_stat.sh
#!/usr/bin/env bash

while [ 1 ]; do
   date +%Y.%m.%d.%H.%M.%S
   sysctl -a | grep vm.stats.vm.
   sleep 1
done

And even we get out of memory with swap_pager_getswapspace Inactive
Memory (from the log file) is around 20GB (doesn't go down or up)
vm.stats.vm.v_inactive_count: 5193530 (*4k pages is around 20GB)

Then we have 20GB inactive memory, but we still get out of memory with
kernel: swap_pager_getswapspace(4): failed. Any ideas why?

>
>> Config:
>> Memory: 32GB, Swap: 512MB
> Probably not the cause of this problem, but this is a non-optimal layout.
> If you're going to use swap at all, it should generally be 2x the size of
> your RAM for optimal performance under memory pressure.

Yes, we will try to disable it totally. Nevertheless why do we get out
of memory/Swap?

>
>> maintenance_work_mem = 512MB
>> effective_cache_size = 10GB
>> work_mem = 892MB
> I expect that this value is the cause of the problem. The scenario you
> describe below is sorting a large table on an unindexed column, meaning
> it will have to use all that work_mem. I'd be interested to see the
> output of:
>
> EXPLAIN ANALYZE SELECT * FROM t_random ORDER BY md5 LIMIT 100000;

That was only a test query, has nothing to do with production based
query. They are mostly SELECT/INSERTS/UPDATES on primary keys.

> But even without that information, I'd recommend you reduce work_mem
> to about 16M or so.

Why so low? E.g. sorting on reporting or some long running queries are
then done on disk and not in memory.

>> wal_buffers = 8MB
>> checkpoint_segments = 16
>> shared_buffers = 7080MB
>> max_connections = 80
>> autovacuum_max_workers = 3
> [snip]
>
>>> We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple
>>> worker processes connected via persistent connections to PostgreSQL,
>>> they perform just simple queries with SELECT on primary keys and
>>> simple INSERTS/UPDATES.
> That's not at all the test scenario you show below. The scenario below
> is a large sort operation on a non-indexed column, which is vastly
> different than a single-row fetch based on an index.

Yes, that non indexed select was just for testing purporeses.

>
>>> Normally nearly all the workers are idle but
>>> they still consume the maximum configured work mem on the PostgreSQL
>>> server and the memory is also resident.
> I see no evidence of that in your top output. Each process has a
> reference to the 7G of shared_buffers you allocated, which is memory
> shared by all processes, and is expected. I'm not as familiar with
> Linux top, but the results appear to be the same.

Yes, might be the case, but if it is n times 7G shared memory then we
have ~20GB Inactive Memory available, so plenty of memory. And why are
we getting: kernel: swap_pager_getswapspace(4): failed?

Thnx.

Ciao,
Gerhard



On Sun, 13 Dec 2015 16:35:08 +0100
Gerhard Wiesinger <lists@wiesinger.com> wrote:

> Hello Bill,
>
> Thank you for your response, comments inline:
>
> On 13.12.2015 16:05, Bill Moran wrote:
> > On Sun, 13 Dec 2015 09:57:21 +0100
> > Gerhard Wiesinger <lists@wiesinger.com> wrote:
> >> some further details from the original FreeBSD 10.1 machine:
> >>
> >> Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
> >> Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse
> >>
> >>     PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME WCPU COMMAND
> >> 77941 pgsql         5  20    0  7925M  7296M usem    2 352:34 6.98%
> >> postgres: username dbnamee 127.0.0.1(43367)  (postgres)
> > <snip>
> >
> > I see no evidence of an actual leak here. Each process is basically using
> > the 7G of shared_buffers you have allocated in the config (which is only
> > 7G _total_ for all processes, since it's shared memory)
>
> OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed?

You haven't provided enough information to isolate that cause yet. What's
in the Postgres log? Surely it will have logged something when its request
for RAM was denied, and it should be more informational than the OS'
generic message.

> >> Out of memory:
> >> kernel: swap_pager_getswapspace(4): failed
> >> kernel: swap_pager_getswapspace(8): failed
> >> kernel: swap_pager_getswapspace(3): failed
> >>
> >> Main issue is IHMO (as far as I understood the FreeBSD Memory system)
> >> that 20G are INACTIVE. When I subtract the shared memory, even ~13GB
> >> should be available, but they are still allocated but inactive
> >> (INACTIVE). INACTIVE memory might be clean or dirty. As we get into out
> >> of memory situations it is likely that the memory is dirty (otherwise it
> >> would have been reused).
> > Not quite correct. Inactive memory is _always_ available for re-use.
>
> Are you sure that's true?

Yes. Read The Design and Implementation of FreeBSD for the details.

> Monitoring inactive memory:
> cat vm_stat.sh
> #!/usr/bin/env bash
>
> while [ 1 ]; do
>    date +%Y.%m.%d.%H.%M.%S
>    sysctl -a | grep vm.stats.vm.
>    sleep 1
> done
>
> And even we get out of memory with swap_pager_getswapspace Inactive
> Memory (from the log file) is around 20GB (doesn't go down or up)
> vm.stats.vm.v_inactive_count: 5193530 (*4k pages is around 20GB)
>
> Then we have 20GB inactive memory, but we still get out of memory with
> kernel: swap_pager_getswapspace(4): failed. Any ideas why?

Theory: If the planner decides it needs to do 30 sort operations for a
query, it will try to allocate 27G of RAM, which exceeds what's available,
and therefore never gets allocated. So you get the "out of space" message,
but the actual memory usage doesn't change.

> >> maintenance_work_mem = 512MB
> >> effective_cache_size = 10GB
> >> work_mem = 892MB
> > I expect that this value is the cause of the problem. The scenario you
> > describe below is sorting a large table on an unindexed column, meaning
> > it will have to use all that work_mem. I'd be interested to see the
> > output of:
> >
> > EXPLAIN ANALYZE SELECT * FROM t_random ORDER BY md5 LIMIT 100000;
>
> That was only a test query, has nothing to do with production based
> query. They are mostly SELECT/INSERTS/UPDATES on primary keys.

Then provide the _actual_ queries and the EXPLAIN ANALYZE and the table
layouts and basic data distribution of the actual cause. If your test
case is completely non-representative of what's happening, then you're
not going to get useful answers.

> > But even without that information, I'd recommend you reduce work_mem
> > to about 16M or so.
>
> Why so low? E.g. sorting on reporting or some long running queries are
> then done on disk and not in memory.

Even a simple query could involve multiple sorts, and you're allowing
each sort to use up to 890M of RAM (which is _not_ shared). As noted
earlier, even a moderately complex query could exceed the available
RAM on the system. But since you don't provide the actual queries and
tables causing problems, I can only guess. And since you appear to
have already decided what the cause of the problem is, then crafted
completely non-relevent queries that you think prove your point, I'm
not sure there's anything I can do to help you.

> >> wal_buffers = 8MB
> >> checkpoint_segments = 16
> >> shared_buffers = 7080MB
> >> max_connections = 80
> >> autovacuum_max_workers = 3
> > [snip]
> >
> >>> We are running PostgreSQL 9.4.5 on FreeBSD 10.1 and have multiple
> >>> worker processes connected via persistent connections to PostgreSQL,
> >>> they perform just simple queries with SELECT on primary keys and
> >>> simple INSERTS/UPDATES.
> > That's not at all the test scenario you show below. The scenario below
> > is a large sort operation on a non-indexed column, which is vastly
> > different than a single-row fetch based on an index.
>
> Yes, that non indexed select was just for testing purporeses.

Completely useless. I can always craft some sort of nonsense case that
will show that something that no sane person would ever do doesn't work.
Especially if I set configuration limits (such as work_mem) to
unreasonably high values.

> >>> Normally nearly all the workers are idle but
> >>> they still consume the maximum configured work mem on the PostgreSQL
> >>> server and the memory is also resident.
> > I see no evidence of that in your top output. Each process has a
> > reference to the 7G of shared_buffers you allocated, which is memory
> > shared by all processes, and is expected. I'm not as familiar with
> > Linux top, but the results appear to be the same.
>
> Yes, might be the case, but if it is n times 7G shared memory then we
> have ~20GB Inactive Memory available, so plenty of memory. And why are
> we getting: kernel: swap_pager_getswapspace(4): failed?

My advice:

First, set work_mem to 16M and see if the problem goes away. If you
still want to investigate exactly what was causing the problem after
that, then post the _actual_ scenerio that's causing it. I find it
_extremely_ unlikely that it's being caused by SELECT * FROM table
WHERE indexed_column = ?. And as a result, you should gather more
information on what queries are actual failing to execute. If you've
got a lot of data and are doing even moderately complex JOINs, you
could easily be allocating many gigs of sort space for each query.
With only 32G of RAM, a few dozen connections could quickly exhaust
that.

--
Bill Moran


Gerhard Wiesinger <lists@wiesinger.com> writes:
>> Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
>> Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse

> OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed?

Just judging from the name of the function, I would bet this is a direct
result of having only 512M of swap configured.  As Bill already pointed
out, that's a pretty useless choice on a system with 32G of RAM.  As soon
as the kernel tries to push out any significant amount of idle processes,
it's gonna be out of swap space.  The numbers you show above prove that
it is almost out of free swap already.

Also, while that 20G of "inactive" pages may be candidates for reuse,
they probably can't actually be reused without swapping them out ...
and there's noplace for that data to go.

            regards, tom lane


Re: Memory Leak executing small queries without closing the connection - FreeBSD

From
Gerhard Wiesinger
Date:
On 13.12.2015 18:17, Tom Lane wrote:
> Gerhard Wiesinger <lists@wiesinger.com> writes:
>>> Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
>>> Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse
>> OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed?
> Just judging from the name of the function, I would bet this is a direct
> result of having only 512M of swap configured.  As Bill already pointed
> out, that's a pretty useless choice on a system with 32G of RAM.  As soon
> as the kernel tries to push out any significant amount of idle processes,
> it's gonna be out of swap space.  The numbers you show above prove that
> it is almost out of free swap already.

The system wasn't designed by me, I wouldn't do it either that way. Does
swapoff help?

>
> Also, while that 20G of "inactive" pages may be candidates for reuse,
> they probably can't actually be reused without swapping them out ...
> and there's noplace for that data to go.

There is no log in syslog (where postgres log) when
swap_pager_getswapspace is logged.

But why do we have 20G of Inactive pages? They are still allocated by
kernel or user space. As you can see below (top output) NON Postgres
processes are around 9G in virtual size, resident even lower. The system
is nearly idle, and the queries typically aren't active after one second
agin. Therefore where does the rest of the 11G of Inactive pages come
from (if it isn't a Postgres/FreeBSD memory leak)?
I read that Postgres has it's own memory allocator:
https://www.reddit.com/r/programming/comments/18zija/github_got_30_better_performance_using_tcmalloc/
Might that be an issue with double allocation/freeing and the "cheese
hole" topic with memory fragmentation?

https://www.opennet.ru/base/dev/fbsdvm.txt.html
         inactive        pages not actively used by programs which are
                         dirty and (at some point) need to be written
                         to their backing store (typically disk).
                         These pages are still associated with objects and
                         can be reclaimed if a program references them.
                         Pages can be moved from the active to the inactive
                         queue at any time with little adverse effect.
                         Moving pages to the cache queue has bigger
                         consequences (note 1)

https://unix.stackexchange.com/questions/134862/what-do-the-different-memory-counters-in-freebsd-mean
     Active: Memory currently being used by a process
     Inactive: Memory that has been freed but is still cached since it
may be used again. If more Free memory is required, this memory can be
cleared and become free. This memory is not cleared before it is needed,
because "free memory is wasted memory", it doesn't cost anything to keep
the old data around in case it is needed again.
     Wired: Memory in use by the Kernel. This memory cannot be swapped
out (GW: including ZFS cache!!!)
     Cache: Memory being used to cache data, can be freed immediately if
required
     Buffers: Disk cache
     Free: Memory that is completely free and ready to use. Inactive,
Cache and Buffers can become free if they are cleaned up.

Thnx.

Ciao,
Gerhard


================================================================================================================================================================
last pid:  7277;  load averages:  0.91,  0.96,  1.02  up 18+06:22:31
18:57:54
135 processes: 2 running, 132 sleeping, 1 waiting

Mem: 8020M Active, 19G Inact, 3537M Wired, 299M Cache, 1679M Buf, 38M Free
Swap: 512M Total, 501M Used, 12M Free, 97% Inuse

   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C TIME    WCPU COMMAND
77941 pgsql         5  20    0  7921M  7295M usem    7 404:32  10.25%
postgres
79570 pgsql         1  20    0  7367M  6968M sbwait  6 4:24   0.59% postgres
73449 pgsql         1  27    0  7367M  6908M sbwait  4 8:08   5.08% postgres
74209 pgsql         1  27    0  7367M  6803M sbwait  0 6:37   1.46% postgres
74207 pgsql         1  25    0  7367M  6748M sbwait  6 6:34   1.76% postgres
74206 pgsql         1  22    0  7367M  6548M sbwait  5 6:44   1.86% postgres
73380 pgsql         1  23    0  7367M  7265M sbwait  4 8:15   1.17% postgres
74208 pgsql         1  24    0  7367M  7223M sbwait  1 6:30   4.59% postgres
79569 pgsql         1  24    0  7367M  7105M sbwait  3 4:36   1.17% postgres
74210 pgsql         1  29    0  7363M  7182M sbwait  5 6:41   5.47% postgres
73479 pgsql         1  22    0  7363M  6560M sbwait  6 7:14   3.56% postgres
83030 pgsql         1  20    0  7329M   193M sbwait  5 0:00   0.00% postgres
76178 pgsql         1  20    0  7323M  7245M sbwait  2 3:44   0.00% postgres
75867 pgsql         1  20    0  7323M  7245M sbwait  2 3:45   0.00% postgres
75869 pgsql         1  20    0  7323M  7245M sbwait  2 3:46   0.00% postgres
75883 pgsql         1  20    0  7323M  7245M sbwait  7 3:46   0.00% postgres
76180 pgsql         1  20    0  7323M  7245M sbwait  7 3:45   0.00% postgres
75868 pgsql         1  20    0  7323M  7245M sbwait  6 3:46   0.00% postgres
76177 pgsql         1  20    0  7323M  7245M sbwait  3 3:45   0.00% postgres
76179 pgsql         1  20    0  7323M  7245M sbwait  3 3:44   0.00% postgres
75884 pgsql         1  20    0  7323M  7245M sbwait  7 3:46   0.00% postgres
76181 pgsql         1  20    0  7323M  7245M sbwait  7 3:44   0.00% postgres
75862 pgsql         1  20    0  7321M  7266M select  0 8:00   0.00% postgres
75863 pgsql         1  20    0  7321M  7261M select  4 4:04   0.00% postgres
  4361 pgsql         1  20    0  7321M   206M sbwait  2 0:00   0.00%
postgres
  7021 pgsql         1  20    0  7321M   181M sbwait  0 0:00   0.00%
postgres
83028 pgsql         1  20    0  7321M   180M sbwait  3 0:00   0.00% postgres
97964 pgsql         1  20    0  7321M   175M sbwait  4 0:00   0.00% postgres
97963 pgsql         1  20    0  7321M   175M sbwait  0 0:01   0.00% postgres
75865 pgsql         1  20    0  7321M   173M select  4 0:05   0.00% postgres
75864 pgsql         1  20    0  7321M   173M select  2 0:54   0.00% postgres
   593 pgsql         1  21    0  7317M   173M select  4 1:50   0.00%
postgres
32387 myusername    9  20    0   980M   375M uwait   5 69:03   1.27% node
71419 myusername    9  20    0   798M   146M uwait   1 0:40   0.00% node
77938 myusername    9  20    0   795M   115M uwait   7 691:07  15.58% node
   831 myusername    9  20    0   628M 30076K uwait   7 6:28   0.00% node
77937 myusername    5  20    0   618M 21352K usem    4 0:00   0.00% node
71418 myusername    5  20    0   618M 18000K usem    0 0:00   0.00% node
   622 myusername    1  20    0   261M  3388K kqread  3 41:01   0.00% nginx
   623 myusername    1  20    0   261M  2528K kqread  3 0:04   0.00% nginx
   621 root          1  20    0   261M  2448K pause   0 0:00   0.00% nginx
   604 myusername    1  20    0   235M 12144K kqread  1 1:27   0.00% uwsgi
   602 myusername    1  20    0   232M 11592K kqread  0 1:11   0.00% uwsgi
   600 myusername    1  20    0   231M  8736K kqread  6 0:58   0.00% uwsgi
   598 myusername    1  20    0   230M 13196K kqread  0 0:17   0.00% uwsgi
   608 myusername    1  20    0   227M 11204K kqread  7 0:46   0.00% uwsgi
   606 myusername    1  20    0   227M 10328K kqread  1 0:59   0.00% uwsgi
   610 myusername    1  20    0   227M 11656K kqread  0 1:09   0.00% uwsgi
   603 myusername    1  20    0   215M 21048K kqread  4 4:35   0.00% uwsgi
   599 myusername    1  20    0   215M 19756K kqread  5 0:52   0.00% uwsgi
   605 myusername    1  20    0   215M 19496K kqread  4 4:01   0.00% uwsgi
   597 myusername    1  20    0   215M 19284K kqread  0 3:21   0.00% uwsgi
   609 myusername    1  20    0   215M 19304K kqread  1 3:10   0.00% uwsgi
   601 myusername    1  20    0   215M 17208K kqread  2 4:47   0.00% uwsgi
   607 myusername    1  20    0   214M 17912K kqread  5 1:44   0.00% uwsgi
   595 myusername    1  20    0   213M 12224K kqread  4 0:01   0.00% uwsgi
   596 myusername    1  20    0   211M  4112K kqread  6 0:00   0.00% uwsgi
   532 myusername    1  20    0   100M  4036K kqread  0 0:51   0.00% uwsgi
   538 myusername    1  20    0   100M  3556K kqread  1 0:51   0.00% uwsgi
92090 myusername    1  20    0 86480K  5576K select  2 0:00   0.00% sshd
84665 myusername    1  20    0 86480K  5460K select  4 0:00   0.00% sshd
84249 myusername    1  20    0 86480K  5456K select  1 0:00   0.00% sshd
72867 myusername    1  20    0 86480K  5348K select  2 0:00   0.00% sshd
75956 myusername    1  20    0 86480K  5328K select  7 0:00   0.00% sshd
92075 root          1  21    0 86480K  5220K select  7 0:00   0.00% sshd
84653 root          1  21    0 86480K  5132K select  7 0:00   0.00% sshd
84175 root          1  20    0 86480K  5124K select  7 0:00   0.00% sshd
72863 root          1  20    0 86480K  4980K select  4 0:00   0.00% sshd
75950 root          1  20    0 86480K  4952K select  5 0:00   0.00% sshd
   653 root          1  20    0 61212K  1756K select  7 0:03   0.00% sshd
92133 myusername    1  20    0 47720K  2100K wait    4 0:00   0.00% su
84828 myusername    1  20    0 47720K  2068K wait    4 0:00   0.00% su
84275 myusername    1  20    0 47720K  2068K wait    3 0:00   0.00% su
83806 myusername    1  20    0 47720K  2052K wait    0 0:00   0.00% su
72875 myusername    1  20    0 47720K  1968K wait    2 0:00   0.00% su
83852 root          1  20    0 47708K  2032K wait    2 0:00   0.00% su
84923 root          1  20    0 47708K  2032K wait    7 0:00   0.00% su
  6080 myusername    1  20    0 46496K 10540K select  7 11:47   0.00% tmux
75866 pgsql         1  20    0 37304K  4964K select  1 1:56   0.00% postgres
   360 unbound       1  20    0 33016K  1652K select  6 0:00   0.00% unbound
68658 myusername    1  20    0 28408K  2300K ttyin   0 0:00   0.00% psql
   656 root          1  20    0 24112K  2104K select  2 0:13   0.00%
sendmail
   659 smmsp         1  20    0 24112K  1932K pause   1 0:00   0.00%
sendmail
92160 root          1  20    0 23572K  3560K ttyin   2 0:00   0.00% csh
84847 root          1  20    0 23572K  3540K pause   4 0:00   0.00% csh
83809 root          1  20    0 23572K  3516K pause   7 0:00   0.00% csh
84294 root          1  20    0 23572K  3424K pause   6 0:00   0.00% csh
72876 root          1  43    0 23572K  3316K ttyin   0 0:00   0.00% csh
  7276 root          1  20    0 21920K  2264K CPU4    4 0:00   0.00% top
83853 pgsql         1  47    0 17068K  2404K ttyin   6 0:00   0.00% sh
84924 pgsql         1  20    0 17068K  2384K wait    5 0:00   0.00% sh
   580 _ntp          1  20  -20 16740K  1056K select  2 0:00   0.00% ntpd
   579 root          1  20  -20 16732K  1036K select  6 0:00   0.00% ntpd
   582 _ntp          1  52    0 16732K   904K select  4 0:00   0.00% ntpd
   663 root          1  20    0 16596K  1064K nanslp  5 0:06   0.00% cron
92095 myusername    1  20    0 15668K  2940K wait    7 0:00   0.00% bash
84255 myusername    1  20    0 15668K  2932K wait    6 0:00   0.00% bash
84675 myusername    1  20    0 15668K  2932K wait    3 0:00   0.00% bash
75957 myusername    1  20    0 15668K  2820K wait    6 0:00   0.00% bash
72869 myusername    1  20    0 15668K  2772K wait    6 0:00   0.00% bash
68644 myusername    1  20    0 15668K  1620K wait    2 0:00   0.00% bash
  7109 myusername    1  20    0 15668K  1000K wait    3 0:00   0.00% bash
  7317 myusername    1  31    0 15668K   956K ttyin   2 0:00   0.00% bash
  6120 myusername    1  20    0 15668K   868K wait    4 0:00   0.00% bash
32681 myusername    1  20    0 15668K   780K ttyin   2 0:00   0.00% bash
84934 pgsql         1  52    0 14716K  2556K nanslp  3 0:02   0.00% cmdwatch
   405 root          1  20    0 14504K   984K select  5 68:40   0.00%
syslogd
   712 root          1  52    0 14492K   808K ttyin   1 0:00   0.00% getty
   711 root          1  52    0 14492K   808K ttyin   5 0:00   0.00% getty
   716 root          1  52    0 14492K   808K ttyin   2 0:00   0.00% getty
   710 root          1  52    0 14492K   808K ttyin   7 0:00   0.00% getty
   714 root          1  52    0 14492K   808K ttyin   5 0:00   0.00% getty
   713 root          1  52    0 14492K   808K ttyin   0 0:00   0.00% getty
   715 root          1  52    0 14492K   808K ttyin   3 0:00   0.00% getty
   709 root          1  52    0 14492K   808K ttyin   2 0:00   0.00% getty
   316 root          1  20    0 13164K   508K select  4 0:00   0.00% devd
  7277 root          1  20    0 12372K  1760K piperd  1 0:00   0.00% sed
     1 root          1  20    0  9472K   116K wait    4 0:00   0.00% init
    12 root         18 -84    -     0K   288K WAIT    0 91:43   0.00% intr
     0 root         12  -8    0     0K   192K -       0 131:24   0.00%
kernel
    15 root         12 -68    -     0K   192K -       7 1:16   0.00% usb
    13 root          3  -8    -     0K    48K -       1 64:25   0.00% geom
     2 root          3 -16    -     0K    48K -       0 49:12   0.00% cam
     9 root          2  20    -     0K    32K sdflus  7 1:43   0.00%
bufdaemon
     6 root          1 -16    -     0K    16K psleep  1 149:47   0.00%
pagedaemon
     5 root          1  -8    -     0K    16K m:w1    5 45:03   0.00%
g_mirror gm0
    14 root          1 -16    -     0K    16K -       3 31:12   0.00%
rand_harvestq
    17 root          1  16    -     0K    16K syncer  5 10:25   0.00% syncer
    16 root          1 -16    -     0K    16K vlruwt  5 0:03   0.00% vnlru
     4 root          1 -16    -     0K    16K idle    7 0:00   0.00%
enc_daemon0
     8 root          1 155 ki31     0K    16K pgzero  7 0:00   0.00%
pagezero
     3 root          1 -16    -     0K    16K waitin  7 0:00   0.00%
sctp_iterator
     7 root          1 -16    -     0K    16K psleep  2 0:00   0.00%
vmdaemon
    10 root          1 -16    -     0K    16K audit_  0 0:00   0.00% audit

================================================================================================================================================================

#!/usr/bin/env bash

while [ 1 ]; do
   date +%Y.%m.%d.%H.%M.%S
   echo "SELECT pid,state,now()-xact_start AS time FROM pg_stat_activity
WHERE state <> 'idle' AND pg_backend_pid()<>pid ORDER BY pid" | psql
postgres
   sleep 1
done

2015.12.13.19.54.34
   pid  | state  |      time
-------+--------+-----------------
  76179 | active | 00:00:00.001182
(1 row)

2015.12.13.19.54.35
  pid | state | time
-----+-------+------
(0 rows)

2015.12.13.19.54.36
   pid  | state  |       time
-------+--------+------------------
  73479 | active | 00:00:00.115812
  75868 | active | -00:00:00.000506
(2 rows)

2015.12.13.19.54.37
  pid | state | time
-----+-------+------
(0 rows)

2015.12.13.19.54.38
   pid  | state  |      time
-------+--------+-----------------
  74206 | active | 00:00:00.750855
(1 row)

2015.12.13.19.54.39
   pid  | state  |      time
-------+--------+-----------------
  77941 | active | 00:00:00.003457
  79569 | active | 00:00:00.061027
(2 rows)

2015.12.13.19.54.40
   pid  | state  |     time
-------+--------+---------------
  74207 | active | 00:00:00.2614
(1 row)

================================================================================================================================================================



On Sun, 13 Dec 2015 20:09:04 +0100
Gerhard Wiesinger <lists@wiesinger.com> wrote:

> On 13.12.2015 18:17, Tom Lane wrote:
> > Gerhard Wiesinger <lists@wiesinger.com> writes:
> >>> Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
> >>> Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse
> >> OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed?
> > Just judging from the name of the function, I would bet this is a direct
> > result of having only 512M of swap configured.  As Bill already pointed
> > out, that's a pretty useless choice on a system with 32G of RAM.  As soon
> > as the kernel tries to push out any significant amount of idle processes,
> > it's gonna be out of swap space.  The numbers you show above prove that
> > it is almost out of free swap already.
>
> The system wasn't designed by me, I wouldn't do it either that way. Does
> swapoff help?

FreeBSD and Linux (and most modern OS) are designed to have swap,
and usually more swap than RAM. I have never heard a good reason for
not using swap, and the reasons I _have_ heard have always been by
people misinformed about how the OS works.

If someone has a _good_ explanation for why you wouldn't want any
swap on a DB server, I'd love to hear it; but everything I've heard
up till now has been speculation based on misinformation.

IOW: no, you should not turn swap off, you should instead allocate
the appropriate amount of swap space.

> > Also, while that 20G of "inactive" pages may be candidates for reuse,
> > they probably can't actually be reused without swapping them out ...
> > and there's noplace for that data to go.
>
> There is no log in syslog (where postgres log) when
> swap_pager_getswapspace is logged.
>
> But why do we have 20G of Inactive pages? They are still allocated by
> kernel or user space. As you can see below (top output) NON Postgres
> processes are around 9G in virtual size, resident even lower. The system
> is nearly idle, and the queries typically aren't active after one second
> agin. Therefore where does the rest of the 11G of Inactive pages come
> from (if it isn't a Postgres/FreeBSD memory leak)?
> I read that Postgres has it's own memory allocator:
> https://www.reddit.com/r/programming/comments/18zija/github_got_30_better_performance_using_tcmalloc/
> Might that be an issue with double allocation/freeing and the "cheese
> hole" topic with memory fragmentation?

If there were a memory leak in either FreeBSD or Postgres of the
seriousness you're describing that were as easy to trigger as you
claim, I would expect the mailing lists and other support forums
to be exploding in panic. Notice that they are not. Also, I still
don't see _ANY_ evidence of a leak. I see evidence that something
is happening that is trying to allocate a LOT of RAM, that isn't
available on your system; but that's not the same as a leak.

> https://www.opennet.ru/base/dev/fbsdvm.txt.html
>          inactive        pages not actively used by programs which are
>                          dirty and (at some point) need to be written
>                          to their backing store (typically disk).
>                          These pages are still associated with objects and
>                          can be reclaimed if a program references them.
>                          Pages can be moved from the active to the inactive
>                          queue at any time with little adverse effect.
>                          Moving pages to the cache queue has bigger
>                          consequences (note 1)

Correct, but, when under pressure, the system _will_ recycle those
pages to be available.

Tom might be correct in that the system thinks they are inactive
because it could easily push them out to swap, but then it can't
_actually_ do that because you haven't allocated enough swap, but
that doesn't match my understanding of how inactive is used. A
question of that detail would be better asked on a FreeBSD forum,
as the differences between different VM implementations can be
pretty specific and technical.

[snip]

> Mem: 8020M Active, 19G Inact, 3537M Wired, 299M Cache, 1679M Buf, 38M Free
> Swap: 512M Total, 501M Used, 12M Free, 97% Inuse
>
>    PID USERNAME    THR PRI NICE   SIZE    RES STATE   C TIME    WCPU COMMAND
> 77941 pgsql         5  20    0  7921M  7295M usem    7 404:32  10.25%
> postgres
> 79570 pgsql         1  20    0  7367M  6968M sbwait  6 4:24   0.59% postgres

[snip about 30 identical PG processes]

> 32387 myusername    9  20    0   980M   375M uwait   5 69:03   1.27% node

[snip similar processes]

>    622 myusername    1  20    0   261M  3388K kqread  3 41:01   0.00% nginx

[snip similar processes]

Wait ... this is a combined HTTP/Postgres server? You didn't mention that
earlier, and it's kind of important.

What evidence do you have that Postgres is actually the part of
this system running out of memory? I don't see any such evidence in any of
your emails, and (based on experience) I find it pretty likely that whatever
is running under node is doing something in a horrifically memory-inefficient
manner. Since you mention that you see nothing in the PG logs, that makes it
even more likely (to me) that you're looking entirely in the wrong place.

I'd be willing to bet a steak dinner that if you put the web server on a
different server than the DB, that the memory problems would follow the
web server and not the DB server.

--
Bill Moran


Re: Memory Leak executing small queries without closing the connection - FreeBSD

From
Gerhard Wiesinger
Date:
On 13.12.2015 21:14, Bill Moran wrote:
> Wait ... this is a combined HTTP/Postgres server? You didn't mention that
> earlier, and it's kind of important.
>
> What evidence do you have that Postgres is actually the part of
> this system running out of memory?

For me the complete picture doesn't look consistent.

> I don't see any such evidence in any of
> your emails, and (based on experience) I find it pretty likely that whatever
> is running under node is doing something in a horrifically memory-inefficient
> manner. Since you mention that you see nothing in the PG logs, that makes it
> even more likely (to me) that you're looking entirely in the wrong place.
>
> I'd be willing to bet a steak dinner that if you put the web server on a
> different server than the DB, that the memory problems would follow the
> web server and not the DB server.

Changes in config:
track_activity_query_size = 102400
work_mem = 100MB

Ok, we restarted PostgreSQL and had it stopped for seconds, and logged
top every second:

When PostgreSQL was down nearly all memory was freed, looks good to me.
So it is likely that node and other processes are not the cause.
Mem: 742M Active, 358M Inact, 1420M Wired, 21G Cache, 871M Buf, 8110M Free
Swap: 512M Total, 477M Used, 35M Free, 93% Inuse

When PostgreSQL restarted, Inactive was growing fast (~1min):
Mem: 7998M Active, 18G Inact, 2763M Wired, 1766M Cache, 1889M Buf, 1041M
Free
Swap: 512M Total, 472M Used, 41M Free, 92% Inuse

After some few minutes we are back again at the same situation:
Mem: 8073M Active, 20G Inact, 2527M Wired, 817M Cache, 1677M Buf, 268M Free
Swap: 512M Total, 472M Used, 41M Free, 92% Inuse

The steak dinner is mine :-) Donating to the PostgreSQL community :-)

Any further ideas, I don't think this is normal system behaviour.

Ciao,
Gerhard



On Sun, 13 Dec 2015 22:23:19 +0100
Gerhard Wiesinger <lists@wiesinger.com> wrote:

> On 13.12.2015 21:14, Bill Moran wrote:
> > Wait ... this is a combined HTTP/Postgres server? You didn't mention that
> > earlier, and it's kind of important.
> >
> > What evidence do you have that Postgres is actually the part of
> > this system running out of memory?
>
> For me the complete picture doesn't look consistent.

That's because you haven't gathered enough of the right type of information.

> > I don't see any such evidence in any of
> > your emails, and (based on experience) I find it pretty likely that whatever
> > is running under node is doing something in a horrifically memory-inefficient
> > manner. Since you mention that you see nothing in the PG logs, that makes it
> > even more likely (to me) that you're looking entirely in the wrong place.
> >
> > I'd be willing to bet a steak dinner that if you put the web server on a
> > different server than the DB, that the memory problems would follow the
> > web server and not the DB server.
>
> Changes in config:
> track_activity_query_size = 102400
> work_mem = 100MB
>
> Ok, we restarted PostgreSQL and had it stopped for seconds, and logged
> top every second:
>
> When PostgreSQL was down nearly all memory was freed, looks good to me.
> So it is likely that node and other processes are not the cause.
> Mem: 742M Active, 358M Inact, 1420M Wired, 21G Cache, 871M Buf, 8110M Free
> Swap: 512M Total, 477M Used, 35M Free, 93% Inuse
>
> When PostgreSQL restarted, Inactive was growing fast (~1min):
> Mem: 7998M Active, 18G Inact, 2763M Wired, 1766M Cache, 1889M Buf, 1041M
> Free
> Swap: 512M Total, 472M Used, 41M Free, 92% Inuse
>
> After some few minutes we are back again at the same situation:
> Mem: 8073M Active, 20G Inact, 2527M Wired, 817M Cache, 1677M Buf, 268M Free
> Swap: 512M Total, 472M Used, 41M Free, 92% Inuse
>
> The steak dinner is mine :-) Donating to the PostgreSQL community :-)

Based on the fact that the inactive memory increased? Your understanding
of inactive memory in FreeBSD is incorrect. Those pages are probably DB
pages that the OS is keeping in inactive memory because Postgres requests
them over and over, which is what the OS is supposed to do to ensure the
best performance. Are you seeing any out of swap space errors? Even if
you are, you still haven't determined if the problem is the result of
Postgres or the node.js stuff you have running. I don't know what node.js
might be caching on the client side ... do you?

No. Until you can actually report back something other than wild
speculation, I'll keep that steak dinner for myself. Besides, that bet
was based on you putting the PG server on seperate hardware from the
web server, which you didn't do.

--
Bill Moran


Re: Memory Leak executing small queries without closing the connection - FreeBSD

From
Adrian Klaver
Date:
On 12/13/2015 01:23 PM, Gerhard Wiesinger wrote:
> On 13.12.2015 21:14, Bill Moran wrote:
>> Wait ... this is a combined HTTP/Postgres server? You didn't mention that
>> earlier, and it's kind of important.
>>
>> What evidence do you have that Postgres is actually the part of
>> this system running out of memory?
>
> For me the complete picture doesn't look consistent.
>
>> I don't see any such evidence in any of
>> your emails, and (based on experience) I find it pretty likely that
>> whatever
>> is running under node is doing something in a horrifically
>> memory-inefficient
>> manner. Since you mention that you see nothing in the PG logs, that
>> makes it
>> even more likely (to me) that you're looking entirely in the wrong place.
>>
>> I'd be willing to bet a steak dinner that if you put the web server on a
>> different server than the DB, that the memory problems would follow the
>> web server and not the DB server.
>
> Changes in config:
> track_activity_query_size = 102400
> work_mem = 100MB
>
> Ok, we restarted PostgreSQL and had it stopped for seconds, and logged
> top every second:
>
> When PostgreSQL was down nearly all memory was freed, looks good to me.
> So it is likely that node and other processes are not the cause.
> Mem: 742M Active, 358M Inact, 1420M Wired, 21G Cache, 871M Buf, 8110M Free
> Swap: 512M Total, 477M Used, 35M Free, 93% Inuse
>
> When PostgreSQL restarted, Inactive was growing fast (~1min):
> Mem: 7998M Active, 18G Inact, 2763M Wired, 1766M Cache, 1889M Buf, 1041M
> Free
> Swap: 512M Total, 472M Used, 41M Free, 92% Inuse
>
> After some few minutes we are back again at the same situation:
> Mem: 8073M Active, 20G Inact, 2527M Wired, 817M Cache, 1677M Buf, 268M Free
> Swap: 512M Total, 472M Used, 41M Free, 92% Inuse
>
> The steak dinner is mine :-) Donating to the PostgreSQL community :-)

To me all the above proves is that this a complete system issue and only
with all the parts running do you get a problem. It still does indicate
which part or interaction of parts is the issue. This is further muddied
by no description of what, if anything, you where doing in the above
scenarios.

>
> Any further ideas, I don't think this is normal system behaviour.
>
> Ciao,
> Gerhard
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com