Thread: PostgreSQL backend process high memory usage issue

PostgreSQL backend process high memory usage issue

From
Shianmiin
Date:
Hi there,

We are evaluating using PostgreSQL to implement a multitenant database,
Currently we are running some tests on single-database-multiple-schema model
(basically, all tenants have the same set of database objects under then own
schema within the same database).
The application will maintain a connection pool that will be shared among
all tenants/schemas.

e.g. If the database has 500 tenants/schemas and each tenants has 200
tables/views,
the total number of tables/views will be 500 * 200 = 100,000.

Since the connection pool will be used by all tenants, eventually each
connection will hit all the tables/views.

In our tests, when the connection hits more views, we found the memory usage
of the backend process increases quite fast and most of them are private
memory.
Those memory will be hold until the connection is closed.

We have a test case that one backend process uses more the 30GB memory and
eventually get an out of memory error.

To help understand the issue, I wrote code to create a simplified test cases
  - MTDB_destroy: used to clear tenant schemas
  - MTDB_Initialize: used to create a multitenant DB
  - MTDB_RunTests: simplified test case, basically select from all tenant
views one by one.

The tests I've done was on PostgreSQL 9.0.3 on CentOS 5.4
To make sure I have a clean environment, I re-created database cluster and
leave majority configurations as default,
(the only thing I HAVE to change is to increase "max_locks_per_transaction"
since MTDB_destroy needs to drop many objects.)

This is what I do to reproduce the issue:
1. create a new database
2. create the three functions using the code attached
3. connect to the new created db and run the initialize scripts

-- Initialize
select MTDB_Initialize('tenant', 100, 100, true);
-- not sure if vacuum analyze is useful here, I just run it
vacuum analyze;
-- check the tables/views created
select table_schema, table_type, count(*) from information_schema.tables
where table_schema like 'tenant%' group by table_schema, table_type order by
table_schema, table_type;

4. open another connection to the new created db and run the test scripts

-- get backend process id for current connection
SELECT pg_backend_pid();

-- open a linux console and run ps -p  and watch VIRT, RES and SHR

-- run tests
select MTDB_RunTests('tenant', 1);

Observations:
1. when the connection for running tests was first created,
    VIRT = 182MB, RES = 6240K, SHR=4648K
2. after run the tests once, (took 175 seconds)
    VIRT = 1661MB  RES = 1.5GB SHR = 55MB
3. re-run the test again (took 167 seconds)
    VIRT = 1661MB  RES = 1.5GB SHR = 55MB
3. re-run the test again (took 165 seconds)
    VIRT = 1661MB  RES = 1.5GB SHR = 55MB

as we scale up the number of tables, the memory usage go up as the tests is
run too.

Can anyone help explain what's happening here?
Is there a way we can control memory usage of PostgreSQL backend process?

Thanks.
Samuel

=============================================================================================

-- MTDB_destroy
create or replace function MTDB_destroy (schemaNamePrefix varchar(100))
returns int as $$
declare
   curs1 cursor(prefix varchar) is select schema_name from
information_schema.schemata where schema_name like prefix || '%';
   schemaName varchar(100);
   count integer;
begin
   count := 0;
   open curs1(schemaNamePrefix);
   loop
      fetch curs1 into schemaName;
      if not found then exit; end if;
      count := count + 1;
      execute 'drop schema ' || schemaName || ' cascade;';
   end loop;
   close curs1;
   return count;
end $$ language plpgsql;

-- MTDB_Initialize
create or replace function MTDB_Initialize (schemaNamePrefix varchar(100),
numberOfSchemas integer, numberOfTablesPerSchema integer,
createViewForEachTable boolean)
returns integer as $$
declare
   currentSchemaId integer;
   currentTableId integer;
   currentSchemaName varchar(100);
   currentTableName varchar(100);
   currentViewName varchar(100);
   count integer;
begin
   -- clear
   perform MTDB_Destroy(schemaNamePrefix);

   count := 0;
   currentSchemaId := 1;
   loop
      currentSchemaName := schemaNamePrefix ||
ltrim(currentSchemaId::varchar(10));
      execute 'create schema ' || currentSchemaName;

      currentTableId := 1;
      loop
         currentTableName := currentSchemaName || '.' || 'table' ||
ltrim(currentTableId::varchar(10));
         execute 'create table ' || currentTableName || ' (f1 integer, f2
integer, f3 varchar(100), f4 varchar(100), f5 varchar(100), f6 varchar(100),
f7 boolean, f8 boolean, f9 integer, f10 integer)';
         if (createViewForEachTable = true) then
            currentViewName := currentSchemaName || '.' || 'view' ||
ltrim(currentTableId::varchar(10));
            execute 'create view ' || currentViewName || ' as ' ||
                     'select t1.* from ' || currentTableName || ' t1 ' ||
             ' inner join ' || currentTableName || ' t2 on (t1.f1 = t2.f1) '
||
             ' inner join ' || currentTableName || ' t3 on (t2.f2 = t3.f2) '
||
             ' inner join ' || currentTableName || ' t4 on (t3.f3 = t4.f3) '
||
             ' inner join ' || currentTableName || ' t5 on (t4.f4 = t5.f4) '
||
             ' inner join ' || currentTableName || ' t6 on (t5.f5 = t6.f5) '
||
             ' inner join ' || currentTableName || ' t7 on (t6.f6 = t7.f6) '
||
             ' inner join ' || currentTableName || ' t8 on (t7.f7 = t8.f7) '
||
             ' inner join ' || currentTableName || ' t9 on (t8.f8 = t9.f8) '
||
             ' inner join ' || currentTableName || ' t10 on (t9.f9 = t10.f9)
';
         end if;
         currentTableId := currentTableId + 1;
         count := count + 1;
         if (currentTableId > numberOfTablesPerSchema) then exit; end if;
      end loop;

      currentSchemaId := currentSchemaId + 1;
      if (currentSchemaId > numberOfSchemas) then exit; end if;
   end loop;
   return count;
END $$ language plpgsql;

-- MTDB_RunTests
create or replace function MTDB_RunTests(schemaNamePrefix varchar(100),
rounds integer)
returns integer as $$
declare
   curs1 cursor(prefix varchar) is select table_schema || '.' || table_name
from information_schema.tables where table_schema like prefix || '%' and
table_type = 'VIEW';
   currentViewName varchar(100);
   count integer;
begin
   count := 0;
   loop
      rounds := rounds - 1;
      if (rounds < 0) then exit; end if;

      open curs1(schemaNamePrefix);
      loop
         fetch curs1 into currentViewName;
         if not found then exit; end if;
         execute 'select * from ' || currentViewName;
         count := count + 1;
      end loop;
      close curs1;
   end loop;
   return count;
end $$ language plpgsql;

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-backend-process-high-memory-usage-issue-tp4289407p4289407.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: PostgreSQL backend process high memory usage issue

From
Merlin Moncure
Date:
On Thu, Apr 7, 2011 at 3:42 PM, Shianmiin <Shianmiin@gmail.com> wrote:
> Hi there,
>
> We are evaluating using PostgreSQL to implement a multitenant database,
> Currently we are running some tests on single-database-multiple-schema model
> (basically, all tenants have the same set of database objects under then own
> schema within the same database).
> The application will maintain a connection pool that will be shared among
> all tenants/schemas.
>
> e.g. If the database has 500 tenants/schemas and each tenants has 200
> tables/views,
> the total number of tables/views will be 500 * 200 = 100,000.
>
> Since the connection pool will be used by all tenants, eventually each
> connection will hit all the tables/views.
>
> In our tests, when the connection hits more views, we found the memory usage
> of the backend process increases quite fast and most of them are private
> memory.
> Those memory will be hold until the connection is closed.
>
> We have a test case that one backend process uses more the 30GB memory and
> eventually get an out of memory error.
>
> To help understand the issue, I wrote code to create a simplified test cases
>  - MTDB_destroy: used to clear tenant schemas
>  - MTDB_Initialize: used to create a multitenant DB
>  - MTDB_RunTests: simplified test case, basically select from all tenant
> views one by one.
>
> The tests I've done was on PostgreSQL 9.0.3 on CentOS 5.4
> To make sure I have a clean environment, I re-created database cluster and
> leave majority configurations as default,
> (the only thing I HAVE to change is to increase "max_locks_per_transaction"
> since MTDB_destroy needs to drop many objects.)
>
> This is what I do to reproduce the issue:
> 1. create a new database
> 2. create the three functions using the code attached
> 3. connect to the new created db and run the initialize scripts
>
> -- Initialize
> select MTDB_Initialize('tenant', 100, 100, true);
> -- not sure if vacuum analyze is useful here, I just run it
> vacuum analyze;
> -- check the tables/views created
> select table_schema, table_type, count(*) from information_schema.tables
> where table_schema like 'tenant%' group by table_schema, table_type order by
> table_schema, table_type;
>
> 4. open another connection to the new created db and run the test scripts
>
> -- get backend process id for current connection
> SELECT pg_backend_pid();
>
> -- open a linux console and run ps -p  and watch VIRT, RES and SHR
>
> -- run tests
> select MTDB_RunTests('tenant', 1);

don't write your test environment this way.  postgresql functions !=
stored procedure. Your entire suite of tests is running in single
transaction context which is very problematic and not a real test of
any reasonably written application.   Convert your test suite into a
sql/per/bash/etc script that is run into the database through libpq or
sql.

merlin

Re: PostgreSQL backend process high memory usage issue

From
John R Pierce
Date:
On 04/07/11 1:42 PM, Shianmiin wrote:
> Since the connection pool will be used by all tenants, eventually each
> connection will hit all the tables/views.


don't all connections in a given pool have to use the same user
credentials?  won't that be problematic for this architecture?





Re: PostgreSQL backend process high memory usage issue

From
Rob Sargent
Date:

On 04/07/2011 03:46 PM, John R Pierce wrote:
> On 04/07/11 1:42 PM, Shianmiin wrote:
>> Since the connection pool will be used by all tenants, eventually each
>> connection will hit all the tables/views.
>
>
> don't all connections in a given pool have to use the same user
> credentials? won't that be problematic for this architecture?
>
>
>
typically one munges the connection to set an effective user inside the
connection established by "common" credentials

Re: PostgreSQL backend process high memory usage issue

From
Shianmiin
Date:
Hi Merlin,

I revised the test code with attached files and use pgbench to send the test
queries.

http://postgresql.1045698.n5.nabble.com/file/n4290723/dotest dotest
http://postgresql.1045698.n5.nabble.com/file/n4290723/initialize.sql
initialize.sql
http://postgresql.1045698.n5.nabble.com/file/n4290723/test.sql test.sql

Samuel

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-backend-process-high-memory-usage-issue-tp4289407p4290723.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: PostgreSQL backend process high memory usage issue

From
Shianmiin
Date:
if we go with single-db-multiple-schema model, either our data access layer
will need to ensure qualifying all the database objects with proper schema
name, or with postgresql, just to change the search path while the
connection passed from pool to app code.  Another model under evaluation is
single-db-single-schema model, which will need to do we you described.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-backend-process-high-memory-usage-issue-tp4289407p4289639.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: PostgreSQL backend process high memory usage issue

From
Shianmiin
Date:
Good point. Thanks.

The tests we did in house is all from client site and definitely not in a
single transaction. I just found this simplified test case can reproduce the
same memory usage issue and didn't pay too much attention to it.

If we repeatedly doing smaller batches, we can still see the backend memory
usage growing (and stable after all the views are selected).

I hope this is enough for looking into the issue. It would take me a while
since I am relatively new to PostgreSQL/Linux. That says, I could try to
rewrite the testing code in more convincing way if it's really needed. :)

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-backend-process-high-memory-usage-issue-tp4289407p4289630.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: PostgreSQL backend process high memory usage issue

From
Shianmiin
Date:
Thanks. Probably, but that's not the point here.
The issue here is how PostgreSQL backend process uses memory and I wonder if
there any way to configure it.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-backend-process-high-memory-usage-issue-tp4289407p4289550.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: PostgreSQL backend process high memory usage issue

From
Merlin Moncure
Date:
On Fri, Apr 8, 2011 at 7:43 AM, Shianmiin <Shianmiin@gmail.com> wrote:
> Hi Merlin,
>
> I revised the test code with attached files and use pgbench to send the test
> queries.
>
> http://postgresql.1045698.n5.nabble.com/file/n4290723/dotest dotest
> http://postgresql.1045698.n5.nabble.com/file/n4290723/initialize.sql
> initialize.sql
> http://postgresql.1045698.n5.nabble.com/file/n4290723/test.sql test.sql

Not to beat on your test some more, but spamming count(*) is probably
not a good real world indicator of application performance (but it
might be).  I'm assuming your memory usage problem has gone away?

merlin

Re: PostgreSQL backend process high memory usage issue

From
Shianmiin
Date:
Shianmiin wrote:
>
> Hi Merlin,
>
> I revised the test code with attached files and use pgbench to send the
> test queries.
>
>  http://postgresql.1045698.n5.nabble.com/file/n4290723/dotest dotest
>  http://postgresql.1045698.n5.nabble.com/file/n4290723/initialize.sql
> initialize.sql
>  http://postgresql.1045698.n5.nabble.com/file/n4290723/test.sql test.sql
>
> Samuel
>


Just want to put some descriptions on the revised test code:

1. This test is not a real life testing, it's is specifically designed to
emphasize the high memory usage issue of PostgreSQL backend process we
encountered.
2. The tests in the dotest file uses pgbench's feature to simulate 20
concurrent clients sending random queries, which will make the host memory
usage climb even faster; eventually all 20 concurrent PostgreSQL process
will all use 1.5 GB (30GB in total). To do the same tests as in the original
post, we can run

pgbench memoryusagetest -c 1 -j 1 -T180 -f test.sql

Samuel

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-backend-process-high-memory-usage-issue-tp4289407p4291173.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: PostgreSQL backend process high memory usage issue

From
Shianmiin
Date:
No. The highmemory usage issueis stll there.

We could change select count(*) to select * or select 1 if you like. Therre
is no data in the tables anyway.

Sent from my iPad

On 2011-04-08, at 8:25 AM, "Merlin Moncure-2 [via PostgreSQL]" <
ml-node+4290983-1196677718-196674@n5.nabble.com> wrote:

On Fri, Apr 8, 2011 at 7:43 AM, Shianmiin <[hidden
email]</user/SendEmail.jtp?type=node&node=4290983&i=0&by-user=t>>
wrote:
> Hi Merlin,
>
> I revised the test code with attached files and use pgbench to send the
test
> queries.
>
>
http://postgresql.1045698.n5.nabble.com/file/n4290723/dotest<http://postgresql.1045698.n5.nabble.com/file/n4290723/dotest?by-user=t>
dotest

>
http://postgresql.1045698.n5.nabble.com/file/n4290723/initialize.sql<http://postgresql.1045698.n5.nabble.com/file/n4290723/initialize.sql?by-user=t>
> initialize.sql
>
http://postgresql.1045698.n5.nabble.com/file/n4290723/test.sql<http://postgresql.1045698.n5.nabble.com/file/n4290723/test.sql?by-user=t>
test.sql


Not to beat on your test some more, but spamming count(*) is probably
not a good real world indicator of application performance (but it
might be).  I'm assuming your memory usage problem has gone away?

merlin

--
Sent via pgsql-general mailing list ([hidden
email]</user/SendEmail.jtp?type=node&node=4290983&i=1&by-user=t>)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


------------------------------
 If you reply to this email, your message will be added to the discussion
below:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-backend-process-high-memory-usage-issue-tp4289407p4290983.html
 To unsubscribe from PostgreSQL backend process high memory usage issue, click

here<http://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4289407&code=U2hpYW5taWluQGdtYWlsLmNvbXw0Mjg5NDA3fC0xMzg0MTAwNzE=>.


--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-backend-process-high-memory-usage-issue-tp4289407p4291100.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: PostgreSQL backend process high memory usage issue

From
Merlin Moncure
Date:
On Fri, Apr 8, 2011 at 10:30 AM, Shianmiin <Shianmiin@gmail.com> wrote:
>
> Shianmiin wrote:
>>
>> Hi Merlin,
>>
>> I revised the test code with attached files and use pgbench to send the
>> test queries.
>>
>>  http://postgresql.1045698.n5.nabble.com/file/n4290723/dotest dotest
>>  http://postgresql.1045698.n5.nabble.com/file/n4290723/initialize.sql
>> initialize.sql
>>  http://postgresql.1045698.n5.nabble.com/file/n4290723/test.sql test.sql
>>
>> Samuel
>>
>
>
> Just want to put some descriptions on the revised test code:
>
> 1. This test is not a real life testing, it's is specifically designed to
> emphasize the high memory usage issue of PostgreSQL backend process we
> encountered.
> 2. The tests in the dotest file uses pgbench's feature to simulate 20
> concurrent clients sending random queries, which will make the host memory
> usage climb even faster; eventually all 20 concurrent PostgreSQL process
> will all use 1.5 GB (30GB in total). To do the same tests as in the original
> post, we can run
>
> pgbench memoryusagetest -c 1 -j 1 -T180 -f test.sql

are you sure you're actually using that memory? I bet you have
configured around 1.5gb shared buffers.  What does 'top' say when this
is happening?

merlin

Re: PostgreSQL backend process high memory usage issue

From
Shianmiin
Date:
No I didn't configured 1.5GB shared memory. For this test I recreated a
database cluster and leave everything in the configuration as default.

As in the original post,
when the connection was first established, the memory usage of backend
process showed in top was

VIRT = 182MB, RES = 6240K, SHR=4648K

while running

pgbench memoryusagetest -c 1 -j 1 -T180 -f test.sql

I saw the memory climb up rapidly to

VIRT = 1661MB  RES = 1.5GB SHR = 55MB

once the tests is done, the process terminated and the memory returned to OS

Samuel



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-backend-process-high-memory-usage-issue-tp4289407p4291691.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: PostgreSQL backend process high memory usage issue

From
Shianmiin
Date:
Further clarification,

if I run two concurrent threads

pgbench memoryusagetest -c 2 -j 2 -T180 -f test.sql

both backend processes uses 1.5GB and result in 3GB in total.

Samuel

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-backend-process-high-memory-usage-issue-tp4289407p4291699.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: PostgreSQL backend process high memory usage issue

From
Merlin Moncure
Date:
On Fri, Apr 8, 2011 at 2:00 PM, Shianmiin <Shianmiin@gmail.com> wrote:
> Further clarification,
>
> if I run two concurrent threads
>
> pgbench memoryusagetest -c 2 -j 2 -T180 -f test.sql
>
> both backend processes uses 1.5GB and result in 3GB in total.

yes. could you please post a capture of top after running the modified
test that does not run in single transaction.

merlin

Re: PostgreSQL backend process high memory usage issue

From
Shianmiin
Date:
No. The highmemory usage issueis stll there.

We could change select count(*) to select * or select 1 if you like. Therre is no data in the tables anyway.

Sent from my iPad

On 2011-04-08, at 8:25 AM, "Merlin Moncure-2 [via PostgreSQL]" <[hidden email]> wrote:

On Fri, Apr 8, 2011 at 7:43 AM, Shianmiin <[hidden email]> wrote:
> Hi Merlin,
>
> I revised the test code with attached files and use pgbench to send the test
> queries.
>
> http://postgresql.1045698.n5.nabble.com/file/n4290723/dotest dotest
> http://postgresql.1045698.n5.nabble.com/file/n4290723/initialize.sql
> initialize.sql
> http://postgresql.1045698.n5.nabble.com/file/n4290723/test.sql test.sql

Not to beat on your test some more, but spamming count(*) is probably
not a good real world indicator of application performance (but it
might be).  I'm assuming your memory usage problem has gone away?

merlin

--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



If you reply to this email, your message will be added to the discussion below:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-backend-process-high-memory-usage-issue-tp4289407p4290983.html


View this message in context: Re: PostgreSQL backend process high memory usage issue
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: PostgreSQL backend process high memory usage issue

From
Shianmiin
Date:
Merlin Moncure-2 wrote:
>
> On Fri, Apr 8, 2011 at 2:00 PM, Shianmiin <Shianmiin@gmail.com>
> wrote:
>> Further clarification,
>>
>> if I run two concurrent threads
>>
>> pgbench memoryusagetest -c 2 -j 2 -T180 -f test.sql
>>
>> both backend processes uses 1.5GB and result in 3GB in total.
>
> yes. could you please post a capture of top after running the modified
> test that does not run in single transaction.
>
> merlin
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Here you are.

http://postgresql.1045698.n5.nabble.com/file/n4292010/PostgreSQLBackendMemoryUsage.gif
PostgreSQLBackendMemoryUsage.gif

The screen shot was taken with running the following command

pgbench memoryusagetest -c 2 -j 2 -T1800 -f test.sql

I have to set the test time longer so that I got the chance to take the
snapshot because if pgbench finishes, the connection will be gone.

The snapshot was taken when I wait until the memory usage stable at 1.5GB.

Please note in this tests, there are two PostgreSQL backend processes, each
use 1.5GB of memory.

Samuel

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-backend-process-high-memory-usage-issue-tp4289407p4292010.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: PostgreSQL backend process high memory usage issue

From
Merlin Moncure
Date:
On Fri, Apr 8, 2011 at 5:07 PM, Shianmiin <Shianmiin@gmail.com> wrote:
>
> Merlin Moncure-2 wrote:
>>
>> On Fri, Apr 8, 2011 at 2:00 PM, Shianmiin <Shianmiin@gmail.com>
>> wrote:
>>> Further clarification,
>>>
>>> if I run two concurrent threads
>>>
>>> pgbench memoryusagetest -c 2 -j 2 -T180 -f test.sql
>>>
>>> both backend processes uses 1.5GB and result in 3GB in total.
>>
>> yes. could you please post a capture of top after running the modified
>> test that does not run in single transaction.
>>
>> merlin
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
> Here you are.
>
> http://postgresql.1045698.n5.nabble.com/file/n4292010/PostgreSQLBackendMemoryUsage.gif
> PostgreSQLBackendMemoryUsage.gif
>
> The screen shot was taken with running the following command
>
> pgbench memoryusagetest -c 2 -j 2 -T1800 -f test.sql
>
> I have to set the test time longer so that I got the chance to take the
> snapshot because if pgbench finishes, the connection will be gone.
>
> The snapshot was taken when I wait until the memory usage stable at 1.5GB.
>
> Please note in this tests, there are two PostgreSQL backend processes, each
> use 1.5GB of memory.

I am not seeing your results.  I was able to run your test on a stock
config (cut down to 50 schemas though) on a vm with 512mb of memory.
What is your shared buffers set to?

merlin

Re: PostgreSQL backend process high memory usage issue

From
Shianmiin
Date:
Merlin Moncure-2 wrote:
>
>
> I am not seeing your results.  I was able to run your test on a stock
> config (cut down to 50 schemas though) on a vm with 512mb of memory.
> What is your shared buffers set to?
>
>

The shared buffers was set to 32MB as in default postgresql.conf

To save you some time and make the conversion moves faster, I re-do a serial
of tests on a VM with 512MB of memory so that we have a common base. Here is
the test results and observations:

1. Setup: VM with 512MB of memory, CentOS 5.5 Final, PostgreSQL 9.0.3, a
fresh db cluster with everything default in all config files. For your
reference, I uploaded my postgresql.conf and top output here
http://postgresql.1045698.n5.nabble.com/file/n4298807/postgresql.conf
postgresql.conf
http://postgresql.1045698.n5.nabble.com/file/n4298807/01_Top_-_VM_with_512MB.gif
01_Top_-_VM_with_512MB.gif

2. using psql to connect to the instance, here is the top output
http://postgresql.1045698.n5.nabble.com/file/n4298807/02_Top_-_VM_with_512MB_-_fresh_connection_.gif
02_Top_-_VM_with_512MB_-_fresh_connection_.gif

3. The follow tests is on a db that initialized with 50 schemas, each with
50 tables/views

   3a. single thread test with command "pgbench memoryusagetest -c 1 -j 1 -T
6000 -f test.sql"
http://postgresql.1045698.n5.nabble.com/file/n4298807/03_Top_-_VM_with_512MB_-_single_thread_test.gif
03_Top_-_VM_with_512MB_-_single_thread_test.gif

observations:
   (backend process 1) VIRT 478 MB, RES 401 MB SHR 32 MB.
   0% waiting
==> The single backend process pretty much use up all the physical memory
and maybe some swap spaces.
==> In the original test, 100 schemas with 100 tables/views per schema, the
process use 1.5 GB
      In this test, 50 schemas with 50 tables/views per schema, the process
use 1.5 / 4 = 478 MB.
      the memory used is somehow proportional to the number of objects in
the database.

   3b. two threads test with command "pgbench memoryusagetest -c 2 -j 2 -T
6000 -f test.sql"
http://postgresql.1045698.n5.nabble.com/file/n4298807/04_Top_-_VM_with_512MB_-_two_threads_test.gif
04_Top_-_VM_with_512MB_-_two_threads_test.gif

observations:
   (backend process 1) VIRT 476 MB, RES 320 MB SHR 9724 KB.
   (backend process 2) VIRT 478 MB, RES 82 MB SHR 6308 KB.
   37.4%waiting
==> the physically memory were all used up by the two backend processes,
plus 775 MB swap space used. The virtual (physical + swap) of each process
is the same as in the single thread test, i.e. ~ 470MB
==> please note that there is considerable %waiting here and kswapd0
starting to work a little hard

   3c. three threads test with command "pgbench memoryusagetest -c 3 -j 3 -T
6000 -f test.sql"
http://postgresql.1045698.n5.nabble.com/file/n4298807/05_Top_-_VM_with_512MB_-_three_threads_test.gif
05_Top_-_VM_with_512MB_-_three_threads_test.gif

observations:
   (backend process 1) VIRT 468 MB, RES 299 MB SHR 18 MB.
   (backend process 2) VIRT 418 MB, RES 61 MB SHR 13 MB.
   (backend process 3) VIRT 421 MB, RES 61 MB SHR 13 MB.
   42.8%waiting
==> all physical memory is used and more swap spaces are used, I didn't let
it run long enough to see if the VIRT all go up to 470 MB since when
swapping is considerable, the tests slows down and the VIRT growth slows
down too. The VIRT still in the same range, i.e. 400-ish MB.
==> the % waiting gets higher and kswapd0 work harder and the tests run
slower

   3d. four threads test with command "pgbench memoryusagetest -c 4 -j 4 -T
6000 -f test.sql"
http://postgresql.1045698.n5.nabble.com/file/n4298807/06_Top_-_VM_with_512MB_-_four_threads_test.gif
06_Top_-_VM_with_512MB_-_four_threads_test.gif
observations:

Observations:
   (backend process 1) VIRT 424 MB, RES 196 MB SHR 21 MB.
   (backend process 2) VIRT 416 MB, RES 83 MB SHR 15 MB.
   (backend process 3) VIRT 418 MB, RES 86 MB SHR 16 MB.
   (backend process 4) VIRT 466 MB, RES 66 MB SHR 16 MB.
   47.8%waiting
==> all physical memory is used and more swap spaces are used, I didn't let
it run long enough to see if the VIRT all go up to 470 MB since when
swapping is considerable, the tests slows down and the VIRT growth slows
down too. The VIRT still in the same range, i.e. 400-ish MB.
==> the % waiting gets higher and kswapd0 work even harder and the tests run
even slower

   3e. A crash test: 60 threads test with command "pgbench memoryusagetest
-c 60 -j 60 -T 6000 -f test.sql"
http://postgresql.1045698.n5.nabble.com/file/n4298807/07_Top_-_VM_with_512MB_-_60_threads_crashing_test.gif
07_Top_-_VM_with_512MB_-_60_threads_crashing_test.gif
Observations:
   (backend process 1) VIRT 148 MB, RES 14 MB SHR 7852 KB.
   (backend process 2) VIRT 149 MB, RES 13 MB SHR 7636 KB.
   ...
   63.9%waiting
==> as expected all physical memory is used and here swap space is used up
too. Each backend process just get as much VIRT memory as they can. After
running it a while, the system reach a point that everything was almost
freeze, then the pgbench process got killed and system back to working
state. The PostgreSQL reception is running fine though.

Here is the questions and concerns:

1. while running the single-thread test, it shows that the PosgreSQL backend
process allocates 478 MB and majority of them are private memory. What are
those memory used for? It looks like it's some kind of cache and probably
related to query plan?

2. Is there a way we can configure the max memory that a PostgreSQL backend
process can allocate? The concern is, from the test results, when there are
a considerable number of objects in the database (it's not unusual in a
single-db-multi-schema multitenant data model) with a pool of long-lived
connections, the memory usage of the corresponding PostgreSQL backend
process will grow over time into all available memory (physical + swap) and
results in considerable swapping and make the system really slow. The only
way I can find to release those allocated (esp. private) memory from backend
process is to disconnect. The concern is that those private memory may
contain some rarely used data (assuming it's used for query plan related
purpose) that it may be more efficient to release those memory for other
backend process to use. That force us that if we go this way, to get around
this issue, we have to build some kind of connection recycling logic to make
sure the system not running into serious memory thrashing situation. It
would be nice to have some way to configure how backend process uses memory,
or is there other ways to see this issue?

Being not knowing PostgreSQL well, sorry I made some assumptions here to try
to make it clear about my concerns. Any insides or suggestions or
corrections is welcomed.

Samuel




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-backend-process-high-memory-usage-issue-tp4289407p4298807.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: PostgreSQL backend process high memory usage issue

From
Merlin Moncure
Date:
On Tue, Apr 12, 2011 at 12:48 PM, Shianmiin <Shianmiin@gmail.com> wrote:
>
> Merlin Moncure-2 wrote:
>>
>>
>> I am not seeing your results.  I was able to run your test on a stock
>> config (cut down to 50 schemas though) on a vm with 512mb of memory.
>> What is your shared buffers set to?
>>
>>
>
> The shared buffers was set to 32MB as in default postgresql.conf
>
> To save you some time and make the conversion moves faster, I re-do a serial
> of tests on a VM with 512MB of memory so that we have a common base. Here is
> the test results and observations:
>
> 1. Setup: VM with 512MB of memory, CentOS 5.5 Final, PostgreSQL 9.0.3, a
> fresh db cluster with everything default in all config files. For your
> reference, I uploaded my postgresql.conf and top output here
> http://postgresql.1045698.n5.nabble.com/file/n4298807/postgresql.conf
> postgresql.conf
> http://postgresql.1045698.n5.nabble.com/file/n4298807/01_Top_-_VM_with_512MB.gif
> 01_Top_-_VM_with_512MB.gif
>
> 2. using psql to connect to the instance, here is the top output
> http://postgresql.1045698.n5.nabble.com/file/n4298807/02_Top_-_VM_with_512MB_-_fresh_connection_.gif
> 02_Top_-_VM_with_512MB_-_fresh_connection_.gif
>
> 3. The follow tests is on a db that initialized with 50 schemas, each with
> 50 tables/views
>
>   3a. single thread test with command "pgbench memoryusagetest -c 1 -j 1 -T
> 6000 -f test.sql"
> http://postgresql.1045698.n5.nabble.com/file/n4298807/03_Top_-_VM_with_512MB_-_single_thread_test.gif
> 03_Top_-_VM_with_512MB_-_single_thread_test.gif
>
> observations:
>   (backend process 1) VIRT 478 MB, RES 401 MB SHR 32 MB.
>   0% waiting
> ==> The single backend process pretty much use up all the physical memory
> and maybe some swap spaces.
> ==> In the original test, 100 schemas with 100 tables/views per schema, the
> process use 1.5 GB
>      In this test, 50 schemas with 50 tables/views per schema, the process
> use 1.5 / 4 = 478 MB.
>      the memory used is somehow proportional to the number of objects in
> the database.
>
>   3b. two threads test with command "pgbench memoryusagetest -c 2 -j 2 -T
> 6000 -f test.sql"
> http://postgresql.1045698.n5.nabble.com/file/n4298807/04_Top_-_VM_with_512MB_-_two_threads_test.gif
> 04_Top_-_VM_with_512MB_-_two_threads_test.gif
>
> observations:
>   (backend process 1) VIRT 476 MB, RES 320 MB SHR 9724 KB.
>   (backend process 2) VIRT 478 MB, RES 82 MB SHR 6308 KB.
>   37.4%waiting
> ==> the physically memory were all used up by the two backend processes,
> plus 775 MB swap space used. The virtual (physical + swap) of each process
> is the same as in the single thread test, i.e. ~ 470MB
> ==> please note that there is considerable %waiting here and kswapd0
> starting to work a little hard
>
>   3c. three threads test with command "pgbench memoryusagetest -c 3 -j 3 -T
> 6000 -f test.sql"
> http://postgresql.1045698.n5.nabble.com/file/n4298807/05_Top_-_VM_with_512MB_-_three_threads_test.gif
> 05_Top_-_VM_with_512MB_-_three_threads_test.gif
>
> observations:
>   (backend process 1) VIRT 468 MB, RES 299 MB SHR 18 MB.
>   (backend process 2) VIRT 418 MB, RES 61 MB SHR 13 MB.
>   (backend process 3) VIRT 421 MB, RES 61 MB SHR 13 MB.
>   42.8%waiting
> ==> all physical memory is used and more swap spaces are used, I didn't let
> it run long enough to see if the VIRT all go up to 470 MB since when
> swapping is considerable, the tests slows down and the VIRT growth slows
> down too. The VIRT still in the same range, i.e. 400-ish MB.
> ==> the % waiting gets higher and kswapd0 work harder and the tests run
> slower
>
>   3d. four threads test with command "pgbench memoryusagetest -c 4 -j 4 -T
> 6000 -f test.sql"
> http://postgresql.1045698.n5.nabble.com/file/n4298807/06_Top_-_VM_with_512MB_-_four_threads_test.gif
> 06_Top_-_VM_with_512MB_-_four_threads_test.gif
> observations:
>
> Observations:
>   (backend process 1) VIRT 424 MB, RES 196 MB SHR 21 MB.
>   (backend process 2) VIRT 416 MB, RES 83 MB SHR 15 MB.
>   (backend process 3) VIRT 418 MB, RES 86 MB SHR 16 MB.
>   (backend process 4) VIRT 466 MB, RES 66 MB SHR 16 MB.
>   47.8%waiting
> ==> all physical memory is used and more swap spaces are used, I didn't let
> it run long enough to see if the VIRT all go up to 470 MB since when
> swapping is considerable, the tests slows down and the VIRT growth slows
> down too. The VIRT still in the same range, i.e. 400-ish MB.
> ==> the % waiting gets higher and kswapd0 work even harder and the tests run
> even slower
>
>   3e. A crash test: 60 threads test with command "pgbench memoryusagetest
> -c 60 -j 60 -T 6000 -f test.sql"
> http://postgresql.1045698.n5.nabble.com/file/n4298807/07_Top_-_VM_with_512MB_-_60_threads_crashing_test.gif
> 07_Top_-_VM_with_512MB_-_60_threads_crashing_test.gif
> Observations:
>   (backend process 1) VIRT 148 MB, RES 14 MB SHR 7852 KB.
>   (backend process 2) VIRT 149 MB, RES 13 MB SHR 7636 KB.
>   ...
>   63.9%waiting
> ==> as expected all physical memory is used and here swap space is used up
> too. Each backend process just get as much VIRT memory as they can. After
> running it a while, the system reach a point that everything was almost
> freeze, then the pgbench process got killed and system back to working
> state. The PostgreSQL reception is running fine though.
>
> Here is the questions and concerns:
>
> 1. while running the single-thread test, it shows that the PosgreSQL backend
> process allocates 478 MB and majority of them are private memory. What are
> those memory used for? It looks like it's some kind of cache and probably
> related to query plan?
>
> 2. Is there a way we can configure the max memory that a PostgreSQL backend
> process can allocate? The concern is, from the test results, when there are
> a considerable number of objects in the database (it's not unusual in a
> single-db-multi-schema multitenant data model) with a pool of long-lived
> connections, the memory usage of the corresponding PostgreSQL backend
> process will grow over time into all available memory (physical + swap) and
> results in considerable swapping and make the system really slow. The only
> way I can find to release those allocated (esp. private) memory from backend
> process is to disconnect. The concern is that those private memory may
> contain some rarely used data (assuming it's used for query plan related
> purpose) that it may be more efficient to release those memory for other
> backend process to use. That force us that if we go this way, to get around
> this issue, we have to build some kind of connection recycling logic to make
> sure the system not running into serious memory thrashing situation. It
> would be nice to have some way to configure how backend process uses memory,
> or is there other ways to see this issue?
>
> Being not knowing PostgreSQL well, sorry I made some assumptions here to try
> to make it clear about my concerns. Any insides or suggestions or
> corrections is welcomed.

I think you may have uncovered a leak (I stand corrected).

The number of schemas in your test is irrelevant -- the leak is
happening in proportion to the number of views (set via \setrandom
tidx 1 10).  At 1 I don't think it exists at all -- at 100 memory use
grows very fast.

Postgresql memory architecture is such that process local permanent
memory is extremely cautiously allocated typically for caching
purposes (like a tiny tablespace cache).  Temporarily, you can see
some per process memory allocations for hashing and sorting which you
can control with work_mem, and for maintenance purposes (create index)
with maintenance_work_mem.  This memory is always given back on
transaction end however.  In normal state of affairs, it's almost
impossible to run postgres out of memory unless you oversubscribe
work_mem and/or shared_buffers.

There are some operations in postgres which are notorious for
exhausting *shared* memory, like creating a lot of schemas and tables
in a single transaction.   However long term memory growth in resident
memory is a serious issue and needs to be tracked down and fixed.

merlin

Re: PostgreSQL backend process high memory usage issue

From
Shianmiin
Date:
Merlin Moncure-2 wrote:
>
>
> I think you may have uncovered a leak (I stand corrected).
>
> The number of schemas in your test is irrelevant -- the leak is
> happening in proportion to the number of views (set via \setrandom
> tidx 1 10).  At 1 I don't think it exists at all -- at 100 memory use
> grows very fast.
>
> Postgresql memory architecture is such that process local permanent
> memory is extremely cautiously allocated typically for caching
> purposes (like a tiny tablespace cache).  Temporarily, you can see
> some per process memory allocations for hashing and sorting which you
> can control with work_mem, and for maintenance purposes (create index)
> with maintenance_work_mem.  This memory is always given back on
> transaction end however.  In normal state of affairs, it's almost
> impossible to run postgres out of memory unless you oversubscribe
> work_mem and/or shared_buffers.
>
> There are some operations in postgres which are notorious for
> exhausting *shared* memory, like creating a lot of schemas and tables
> in a single transaction.   However long term memory growth in resident
> memory is a serious issue and needs to be tracked down and fixed.
>
> merlin
>
>

I am not sure I agree that this is a leaking. Although the memory usage
grows fast with more views selected, they don't grow infinitely. If we let
the tests keep running, the memory usage stables (still a function of the
number of views hit), e.g. In the tests, in the 100 schemas * 100
tables/views per schema scenario, it stables at 1.5 GB while in 50 schemas *
50 tables/views per schema scenario, it stables at 478 MB.

creating large number of schemas/tables/views within a single transaction is
not a real case and hence it's not a concern. However, in the test case, if
we want to create 100 schemas with 100 tables/views with stock
configuration, the only parameter need to be relaxed is
max_locks_per_transaction (effectively, the upper bound of lock slots will
be max_locks_per_transaction * max_connections + max_prepared_transactions),
e.g. in my experiment, set max_locks_per_transaction from 64 (default) to
256 will work here.

In databases like Microsoft SQL Server or Oracle, the database instance
allocates a range of memory for storing global plan caches. I have been
trying to look for similar or corresponding mechanism in PostgreSQL but so
far I haven't found anything yet.  I doubt in PostgreSQL the query plans are
cached in each backend process and are not shared among other backend
processes. This is fine if the number of query plans are not large or the
connection is not long-lived. However, it's a real concern in a usage
scenario that implementing single-db-multiple-schema multitenant model with
long-lived connection pool. I think it would be a good thing if PostgreSQL
has something like some shared plan cache mechanism or at least it can be
configured max memory the backend process can use.

Could you refer me to someone or tell me what should I do to track down the
issue?

Samuel

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-backend-process-high-memory-usage-issue-tp4289407p4299348.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: [BUGS] PostgreSQL backend process high memory usage issue

From
Tom Lane
Date:
Merlin Moncure <mmoncure@gmail.com> writes:
> I think you may have uncovered a leak (I stand corrected).

> The number of schemas in your test is irrelevant -- the leak is
> happening in proportion to the number of views (set via \setrandom
> tidx 1 10).  At 1 I don't think it exists at all -- at 100 memory use
> grows very fast.

I don't think it's a leak, exactly: it's just that the "relcache" entry
for each one of these views occupies about 100K.  A backend that touches
N of the views is going to need about N*100K in relcache space.  I can't
get terribly excited about that.  Trying to reduce the size of the
relcache would be a net loss for most usage patterns (ie, we'd end up
increasing the amount of re-fetching from the system catalogs that
backends would have to do).  And I don't think that this test case has
much of anything to do with sane application design, anyway.  Do you
really need that many complex views?  Do you really need to have most
sessions touching all of them?

            regards, tom lane

Re: [BUGS] PostgreSQL backend process high memory usage issue

From
Merlin Moncure
Date:
On Wed, Apr 13, 2011 at 12:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> I think you may have uncovered a leak (I stand corrected).
>
>> The number of schemas in your test is irrelevant -- the leak is
>> happening in proportion to the number of views (set via \setrandom
>> tidx 1 10).  At 1 I don't think it exists at all -- at 100 memory use
>> grows very fast.
>
> I don't think it's a leak, exactly: it's just that the "relcache" entry
> for each one of these views occupies about 100K.  A backend that touches
> N of the views is going to need about N*100K in relcache space.  I can't
> get terribly excited about that.  Trying to reduce the size of the
> relcache would be a net loss for most usage patterns (ie, we'd end up
> increasing the amount of re-fetching from the system catalogs that
> backends would have to do).  And I don't think that this test case has
> much of anything to do with sane application design, anyway.  Do you
> really need that many complex views?  Do you really need to have most
> sessions touching all of them?

Ya, my mistake -- it *felt* like a leak when of course it was not.
100k does seem like an awful lot though -- perhaps this could be
organized better? -- but that's not really the point.  I've coded a
lot of multi schema designs and they tend to either go the one
session/schema route or the connection pooling route.  Either way,
cache memory usage tends to work itself out pretty well (it's never
been a problem for me before at least).  I can't recall anyone ever
even complaining about it in a non synthetic test.

merlin

Re: [BUGS] PostgreSQL backend process high memory usage issue

From
Shianmiin
Date:
Merlin Moncure-2 wrote:
>
>
>  ... I've coded a
> lot of multi schema designs and they tend to either go the one
> session/schema route or the connection pooling route.  Either way,
> cache memory usage tends to work itself out pretty well (it's never
> been a problem for me before at least).  I can't recall anyone ever
> even complaining about it in a non synthetic test.
>
>

It's good to know that. Many thanks to your help. :)

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-backend-process-high-memory-usage-issue-tp4289407p4301555.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: [BUGS] PostgreSQL backend process high memory usage issue

From
Shianmiin
Date:
Tom Lane-2 wrote:
>
>
> I don't think it's a leak, exactly: it's just that the "relcache" entry
> for each one of these views occupies about 100K.  A backend that touches
> N of the views is going to need about N*100K in relcache space.  I can't
> get terribly excited about that.  Trying to reduce the size of the
> relcache would be a net loss for most usage patterns (ie, we'd end up
> increasing the amount of re-fetching from the system catalogs that
> backends would have to do).  And I don't think that this test case has
> much of anything to do with sane application design, anyway.  Do you
> really need that many complex views?  Do you really need to have most
> sessions touching all of them?
>
>

Thanks for the clarification, that answers our question and gives us a good
direction where to look for further information. We have gained more
confidence on moving toward using PostgreSQL as our multitenant database
backend.

The tests were designed to show the high memory usage findings while we are
evaluating PostgreSQL, and yes, it's far from real world scenario. However,
the concern is not come from nothing - current our system is running on
Microsoft SQL Server with one db per tenant multitenancy model. We have one
db server that has 5000 tenant databases, each with 200 tables and 500
views. There are quite a few views that are much more complex than the one
shown in the test. When a request comes in, the application servers will
randomly pick a connection from the pool to query the db, so theoretically
every connection could eventually hit all views, in real wold it may take
quite a while to fill-up the memory until it reaches an unacceptable size.
However, it just feel a little weird that there isn't a convenient way for
PostgreSQL to control the cache memory usage of backend process.  :)

We are still at early staging of moving to a different multitenant db model
and there are plenty of options that we can go or get around issues like
this. Thanks again.


--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-backend-process-high-memory-usage-issue-tp4289407p4301552.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.