Thread: Lots of memory allocated when reassigning Large Objects

Lots of memory allocated when reassigning Large Objects

From
Guillaume Lelarge
Date:
Hi,

One of our customers had a very bad issue while trying to reassign objects from user A to user B. He had a lot of them, and the backend got very hungry for memory. It finally all went down when the linux kernel decided to kill the backend (-9 of course).

I attach three shell scripts showing the issue. create.sh creates a database and two users. Then it imports a million Large Objects in this new database. There's no drop.sh as it is a simple "dropdb foodb".

session1_monitor.sh will start logging memory usage in the server log file every second. So it needs v14, but our customer is in v11. While this script runs, you can start session2_reindex.sh. This script will only run a reassign from one user to another.

Here is what I get in the server log file:

$ grep "Grand total" 14.log
LOG:  Grand total: 15560832 bytes...
LOG:  Grand total: 68710528 bytes...
LOG:  Grand total: 119976064 bytes..
LOG:  Grand total: 171626624 bytes...
LOG:  Grand total: 224211072 bytes...
LOG:  Grand total: 276615296 bytes...
LOG:  Grand total: 325611648 bytes...
LOG:  Grand total: 378196096 bytes...
LOG:  Grand total: 429838464 bytes...
LOG:  Grand total: 481104000 bytes...

IOW, it's asking for at least 481MB to reassign 1 million empty LO. It strikes me as odd.

FWIW, the biggest memory context is this one:

LOG:  level: 2; PortalContext: 479963904 total in 58590 blocks; 2662328 free (32567 chunks); 477301576 used: <unnamed>

Memory is released at the end of the reassignment. So it's definitely not leaked forever, but only during the operation, which looks like a missing pfree (or something related). I've tried to find something like that in the code somewhere, but to no avail. I'm pretty sure I missed something, which is the reason for this email :)

Thanks.

Regards.

PS : we've found a workaround to make it work for our customer (executing all the required ALTER LARGE OBJECT ... OWNER TO ...), but I'm still amazed by this weird behaviour.


--
Guillaume.
Attachment

Re: Lots of memory allocated when reassigning Large Objects

From
Justin Pryzby
Date:
On Mon, Nov 29, 2021 at 01:49:24PM +0100, Guillaume Lelarge wrote:
> One of our customers had a very bad issue while trying to reassign objects
> from user A to user B. He had a lot of them, and the backend got very
> hungry for memory. It finally all went down when the linux kernel decided
> to kill the backend (-9 of course).

> Memory is released at the end of the reassignment. So it's definitely not
> leaked forever, but only during the operation, which looks like a missing
> pfree (or something related). I've tried to find something like that in the
> code somewhere, but to no avail. I'm pretty sure I missed something, which
> is the reason for this email :)

I reproduced the issue like this.

psql postgres -c 'CREATE ROLE two WITH login superuser'
psql postgres two -c "SELECT lo_import('/dev/null') FROM generate_series(1,22111)" >/dev/null
psql postgres -c 'SET client_min_messages=debug; SET log_statement_stats=on;' -c 'begin; REASSIGN OWNED BY two TO
pryzbyj;rollback;'
 

I didn't find the root problem, but was able to avoid the issue by creating a
new mem context.  I wonder if there are a bunch more issues like this.

unpatched:
!       33356 kB max resident size

patched:
!       21352 kB max resident size

diff --git a/src/backend/catalog/pg_shdepend.c b/src/backend/catalog/pg_shdepend.c
index 9ea42f805f..cbe7f04983 100644
--- a/src/backend/catalog/pg_shdepend.c
+++ b/src/backend/catalog/pg_shdepend.c
@@ -65,6 +65,7 @@
 #include "storage/lmgr.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
+#include "utils/memutils.h"
 #include "utils/syscache.h"
 
 typedef enum
@@ -1497,6 +1498,11 @@ shdepReassignOwned(List *roleids, Oid newrole)
         while ((tuple = systable_getnext(scan)) != NULL)
         {
             Form_pg_shdepend sdepForm = (Form_pg_shdepend) GETSTRUCT(tuple);
+            MemoryContext cxt, oldcxt;
+
+            cxt = AllocSetContextCreate(CurrentMemoryContext, "shdepReassignOwned cxt",
+                    ALLOCSET_DEFAULT_SIZES);
+            oldcxt = MemoryContextSwitchTo(cxt);
 
             /*
              * We only operate on shared objects and objects in the current
@@ -1598,8 +1604,12 @@ shdepReassignOwned(List *roleids, Oid newrole)
                     elog(ERROR, "unexpected classid %u", sdepForm->classid);
                     break;
             }
+
             /* Make sure the next iteration will see my changes */
             CommandCounterIncrement();
+
+            MemoryContextSwitchTo(oldcxt);
+            MemoryContextDelete(cxt);
         }
 
         systable_endscan(scan);



Re: Lots of memory allocated when reassigning Large Objects

From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes:
> I reproduced the issue like this.

> psql postgres -c 'CREATE ROLE two WITH login superuser'
> psql postgres two -c "SELECT lo_import('/dev/null') FROM generate_series(1,22111)" >/dev/null
> psql postgres -c 'SET client_min_messages=debug; SET log_statement_stats=on;' -c 'begin; REASSIGN OWNED BY two TO
pryzbyj;rollback;' 

Confirmed here, although I needed to use a lot more than 22K large objects
to see a big leak.

> I didn't find the root problem, but was able to avoid the issue by creating a
> new mem context.  I wonder if there are a bunch more issues like this.

I poked into it with valgrind, and identified the major leak as being
stuff that is allocated by ExecOpenIndices and not freed by
ExecCloseIndices.  The latter knows it's leaking:

    /*
     * XXX should free indexInfo array here too?  Currently we assume that
     * such stuff will be cleaned up automatically in FreeExecutorState.
     */

On the whole, I'd characterize this as DDL code using pieces of the
executor without satisfying the executor's expectations as to environment
--- specifically, that it'll be run in a memory context that doesn't
outlive executor shutdown.  Normally, any one DDL operation does a limited
number of catalog updates so that small per-update leaks don't cost that
much ... but REASSIGN OWNED creates a loop that can invoke ALTER OWNER
many times.

I think your idea of creating a short-lived context is about right.
Another idea we could consider is to do that within CatalogTupleUpdate;
but I'm not sure that the cost/benefit ratio would be good for most
operations.  Anyway I think ALTER OWNER has other leaks outside the
index-update operations, so we'd still need to do this within
REASSIGN OWNED's loop.

DROP OWNED BY likely has similar issues.

            regards, tom lane



Re: Lots of memory allocated when reassigning Large Objects

From
Guillaume Lelarge
Date:
Le lun. 29 nov. 2021 à 19:40, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Justin Pryzby <pryzby@telsasoft.com> writes:
> I reproduced the issue like this.

> psql postgres -c 'CREATE ROLE two WITH login superuser'
> psql postgres two -c "SELECT lo_import('/dev/null') FROM generate_series(1,22111)" >/dev/null
> psql postgres -c 'SET client_min_messages=debug; SET log_statement_stats=on;' -c 'begin; REASSIGN OWNED BY two TO pryzbyj; rollback;'

Confirmed here, although I needed to use a lot more than 22K large objects
to see a big leak.


So do I.

> I didn't find the root problem, but was able to avoid the issue by creating a
> new mem context.  I wonder if there are a bunch more issues like this.

I poked into it with valgrind, and identified the major leak as being
stuff that is allocated by ExecOpenIndices and not freed by
ExecCloseIndices.  The latter knows it's leaking:

        /*
         * XXX should free indexInfo array here too?  Currently we assume that
         * such stuff will be cleaned up automatically in FreeExecutorState.
         */

On the whole, I'd characterize this as DDL code using pieces of the
executor without satisfying the executor's expectations as to environment
--- specifically, that it'll be run in a memory context that doesn't
outlive executor shutdown.  Normally, any one DDL operation does a limited
number of catalog updates so that small per-update leaks don't cost that
much ... but REASSIGN OWNED creates a loop that can invoke ALTER OWNER
many times.

I think your idea of creating a short-lived context is about right.
Another idea we could consider is to do that within CatalogTupleUpdate;
but I'm not sure that the cost/benefit ratio would be good for most
operations.  Anyway I think ALTER OWNER has other leaks outside the
index-update operations, so we'd still need to do this within
REASSIGN OWNED's loop.


I've tried Justin's patch but it didn't help with my memory allocation issue. FWIW, I attach the patch I used in v14.

DROP OWNED BY likely has similar issues.


Didn't try it, but it wouldn't be a surprise.


--
Guillaume.
Attachment

Re: Lots of memory allocated when reassigning Large Objects

From
Tom Lane
Date:
Guillaume Lelarge <guillaume@lelarge.info> writes:
> I've tried Justin's patch but it didn't help with my memory allocation
> issue. FWIW, I attach the patch I used in v14.

[ looks closer ... ]  Ah, that patch is a bit buggy: it fails to do the
right thing in the cases where the loop does a "continue".  The attached
revision seems to behave properly.

I still see a small leakage, which I think is due to accumulation of
pending sinval messages for the catalog updates.  I'm curious whether
that's big enough to be a problem for Guillaume's use case.  (We've
speculated before about bounding the memory used for pending sinval
in favor of just issuing a cache reset when the list would be too
big.  But nobody's done anything about it, suggesting that people
seldom have a problem in practice.)

>> DROP OWNED BY likely has similar issues.

> Didn't try it, but it wouldn't be a surprise.

I tried just changing the REASSIGN to a DROP in Justin's example,
and immediately hit

ERROR:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction.

thanks to the per-object locks we try to acquire.  So I'm not
sure that the DROP case can reach an interesting amount of
local memory leaked before it runs out of lock-table space.

            regards, tom lane

diff --git a/src/backend/catalog/pg_shdepend.c b/src/backend/catalog/pg_shdepend.c
index 9ea42f805f..d7f0708396 100644
--- a/src/backend/catalog/pg_shdepend.c
+++ b/src/backend/catalog/pg_shdepend.c
@@ -65,6 +65,7 @@
 #include "storage/lmgr.h"
 #include "utils/acl.h"
 #include "utils/fmgroids.h"
+#include "utils/memutils.h"
 #include "utils/syscache.h"

 typedef enum
@@ -1497,6 +1498,7 @@ shdepReassignOwned(List *roleids, Oid newrole)
         while ((tuple = systable_getnext(scan)) != NULL)
         {
             Form_pg_shdepend sdepForm = (Form_pg_shdepend) GETSTRUCT(tuple);
+            MemoryContext cxt, oldcxt;

             /*
              * We only operate on shared objects and objects in the current
@@ -1510,6 +1512,11 @@ shdepReassignOwned(List *roleids, Oid newrole)
             if (sdepForm->deptype != SHARED_DEPENDENCY_OWNER)
                 continue;

+            cxt = AllocSetContextCreate(CurrentMemoryContext,
+                                        "shdepReassignOwned",
+                                        ALLOCSET_DEFAULT_SIZES);
+            oldcxt = MemoryContextSwitchTo(cxt);
+
             /* Issue the appropriate ALTER OWNER call */
             switch (sdepForm->classid)
             {
@@ -1598,6 +1605,10 @@ shdepReassignOwned(List *roleids, Oid newrole)
                     elog(ERROR, "unexpected classid %u", sdepForm->classid);
                     break;
             }
+
+            MemoryContextSwitchTo(oldcxt);
+            MemoryContextDelete(cxt);
+
             /* Make sure the next iteration will see my changes */
             CommandCounterIncrement();
         }

Re: Lots of memory allocated when reassigning Large Objects

From
Guillaume Lelarge
Date:
Le lun. 29 nov. 2021 à 20:39, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Guillaume Lelarge <guillaume@lelarge.info> writes:
> I've tried Justin's patch but it didn't help with my memory allocation
> issue. FWIW, I attach the patch I used in v14.

[ looks closer ... ]  Ah, that patch is a bit buggy: it fails to do the
right thing in the cases where the loop does a "continue".  The attached
revision seems to behave properly.

I still see a small leakage, which I think is due to accumulation of
pending sinval messages for the catalog updates.  I'm curious whether
that's big enough to be a problem for Guillaume's use case.  (We've
speculated before about bounding the memory used for pending sinval
in favor of just issuing a cache reset when the list would be too
big.  But nobody's done anything about it, suggesting that people
seldom have a problem in practice.)


I've tried your patch with my test case. It still uses a lot of memory. Actually even more.

I have this with the log_statement_stats:

1185072 kB max resident size

And I have this with the log-memory-contexts function:

LOG:  Grand total: 1007796352 bytes in 320 blocks; 3453512 free (627 chunks); 1004342840 used

Contrary to Justin's patch, the shdepReassignOwned doesn't seem to be used. I don't get any shdepReassignOwned line in the log file. I tried multiple times to avoid any mistake on my part, but got same result.

>> DROP OWNED BY likely has similar issues.

> Didn't try it, but it wouldn't be a surprise.

I tried just changing the REASSIGN to a DROP in Justin's example,
and immediately hit

ERROR:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction.

thanks to the per-object locks we try to acquire.  So I'm not
sure that the DROP case can reach an interesting amount of
local memory leaked before it runs out of lock-table space.


I've hit the same issue when I tried my ALTER LARGE OBJECT workaround in one transaction.


--
Guillaume.

Re: Lots of memory allocated when reassigning Large Objects

From
Tom Lane
Date:
Guillaume Lelarge <guillaume@lelarge.info> writes:
> Le lun. 29 nov. 2021 à 20:39, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
>> [ looks closer ... ]  Ah, that patch is a bit buggy: it fails to do the
>> right thing in the cases where the loop does a "continue".  The attached
>> revision seems to behave properly.

> I've tried your patch with my test case. It still uses a lot of memory.
> Actually even more.

Hmm ... I tried it with your test case, and I see the backend completing
the query without going beyond 190MB used (which is mostly shared memory).
Without the patch it blows past that point very quickly indeed.

I'm checking it in HEAD though; perhaps there's something else wrong
in the back branches?

            regards, tom lane



Re: Lots of memory allocated when reassigning Large Objects

From
Guillaume Lelarge
Date:
Le lun. 29 nov. 2021 à 22:27, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Guillaume Lelarge <guillaume@lelarge.info> writes:
> Le lun. 29 nov. 2021 à 20:39, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
>> [ looks closer ... ]  Ah, that patch is a bit buggy: it fails to do the
>> right thing in the cases where the loop does a "continue".  The attached
>> revision seems to behave properly.

> I've tried your patch with my test case. It still uses a lot of memory.
> Actually even more.

Hmm ... I tried it with your test case, and I see the backend completing
the query without going beyond 190MB used (which is mostly shared memory).
Without the patch it blows past that point very quickly indeed.

I'm checking it in HEAD though; perhaps there's something else wrong
in the back branches?


That's also what I was thinking. I was only trying with v14. I just checked with v15devel, and your patch works alright. So there must be something else with back branches.


--
Guillaume.

Re: Lots of memory allocated when reassigning Large Objects

From
Tom Lane
Date:
Guillaume Lelarge <guillaume@lelarge.info> writes:
> Le lun. 29 nov. 2021 à 22:27, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
>> I'm checking it in HEAD though; perhaps there's something else wrong
>> in the back branches?

> That's also what I was thinking. I was only trying with v14. I just checked
> with v15devel, and your patch works alright. So there must be something
> else with back branches.

Thanks for confirming, I'll dig into it later.

            regards, tom lane



Re: Lots of memory allocated when reassigning Large Objects

From
Guillaume Lelarge
Date:
Le lun. 29 nov. 2021 à 22:47, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Guillaume Lelarge <guillaume@lelarge.info> writes:
> Le lun. 29 nov. 2021 à 22:27, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
>> I'm checking it in HEAD though; perhaps there's something else wrong
>> in the back branches?

> That's also what I was thinking. I was only trying with v14. I just checked
> with v15devel, and your patch works alright. So there must be something
> else with back branches.

Thanks for confirming, I'll dig into it later.


Thanks a lot.


--
Guillaume.

Re: Lots of memory allocated when reassigning Large Objects

From
Tom Lane
Date:
Guillaume Lelarge <guillaume@lelarge.info> writes:
> Le lun. 29 nov. 2021 à 22:27, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
>> I'm checking it in HEAD though; perhaps there's something else wrong
>> in the back branches?

> That's also what I was thinking. I was only trying with v14. I just checked
> with v15devel, and your patch works alright. So there must be something
> else with back branches.

AFAICT the patch fixes what it intends to fix in v14 too.  The reason the
residual leak is worse in v14 is that the sinval message queue is bulkier.
We improved that in HEAD in commit 3aafc030a.  I'm not sure if I want to
take the risk of back-patching that, even now that it's aged a couple
months in the tree.  It is a pretty localized fix, but it makes some
assumptions about usage patterns that might not hold up.

            regards, tom lane



Re: Lots of memory allocated when reassigning Large Objects

From
Guillaume Lelarge
Date:
Le mar. 30 nov. 2021 à 00:25, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Guillaume Lelarge <guillaume@lelarge.info> writes:
> Le lun. 29 nov. 2021 à 22:27, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
>> I'm checking it in HEAD though; perhaps there's something else wrong
>> in the back branches?

> That's also what I was thinking. I was only trying with v14. I just checked
> with v15devel, and your patch works alright. So there must be something
> else with back branches.

AFAICT the patch fixes what it intends to fix in v14 too.  The reason the
residual leak is worse in v14 is that the sinval message queue is bulkier.
We improved that in HEAD in commit 3aafc030a.

I wanted to make sure that commit 3aafc030a fixed this issue. So I did a few tests:

without 3aafc030a, without your latest patch
  1182148 kB max resident size
with 3aafc030a, without your latest patch
  1306812 kB max resident size

without 3aafc030a, with your latest patch
  1182128 kB max resident size
with 3aafc030a, with your latest patch
  180996 kB max resident size

Definitely, 3aafc030a and your latest patch allow PostgreSQL to use much less memory. Going from 1GB to 180MB is awesome.

I tried to cherry-pick 3aafc030a on v14, but it didn't apply cleanly, and the work was a bit overwhelming for me, at least in the morning. I'll try again today, but I don't have much hope.

I'm not sure if I want to
take the risk of back-patching that, even now that it's aged a couple
months in the tree.  It is a pretty localized fix, but it makes some
assumptions about usage patterns that might not hold up.


I understand. Of course, it would be better if it could be fixed for each supported version but I'm already happy that it could be fixed in the next release.


--
Guillaume.

Re: Lots of memory allocated when reassigning Large Objects

From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes:
> @Guillaume: Even if memory use with the patch isn't constant, I imagine it's
> enough to have avoided OOM.

I think it's good enough in HEAD.  In the back branches, the sinval
queue growth is bad enough that there's still an issue.  Still,
this is a useful improvement, so I added some comments and pushed it.

            regards, tom lane



Re: Lots of memory allocated when reassigning Large Objects

From
Guillaume Lelarge
Date:
Le mer. 1 déc. 2021 à 19:48, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
Justin Pryzby <pryzby@telsasoft.com> writes:
> @Guillaume: Even if memory use with the patch isn't constant, I imagine it's
> enough to have avoided OOM.

I think it's good enough in HEAD.  In the back branches, the sinval
queue growth is bad enough that there's still an issue.  Still,
this is a useful improvement, so I added some comments and pushed it.


Thanks.


--
Guillaume.