Thread: possible memory leak in VACUUM ANALYZE

possible memory leak in VACUUM ANALYZE

From

Pavel Stehule

Date:

10 February 2023, 23:09:06

Just a small note - I executed VACUUM ANALYZE on one customer's database, and I had to cancel it after a few hours, because it had more than 20GB RAM (almost all physical RAM). The memory leak is probably not too big. This database is a little bit unusual. This one database has more than 1 800 000 tables. and the same number of indexes.

Regards

Pavel

Re: possible memory leak in VACUUM ANALYZE

From

Andres Freund

Date:

10 February 2023, 23:18:39

Hi,

On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
> Just a small note - I executed VACUUM ANALYZE on one customer's database,
> and I had to cancel it after a few hours, because it had more than 20GB RAM
> (almost all physical RAM).

Just to make sure: You're certain this was an actual memory leak, not just
vacuum ending up having referenced all of shared_buffers?  Unless you use huge
pages, RSS increases over time, as a process touched more and more pages in
shared memory.  Of course that couldn't explain rising above shared_buffers +
overhead.

> The memory leak is probably not too big. This database is a little bit
> unusual.  This one database has more than 1 800 000 tables. and the same
> number of indexes.

If you have 1.8 million tables in a single database, what you saw might just
have been the size of the relation and catalog caches.

Greetings,

Andres Freund

Re: possible memory leak in VACUUM ANALYZE

From

Pavel Stehule

Date:

10 February 2023, 23:23:11

pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de> napsal:

Hi,

On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
> Just a small note - I executed VACUUM ANALYZE on one customer's database,
> and I had to cancel it after a few hours, because it had more than 20GB RAM
> (almost all physical RAM).

Just to make sure: You're certain this was an actual memory leak, not just
vacuum ending up having referenced all of shared_buffers? Unless you use huge
pages, RSS increases over time, as a process touched more and more pages in
shared memory. Of course that couldn't explain rising above shared_buffers +
overhead.

> The memory leak is probably not too big. This database is a little bit
> unusual. This one database has more than 1 800 000 tables. and the same
> number of indexes.

If you have 1.8 million tables in a single database, what you saw might just
have been the size of the relation and catalog caches.

can be

Regards

Pavel

Greetings,

Andres Freund

Re: possible memory leak in VACUUM ANALYZE

From

Justin Pryzby

Date:

11 February 2023, 01:01:31

On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:
> pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de> napsal:
> >
> > On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
> > > Just a small note - I executed VACUUM ANALYZE on one customer's database,
> > > and I had to cancel it after a few hours, because it had more than 20GB RAM
> > > (almost all physical RAM).
> >
> > Just to make sure: You're certain this was an actual memory leak, not just
> > vacuum ending up having referenced all of shared_buffers?  Unless you use huge
> > pages, RSS increases over time, as a process touched more and more pages in
> > shared memory.  Of course that couldn't explain rising above
> > shared_buffers + overhead.
> >
> > > The memory leak is probably not too big. This database is a little bit
> > > unusual.  This one database has more than 1 800 000 tables. and the same
> > > number of indexes.
> >
> > If you have 1.8 million tables in a single database, what you saw might just
> > have been the size of the relation and catalog caches.
> 
> can be

Well, how big was shared_buffers on that instance ?

-- 
Justin

Re: possible memory leak in VACUUM ANALYZE

From

Pavel Stehule

Date:

11 February 2023, 09:06:45

pá 10. 2. 2023 v 23:01 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:

On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:
> pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de> napsal:
> >
> > On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
> > > Just a small note - I executed VACUUM ANALYZE on one customer's database,
> > > and I had to cancel it after a few hours, because it had more than 20GB RAM
> > > (almost all physical RAM).
> >
> > Just to make sure: You're certain this was an actual memory leak, not just
> > vacuum ending up having referenced all of shared_buffers? Unless you use huge
> > pages, RSS increases over time, as a process touched more and more pages in
> > shared memory. Of course that couldn't explain rising above
> > shared_buffers + overhead.
> >
> > > The memory leak is probably not too big. This database is a little bit
> > > unusual. This one database has more than 1 800 000 tables. and the same
> > > number of indexes.
> >
> > If you have 1.8 million tables in a single database, what you saw might just
> > have been the size of the relation and catalog caches.
>
> can be

Well, how big was shared_buffers on that instance ?

20GB RAM

20GB swap

2GB shared buffers

--
Justin

Re: possible memory leak in VACUUM ANALYZE

From

Justin Pryzby

Date:

11 February 2023, 09:53:48

On Sat, Feb 11, 2023 at 07:06:45AM +0100, Pavel Stehule wrote:
> pá 10. 2. 2023 v 23:01 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
> > On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:
> > > pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de> napsal:
> > > > On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
> > > > > Just a small note - I executed VACUUM ANALYZE on one customer's database,
> > > > > and I had to cancel it after a few hours, because it had more than 20GB RAM
> > > > > (almost all physical RAM).
> > > >
> > > > Just to make sure: You're certain this was an actual memory leak, not just
> > > > vacuum ending up having referenced all of shared_buffers?  Unless you use huge
> > > > pages, RSS increases over time, as a process touched more and more pages in
> > > > shared memory.  Of course that couldn't explain rising above
> > > > shared_buffers + overhead.
> > > >
> > > > > The memory leak is probably not too big. This database is a little bit
> > > > > unusual.  This one database has more than 1 800 000 tables. and the same
> > > > > number of indexes.
> > > >
> > > > If you have 1.8 million tables in a single database, what you saw might just
> > > > have been the size of the relation and catalog caches.
> > >
> > > can be
> >
> > Well, how big was shared_buffers on that instance ?
> 
> 20GB RAM
> 20GB swap
> 2GB shared buffers

Thanks; so that can't explain using more than 2GB + a bit of overhead.

Can you reproduce the problem and figure out which relation was being
processed, or if the memory use is growing across relations?
pg_stat_progress_analyze/vacuum would be one thing to check.
Does VACUUM alone trigger the issue ?  What about ANALYZE ?

Was parallel vacuum happening (are there more than one index per table) ?

Do you have any extended stats objects or non-default stats targets ?
What server version is it?  What OS?  Extensions?  Non-btree indexes?

BTW I'm interested about this because I have an VM instance running v15
which has been killed more than a couple times in the last 6 months, and
I haven't been able to diagnose why.  But autovacuum/analyze could
explain it.  On this one particular instance, we don't have many
relations, though...

-- 
Justin

Re: possible memory leak in VACUUM ANALYZE

From

Andres Freund

Date:

11 February 2023, 10:18:26

Hi,

On 2023-02-11 00:53:48 -0600, Justin Pryzby wrote:
> On Sat, Feb 11, 2023 at 07:06:45AM +0100, Pavel Stehule wrote:
> > pá 10. 2. 2023 v 23:01 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
> > > On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:
> > > > pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de> napsal:
> > > > > On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
> > > > > > Just a small note - I executed VACUUM ANALYZE on one customer's database,
> > > > > > and I had to cancel it after a few hours, because it had more than 20GB RAM
> > > > > > (almost all physical RAM).
> > > > >
> > > > > Just to make sure: You're certain this was an actual memory leak, not just
> > > > > vacuum ending up having referenced all of shared_buffers?  Unless you use huge
> > > > > pages, RSS increases over time, as a process touched more and more pages in
> > > > > shared memory.  Of course that couldn't explain rising above
> > > > > shared_buffers + overhead.
> > > > >
> > > > > > The memory leak is probably not too big. This database is a little bit
> > > > > > unusual.  This one database has more than 1 800 000 tables. and the same
> > > > > > number of indexes.
> > > > >
> > > > > If you have 1.8 million tables in a single database, what you saw might just
> > > > > have been the size of the relation and catalog caches.
> > > >
> > > > can be
> > >
> > > Well, how big was shared_buffers on that instance ?
> > 
> > 20GB RAM
> > 20GB swap
> > 2GB shared buffers
> 
> Thanks; so that can't explain using more than 2GB + a bit of overhead.

I think my theory of 1.8 million relcache / catcache entries is pretty good...

I'd do the vacuum analyze again, interrupt once memory usage is high, and
check
  SELECT * FROM pg_backend_memory_contexts ORDER BY total_bytes DESC


> BTW I'm interested about this because I have an VM instance running v15
> which has been killed more than a couple times in the last 6 months, and
> I haven't been able to diagnose why.  But autovacuum/analyze could
> explain it.  On this one particular instance, we don't have many
> relations, though...

Killed in what way? OOM?

If you'd set up strict overcommit you'd get a nice memory dump in the log...

Greetings,

Andres Freund

Re: possible memory leak in VACUUM ANALYZE

From

Pavel Stehule

Date:

11 February 2023, 10:20:37

so 11. 2. 2023 v 7:53 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:

On Sat, Feb 11, 2023 at 07:06:45AM +0100, Pavel Stehule wrote:
> pá 10. 2. 2023 v 23:01 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
> > On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:
> > > pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de> napsal:
> > > > On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
> > > > > Just a small note - I executed VACUUM ANALYZE on one customer's database,
> > > > > and I had to cancel it after a few hours, because it had more than 20GB RAM
> > > > > (almost all physical RAM).
> > > >
> > > > Just to make sure: You're certain this was an actual memory leak, not just
> > > > vacuum ending up having referenced all of shared_buffers? Unless you use huge
> > > > pages, RSS increases over time, as a process touched more and more pages in
> > > > shared memory. Of course that couldn't explain rising above
> > > > shared_buffers + overhead.
> > > >
> > > > > The memory leak is probably not too big. This database is a little bit
> > > > > unusual. This one database has more than 1 800 000 tables. and the same
> > > > > number of indexes.
> > > >
> > > > If you have 1.8 million tables in a single database, what you saw might just
> > > > have been the size of the relation and catalog caches.
> > >
> > > can be
> >
> > Well, how big was shared_buffers on that instance ?
>
> 20GB RAM
> 20GB swap
> 2GB shared buffers

Thanks; so that can't explain using more than 2GB + a bit of overhead.

Can you reproduce the problem and figure out which relation was being
processed, or if the memory use is growing across relations?
pg_stat_progress_analyze/vacuum would be one thing to check.
Does VACUUM alone trigger the issue ? What about ANALYZE ?

I executed VACUUM and ANALYZE separately, and memory grew up in both cases with similar speed.

almost all tables has less than 120 pages, and less than thousand tuples

and almost all tables has twenty fields + one geometry field + gist index

the size of pg_attribute has 6GB and pg_class has about 2GB

Unfortunately, that is customer's production server, and I have not full access there so I cannot to do deeper investigation

Was parallel vacuum happening (are there more than one index per table) ?

probably not - almost all tables are small

Do you have any extended stats objects or non-default stats targets ?
What server version is it? What OS? Extensions? Non-btree indexes?

PostgreSQL 14 on linux with installed PostGIS

BTW I'm interested about this because I have an VM instance running v15
which has been killed more than a couple times in the last 6 months, and
I haven't been able to diagnose why. But autovacuum/analyze could
explain it. On this one particular instance, we don't have many
relations, though...

--
Justin