Thread: possible memory leak in VACUUM ANALYZE
Hi
Just a small note - I executed VACUUM ANALYZE on one customer's database, and I had to cancel it after a few hours, because it had more than 20GB RAM (almost all physical RAM). The memory leak is probably not too big. This database is a little bit unusual. This one database has more than 1 800 000 tables. and the same number of indexes.
Regards
Pavel
Hi, On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote: > Just a small note - I executed VACUUM ANALYZE on one customer's database, > and I had to cancel it after a few hours, because it had more than 20GB RAM > (almost all physical RAM). Just to make sure: You're certain this was an actual memory leak, not just vacuum ending up having referenced all of shared_buffers? Unless you use huge pages, RSS increases over time, as a process touched more and more pages in shared memory. Of course that couldn't explain rising above shared_buffers + overhead. > The memory leak is probably not too big. This database is a little bit > unusual. This one database has more than 1 800 000 tables. and the same > number of indexes. If you have 1.8 million tables in a single database, what you saw might just have been the size of the relation and catalog caches. Greetings, Andres Freund
pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de> napsal:
Hi,
On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
> Just a small note - I executed VACUUM ANALYZE on one customer's database,
> and I had to cancel it after a few hours, because it had more than 20GB RAM
> (almost all physical RAM).
Just to make sure: You're certain this was an actual memory leak, not just
vacuum ending up having referenced all of shared_buffers? Unless you use huge
pages, RSS increases over time, as a process touched more and more pages in
shared memory. Of course that couldn't explain rising above shared_buffers +
overhead.
> The memory leak is probably not too big. This database is a little bit
> unusual. This one database has more than 1 800 000 tables. and the same
> number of indexes.
If you have 1.8 million tables in a single database, what you saw might just
have been the size of the relation and catalog caches.
can be
Regards
Pavel
Greetings,
Andres Freund
On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote: > pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de> napsal: > > > > On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote: > > > Just a small note - I executed VACUUM ANALYZE on one customer's database, > > > and I had to cancel it after a few hours, because it had more than 20GB RAM > > > (almost all physical RAM). > > > > Just to make sure: You're certain this was an actual memory leak, not just > > vacuum ending up having referenced all of shared_buffers? Unless you use huge > > pages, RSS increases over time, as a process touched more and more pages in > > shared memory. Of course that couldn't explain rising above > > shared_buffers + overhead. > > > > > The memory leak is probably not too big. This database is a little bit > > > unusual. This one database has more than 1 800 000 tables. and the same > > > number of indexes. > > > > If you have 1.8 million tables in a single database, what you saw might just > > have been the size of the relation and catalog caches. > > can be Well, how big was shared_buffers on that instance ? -- Justin
pá 10. 2. 2023 v 23:01 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:
> pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de> napsal:
> >
> > On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
> > > Just a small note - I executed VACUUM ANALYZE on one customer's database,
> > > and I had to cancel it after a few hours, because it had more than 20GB RAM
> > > (almost all physical RAM).
> >
> > Just to make sure: You're certain this was an actual memory leak, not just
> > vacuum ending up having referenced all of shared_buffers? Unless you use huge
> > pages, RSS increases over time, as a process touched more and more pages in
> > shared memory. Of course that couldn't explain rising above
> > shared_buffers + overhead.
> >
> > > The memory leak is probably not too big. This database is a little bit
> > > unusual. This one database has more than 1 800 000 tables. and the same
> > > number of indexes.
> >
> > If you have 1.8 million tables in a single database, what you saw might just
> > have been the size of the relation and catalog caches.
>
> can be
Well, how big was shared_buffers on that instance ?
20GB RAM
20GB swap
2GB shared buffers
--
Justin
On Sat, Feb 11, 2023 at 07:06:45AM +0100, Pavel Stehule wrote: > pá 10. 2. 2023 v 23:01 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal: > > On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote: > > > pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de> napsal: > > > > On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote: > > > > > Just a small note - I executed VACUUM ANALYZE on one customer's database, > > > > > and I had to cancel it after a few hours, because it had more than 20GB RAM > > > > > (almost all physical RAM). > > > > > > > > Just to make sure: You're certain this was an actual memory leak, not just > > > > vacuum ending up having referenced all of shared_buffers? Unless you use huge > > > > pages, RSS increases over time, as a process touched more and more pages in > > > > shared memory. Of course that couldn't explain rising above > > > > shared_buffers + overhead. > > > > > > > > > The memory leak is probably not too big. This database is a little bit > > > > > unusual. This one database has more than 1 800 000 tables. and the same > > > > > number of indexes. > > > > > > > > If you have 1.8 million tables in a single database, what you saw might just > > > > have been the size of the relation and catalog caches. > > > > > > can be > > > > Well, how big was shared_buffers on that instance ? > > 20GB RAM > 20GB swap > 2GB shared buffers Thanks; so that can't explain using more than 2GB + a bit of overhead. Can you reproduce the problem and figure out which relation was being processed, or if the memory use is growing across relations? pg_stat_progress_analyze/vacuum would be one thing to check. Does VACUUM alone trigger the issue ? What about ANALYZE ? Was parallel vacuum happening (are there more than one index per table) ? Do you have any extended stats objects or non-default stats targets ? What server version is it? What OS? Extensions? Non-btree indexes? BTW I'm interested about this because I have an VM instance running v15 which has been killed more than a couple times in the last 6 months, and I haven't been able to diagnose why. But autovacuum/analyze could explain it. On this one particular instance, we don't have many relations, though... -- Justin
Hi, On 2023-02-11 00:53:48 -0600, Justin Pryzby wrote: > On Sat, Feb 11, 2023 at 07:06:45AM +0100, Pavel Stehule wrote: > > pá 10. 2. 2023 v 23:01 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal: > > > On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote: > > > > pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de> napsal: > > > > > On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote: > > > > > > Just a small note - I executed VACUUM ANALYZE on one customer's database, > > > > > > and I had to cancel it after a few hours, because it had more than 20GB RAM > > > > > > (almost all physical RAM). > > > > > > > > > > Just to make sure: You're certain this was an actual memory leak, not just > > > > > vacuum ending up having referenced all of shared_buffers? Unless you use huge > > > > > pages, RSS increases over time, as a process touched more and more pages in > > > > > shared memory. Of course that couldn't explain rising above > > > > > shared_buffers + overhead. > > > > > > > > > > > The memory leak is probably not too big. This database is a little bit > > > > > > unusual. This one database has more than 1 800 000 tables. and the same > > > > > > number of indexes. > > > > > > > > > > If you have 1.8 million tables in a single database, what you saw might just > > > > > have been the size of the relation and catalog caches. > > > > > > > > can be > > > > > > Well, how big was shared_buffers on that instance ? > > > > 20GB RAM > > 20GB swap > > 2GB shared buffers > > Thanks; so that can't explain using more than 2GB + a bit of overhead. I think my theory of 1.8 million relcache / catcache entries is pretty good... I'd do the vacuum analyze again, interrupt once memory usage is high, and check SELECT * FROM pg_backend_memory_contexts ORDER BY total_bytes DESC > BTW I'm interested about this because I have an VM instance running v15 > which has been killed more than a couple times in the last 6 months, and > I haven't been able to diagnose why. But autovacuum/analyze could > explain it. On this one particular instance, we don't have many > relations, though... Killed in what way? OOM? If you'd set up strict overcommit you'd get a nice memory dump in the log... Greetings, Andres Freund
so 11. 2. 2023 v 7:53 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Sat, Feb 11, 2023 at 07:06:45AM +0100, Pavel Stehule wrote:
> pá 10. 2. 2023 v 23:01 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
> > On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:
> > > pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de> napsal:
> > > > On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
> > > > > Just a small note - I executed VACUUM ANALYZE on one customer's database,
> > > > > and I had to cancel it after a few hours, because it had more than 20GB RAM
> > > > > (almost all physical RAM).
> > > >
> > > > Just to make sure: You're certain this was an actual memory leak, not just
> > > > vacuum ending up having referenced all of shared_buffers? Unless you use huge
> > > > pages, RSS increases over time, as a process touched more and more pages in
> > > > shared memory. Of course that couldn't explain rising above
> > > > shared_buffers + overhead.
> > > >
> > > > > The memory leak is probably not too big. This database is a little bit
> > > > > unusual. This one database has more than 1 800 000 tables. and the same
> > > > > number of indexes.
> > > >
> > > > If you have 1.8 million tables in a single database, what you saw might just
> > > > have been the size of the relation and catalog caches.
> > >
> > > can be
> >
> > Well, how big was shared_buffers on that instance ?
>
> 20GB RAM
> 20GB swap
> 2GB shared buffers
Thanks; so that can't explain using more than 2GB + a bit of overhead.
Can you reproduce the problem and figure out which relation was being
processed, or if the memory use is growing across relations?
pg_stat_progress_analyze/vacuum would be one thing to check.
Does VACUUM alone trigger the issue ? What about ANALYZE ?
I executed VACUUM and ANALYZE separately, and memory grew up in both cases with similar speed.
almost all tables has less than 120 pages, and less than thousand tuples
and almost all tables has twenty fields + one geometry field + gist index
the size of pg_attribute has 6GB and pg_class has about 2GB
Unfortunately, that is customer's production server, and I have not full access there so I cannot to do deeper investigation
Was parallel vacuum happening (are there more than one index per table) ?
probably not - almost all tables are small
Do you have any extended stats objects or non-default stats targets ?
What server version is it? What OS? Extensions? Non-btree indexes?
PostgreSQL 14 on linux with installed PostGIS
BTW I'm interested about this because I have an VM instance running v15
which has been killed more than a couple times in the last 6 months, and
I haven't been able to diagnose why. But autovacuum/analyze could
explain it. On this one particular instance, we don't have many
relations, though...
--
Justin