Thread: Vacuuming the free space map considered harmful?
We're tracking down an issue that we've seen in two separate installations so far, which is that, at the very end of a vacuum,the vacuum operation starts using *very* high levels of CPU and (sometimes) I/O, often to the point that the systembecomes unable to service other requests. We've seen this on versions 15, 16, and 17 so far. The common data points are: 1. The table being vacuumed is large (>250 million rows, often in the >10 billion row level). 2. The table has a relatively high churn rate. 3. The number of updated / deleted rows before that particular vacuum cycle are very high. Everything seems to point to the vacuum free space map operation, since it would have a lot of work to do in that particularsituation, it happens at just the right place in the vacuum cycle, and its resource consumption is not throttledthe way the regular vacuum operation is. Assuming this analysis is correct, our current proposal as a temporary fix is to increase the frequency of autovacuum onthose tables, so that the free space map vacuum operation has less to do, and is less likely to consume the system. Inthe longer run, is it worth considering implementing a cost delay inside of the free space map update operations?
Hi, On Wed, Mar 19, 2025 at 09:53:37AM +0100, Christophe Pettus wrote: > We're tracking down an issue that we've seen in two separate > installations so far, which is that, at the very end of a vacuum, the > vacuum operation starts using *very* high levels of CPU and > (sometimes) I/O, often to the point that the system becomes unable to > service other requests. We've seen this on versions 15, 16, and 17 so > far. Ouch. > The common data points are: > > 1. The table being vacuumed is large (>250 million rows, often in the > >10 billion row level). > 2. The table has a relatively high churn rate. > 3. The number of updated / deleted rows before that particular vacuum > cycle are very high. > > Everything seems to point to the vacuum free space map operation, > since it would have a lot of work to do in that particular situation, > it happens at just the right place in the vacuum cycle, and its > resource consumption is not throttled the way the regular vacuum > operation is. Independent of throttling, if it turns out free space map vacuum is indeed the culprit, I think it would make sense to add that one as a dedicated phase so it can be more easily tracked in pg_stat_progress_vacuum etc. Michael
On 2025-Mar-19, Christophe Pettus wrote: > Everything seems to point to the vacuum free space map operation, > since it would have a lot of work to do in that particular situation, > it happens at just the right place in the vacuum cycle, and its > resource consumption is not throttled the way the regular vacuum > operation is. > > Assuming this analysis is correct, our current proposal as a temporary > fix is to increase the frequency of autovacuum on those tables, so > that the free space map vacuum operation has less to do, and is less > likely to consume the system. In the longer run, is it worth > considering implementing a cost delay inside of the free space map > update operations? Assuming this analysis is correct, I agree that the FSM vacuuming should also be throttled, as long as that can be done without blocking concurrent operations (insertions) on the table. Another idea to consider might be to split out the vacuuming of the FSM from the vacuuming of the table itself; given that HOT pruning can free significant amounts of space prior to actual vacuuming, it's possible that there are wins to scheduling it separately. However, having a way to track stats for it might be tricky. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "If you want to have good ideas, you must have many ideas. Most of them will be wrong, and what you have to learn is which ones to throw away." (Linus Pauling)
On Wed, Mar 19, 2025 at 4:54 AM Christophe Pettus <xof@thebuild.com> wrote: > > Everything seems to point to the vacuum free space map operation, since it would have a lot of work to do in that particularsituation, it happens at just the right place in the vacuum cycle, and its resource consumption is not throttledthe way the regular vacuum operation is. Do you know how big the FSM was? As others have said, it could be worth adding a phase to pg_stat_progress_vacuum. I know you said you saw this all the way back to 15, but it made me wonder if this would actually get worse after 17 and the tidstore optimization -- since a large table with indexes requiring multiple vacuum passes would actually end up having its FSM vacuumed sooner. - Melanie
> On Mar 19, 2025, at 14:12, Melanie Plageman <melanieplageman@gmail.com> wrote: > > Do you know how big the FSM was? Unfortunately, no; both installations are locked-down hosted environments. > As others have said, it could be worth adding a phase to > pg_stat_progress_vacuum. +1.
> On Mar 19, 2025, at 12:12, Álvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Assuming this analysis is correct, I agree that the FSM vacuuming should > also be throttled, as long as that can be done without blocking > concurrent operations (insertions) on the table. From an (admittedly somewhat naïve) look at the code, it appears that having it honor the cost delay wouldn't introduce excessiveblocking, as long as the delay wasn't implemented at a really dumb place.
Hi, On 2025-03-19 15:05:47 +0100, Christophe Pettus wrote: > > On Mar 19, 2025, at 14:12, Melanie Plageman <melanieplageman@gmail.com> wrote: > > > > Do you know how big the FSM was? > > Unfortunately, no; both installations are locked-down hosted environments. That can be checked with pg_relation_size(), e.g. SELECT pg_relation_size('pg_class', 'fsm'); nothing special required. Greetings, Andres Freund
> On Mar 19, 2025, at 15:06, Christophe Pettus <xof@thebuild.com> wrote: > > From an (admittedly somewhat naïve) look at the code, it appears that having it honor the cost delay wouldn't introduceexcessive blocking, as long as the delay wasn't implemented at a really dumb place. Specifically, it looks like doing the cost-based wait immediately after backend/storage/freespace/freespace.c:898 (in HEAD)wouldn't be in a critical section, although I could be totally in the weeds on this.
> On Mar 19, 2025, at 15:12, Andres Freund <andres@anarazel.de> wrote: > That can be checked with pg_relation_size(), e.g. > SELECT pg_relation_size('pg_class', 'fsm'); > nothing special required. Thanks!