Thread: maintenance_work_mem = 64kB doesn't work for vacuum
Hi, Commit bbf668d66fbf6 (back-patched to v17) lowered the minimum maintenance_work_mem to 64kB, but it doesn't work for parallel vacuum cases since the minimum dsa segment size (DSA_MIN_SEGMENT_SIZE) is 256kB. As soon as the radix tree allocates its control object and the root node, the memory usage exceeds the maintenance_work_mem limit and vacuum ends up with the following assertion: TRAP: failed Assert("vacrel->lpdead_item_pages > 0"), File: "vacuumlazy.c", Line: 2447, PID: 3208575 On build without --enable-cassert, vacuum never ends. I've tried to lower DSA_MIN_SEGMENT_SIZE to 64kB but no luck. Investigating it further, dsa creates a 64kB superblock for each size class and when creating a new shared radix tree we need to create two superblocks: one for the radix tree control data (64 bytes) and another one for the root node (40 bytes). Also, each superblock requires a span data, which uses 1 page (4096kB). Therefore, we need at least 136kB for a shared radix tree even when it's empty. A simple fix is to bump the minimum maintenance_work_mem to 256kB. We would break the compatibility for backbranch (i.e. v17) but I guess it's unlikely that existing v17 users are using less than 1MB maintenance_work_mem (the release note doesn't mention the fact that we lowered the minimum value). Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Mon, 10 Mar 2025 at 07:46, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > A simple fix is to bump the minimum maintenance_work_mem to 256kB. We > would break the compatibility for backbranch (i.e. v17) but I guess > it's unlikely that existing v17 users are using less than 1MB > maintenance_work_mem (the release note doesn't mention the fact that > we lowered the minimum value). Could you do something similar to what's in hash_agg_check_limits() where we check we've got at least 1 item before bailing before we've used up the all the prescribed memory? That seems like a safer coding practise as if in the future the minimum usage for a DSM segment goes above 256KB, the bug comes back again. David
On Mon, Mar 10, 2025 at 1:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Commit bbf668d66fbf6 (back-patched to v17) lowered the minimum > maintenance_work_mem to 64kB, but it doesn't work for parallel vacuum That was done in the first place to make a regression test for a bug fix easier, but that test never got committed. In any case I found it worked back in July: https://www.postgresql.org/message-id/CANWCAZZb7wd403wHQQUJZjkF%2BRWKAAa%2BWARP0Rj0EyMcfcdN9Q%40mail.gmail.com -- John Naylor Amazon Web Services
On Sun, Mar 9, 2025 at 9:24 PM John Naylor <johncnaylorls@gmail.com> wrote: > > On Mon, Mar 10, 2025 at 1:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > Commit bbf668d66fbf6 (back-patched to v17) lowered the minimum > > maintenance_work_mem to 64kB, but it doesn't work for parallel vacuum > > That was done in the first place to make a regression test for a bug > fix easier, but that test never got committed. In any case I found it > worked back in July: Yes, I would like to keep the lower minimum. I really do have every intention of committing that test. Apologies for taking so long. Raising the limit to 256 kB might make the test take too long. And I think it's nice to have that coverage (not just of the vacuum bug but of the multi-index vacuum pass vacuum in a natural setting [as opposed to the tidstore test module]). I don't recall if we have that elsewhere. - Melanie
On Mon, 10 Mar 2025 at 10:30, David Rowley <dgrowleyml@gmail.com> wrote: > Could you do something similar to what's in hash_agg_check_limits() > where we check we've got at least 1 item before bailing before we've > used up the all the prescribed memory? That seems like a safer coding > practise as if in the future the minimum usage for a DSM segment goes > above 256KB, the bug comes back again. FWIW, I had something like the attached in mind. David
Attachment
On Sun, Mar 9, 2025 at 7:03 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Mon, 10 Mar 2025 at 10:30, David Rowley <dgrowleyml@gmail.com> wrote: > > Could you do something similar to what's in hash_agg_check_limits() > > where we check we've got at least 1 item before bailing before we've > > used up the all the prescribed memory? That seems like a safer coding > > practise as if in the future the minimum usage for a DSM segment goes > > above 256KB, the bug comes back again. > > FWIW, I had something like the attached in mind. > Thank you for the patch! I like your idea. This means that even if we set maintenance_work_mem to 64kB the memory usage would not actually be limited to 64kB but probably we're fine as it's primarily testing purpose. Regarding that patch, we need to note that the lpdead_items is a counter that is not reset in the entire vacuum. Therefore, with maintenance_work_mem = 64kB, once we collect at least one lpdead item, we perform a cycle of index vacuuming and heap vacuuming for every subsequent block even if they don't have a lpdead item. I think we should use vacrel->dead_items_info->num_items instead. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Mon, 10 Mar 2025 at 17:22, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Regarding that patch, we need to note that the lpdead_items is a > counter that is not reset in the entire vacuum. Therefore, with > maintenance_work_mem = 64kB, once we collect at least one lpdead item, > we perform a cycle of index vacuuming and heap vacuuming for every > subsequent block even if they don't have a lpdead item. I think we > should use vacrel->dead_items_info->num_items instead. OK, I didn't study the code enough to realise that. My patch was only intended as an indication of what I thought. Please feel free to proceed with your own patch using the correct field. When playing with parallel vacuum, I also wondered if there should be some heuristic that avoids parallel vacuum unless the user specifically asked for it in the command when maintenance_work_mem is set to something far too low. Take the following case as an example: set maintenance_work_mem=64; create table aa(a int primary key, b int unique); insert into aa select a,a from generate_Series(1,1000000) a; delete from aa; -- try a vacuum with no parallelism vacuum (verbose, parallel 0) aa; system usage: CPU: user: 0.53 s, system: 0.00 s, elapsed: 0.57 s If I did the following instead: vacuum (verbose) aa; The vacuum goes parallel and it takes a very long time due to launching a parallel worker to do 1 page worth of tuples. I see the following message 4425 times INFO: launched 1 parallel vacuum worker for index vacuuming (planned: 1) and takes about 30 seconds to complete: system usage: CPU: user: 14.00 s, system: 0.81 s, elapsed: 30.86 s Shouldn't the code in parallel_vacuum_compute_workers() try and pick a good value for the workers based on the available memory and table size when the user does not explicitly specify how many workers they want? David
On Mon, Mar 10, 2025 at 2:53 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Mon, 10 Mar 2025 at 17:22, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Regarding that patch, we need to note that the lpdead_items is a > > counter that is not reset in the entire vacuum. Therefore, with > > maintenance_work_mem = 64kB, once we collect at least one lpdead item, > > we perform a cycle of index vacuuming and heap vacuuming for every > > subsequent block even if they don't have a lpdead item. I think we > > should use vacrel->dead_items_info->num_items instead. > > OK, I didn't study the code enough to realise that. My patch was only > intended as an indication of what I thought. Please feel free to > proceed with your own patch using the correct field. > > When playing with parallel vacuum, I also wondered if there should be > some heuristic that avoids parallel vacuum unless the user > specifically asked for it in the command when maintenance_work_mem is > set to something far too low. > > Take the following case as an example: > set maintenance_work_mem=64; > create table aa(a int primary key, b int unique); > insert into aa select a,a from generate_Series(1,1000000) a; > delete from aa; > > -- try a vacuum with no parallelism > vacuum (verbose, parallel 0) aa; > > system usage: CPU: user: 0.53 s, system: 0.00 s, elapsed: 0.57 s > > If I did the following instead: > > vacuum (verbose) aa; > > The vacuum goes parallel and it takes a very long time due to > launching a parallel worker to do 1 page worth of tuples. I see the > following message 4425 times > > INFO: launched 1 parallel vacuum worker for index vacuuming (planned: 1) > > and takes about 30 seconds to complete: system usage: CPU: user: 14.00 > s, system: 0.81 s, elapsed: 30.86 s > > Shouldn't the code in parallel_vacuum_compute_workers() try and pick a > good value for the workers based on the available memory and table > size when the user does not explicitly specify how many workers they > want? I think in your case the threshold of min_parallel_index_scan_size didn't work well. Given that one worker is assigned to one index and index vacuum time mostly depends on the index size, the index size would be a good criterion to decide the parallel degree. For example, even if the table has only one dead item, index vacuuming would take a long time if indexes are large as we scan the whole indexes in common cases (e.g. btree indexes), in which we would like to use parallel index vacuuming. Also, even if the table has many dead items but its indexes are small (e.g., expression indexes), it would be better not to use parallel index vacuuming. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Mon, Mar 10, 2025 at 2:53 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Mon, 10 Mar 2025 at 17:22, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Regarding that patch, we need to note that the lpdead_items is a > > counter that is not reset in the entire vacuum. Therefore, with > > maintenance_work_mem = 64kB, once we collect at least one lpdead item, > > we perform a cycle of index vacuuming and heap vacuuming for every > > subsequent block even if they don't have a lpdead item. I think we > > should use vacrel->dead_items_info->num_items instead. > > OK, I didn't study the code enough to realise that. My patch was only > intended as an indication of what I thought. Please feel free to > proceed with your own patch using the correct field. > I've attached the patch. I added the minimum regression tests for that. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, 18 Mar 2025 at 05:49, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I've attached the patch. I added the minimum regression tests for that. I think the change to vacuumlazy.c is ok. The new test you've added creates a table called pvactst2 but then adds a test that uses the pvactst table. Did you mean to skip the DROP TABLE pvactst2;? Is there a reason to keep the maintenance_work_mem=64 for the subsequent existing test? David
On Mon, Mar 17, 2025 at 7:06 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 18 Mar 2025 at 05:49, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I've attached the patch. I added the minimum regression tests for that. > > I think the change to vacuumlazy.c is ok. The new test you've added > creates a table called pvactst2 but then adds a test that uses the > pvactst table. Fixed. > Did you mean to skip the DROP TABLE pvactst2;? Yes, added DROP TABLE pvactst2. > Is there a reason to keep the maintenance_work_mem=64 for the > subsequent existing test? No, I reset it immediately after tests for pvactst2. I've attached the updated patch. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, 18 Mar 2025 at 19:34, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I've attached the updated patch. Looks good to me. David
On Mon, Mar 17, 2025 at 11:54 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 18 Mar 2025 at 19:34, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I've attached the updated patch. > > Looks good to me. Thank you for reviewing the patch. Pushed (backpatched to v17). Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com