Thread: Re: [HACKERS] Block level parallel vacuum
On Thu, Nov 30, 2017 at 11:09 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Oct 24, 2017 at 5:54 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Yeah, I was thinking the commit is relevant with this issue but as >> Amit mentioned this error is emitted by DROP SCHEMA CASCASE. >> I don't find out the cause of this issue yet. With the previous >> version patch, autovacuum workers were woking with one parallel worker >> but it never drops relations. So it's possible that the error might >> not have been relevant with the patch but anywayI'll continue to work >> on that. > > This depends on the extension lock patch from > https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com/ > if I am following correctly. So I propose to mark this patch as > returned with feedback for now, and come back to it once the root > problems are addressed. Feel free to correct me if you think that's > not adapted. I've re-designed the parallel vacuum patch. Attached the latest version patch. As the discussion so far, this patch depends on the extension lock patch[1]. However I think we can discuss the design part of parallel vacuum independently from that patch. That's way I'm proposing the new patch. In this patch, I structured and refined the lazy_scan_heap() because it's a single big function and not suitable for making it parallel. The parallel vacuum worker processes keep waiting for commands from the parallel vacuum leader process. Before entering each phase of lazy vacuum such as scanning heap, vacuum index and vacuum heap, the leader process changes the all workers state to the next state. Vacuum worker processes do the job according to the their state and wait for the next command after finished. Also in before entering the next phase, the leader process does some preparation works while vacuum workers is sleeping; for example, clearing shared dead tuple space before entering the 'scanning heap' phase. The status of vacuum workers are stored into a DSM area pointed by WorkerState variables, and controlled by the leader process. FOr the basic design and performance improvements please refer to my presentation at PGCon 2018[2]. The number of parallel vacuum workers is determined according to either the table size or PARALLEL option in VACUUM command. The maximum of parallel workers is max_parallel_maintenance_workers. I've separated the code for vacuum worker process to backends/commands/vacuumworker.c, and created includes/commands/vacuum_internal.h file to declare the definitions for the lazy vacuum. For autovacuum, this patch allows autovacuum worker process to use the parallel option according to the relation size or the reloption. But autovacuum delay, since there is no slots for parallel worker of autovacuum in AutoVacuumShmem this patch doesn't support the change of the autovacuum delay configuration during running. Please apply this patch with the extension lock patch[1] when testing as this patch can try to extend visibility map pages concurrently. [1] https://www.postgresql.org/message-id/CAD21AoBn8WbOt21MFfj1mQmL2ZD8KVgMHYrOe1F5ozsQC4Z_hw%40mail.gmail.com [2] https://www.pgcon.org/2018/schedule/events/1202.en.html Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Tue, Aug 14, 2018 at 9:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Nov 30, 2017 at 11:09 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: > > On Tue, Oct 24, 2017 at 5:54 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >> Yeah, I was thinking the commit is relevant with this issue but as > >> Amit mentioned this error is emitted by DROP SCHEMA CASCASE. > >> I don't find out the cause of this issue yet. With the previous > >> version patch, autovacuum workers were woking with one parallel worker > >> but it never drops relations. So it's possible that the error might > >> not have been relevant with the patch but anywayI'll continue to work > >> on that. > > > > This depends on the extension lock patch from > > https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com/ > > if I am following correctly. So I propose to mark this patch as > > returned with feedback for now, and come back to it once the root > > problems are addressed. Feel free to correct me if you think that's > > not adapted. > > I've re-designed the parallel vacuum patch. Attached the latest > version patch. As the discussion so far, this patch depends on the > extension lock patch[1]. However I think we can discuss the design > part of parallel vacuum independently from that patch. That's way I'm > proposing the new patch. In this patch, I structured and refined the > lazy_scan_heap() because it's a single big function and not suitable > for making it parallel. > > The parallel vacuum worker processes keep waiting for commands from > the parallel vacuum leader process. Before entering each phase of lazy > vacuum such as scanning heap, vacuum index and vacuum heap, the leader > process changes the all workers state to the next state. Vacuum worker > processes do the job according to the their state and wait for the > next command after finished. Also in before entering the next phase, > the leader process does some preparation works while vacuum workers is > sleeping; for example, clearing shared dead tuple space before > entering the 'scanning heap' phase. The status of vacuum workers are > stored into a DSM area pointed by WorkerState variables, and > controlled by the leader process. FOr the basic design and performance > improvements please refer to my presentation at PGCon 2018[2]. > > The number of parallel vacuum workers is determined according to > either the table size or PARALLEL option in VACUUM command. The > maximum of parallel workers is max_parallel_maintenance_workers. > > I've separated the code for vacuum worker process to > backends/commands/vacuumworker.c, and created > includes/commands/vacuum_internal.h file to declare the definitions > for the lazy vacuum. > > For autovacuum, this patch allows autovacuum worker process to use the > parallel option according to the relation size or the reloption. But > autovacuum delay, since there is no slots for parallel worker of > autovacuum in AutoVacuumShmem this patch doesn't support the change of > the autovacuum delay configuration during running. > Attached rebased version patch to the current HEAD. > Please apply this patch with the extension lock patch[1] when testing > as this patch can try to extend visibility map pages concurrently. > Because the patch leads performance degradation in the case where bulk-loading to a partitioned table I think that the original proposal, which makes group locking conflict when relation extension locks, is more realistic approach. So I worked on this with the simple patch instead of [1]. Attached three patches: * 0001 patch publishes some static functions such as heap_paralellscan_startblock_init so that the parallel vacuum code can use them. * 0002 patch makes the group locking conflict when relation extension locks. * 0003 patch add paralel option to lazy vacuum. Please review them. [1] https://www.postgresql.org/message-id/CAD21AoBn8WbOt21MFfj1mQmL2ZD8KVgMHYrOe1F5ozsQC4Z_hw%40mail.gmail.com Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Oct 30, 2018 at 5:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Aug 14, 2018 at 9:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Thu, Nov 30, 2017 at 11:09 AM, Michael Paquier > > <michael.paquier@gmail.com> wrote: > > > On Tue, Oct 24, 2017 at 5:54 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > >> Yeah, I was thinking the commit is relevant with this issue but as > > >> Amit mentioned this error is emitted by DROP SCHEMA CASCASE. > > >> I don't find out the cause of this issue yet. With the previous > > >> version patch, autovacuum workers were woking with one parallel worker > > >> but it never drops relations. So it's possible that the error might > > >> not have been relevant with the patch but anywayI'll continue to work > > >> on that. > > > > > > This depends on the extension lock patch from > > > https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com/ > > > if I am following correctly. So I propose to mark this patch as > > > returned with feedback for now, and come back to it once the root > > > problems are addressed. Feel free to correct me if you think that's > > > not adapted. > > > > I've re-designed the parallel vacuum patch. Attached the latest > > version patch. As the discussion so far, this patch depends on the > > extension lock patch[1]. However I think we can discuss the design > > part of parallel vacuum independently from that patch. That's way I'm > > proposing the new patch. In this patch, I structured and refined the > > lazy_scan_heap() because it's a single big function and not suitable > > for making it parallel. > > > > The parallel vacuum worker processes keep waiting for commands from > > the parallel vacuum leader process. Before entering each phase of lazy > > vacuum such as scanning heap, vacuum index and vacuum heap, the leader > > process changes the all workers state to the next state. Vacuum worker > > processes do the job according to the their state and wait for the > > next command after finished. Also in before entering the next phase, > > the leader process does some preparation works while vacuum workers is > > sleeping; for example, clearing shared dead tuple space before > > entering the 'scanning heap' phase. The status of vacuum workers are > > stored into a DSM area pointed by WorkerState variables, and > > controlled by the leader process. FOr the basic design and performance > > improvements please refer to my presentation at PGCon 2018[2]. > > > > The number of parallel vacuum workers is determined according to > > either the table size or PARALLEL option in VACUUM command. The > > maximum of parallel workers is max_parallel_maintenance_workers. > > > > I've separated the code for vacuum worker process to > > backends/commands/vacuumworker.c, and created > > includes/commands/vacuum_internal.h file to declare the definitions > > for the lazy vacuum. > > > > For autovacuum, this patch allows autovacuum worker process to use the > > parallel option according to the relation size or the reloption. But > > autovacuum delay, since there is no slots for parallel worker of > > autovacuum in AutoVacuumShmem this patch doesn't support the change of > > the autovacuum delay configuration during running. > > > > Attached rebased version patch to the current HEAD. > > > Please apply this patch with the extension lock patch[1] when testing > > as this patch can try to extend visibility map pages concurrently. > > > > Because the patch leads performance degradation in the case where > bulk-loading to a partitioned table I think that the original > proposal, which makes group locking conflict when relation extension > locks, is more realistic approach. So I worked on this with the simple > patch instead of [1]. Attached three patches: > > * 0001 patch publishes some static functions such as > heap_paralellscan_startblock_init so that the parallel vacuum code can > use them. > * 0002 patch makes the group locking conflict when relation extension locks. > * 0003 patch add paralel option to lazy vacuum. > > Please review them. > Oops, forgot to attach patches. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Excuse me for being noisy. Increasing vacuum's ring buffer improves vacuum upto 6 times. https://www.postgresql.org/message-id/flat/20170720190405.GM1769%40tamriel.snowman.net This is one-line change. How much improvement parallel vacuum gives? 31.10.2018 3:23, Masahiko Sawada пишет: > On Tue, Oct 30, 2018 at 5:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Tue, Aug 14, 2018 at 9:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> >>> On Thu, Nov 30, 2017 at 11:09 AM, Michael Paquier >>> <michael.paquier@gmail.com> wrote: >>>> On Tue, Oct 24, 2017 at 5:54 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>>> Yeah, I was thinking the commit is relevant with this issue but as >>>>> Amit mentioned this error is emitted by DROP SCHEMA CASCASE. >>>>> I don't find out the cause of this issue yet. With the previous >>>>> version patch, autovacuum workers were woking with one parallel worker >>>>> but it never drops relations. So it's possible that the error might >>>>> not have been relevant with the patch but anywayI'll continue to work >>>>> on that. >>>> >>>> This depends on the extension lock patch from >>>> https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com/ >>>> if I am following correctly. So I propose to mark this patch as >>>> returned with feedback for now, and come back to it once the root >>>> problems are addressed. Feel free to correct me if you think that's >>>> not adapted. >>> >>> I've re-designed the parallel vacuum patch. Attached the latest >>> version patch. As the discussion so far, this patch depends on the >>> extension lock patch[1]. However I think we can discuss the design >>> part of parallel vacuum independently from that patch. That's way I'm >>> proposing the new patch. In this patch, I structured and refined the >>> lazy_scan_heap() because it's a single big function and not suitable >>> for making it parallel. >>> >>> The parallel vacuum worker processes keep waiting for commands from >>> the parallel vacuum leader process. Before entering each phase of lazy >>> vacuum such as scanning heap, vacuum index and vacuum heap, the leader >>> process changes the all workers state to the next state. Vacuum worker >>> processes do the job according to the their state and wait for the >>> next command after finished. Also in before entering the next phase, >>> the leader process does some preparation works while vacuum workers is >>> sleeping; for example, clearing shared dead tuple space before >>> entering the 'scanning heap' phase. The status of vacuum workers are >>> stored into a DSM area pointed by WorkerState variables, and >>> controlled by the leader process. FOr the basic design and performance >>> improvements please refer to my presentation at PGCon 2018[2]. >>> >>> The number of parallel vacuum workers is determined according to >>> either the table size or PARALLEL option in VACUUM command. The >>> maximum of parallel workers is max_parallel_maintenance_workers. >>> >>> I've separated the code for vacuum worker process to >>> backends/commands/vacuumworker.c, and created >>> includes/commands/vacuum_internal.h file to declare the definitions >>> for the lazy vacuum. >>> >>> For autovacuum, this patch allows autovacuum worker process to use the >>> parallel option according to the relation size or the reloption. But >>> autovacuum delay, since there is no slots for parallel worker of >>> autovacuum in AutoVacuumShmem this patch doesn't support the change of >>> the autovacuum delay configuration during running. >>> >> >> Attached rebased version patch to the current HEAD. >> >>> Please apply this patch with the extension lock patch[1] when testing >>> as this patch can try to extend visibility map pages concurrently. >>> >> >> Because the patch leads performance degradation in the case where >> bulk-loading to a partitioned table I think that the original >> proposal, which makes group locking conflict when relation extension >> locks, is more realistic approach. So I worked on this with the simple >> patch instead of [1]. Attached three patches: >> >> * 0001 patch publishes some static functions such as >> heap_paralellscan_startblock_init so that the parallel vacuum code can >> use them. >> * 0002 patch makes the group locking conflict when relation extension locks. >> * 0003 patch add paralel option to lazy vacuum. >> >> Please review them. >> > > Oops, forgot to attach patches. > > Regards, > > -- > Masahiko Sawada > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center >
Hi, On Thu, Nov 1, 2018 at 2:28 PM Yura Sokolov <funny.falcon@gmail.com> wrote: > > Excuse me for being noisy. > > Increasing vacuum's ring buffer improves vacuum upto 6 times. > https://www.postgresql.org/message-id/flat/20170720190405.GM1769%40tamriel.snowman.net > This is one-line change. > > How much improvement parallel vacuum gives? It depends on hardware resources you can use. In current design the scanning heap and vacuuming heap are procesed with parallel workers at block level (using parallel sequential scan) and the vacuuming indexes are also processed with parallel worker at index-level. So even if a table is not large enough the more a table has indexes you can get better performance. The performance test result (I attached) I did before shows that parallel vacuum is up to almost 10 times faster than single-process vacuum in a case. The test used not-large table (4GB table) with many indexes but it would be insteresting to test with large table. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Attached rebased version patch to the current HEAD. > > > Please apply this patch with the extension lock patch[1] when testing > > as this patch can try to extend visibility map pages concurrently. > > > > Because the patch leads performance degradation in the case where > bulk-loading to a partitioned table I think that the original > proposal, which makes group locking conflict when relation extension > locks, is more realistic approach. So I worked on this with the simple > patch instead of [1]. Attached three patches: > > * 0001 patch publishes some static functions such as > heap_paralellscan_startblock_init so that the parallel vacuum code can > use them. > * 0002 patch makes the group locking conflict when relation extension locks. > * 0003 patch add paralel option to lazy vacuum. > > Please review them. > I could see that you have put a lot of effort on this patch and still we are not able to make much progress mainly I guess because of relation extension lock problem. I think we can park that problem for some time (as already we have invested quite some time on it), discuss a bit about actual parallel vacuum patch and then come back to it. I don't know if that is right or not. I am not sure we can make this ready for PG12 timeframe, but I feel this patch deserves some attention. I have started reading the main parallel vacuum patch and below are some assorted comments. + <para> + Execute <command>VACUUM</command> in parallel by <replaceable class="parameter">N + </replaceable>a background workers. Collecting garbage on table is processed + in block-level parallel. For tables with indexes, parallel vacuum assigns each + index to each parallel vacuum worker and all garbages on a index are processed + by particular parallel vacuum worker. The maximum nunber of parallel workers + is <xref linkend="guc-max-parallel-workers-maintenance"/>. This option can not + use with <literal>FULL</literal> option. + </para> There are a couple of mistakes in above para: (a) "..a background workers." a seems redundant. (b) "Collecting garbage on table is processed in block-level parallel."/"Collecting garbage on table is processed at block-level in parallel." (c) "For tables with indexes, parallel vacuum assigns each index to each parallel vacuum worker and all garbages on a index are processed by particular parallel vacuum worker." We can rephrase it as: "For tables with indexes, parallel vacuum assigns a worker to each index and all garbages on a index are processed by particular that parallel vacuum worker." (d) Typo: nunber/number (e) Typo: can not/cannot I have glanced part of the patch, but didn't find any README or doc containing the design of this patch. I think without having design in place, it is difficult to review a patch of this size and complexity. To start with at least explain how the work is distributed among workers, say there are two workers which needs to vacuum a table with four indexes, how it works? How does the leader participate and coordinate the work. The other parts that you can explain how the state is maintained during parallel vacuum, something like you are trying to do in below function: + * lazy_prepare_next_state + * + * Before enter the next state prepare the next state. In parallel lazy vacuum, + * we must wait for the all vacuum workers to finish the previous state before + * preparation. Also, after prepared we change the state ot all vacuum workers + * and wake up them. + */ +static void +lazy_prepare_next_state(LVState *lvstate, LVLeader *lvleader, int next_state) Still other things are how the stats are shared among leader and worker. I can understand few things in bits and pieces while glancing through the patch, but it would be easier to understand if you document it at one place. It can help reviewers to understand it. Can you consider to split the patch so that the refactoring you have done in current code to make it usable by parallel vacuum is a separate patch? +/* + * Vacuum all indexes. In parallel vacuum, each workers take indexes + * one by one. Also after vacuumed index they mark it as done. This marking + * is necessary to guarantee that all indexes are vacuumed based on + * the current collected dead tuples. The leader process continues to + * vacuum even if any indexes is not vacuumed completely due to failure of + * parallel worker for whatever reason. The mark will be checked before entering + * the next state. + */ +void +lazy_vacuum_all_indexes(LVState *lvstate) I didn't understand what you want to say here. Do you mean that leader can continue collecting more dead tuple TIDs when workers are vacuuming the index? How does it deal with the errors if any during index vacuum? + * plan_lazy_vacuum_workers_index_workers + * Use the planner to decide how many parallel worker processes + * VACUUM and autovacuum should request for use + * + * tableOid is the table begin vacuumed which must not be non-tables or + * special system tables. .. + plan_lazy_vacuum_workers(Oid tableOid, int nworkers_requested) The comment starting from tableOid is not clear. The actual function name(plan_lazy_vacuum_workers) and name in comments (plan_lazy_vacuum_workers_index_workers) doesn't match. Can you take relation as input parameter instead of taking tableOid as that can save a lot of code in this function. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Nov 24, 2018 at 5:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > I could see that you have put a lot of effort on this patch and still > we are not able to make much progress mainly I guess because of > relation extension lock problem. I think we can park that problem for > some time (as already we have invested quite some time on it), discuss > a bit about actual parallel vacuum patch and then come back to it. > Today, I was reading this and previous related thread [1] and it seems to me multiple people Andres [2], Simon [3] have pointed out that parallelization for index portion is more valuable. Also, some of the results [4] indicate the same. Now, when there are no indexes, parallelizing heap scans also have benefit, but I think in practice we will see more cases where the user wants to vacuum tables with indexes. So how about if we break this problem in the following way where each piece give the benefit of its own: (a) Parallelize index scans wherein the workers will be launched only to vacuum indexes. Only one worker per index will be spawned. (b) Parallelize per-index vacuum. Each index can be vacuumed by multiple workers. (c) Parallelize heap scans where multiple workers will scan the heap, collect dead TIDs and then launch multiple workers for indexes. I think if we break this problem into multiple patches, it will reduce the scope of each patch and help us in making progress. Now, it's been more than 2 years that we are trying to solve this problem, but still didn't make much progress. I understand there are various genuine reasons and all of that work will help us in solving all the problems in this area. How about if we first target problem (a) and once we are done with that we can see which of (b) or (c) we want to do first? [1] - https://www.postgresql.org/message-id/CAD21AoD1xAqp4zK-Vi1cuY3feq2oO8HcpJiz32UDUfe0BE31Xw%40mail.gmail.com [2] - https://www.postgresql.org/message-id/20160823164836.naody2ht6cutioiz%40alap3.anarazel.de [3] - https://www.postgresql.org/message-id/CANP8%2BjKWOw6AAorFOjdynxUKqs6XRReOcNy-VXRFFU_4bBT8ww%40mail.gmail.com [4] - https://www.postgresql.org/message-id/CAGTBQpbU3R_VgyWk6jaD%3D6v-Wwrm8%2B6CbrzQxQocH0fmedWRkw%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sun, Nov 25, 2018 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Nov 24, 2018 at 5:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > Thank you for the comment. > > I could see that you have put a lot of effort on this patch and still > > we are not able to make much progress mainly I guess because of > > relation extension lock problem. I think we can park that problem for > > some time (as already we have invested quite some time on it), discuss > > a bit about actual parallel vacuum patch and then come back to it. > > > > Today, I was reading this and previous related thread [1] and it seems > to me multiple people Andres [2], Simon [3] have pointed out that > parallelization for index portion is more valuable. Also, some of the > results [4] indicate the same. Now, when there are no indexes, > parallelizing heap scans also have benefit, but I think in practice we > will see more cases where the user wants to vacuum tables with > indexes. So how about if we break this problem in the following way > where each piece give the benefit of its own: > (a) Parallelize index scans wherein the workers will be launched only > to vacuum indexes. Only one worker per index will be spawned. > (b) Parallelize per-index vacuum. Each index can be vacuumed by > multiple workers. > (c) Parallelize heap scans where multiple workers will scan the heap, > collect dead TIDs and then launch multiple workers for indexes. > > I think if we break this problem into multiple patches, it will reduce > the scope of each patch and help us in making progress. Now, it's > been more than 2 years that we are trying to solve this problem, but > still didn't make much progress. I understand there are various > genuine reasons and all of that work will help us in solving all the > problems in this area. How about if we first target problem (a) and > once we are done with that we can see which of (b) or (c) we want to > do first? Thank you for suggestion. It seems good to me. We would get a nice performance scalability even by only (a), and vacuum will get more powerful by (b) or (c). Also, (a) would not require to resovle the relation extension lock issue IIUC. I'll change the patch and submit to the next CF. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Nov 26, 2018 at 2:08 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Sun, Nov 25, 2018 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Nov 24, 2018 at 5:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > Thank you for the comment. > > > > I could see that you have put a lot of effort on this patch and still > > > we are not able to make much progress mainly I guess because of > > > relation extension lock problem. I think we can park that problem for > > > some time (as already we have invested quite some time on it), discuss > > > a bit about actual parallel vacuum patch and then come back to it. > > > > > > > Today, I was reading this and previous related thread [1] and it seems > > to me multiple people Andres [2], Simon [3] have pointed out that > > parallelization for index portion is more valuable. Also, some of the > > results [4] indicate the same. Now, when there are no indexes, > > parallelizing heap scans also have benefit, but I think in practice we > > will see more cases where the user wants to vacuum tables with > > indexes. So how about if we break this problem in the following way > > where each piece give the benefit of its own: > > (a) Parallelize index scans wherein the workers will be launched only > > to vacuum indexes. Only one worker per index will be spawned. > > (b) Parallelize per-index vacuum. Each index can be vacuumed by > > multiple workers. > > (c) Parallelize heap scans where multiple workers will scan the heap, > > collect dead TIDs and then launch multiple workers for indexes. > > > > I think if we break this problem into multiple patches, it will reduce > > the scope of each patch and help us in making progress. Now, it's > > been more than 2 years that we are trying to solve this problem, but > > still didn't make much progress. I understand there are various > > genuine reasons and all of that work will help us in solving all the > > problems in this area. How about if we first target problem (a) and > > once we are done with that we can see which of (b) or (c) we want to > > do first? > > Thank you for suggestion. It seems good to me. We would get a nice > performance scalability even by only (a), and vacuum will get more > powerful by (b) or (c). Also, (a) would not require to resovle the > relation extension lock issue IIUC. > Yes, I also think so. We do acquire 'relation extension lock' during index vacuum, but as part of (a), we are talking one worker per-index, so there shouldn't be a problem with respect to deadlocks. > I'll change the patch and submit > to the next CF. > Okay. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Nov 27, 2018 at 11:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Nov 26, 2018 at 2:08 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Sun, Nov 25, 2018 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sat, Nov 24, 2018 at 5:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > > Thank you for the comment. > > > > > > I could see that you have put a lot of effort on this patch and still > > > > we are not able to make much progress mainly I guess because of > > > > relation extension lock problem. I think we can park that problem for > > > > some time (as already we have invested quite some time on it), discuss > > > > a bit about actual parallel vacuum patch and then come back to it. > > > > > > > > > > Today, I was reading this and previous related thread [1] and it seems > > > to me multiple people Andres [2], Simon [3] have pointed out that > > > parallelization for index portion is more valuable. Also, some of the > > > results [4] indicate the same. Now, when there are no indexes, > > > parallelizing heap scans also have benefit, but I think in practice we > > > will see more cases where the user wants to vacuum tables with > > > indexes. So how about if we break this problem in the following way > > > where each piece give the benefit of its own: > > > (a) Parallelize index scans wherein the workers will be launched only > > > to vacuum indexes. Only one worker per index will be spawned. > > > (b) Parallelize per-index vacuum. Each index can be vacuumed by > > > multiple workers. > > > (c) Parallelize heap scans where multiple workers will scan the heap, > > > collect dead TIDs and then launch multiple workers for indexes. > > > > > > I think if we break this problem into multiple patches, it will reduce > > > the scope of each patch and help us in making progress. Now, it's > > > been more than 2 years that we are trying to solve this problem, but > > > still didn't make much progress. I understand there are various > > > genuine reasons and all of that work will help us in solving all the > > > problems in this area. How about if we first target problem (a) and > > > once we are done with that we can see which of (b) or (c) we want to > > > do first? > > > > Thank you for suggestion. It seems good to me. We would get a nice > > performance scalability even by only (a), and vacuum will get more > > powerful by (b) or (c). Also, (a) would not require to resovle the > > relation extension lock issue IIUC. > > > > Yes, I also think so. We do acquire 'relation extension lock' during > index vacuum, but as part of (a), we are talking one worker per-index, > so there shouldn't be a problem with respect to deadlocks. > > > I'll change the patch and submit > > to the next CF. > > > > Okay. > Attached the updated patches. I scaled back the scope of this patch. The patch now includes only feature (a), that is it execute both index vacuum and cleanup index in parallel. It also doesn't include autovacuum support for now. The PARALLEL option works alomst same as before patch. In VACUUM command, we can specify 'PARALLEL n' option where n is the number of parallel workers to request. If the n is omitted the number of parallel worekrs would be # of indexes -1. Also we can specify parallel degree by parallel_worker reloption. The number or parallel workers is capped by Min(# of indexes - 1, max_maintenance_parallel_workers). That is, parallel vacuum can be executed for a table if it has more than one indexes. For internal design, the details are written at the top of comment in vacuumlazy.c file. In parallel vacuum mode, we allocate DSM at the beginning of lazy vacuum which stores shared information as well as dead tuples. When starting either index vacuum or cleanup vacuum we launch parallel workers. The parallel workers perform either index vacuum or clenaup vacuum for each indexes, and then exit after done all indexes. Then the leader process re-initialize DSM and re-launch at the next time, not destroy parallel context here. After done lazy vacuum, the leader process exits the parallel mode and updates index statistics since we are not allowed any writes during parallel mode. Also I've attached 0002 patch to support parallel lazy vacuum for vacuumdb command. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Tue, Dec 18, 2018 at 1:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Attached the updated patches. I scaled back the scope of this patch. > The patch now includes only feature (a), that is it execute both index > vacuum and cleanup index in parallel. It also doesn't include > autovacuum support for now. > > The PARALLEL option works alomst same as before patch. In VACUUM > command, we can specify 'PARALLEL n' option where n is the number of > parallel workers to request. If the n is omitted the number of > parallel worekrs would be # of indexes -1. > I think for now this is okay, but I guess in furture when we make heapscans also parallel or maybe allow more than one worker per-index vacuum, then this won't hold good. So, I am not sure if below text in docs is most appropriate. + <term><literal>PARALLEL <replaceable class="parameter">N</replaceable></literal></term> + <listitem> + <para> + Execute index vacuum and cleanup index in parallel with + <replaceable class="parameter">N</replaceable> background workers. If the parallel + degree <replaceable class="parameter">N</replaceable> is omitted, + <command>VACUUM</command> requests the number of indexes - 1 processes, which is the + maximum number of parallel vacuum workers since individual indexes is processed by + one process. The actual number of parallel vacuum workers may be less due to the + setting of <xref linkend="guc-max-parallel-workers-maintenance"/>. + This option can not use with <literal>FULL</literal> option. It might be better to use some generic language in docs, something like "If the parallel degree N is omitted, then vacuum decides the number of workers based on number of indexes on the relation which is further limited by max-parallel-workers-maintenance". I think you also need to mention in some way that you consider storage option parallel_workers. Few assorted comments: 1. +lazy_begin_parallel_vacuum_index(LVState *lvstate, bool for_cleanup) { .. + + LaunchParallelWorkers(lvstate->pcxt); + + /* + * if no workers launched, we vacuum all indexes by the leader process + * alone. Since there is hope that we can launch workers in the next + * execution time we don't want to end the parallel mode yet. + */ + if (lvstate->pcxt->nworkers_launched == 0) + return; It is quite possible that the workers are not launched because we fail to allocate memory, basically when pcxt->nworkers is zero. I think in such cases there is no use for being in parallel mode. You can even detect that before calling lazy_begin_parallel_vacuum_index. 2. static void +lazy_vacuum_all_indexes_for_leader(LVState *lvstate, IndexBulkDeleteResult **stats, + LVTidMap *dead_tuples, bool do_parallel, + bool for_cleanup) { .. + if (do_parallel) + lazy_begin_parallel_vacuum_index(lvstate, for_cleanup); + + for (;;) + { + IndexBulkDeleteResult *r = NULL; + + /* + * Get the next index number to vacuum and set index statistics. In parallel + * lazy vacuum, index bulk-deletion results are stored in the shared memory + * segment. If it's already updated we use it rather than setting it to NULL. + * In single vacuum, we can always use an element of the 'stats'. + */ + if (do_parallel) + { + idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1); + + if (lvshared->indstats[idx].updated) + r = &(lvshared->indstats[idx].stats); + } It is quite possible that we are not able to launch any workers in lazy_begin_parallel_vacuum_index, in such cases, we should not use parallel mode path, basically as written we can't rely on 'do_parallel' flag. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 20, 2018 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Dec 18, 2018 at 1:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > Attached the updated patches. I scaled back the scope of this patch. > > The patch now includes only feature (a), that is it execute both index > > vacuum and cleanup index in parallel. It also doesn't include > > autovacuum support for now. > > > > The PARALLEL option works alomst same as before patch. In VACUUM > > command, we can specify 'PARALLEL n' option where n is the number of > > parallel workers to request. If the n is omitted the number of > > parallel worekrs would be # of indexes -1. > > > > I think for now this is okay, but I guess in furture when we make > heapscans also parallel or maybe allow more than one worker per-index > vacuum, then this won't hold good. So, I am not sure if below text in > docs is most appropriate. > > + <term><literal>PARALLEL <replaceable > class="parameter">N</replaceable></literal></term> > + <listitem> > + <para> > + Execute index vacuum and cleanup index in parallel with > + <replaceable class="parameter">N</replaceable> background > workers. If the parallel > + degree <replaceable class="parameter">N</replaceable> is omitted, > + <command>VACUUM</command> requests the number of indexes - 1 > processes, which is the > + maximum number of parallel vacuum workers since individual > indexes is processed by > + one process. The actual number of parallel vacuum workers may > be less due to the > + setting of <xref linkend="guc-max-parallel-workers-maintenance"/>. > + This option can not use with <literal>FULL</literal> option. > > It might be better to use some generic language in docs, something > like "If the parallel degree N is omitted, then vacuum decides the > number of workers based on number of indexes on the relation which is > further limited by max-parallel-workers-maintenance". Thank you for the review. I agreed your concern and the text you suggested. > I think you > also need to mention in some way that you consider storage option > parallel_workers. Added. > > Few assorted comments: > 1. > +lazy_begin_parallel_vacuum_index(LVState *lvstate, bool for_cleanup) > { > .. > + > + LaunchParallelWorkers(lvstate->pcxt); > + > + /* > + * if no workers launched, we vacuum all indexes by the leader process > + * alone. Since there is hope that we can launch workers in the next > + * execution time we don't want to end the parallel mode yet. > + */ > + if (lvstate->pcxt->nworkers_launched == 0) > + return; > > It is quite possible that the workers are not launched because we fail > to allocate memory, basically when pcxt->nworkers is zero. I think in > such cases there is no use for being in parallel mode. You can even > detect that before calling lazy_begin_parallel_vacuum_index. Agreed. we can stop preparation and exit parallel mode if pcxt->nworkers got 0 after InitializeParallelDSM() . > > 2. > static void > +lazy_vacuum_all_indexes_for_leader(LVState *lvstate, > IndexBulkDeleteResult **stats, > + LVTidMap *dead_tuples, bool do_parallel, > + bool for_cleanup) > { > .. > + if (do_parallel) > + lazy_begin_parallel_vacuum_index(lvstate, for_cleanup); > + > + for (;;) > + { > + IndexBulkDeleteResult *r = NULL; > + > + /* > + * Get the next index number to vacuum and set index statistics. In parallel > + * lazy vacuum, index bulk-deletion results are stored in the shared memory > + * segment. If it's already updated we use it rather than setting it to NULL. > + * In single vacuum, we can always use an element of the 'stats'. > + */ > + if (do_parallel) > + { > + idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1); > + > + if (lvshared->indstats[idx].updated) > + r = &(lvshared->indstats[idx].stats); > + } > > It is quite possible that we are not able to launch any workers in > lazy_begin_parallel_vacuum_index, in such cases, we should not use > parallel mode path, basically as written we can't rely on > 'do_parallel' flag. > Fixed. Attached new version patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Fri, Dec 28, 2018 at 11:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Dec 20, 2018 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Dec 18, 2018 at 1:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > Attached the updated patches. I scaled back the scope of this patch. > > > The patch now includes only feature (a), that is it execute both index > > > vacuum and cleanup index in parallel. It also doesn't include > > > autovacuum support for now. > > > > > > The PARALLEL option works alomst same as before patch. In VACUUM > > > command, we can specify 'PARALLEL n' option where n is the number of > > > parallel workers to request. If the n is omitted the number of > > > parallel worekrs would be # of indexes -1. > > > > > > > I think for now this is okay, but I guess in furture when we make > > heapscans also parallel or maybe allow more than one worker per-index > > vacuum, then this won't hold good. So, I am not sure if below text in > > docs is most appropriate. > > > > + <term><literal>PARALLEL <replaceable > > class="parameter">N</replaceable></literal></term> > > + <listitem> > > + <para> > > + Execute index vacuum and cleanup index in parallel with > > + <replaceable class="parameter">N</replaceable> background > > workers. If the parallel > > + degree <replaceable class="parameter">N</replaceable> is omitted, > > + <command>VACUUM</command> requests the number of indexes - 1 > > processes, which is the > > + maximum number of parallel vacuum workers since individual > > indexes is processed by > > + one process. The actual number of parallel vacuum workers may > > be less due to the > > + setting of <xref linkend="guc-max-parallel-workers-maintenance"/>. > > + This option can not use with <literal>FULL</literal> option. > > > > It might be better to use some generic language in docs, something > > like "If the parallel degree N is omitted, then vacuum decides the > > number of workers based on number of indexes on the relation which is > > further limited by max-parallel-workers-maintenance". > > Thank you for the review. > > I agreed your concern and the text you suggested. > > > I think you > > also need to mention in some way that you consider storage option > > parallel_workers. > > Added. > > > > > Few assorted comments: > > 1. > > +lazy_begin_parallel_vacuum_index(LVState *lvstate, bool for_cleanup) > > { > > .. > > + > > + LaunchParallelWorkers(lvstate->pcxt); > > + > > + /* > > + * if no workers launched, we vacuum all indexes by the leader process > > + * alone. Since there is hope that we can launch workers in the next > > + * execution time we don't want to end the parallel mode yet. > > + */ > > + if (lvstate->pcxt->nworkers_launched == 0) > > + return; > > > > It is quite possible that the workers are not launched because we fail > > to allocate memory, basically when pcxt->nworkers is zero. I think in > > such cases there is no use for being in parallel mode. You can even > > detect that before calling lazy_begin_parallel_vacuum_index. > > Agreed. we can stop preparation and exit parallel mode if > pcxt->nworkers got 0 after InitializeParallelDSM() . > > > > > 2. > > static void > > +lazy_vacuum_all_indexes_for_leader(LVState *lvstate, > > IndexBulkDeleteResult **stats, > > + LVTidMap *dead_tuples, bool do_parallel, > > + bool for_cleanup) > > { > > .. > > + if (do_parallel) > > + lazy_begin_parallel_vacuum_index(lvstate, for_cleanup); > > + > > + for (;;) > > + { > > + IndexBulkDeleteResult *r = NULL; > > + > > + /* > > + * Get the next index number to vacuum and set index statistics. In parallel > > + * lazy vacuum, index bulk-deletion results are stored in the shared memory > > + * segment. If it's already updated we use it rather than setting it to NULL. > > + * In single vacuum, we can always use an element of the 'stats'. > > + */ > > + if (do_parallel) > > + { > > + idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1); > > + > > + if (lvshared->indstats[idx].updated) > > + r = &(lvshared->indstats[idx].stats); > > + } > > > > It is quite possible that we are not able to launch any workers in > > lazy_begin_parallel_vacuum_index, in such cases, we should not use > > parallel mode path, basically as written we can't rely on > > 'do_parallel' flag. > > > > Fixed. > > Attached new version patch. > Rebased. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Tue, Jan 15, 2019 at 6:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Rebased.
I started reviewing the patch, I didn't finish my review yet.
Following are some of the comments.
+ <term><literal>PARALLEL <replaceable class="parameter">N</replaceable></literal></term>
+ <listitem>
+ <para>
+ Execute index vacuum and cleanup index in parallel with
I doubt that user can understand the terms index vacuum and cleanup index.
May be it needs some more detailed information.
- VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7 /* don't skip any pages */
+ VACOPT_PARALLEL = 1 << 7, /* do lazy VACUUM in parallel */
+ VACOPT_DISABLE_PAGE_SKIPPING = 1 << 8 /* don't skip any pages */
+} VacuumOptionFlag;
Any specific reason behind not adding it as last member of the enum?
-typedef enum VacuumOption
+typedef enum VacuumOptionFlag
{
I don't find the new name quite good, how about VacuumFlags?
+typedef struct VacuumOption
+{
How about VacuumOptions? Because this structure can contains all the
options provided to vacuum operation.
+ vacopt1->flags |= vacopt2->flags;
+ if (vacopt2->flags == VACOPT_PARALLEL)
+ vacopt1->nworkers = vacopt2->nworkers;
+ pfree(vacopt2);
+ $$ = vacopt1;
+ }
As the above statement indicates the the last parallel number of workers
is considered into the account, can we explain it in docs?
postgres=# vacuum (parallel 2, verbose) tbl;
With verbose, no parallel workers related information is available.
I feel giving that information is required even when it is not parallel
vacuum also.
Regards,
Haribabu Kommi
Fujitsu Australia
On Fri, Jan 18, 2019 at 10:38 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > > > On Tue, Jan 15, 2019 at 6:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> >> Rebased. > > > I started reviewing the patch, I didn't finish my review yet. > Following are some of the comments. Thank you for reviewing the patch. > > + <term><literal>PARALLEL <replaceable class="parameter">N</replaceable></literal></term> > + <listitem> > + <para> > + Execute index vacuum and cleanup index in parallel with > > I doubt that user can understand the terms index vacuum and cleanup index. > May be it needs some more detailed information. > Agreed. Table 27.22 "Vacuum phases" has a good description of vacuum phases. So maybe adding the referencint to it would work. > > - VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7 /* don't skip any pages */ > + VACOPT_PARALLEL = 1 << 7, /* do lazy VACUUM in parallel */ > + VACOPT_DISABLE_PAGE_SKIPPING = 1 << 8 /* don't skip any pages */ > +} VacuumOptionFlag; > > Any specific reason behind not adding it as last member of the enum? > My mistake, fixed it. > > -typedef enum VacuumOption > +typedef enum VacuumOptionFlag > { > > I don't find the new name quite good, how about VacuumFlags? > Agreed with removing "Option" from the name but I think VacuumFlag would be better because this enum represents only one flag. Thoughts? > > +typedef struct VacuumOption > +{ > > How about VacuumOptions? Because this structure can contains all the > options provided to vacuum operation. > Agreed. > > > + vacopt1->flags |= vacopt2->flags; > + if (vacopt2->flags == VACOPT_PARALLEL) > + vacopt1->nworkers = vacopt2->nworkers; > + pfree(vacopt2); > + $$ = vacopt1; > + } > > As the above statement indicates the the last parallel number of workers > is considered into the account, can we explain it in docs? > Agreed. > > postgres=# vacuum (parallel 2, verbose) tbl; > > With verbose, no parallel workers related information is available. > I feel giving that information is required even when it is not parallel > vacuum also. > Agreed. How about the folloiwng verbose output? I've added the number of launched, planned and requested vacuum workers and purpose (vacuum or cleanup). postgres(1:91536)=# vacuum (verbose, parallel 30) test; -- table 'test' has 3 indexes INFO: vacuuming "public.test" INFO: launched 2 parallel vacuum workers for index vacuum (planned: 2, requested: 30) INFO: scanned index "test_idx1" to remove 2000 row versions DETAIL: CPU: user: 0.12 s, system: 0.00 s, elapsed: 0.12 s INFO: scanned index "test_idx2" to remove 2000 row versions by parallel vacuum worker DETAIL: CPU: user: 0.07 s, system: 0.05 s, elapsed: 0.12 s INFO: scanned index "test_idx3" to remove 2000 row versions by parallel vacuum worker DETAIL: CPU: user: 0.09 s, system: 0.05 s, elapsed: 0.14 s INFO: "test": removed 2000 row versions in 10 pages DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s INFO: launched 2 parallel vacuum workers for index cleanup (planned: 2, requested: 30) INFO: index "test_idx1" now contains 991151 row versions in 2745 pages DETAIL: 2000 index row versions were removed. 24 index pages have been deleted, 18 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "test_idx2" now contains 991151 row versions in 2745 pages DETAIL: 2000 index row versions were removed. 24 index pages have been deleted, 18 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "test_idx3" now contains 991151 row versions in 2745 pages DETAIL: 2000 index row versions were removed. 24 index pages have been deleted, 18 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: "test": found 2000 removable, 367 nonremovable row versions in 41 out of 4425 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 500 There were 6849 unused item pointers. Skipped 0 pages due to buffer pins, 0 frozen pages. 0 pages are entirely empty. CPU: user: 0.12 s, system: 0.01 s, elapsed: 0.17 s. VACUUM Since the previous patch conflicts with 285d8e12 I've attached the latest version patch that incorporated the review comment I got. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Fri, Jan 18, 2019 at 11:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Jan 18, 2019 at 10:38 AM Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>
>
> On Tue, Jan 15, 2019 at 6:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>
>> Rebased.
>
>
> I started reviewing the patch, I didn't finish my review yet.
> Following are some of the comments.
Thank you for reviewing the patch.
>
> + <term><literal>PARALLEL <replaceable class="parameter">N</replaceable></literal></term>
> + <listitem>
> + <para>
> + Execute index vacuum and cleanup index in parallel with
>
> I doubt that user can understand the terms index vacuum and cleanup index.
> May be it needs some more detailed information.
>
Agreed. Table 27.22 "Vacuum phases" has a good description of vacuum
phases. So maybe adding the referencint to it would work.
OK.
>
> -typedef enum VacuumOption
> +typedef enum VacuumOptionFlag
> {
>
> I don't find the new name quite good, how about VacuumFlags?
>
Agreed with removing "Option" from the name but I think VacuumFlag
would be better because this enum represents only one flag. Thoughts?
OK.
> postgres=# vacuum (parallel 2, verbose) tbl;
>
> With verbose, no parallel workers related information is available.
> I feel giving that information is required even when it is not parallel
> vacuum also.
>
Agreed. How about the folloiwng verbose output? I've added the number
of launched, planned and requested vacuum workers and purpose (vacuum
or cleanup).
postgres(1:91536)=# vacuum (verbose, parallel 30) test; -- table
'test' has 3 indexes
INFO: vacuuming "public.test"
INFO: launched 2 parallel vacuum workers for index vacuum (planned:
2, requested: 30)
INFO: scanned index "test_idx1" to remove 2000 row versions
DETAIL: CPU: user: 0.12 s, system: 0.00 s, elapsed: 0.12 s
INFO: scanned index "test_idx2" to remove 2000 row versions by
parallel vacuum worker
DETAIL: CPU: user: 0.07 s, system: 0.05 s, elapsed: 0.12 s
INFO: scanned index "test_idx3" to remove 2000 row versions by
parallel vacuum worker
DETAIL: CPU: user: 0.09 s, system: 0.05 s, elapsed: 0.14 s
INFO: "test": removed 2000 row versions in 10 pages
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: launched 2 parallel vacuum workers for index cleanup (planned:
2, requested: 30)
INFO: index "test_idx1" now contains 991151 row versions in 2745 pages
DETAIL: 2000 index row versions were removed.
24 index pages have been deleted, 18 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "test_idx2" now contains 991151 row versions in 2745 pages
DETAIL: 2000 index row versions were removed.
24 index pages have been deleted, 18 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "test_idx3" now contains 991151 row versions in 2745 pages
DETAIL: 2000 index row versions were removed.
24 index pages have been deleted, 18 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "test": found 2000 removable, 367 nonremovable row versions in
41 out of 4425 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 500
There were 6849 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.12 s, system: 0.01 s, elapsed: 0.17 s.
VACUUM
The verbose output is good.
Since the previous patch conflicts with 285d8e12 I've attached the
latest version patch that incorporated the review comment I got.
Thanks for the latest patch. I have some more minor comments.
+ Execute index vacuum and cleanup index in parallel with
Better to use vacuum index and cleanup index? This is in same with
the description of vacuum phases. It is better to follow same notation
in the patch.
+ dead_tuples = lazy_space_alloc(lvstate, nblocks, parallel_workers);
With the change, the lazy_space_alloc takes care of initializing the
parallel vacuum, can we write something related to that in the comments.
+ initprog_val[2] = dead_tuples->max_dead_tuples;
dead_tuples variable may need rename for better reading?
+ if (lvshared->indstats[idx].updated)
+ result = &(lvshared->indstats[idx].stats);
+ else
+ copy_result = true;
I don't see a need for copy_result variable, how about directly using
the updated flag to decide whether to copy or not? Once the result is
copied update the flag.
+use Test::More tests => 34;
I don't find any new tetst are added in this patch.
I am thinking of performance penalty if we use the parallel option of
vacuum on a small sized table?
Regards,
Haribabu Kommi
Fujitsu Australia
On Tue, Jan 22, 2019 at 9:59 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > > > Thanks for the latest patch. I have some more minor comments. Thank you for reviewing the patch. > > + Execute index vacuum and cleanup index in parallel with > > Better to use vacuum index and cleanup index? This is in same with > the description of vacuum phases. It is better to follow same notation > in the patch. Agreed. I've changed it to "Vacuum index and cleanup index in parallel with ...". > > > + dead_tuples = lazy_space_alloc(lvstate, nblocks, parallel_workers); > > With the change, the lazy_space_alloc takes care of initializing the > parallel vacuum, can we write something related to that in the comments. > Agreed. > > + initprog_val[2] = dead_tuples->max_dead_tuples; > > dead_tuples variable may need rename for better reading? > I might not get your comment correctly but I've tried to fix it. Please review it. > > > + if (lvshared->indstats[idx].updated) > + result = &(lvshared->indstats[idx].stats); > + else > + copy_result = true; > > > I don't see a need for copy_result variable, how about directly using > the updated flag to decide whether to copy or not? Once the result is > copied update the flag. > You're right. Fixed. > > +use Test::More tests => 34; > > I don't find any new tetst are added in this patch. Fixed. > > I am thinking of performance penalty if we use the parallel option of > vacuum on a small sized table? Hm, unlike other parallel operations other than ParallelAppend the parallel vacuum executes multiple index vacuum simultaneously. Therefore this can avoid contension. I think there is a performance penalty but it would not be big. Attached the latest patches. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Thu, Jan 24, 2019 at 1:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Attached the latest patches.
Thanks for the updated patches.
Some more code review comments.
+ started by a single utility command. Currently, the parallel
+ utility commands that support the use of parallel workers are
+ <command>CREATE INDEX</command> and <command>VACUUM</command>
+ without <literal>FULL</literal> option, and only when building
+ a B-tree index. Parallel workers are taken from the pool of
I feel the above sentence may not give the proper picture, how about the
adding following modification?
<command>CREATE INDEX</command> only when building a B-tree index
and <command>VACUUM</command> without <literal>FULL</literal> option.
+ * parallel vacuum, we perform both index vacuum and index cleanup in parallel.
+ * Individual indexes is processed by one vacuum process. At beginning of
How about vacuum index and cleanup index similar like other places?
+ * memory space for dead tuples. When starting either index vacuum or cleanup
+ * vacuum, we launch parallel worker processes. Once all indexes are processed
same here as well?
+ * Before starting parallel index vacuum and parallel cleanup index we launch
+ * parallel workers. All parallel workers will exit after processed all indexes
parallel vacuum index and parallel cleanup index?
+ /*
+ * If there is already-updated result in the shared memory we
+ * use it. Otherwise we pass NULL to index AMs and copy the
+ * result to the shared memory segment.
+ */
+ if (lvshared->indstats[idx].updated)
+ result = &(lvshared->indstats[idx].stats);
I didn't really find a need of the flag to differentiate the stats pointer from
first run to second run? I don't see any problem in passing directing the stats
and the same stats are updated in the worker side and leader side. Anyway no two
processes will do the index vacuum at same time. Am I missing something?
Even if this flag is to identify whether the stats are updated or not before
writing them, I don't see a need of it compared to normal vacuum.
+ * Enter the parallel mode, allocate and initialize a DSM segment. Return
+ * the memory space for storing dead tuples or NULL if no workers are prepared.
+ */
+ pcxt = CreateParallelContext("postgres", "heap_parallel_vacuum_main",
+ request, true);
But we are passing as serializable_okay flag as true, means it doesn't return
NULL. Is it expected?
+ initStringInfo(&buf);
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker %s (planned: %d",
+ "launched %d parallel vacuum workers %s (planned: %d",
+ lvstate->pcxt->nworkers_launched),
+ lvstate->pcxt->nworkers_launched,
+ for_cleanup ? "for index cleanup" : "for index vacuum",
+ lvstate->pcxt->nworkers);
+ if (lvstate->options.nworkers > 0)
+ appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);
what is the difference between planned workers and requested workers, aren't both
are same?
- COMPARE_SCALAR_FIELD(options);
- COMPARE_NODE_FIELD(rels);
+ if (a->options.flags != b->options.flags)
+ return false;
+ if (a->options.nworkers != b->options.nworkers)
+ return false;
Options is changed from SCALAR to check, but why the rels check is removed?
The options is changed from int to a structure so using SCALAR may not work
in other function like _copyVacuumStmt and etc?
+typedef struct VacuumOptions
+{
+ VacuumFlag flags; /* OR of VacuumFlag */
+ int nworkers; /* # of parallel vacuum workers */
+} VacuumOptions;
Do we need to add NodeTag for the above structure? Because this structure is
part of VacuumStmt structure.
+ <application>vacuumdb</application> will require background workers,
+ so make sure your <xref linkend="guc-max-parallel-workers-maintenance"/>
+ setting is more than one.
removing vacuumdb and changing it as "This option will ..."?
I will continue the testing of this patch and share the details.
Regards,
Haribabu Kommi
Fujitsu Australia
On Wed, Jan 30, 2019 at 2:06 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > > > On Thu, Jan 24, 2019 at 1:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> >> Attached the latest patches. > > > Thanks for the updated patches. > Some more code review comments. > Thank you! > + started by a single utility command. Currently, the parallel > + utility commands that support the use of parallel workers are > + <command>CREATE INDEX</command> and <command>VACUUM</command> > + without <literal>FULL</literal> option, and only when building > + a B-tree index. Parallel workers are taken from the pool of > > > I feel the above sentence may not give the proper picture, how about the > adding following modification? > > <command>CREATE INDEX</command> only when building a B-tree index > and <command>VACUUM</command> without <literal>FULL</literal> option. > > Agreed. > > + * parallel vacuum, we perform both index vacuum and index cleanup in parallel. > + * Individual indexes is processed by one vacuum process. At beginning of > > How about vacuum index and cleanup index similar like other places? > > > + * memory space for dead tuples. When starting either index vacuum or cleanup > + * vacuum, we launch parallel worker processes. Once all indexes are processed > > same here as well? > > > + * Before starting parallel index vacuum and parallel cleanup index we launch > + * parallel workers. All parallel workers will exit after processed all indexes > > parallel vacuum index and parallel cleanup index? > > ISTM we're using like "index vacuuming", "index cleanup" and "FSM vacuming" in vacuumlazy.c so maybe "parallel index vacuuming" and "parallel index cleanup" would be better? > + /* > + * If there is already-updated result in the shared memory we > + * use it. Otherwise we pass NULL to index AMs and copy the > + * result to the shared memory segment. > + */ > + if (lvshared->indstats[idx].updated) > + result = &(lvshared->indstats[idx].stats); > > I didn't really find a need of the flag to differentiate the stats pointer from > first run to second run? I don't see any problem in passing directing the stats > and the same stats are updated in the worker side and leader side. Anyway no two > processes will do the index vacuum at same time. Am I missing something? > > Even if this flag is to identify whether the stats are updated or not before > writing them, I don't see a need of it compared to normal vacuum. > The passing stats = NULL to amvacuumcleanup and ambulkdelete means the first time execution. For example, btvacuumcleanup skips cleanup if it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or amvacuumcleanup when the first time calling. And they store the result stats to the memory allocated int the local memory. Therefore in the parallel vacuum I think that both worker and leader need to move it to the shared memory and mark it as updated as different worker could vacuum different indexes at the next time. > > + * Enter the parallel mode, allocate and initialize a DSM segment. Return > + * the memory space for storing dead tuples or NULL if no workers are prepared. > + */ > > + pcxt = CreateParallelContext("postgres", "heap_parallel_vacuum_main", > + request, true); > > But we are passing as serializable_okay flag as true, means it doesn't return > NULL. Is it expected? > > I think you're right. Since the request never be 0 and serializable_okey is true it should not return NULL. Will fix. > + initStringInfo(&buf); > + appendStringInfo(&buf, > + ngettext("launched %d parallel vacuum worker %s (planned: %d", > + "launched %d parallel vacuum workers %s (planned: %d", > + lvstate->pcxt->nworkers_launched), > + lvstate->pcxt->nworkers_launched, > + for_cleanup ? "for index cleanup" : "for index vacuum", > + lvstate->pcxt->nworkers); > + if (lvstate->options.nworkers > 0) > + appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers); > > what is the difference between planned workers and requested workers, aren't both > are same? The request is the parallel degree that is specified explicitly by user whereas the planned is the actual number we planned based on the number of indexes the table has. For example, if we do like 'VACUUM (PARALLEL 3000) tbl' where the tbl has 4 indexes, the request is 3000 and the planned is 4. Also if max_parallel_maintenance_workers is 2 the planned is 2. > > > - COMPARE_SCALAR_FIELD(options); > - COMPARE_NODE_FIELD(rels); > + if (a->options.flags != b->options.flags) > + return false; > + if (a->options.nworkers != b->options.nworkers) > + return false; > > Options is changed from SCALAR to check, but why the rels check is removed? > The options is changed from int to a structure so using SCALAR may not work > in other function like _copyVacuumStmt and etc? Agreed and will fix. > > +typedef struct VacuumOptions > +{ > + VacuumFlag flags; /* OR of VacuumFlag */ > + int nworkers; /* # of parallel vacuum workers */ > +} VacuumOptions; > > > Do we need to add NodeTag for the above structure? Because this structure is > part of VacuumStmt structure. Yes, I will add it. > > > + <application>vacuumdb</application> will require background workers, > + so make sure your <xref linkend="guc-max-parallel-workers-maintenance"/> > + setting is more than one. > > removing vacuumdb and changing it as "This option will ..."? > Agreed. > I will continue the testing of this patch and share the details. > Thank you. I'll submit the updated patch set. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Feb 1, 2019 at 2:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Jan 30, 2019 at 2:06 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > > Thank you. I'll submit the updated patch set. > I don't see any chance of getting this committed in the next few days, so, moved to next CF. Thanks for working on this and I hope you will continue work on this project. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Feb 2, 2019 at 4:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Feb 1, 2019 at 2:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Wed, Jan 30, 2019 at 2:06 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > > > > Thank you. I'll submit the updated patch set. > > > > I don't see any chance of getting this committed in the next few days, > so, moved to next CF. Thanks for working on this and I hope you will > continue work on this project. Thank you! Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Jan 31, 2019 at 10:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Thank you. I'll submit the updated patch set. > Attached the latest patch set. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Jan 30, 2019 at 2:06 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
>
>
>
> + * Before starting parallel index vacuum and parallel cleanup index we launch
> + * parallel workers. All parallel workers will exit after processed all indexes
>
> parallel vacuum index and parallel cleanup index?
>
>
ISTM we're using like "index vacuuming", "index cleanup" and "FSM
vacuming" in vacuumlazy.c so maybe "parallel index vacuuming" and
"parallel index cleanup" would be better?
OK.
> + /*
> + * If there is already-updated result in the shared memory we
> + * use it. Otherwise we pass NULL to index AMs and copy the
> + * result to the shared memory segment.
> + */
> + if (lvshared->indstats[idx].updated)
> + result = &(lvshared->indstats[idx].stats);
>
> I didn't really find a need of the flag to differentiate the stats pointer from
> first run to second run? I don't see any problem in passing directing the stats
> and the same stats are updated in the worker side and leader side. Anyway no two
> processes will do the index vacuum at same time. Am I missing something?
>
> Even if this flag is to identify whether the stats are updated or not before
> writing them, I don't see a need of it compared to normal vacuum.
>
The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
first time execution. For example, btvacuumcleanup skips cleanup if
it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
amvacuumcleanup when the first time calling. And they store the result
stats to the memory allocated int the local memory. Therefore in the
parallel vacuum I think that both worker and leader need to move it to
the shared memory and mark it as updated as different worker could
vacuum different indexes at the next time.
OK, understood the point. But for btbulkdelete whenever the stats are NULL,
it allocates the memory. So I don't see a problem with it.
The only problem is with btvacuumcleanup, when there are no dead tuples
present in the table, the btbulkdelete is not called and directly the btvacuumcleanup
is called at the end of vacuum, in that scenario, there is code flow difference
based on the stats. so why can't we use the deadtuples number to differentiate
instead of adding another flag? And also this scenario is not very often, so avoiding
memcpy for normal operations would be better. It may be a small gain, just
thought of it.
> + initStringInfo(&buf);
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker %s (planned: %d",
> + "launched %d parallel vacuum workers %s (planned: %d",
> + lvstate->pcxt->nworkers_launched),
> + lvstate->pcxt->nworkers_launched,
> + for_cleanup ? "for index cleanup" : "for index vacuum",
> + lvstate->pcxt->nworkers);
> + if (lvstate->options.nworkers > 0)
> + appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);
>
> what is the difference between planned workers and requested workers, aren't both
> are same?
The request is the parallel degree that is specified explicitly by
user whereas the planned is the actual number we planned based on the
number of indexes the table has. For example, if we do like 'VACUUM
(PARALLEL 3000) tbl' where the tbl has 4 indexes, the request is 3000
and the planned is 4. Also if max_parallel_maintenance_workers is 2
the planned is 2.
OK.
Regards,
Haribabu Kommi
Fujitsu Australia
On Tue, Feb 5, 2019 at 12:14 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > > > On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> >> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the >> first time execution. For example, btvacuumcleanup skips cleanup if >> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or >> amvacuumcleanup when the first time calling. And they store the result >> stats to the memory allocated int the local memory. Therefore in the >> parallel vacuum I think that both worker and leader need to move it to >> the shared memory and mark it as updated as different worker could >> vacuum different indexes at the next time. > > > OK, understood the point. But for btbulkdelete whenever the stats are NULL, > it allocates the memory. So I don't see a problem with it. > > The only problem is with btvacuumcleanup, when there are no dead tuples > present in the table, the btbulkdelete is not called and directly the btvacuumcleanup > is called at the end of vacuum, in that scenario, there is code flow difference > based on the stats. so why can't we use the deadtuples number to differentiate > instead of adding another flag? I don't understand your suggestion. What do we compare deadtuples number to? Could you elaborate on that please? > And also this scenario is not very often, so avoiding > memcpy for normal operations would be better. It may be a small gain, just > thought of it. > This scenario could happen periodically on an insert-only table. Additional memcpy is executed once per indexes in a vacuuming but I agree that the avoiding memcpy would be good. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Sat, Feb 9, 2019 at 11:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Feb 5, 2019 at 12:14 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
>
> On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>
>> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
>> first time execution. For example, btvacuumcleanup skips cleanup if
>> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
>> amvacuumcleanup when the first time calling. And they store the result
>> stats to the memory allocated int the local memory. Therefore in the
>> parallel vacuum I think that both worker and leader need to move it to
>> the shared memory and mark it as updated as different worker could
>> vacuum different indexes at the next time.
>
>
> OK, understood the point. But for btbulkdelete whenever the stats are NULL,
> it allocates the memory. So I don't see a problem with it.
>
> The only problem is with btvacuumcleanup, when there are no dead tuples
> present in the table, the btbulkdelete is not called and directly the btvacuumcleanup
> is called at the end of vacuum, in that scenario, there is code flow difference
> based on the stats. so why can't we use the deadtuples number to differentiate
> instead of adding another flag?
I don't understand your suggestion. What do we compare deadtuples
number to? Could you elaborate on that please?
The scenario where the stats should pass NULL to btvacuumcleanup function is
when there no dead tuples, I just think that we may use that deadtuples structure
to find out whether stats should pass NULL or not while avoiding the extra
memcpy.
> And also this scenario is not very often, so avoiding
> memcpy for normal operations would be better. It may be a small gain, just
> thought of it.
>
This scenario could happen periodically on an insert-only table.
Additional memcpy is executed once per indexes in a vacuuming but I
agree that the avoiding memcpy would be good.
Yes, understood. If possible removing the need of memcpy would be good.
The latest patch doesn't apply anymore. Needs a rebase.
Regards,
Haribabu Kommi
Fujitsu Australia
On Wed, Feb 13, 2019 at 9:32 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > > > On Sat, Feb 9, 2019 at 11:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Tue, Feb 5, 2019 at 12:14 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote: >> > >> > >> > On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> >> >> >> >> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the >> >> first time execution. For example, btvacuumcleanup skips cleanup if >> >> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or >> >> amvacuumcleanup when the first time calling. And they store the result >> >> stats to the memory allocated int the local memory. Therefore in the >> >> parallel vacuum I think that both worker and leader need to move it to >> >> the shared memory and mark it as updated as different worker could >> >> vacuum different indexes at the next time. >> > >> > >> > OK, understood the point. But for btbulkdelete whenever the stats are NULL, >> > it allocates the memory. So I don't see a problem with it. >> > >> > The only problem is with btvacuumcleanup, when there are no dead tuples >> > present in the table, the btbulkdelete is not called and directly the btvacuumcleanup >> > is called at the end of vacuum, in that scenario, there is code flow difference >> > based on the stats. so why can't we use the deadtuples number to differentiate >> > instead of adding another flag? >> >> I don't understand your suggestion. What do we compare deadtuples >> number to? Could you elaborate on that please? > > > The scenario where the stats should pass NULL to btvacuumcleanup function is > when there no dead tuples, I just think that we may use that deadtuples structure > to find out whether stats should pass NULL or not while avoiding the extra > memcpy. > Thank you for your explanation. I understood. Maybe I'm worrying too much but I'm concernced compatibility; currently we handle indexes individually. So if there is an index access method whose ambulkdelete returns NULL at the first call but returns a palloc'd struct at the second time or other, that doesn't work fine. The documentation says that passed-in 'stats' is NULL at the first time call of ambulkdelete but doesn't say about the second time or more. Index access methods may expect that the passed-in 'stats' is the same as what they has returned last time. So I think to add an extra flag for keeping comptibility. >> >> > And also this scenario is not very often, so avoiding >> > memcpy for normal operations would be better. It may be a small gain, just >> > thought of it. >> > >> >> This scenario could happen periodically on an insert-only table. >> Additional memcpy is executed once per indexes in a vacuuming but I >> agree that the avoiding memcpy would be good. > > > Yes, understood. If possible removing the need of memcpy would be good. > The latest patch doesn't apply anymore. Needs a rebase. > Thank you. Attached the rebased patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Feb 13, 2019 at 9:32 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>
>
> On Sat, Feb 9, 2019 at 11:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Tue, Feb 5, 2019 at 12:14 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>> >
>> >
>> > On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> >>
>> >>
>> >> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
>> >> first time execution. For example, btvacuumcleanup skips cleanup if
>> >> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
>> >> amvacuumcleanup when the first time calling. And they store the result
>> >> stats to the memory allocated int the local memory. Therefore in the
>> >> parallel vacuum I think that both worker and leader need to move it to
>> >> the shared memory and mark it as updated as different worker could
>> >> vacuum different indexes at the next time.
>> >
>> >
>> > OK, understood the point. But for btbulkdelete whenever the stats are NULL,
>> > it allocates the memory. So I don't see a problem with it.
>> >
>> > The only problem is with btvacuumcleanup, when there are no dead tuples
>> > present in the table, the btbulkdelete is not called and directly the btvacuumcleanup
>> > is called at the end of vacuum, in that scenario, there is code flow difference
>> > based on the stats. so why can't we use the deadtuples number to differentiate
>> > instead of adding another flag?
>>
>> I don't understand your suggestion. What do we compare deadtuples
>> number to? Could you elaborate on that please?
>
>
> The scenario where the stats should pass NULL to btvacuumcleanup function is
> when there no dead tuples, I just think that we may use that deadtuples structure
> to find out whether stats should pass NULL or not while avoiding the extra
> memcpy.
>
Thank you for your explanation. I understood. Maybe I'm worrying too
much but I'm concernced compatibility; currently we handle indexes
individually. So if there is an index access method whose ambulkdelete
returns NULL at the first call but returns a palloc'd struct at the
second time or other, that doesn't work fine.
The documentation says that passed-in 'stats' is NULL at the first
time call of ambulkdelete but doesn't say about the second time or
more. Index access methods may expect that the passed-in 'stats' is
the same as what they has returned last time. So I think to add an
extra flag for keeping comptibility.
I checked some of the ambulkdelete functions, and they are not returning
a NULL data whenever those functions are called. But the palloc'd structure
doesn't get filled with the details.
IMO, there is no need of any extra code that is required for parallel vacuum
compared to normal vacuum.
Regards,
Haribabu Kommi
Fujitsu Australia
On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Thank you. Attached the rebased patch.
I ran some performance tests to compare the parallelism benefits,
but I got some strange results of performance overhead, may be it is
because, I tested it on my laptop.
FYI,
Table schema:
create table tbl(f1 int, f2 char(100), f3 float4, f4 char(100), f5 float8, f6 char(100), f7 bigint);
Tbl with 3 indexes
1000 record deletion
master - 22ms
patch - 25ms with 0 parallel workers
patch - 43ms with 1 parallel worker
patch - 72ms with 2 parallel workers
10000 record deletion
master - 52ms
patch - 56ms with 0 parallel workers
patch - 79ms with 1 parallel worker
patch - 86ms with 2 parallel workers
100000 record deletion
master - 410ms
patch - 379ms with 0 parallel workers
patch - 330ms with 1 parallel worker
patch - 289ms with 2 parallel workers
Tbl with 5 indexes
1000 record deletion
master - 28ms
patch - 34ms with 0 parallel workers
patch - 86ms with 2 parallel workers
patch - 106ms with 4 parallel workers
10000 record deletion
master - 58ms
patch - 63ms with 0 parallel workers
patch - 101ms with 2 parallel workers
patch - 118ms with 4 parallel workers
100000 record deletion
master - 632ms
patch - 490ms with 0 parallel workers
patch - 455ms with 2 parallel workers
patch - 403ms with 4 parallel workers
Tbl with 7 indexes
1000 record deletion
master - 35ms
patch - 44ms with 0 parallel workers
patch - 93ms with 2 parallel workers
patch - 110ms with 4 parallel workers
patch - 123ms with 6 parallel workers
10000 record deletion
master - 76ms
patch - 78ms with 0 parallel workers
patch - 135ms with 2 parallel workers
patch - 143ms with 4 parallel workers
patch - 139ms with 6 parallel workers
100000 record deletion
master - 641ms
patch - 656ms with 0 parallel workers
patch - 613ms with 2 parallel workers
patch - 735ms with 4 parallel workers
patch - 679ms with 6 parallel workers
Regards,
Haribabu Kommi
Fujitsu Australia
On Tue, Feb 26, 2019 at 1:35 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > > On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> Thank you. Attached the rebased patch. > > > I ran some performance tests to compare the parallelism benefits, Thank you for testing! > but I got some strange results of performance overhead, may be it is > because, I tested it on my laptop. Hmm, I think the parallel vacuum would help for heavy workloads like a big table with multiple indexes. In your test result, all executions are completed within 1 sec, which seems to be one use case that the parallel vacuum wouldn't help. I suspect that the table is small, right? Anyway I'll also do performance tests. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Sat, Feb 23, 2019 at 10:28 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > > > On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Wed, Feb 13, 2019 at 9:32 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote: >> > >> > >> > On Sat, Feb 9, 2019 at 11:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> >> >> On Tue, Feb 5, 2019 at 12:14 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote: >> >> > >> >> > >> >> > On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> >> >> >> >> >> >> >> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the >> >> >> first time execution. For example, btvacuumcleanup skips cleanup if >> >> >> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or >> >> >> amvacuumcleanup when the first time calling. And they store the result >> >> >> stats to the memory allocated int the local memory. Therefore in the >> >> >> parallel vacuum I think that both worker and leader need to move it to >> >> >> the shared memory and mark it as updated as different worker could >> >> >> vacuum different indexes at the next time. >> >> > >> >> > >> >> > OK, understood the point. But for btbulkdelete whenever the stats are NULL, >> >> > it allocates the memory. So I don't see a problem with it. >> >> > >> >> > The only problem is with btvacuumcleanup, when there are no dead tuples >> >> > present in the table, the btbulkdelete is not called and directly the btvacuumcleanup >> >> > is called at the end of vacuum, in that scenario, there is code flow difference >> >> > based on the stats. so why can't we use the deadtuples number to differentiate >> >> > instead of adding another flag? >> >> >> >> I don't understand your suggestion. What do we compare deadtuples >> >> number to? Could you elaborate on that please? >> > >> > >> > The scenario where the stats should pass NULL to btvacuumcleanup function is >> > when there no dead tuples, I just think that we may use that deadtuples structure >> > to find out whether stats should pass NULL or not while avoiding the extra >> > memcpy. >> > >> >> Thank you for your explanation. I understood. Maybe I'm worrying too >> much but I'm concernced compatibility; currently we handle indexes >> individually. So if there is an index access method whose ambulkdelete >> returns NULL at the first call but returns a palloc'd struct at the >> second time or other, that doesn't work fine. >> >> The documentation says that passed-in 'stats' is NULL at the first >> time call of ambulkdelete but doesn't say about the second time or >> more. Index access methods may expect that the passed-in 'stats' is >> the same as what they has returned last time. So I think to add an >> extra flag for keeping comptibility. > > > I checked some of the ambulkdelete functions, and they are not returning > a NULL data whenever those functions are called. But the palloc'd structure > doesn't get filled with the details. > > IMO, there is no need of any extra code that is required for parallel vacuum > compared to normal vacuum. > Hmm, I think that this code is necessary to faithfully keep the same index vacuum behavior, especially for communication between lazy vacuum and IAMs, as it is. The IAMs in postgres don't worry about that but other third party AMs might not, and it might be developed in the future. On the other hand, I can understand your concerns; if such IAM is quite rare we might not need to make the code complicated needlessly. I'd like to hear more opinions also from other hackers. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Feb 14, 2019 at 5:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Thank you. Attached the rebased patch. Here are some review comments. + started by a single utility command. Currently, the parallel + utility commands that support the use of parallel workers are + <command>CREATE INDEX</command> and <command>VACUUM</command> + without <literal>FULL</literal> option, and only when building + a B-tree index. Parallel workers are taken from the pool of That sentence is garbled. The end part about b-tree indexes applies only to CREATE INDEX, not to VACUUM, since VACUUM does build indexes. + Vacuum index and cleanup index in parallel + <replaceable class="parameter">N</replaceable> background workers (for the detail + of each vacuum phases, please refer to <xref linkend="vacuum-phases"/>. If the I have two problems with this. One is that I can't understand the English very well. I think you mean something like: "Perform the 'vacuum index' and 'cleanup index' phases of VACUUM in parallel using N background workers," but I'm not entirely sure. The other is that if that is what you mean, I don't think it's a sufficient description. Users need to understand whether, for example, only one worker can be used per index, or whether the work for a single index can be split across workers. + parallel degree <replaceable class="parameter">N</replaceable> is omitted, + then <command>VACUUM</command> decides the number of workers based on + number of indexes on the relation which further limited by + <xref linkend="guc-max-parallel-workers-maintenance"/>. Also if this option Now this makes it sound like it's one worker per index, but you could be more explicit about it. + is specified multile times, the last parallel degree + <replaceable class="parameter">N</replaceable> is considered into the account. Typo, but I'd just delete this sentence altogether; the behavior if the option is multiply specified seems like a triviality that need not be documented. + Setting a value for <literal>parallel_workers</literal> via + <xref linkend="sql-altertable"/> also controls how many parallel + worker processes will be requested by a <command>VACUUM</command> + against the table. This setting is overwritten by setting + <replaceable class="parameter">N</replaceable> of <literal>PARALLEL</literal> + option. I wonder if we really want this behavior. Should a setting that controls the degree of parallelism when scanning the table also affect VACUUM? I tend to think that we probably don't ever want VACUUM of a table to be parallel by default, but rather something that the user must explicitly request. Happy to hear other opinions. If we do want this behavior, I think this should be written differently, something like this: The PARALLEL N option to VACUUM takes precedence over this option. + * parallel mode nor destories the parallel context. For updating the index Spelling. +/* DSM keys for parallel lazy vacuum */ +#define PARALLEL_VACUUM_KEY_SHARED UINT64CONST(0xFFFFFFFFFFF00001) +#define PARALLEL_VACUUM_KEY_DEAD_TUPLES UINT64CONST(0xFFFFFFFFFFF00002) +#define PARALLEL_VACUUM_KEY_QUERY_TEXT UINT64CONST(0xFFFFFFFFFFF00003) Any special reason not to use just 1, 2, 3 here? The general infrastructure stuff uses high numbers to avoid conflicting with plan_node_id values, but end clients of the parallel infrastructure can generally just use small integers. + bool updated; /* is the stats updated? */ is -> are + * LVDeadTuples controls the dead tuple TIDs collected during heap scan. what do you mean by "controls", exactly? stores? + * This is allocated in a dynamic shared memory segment when parallel + * lazy vacuum mode, or allocated in a local memory. If this is in DSM, then max_tuples is a wart, I think. We can't grow the segment at that point. I'm suspicious that we need a better design here. It looks like you gather all of the dead tuples in backend-local memory and then allocate an equal amount of DSM to copy them. But that means that we are using twice as much memory, which seems pretty bad. You'd have to do that at least momentarily no matter what, but it's not obvious that the backend-local copy is ever freed. There's another patch kicking around to allocate memory for vacuum in chunks rather than preallocating the whole slab of memory at once; we might want to think about getting that committed first and then having this build on top of it. At least we need something smarter than this. -heap_vacuum_rel(Relation onerel, int options, VacuumParams *params, +heap_vacuum_rel(Relation onerel, VacuumOptions options, VacuumParams *params, We generally avoid passing a struct by value; copying the struct can be expensive and having multiple shallow copies of the same data sometimes leads to surprising results. I think it might be a good idea to propose a preliminary refactoring patch that invents VacuumOptions and gives it just a single 'int' member and refactors everything to use it, and then that can be committed first. It should pass a pointer, though, not the actual struct. + LVState *lvstate; It's not clear to me why we need this new LVState thing. What's the motivation for that? If it's a good idea, could it be done as a separate, preparatory patch? It seems to be responsible for a lot of code churn in this patch. It also leads to strange stuff like this: ereport(elevel, - (errmsg("scanned index \"%s\" to remove %d row versions", + (errmsg("scanned index \"%s\" to remove %d row versions %s", RelationGetRelationName(indrel), - vacrelstats->num_dead_tuples), + dead_tuples->num_tuples, + IsParallelWorker() ? "by parallel vacuum worker" : ""), This doesn't seem to be great grammar, and translation guidelines generally discourage this sort of incremental message construction quite strongly. Since the user can probably infer what happened by a suitable choice of log_line_prefix, I'm not totally sure this is worth doing in the first place, but if we're going to do it, it should probably have two completely separate message strings and pick between them using IsParallelWorker(), rather than building it up incrementally like this. +compute_parallel_workers(Relation rel, int nrequests, int nindexes) I think 'nrequets' is meant to be 'nrequested'. It isn't the number of requests; it's the number of workers that were requested. + /* quick exit if no workers are prepared, e.g. under serializable isolation */ That comment makes very little sense in this context. + /* Report parallel vacuum worker information */ + initStringInfo(&buf); + appendStringInfo(&buf, + ngettext("launched %d parallel vacuum worker %s (planned: %d", + "launched %d parallel vacuum workers %s (planned: %d", + lvstate->pcxt->nworkers_launched), + lvstate->pcxt->nworkers_launched, + for_cleanup ? "for index cleanup" : "for index vacuum", + lvstate->pcxt->nworkers); + if (lvstate->options.nworkers > 0) + appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers); + + appendStringInfo(&buf, ")"); + ereport(elevel, (errmsg("%s", buf.data))); This is another example of incremental message construction, again violating translation guidelines. + WaitForParallelWorkersToAttach(lvstate->pcxt); Why? + /* + * If there is already-updated result in the shared memory we use it. + * Otherwise we pass NULL to index AMs, meaning it's first time call, + * and copy the result to the shared memory segment. + */ I'm probably missing something here, but isn't the intention that we only do each index once? If so, how would there be anything there already? Once from for_cleanup = false and once for for_cleanup = true? + if (a->options.flags != b->options.flags) + return false; + if (a->options.nworkers != b->options.nworkers) + return false; You could just do COMPARE_SCALAR_FIELD(options.flags); COMPARE_SCALAR_FIELD(options.nworkers); -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Feb 28, 2019 at 2:44 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Feb 14, 2019 at 5:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Thank you. Attached the rebased patch. > > Here are some review comments. Thank you for reviewing the patches! > > + started by a single utility command. Currently, the parallel > + utility commands that support the use of parallel workers are > + <command>CREATE INDEX</command> and <command>VACUUM</command> > + without <literal>FULL</literal> option, and only when building > + a B-tree index. Parallel workers are taken from the pool of > > That sentence is garbled. The end part about b-tree indexes applies > only to CREATE INDEX, not to VACUUM, since VACUUM does build indexes. Fixed. > > + Vacuum index and cleanup index in parallel > + <replaceable class="parameter">N</replaceable> background > workers (for the detail > + of each vacuum phases, please refer to <xref > linkend="vacuum-phases"/>. If the > > I have two problems with this. One is that I can't understand the > English very well. I think you mean something like: "Perform the > 'vacuum index' and 'cleanup index' phases of VACUUM in parallel using > N background workers," but I'm not entirely sure. The other is that > if that is what you mean, I don't think it's a sufficient description. > Users need to understand whether, for example, only one worker can be > used per index, or whether the work for a single index can be split > across workers. > > + parallel degree <replaceable class="parameter">N</replaceable> > is omitted, > + then <command>VACUUM</command> decides the number of workers based on > + number of indexes on the relation which further limited by > + <xref linkend="guc-max-parallel-workers-maintenance"/>. Also if > this option > > Now this makes it sound like it's one worker per index, but you could > be more explicit about it. Fixed. > > + is specified multile times, the last parallel degree > + <replaceable class="parameter">N</replaceable> is considered > into the account. > > Typo, but I'd just delete this sentence altogether; the behavior if > the option is multiply specified seems like a triviality that need not > be documented. Understood, removed. > > + Setting a value for <literal>parallel_workers</literal> via > + <xref linkend="sql-altertable"/> also controls how many parallel > + worker processes will be requested by a <command>VACUUM</command> > + against the table. This setting is overwritten by setting > + <replaceable class="parameter">N</replaceable> of > <literal>PARALLEL</literal> > + option. > > I wonder if we really want this behavior. Should a setting that > controls the degree of parallelism when scanning the table also affect > VACUUM? I tend to think that we probably don't ever want VACUUM of a > table to be parallel by default, but rather something that the user > must explicitly request. Happy to hear other opinions. If we do want > this behavior, I think this should be written differently, something > like this: The PARALLEL N option to VACUUM takes precedence over this > option. For example, I can imagine a use case where a batch job does parallel vacuum to some tables in a maintenance window. The batch operation will need to compute and specify the degree of parallelism every time according to for instance the number of indexes, which would be troublesome. But if we can set the degree of parallelism for each tables it can just to do 'VACUUM (PARALLEL)'. > > + * parallel mode nor destories the parallel context. For updating the index > > Spelling. Fixed. > > +/* DSM keys for parallel lazy vacuum */ > +#define PARALLEL_VACUUM_KEY_SHARED UINT64CONST(0xFFFFFFFFFFF00001) > +#define PARALLEL_VACUUM_KEY_DEAD_TUPLES UINT64CONST(0xFFFFFFFFFFF00002) > +#define PARALLEL_VACUUM_KEY_QUERY_TEXT UINT64CONST(0xFFFFFFFFFFF00003) > > Any special reason not to use just 1, 2, 3 here? The general > infrastructure stuff uses high numbers to avoid conflicting with > plan_node_id values, but end clients of the parallel infrastructure > can generally just use small integers. It seems that I was worrying unnecessarily, changed to 1, 2, 3. > > + bool updated; /* is the stats updated? */ > > is -> are > > + * LVDeadTuples controls the dead tuple TIDs collected during heap scan. > > what do you mean by "controls", exactly? stores? Fixed. > > + * This is allocated in a dynamic shared memory segment when parallel > + * lazy vacuum mode, or allocated in a local memory. > > If this is in DSM, then max_tuples is a wart, I think. We can't grow > the segment at that point. I'm suspicious that we need a better > design here. It looks like you gather all of the dead tuples in > backend-local memory and then allocate an equal amount of DSM to copy > them. But that means that we are using twice as much memory, which > seems pretty bad. You'd have to do that at least momentarily no > matter what, but it's not obvious that the backend-local copy is ever > freed. Hmm, the current design is more simple; only the leader process scans heap and save dead tuples TID to DSM. The DSM is allocated at once when starting lazy vacuum and we never need to enlarge DSM . Also we can use the same code around heap vacuum and collecting dead tuples for both single process vacuum and parallel vacuum. Once index vacuum is completed, the leader process reinitializes DSM and reuse it in the next time. > There's another patch kicking around to allocate memory for > vacuum in chunks rather than preallocating the whole slab of memory at > once; we might want to think about getting that committed first and > then having this build on top of it. At least we need something > smarter than this. Since the parallel vacuum uses memory in the same manner as the single process vacuum it's not deteriorated. I'd agree that that patch is more smarter and this patch can be built on top of it but I'm concerned that there two proposals on that thread and the discussion has not been active for 8 months. I wonder if it would be worth to think of improving the memory allocating based on that patch after the parallel vacuum get committed. > > -heap_vacuum_rel(Relation onerel, int options, VacuumParams *params, > +heap_vacuum_rel(Relation onerel, VacuumOptions options, VacuumParams *params, > > We generally avoid passing a struct by value; copying the struct can > be expensive and having multiple shallow copies of the same data > sometimes leads to surprising results. I think it might be a good > idea to propose a preliminary refactoring patch that invents > VacuumOptions and gives it just a single 'int' member and refactors > everything to use it, and then that can be committed first. It should > pass a pointer, though, not the actual struct. Agreed. I'll separate patches and propose it. > > + LVState *lvstate; > > It's not clear to me why we need this new LVState thing. What's the > motivation for that? If it's a good idea, could it be done as a > separate, preparatory patch? It seems to be responsible for a lot of > code churn in this patch. It also leads to strange stuff like this: The main motivations are refactoring and improving readability but it's mainly for the previous version patch which implements parallel heap vacuum. It might no longer need here. I'll try to implement without LVState. Thank you. > > ereport(elevel, > - (errmsg("scanned index \"%s\" to remove %d row versions", > + (errmsg("scanned index \"%s\" to remove %d row versions %s", > RelationGetRelationName(indrel), > - vacrelstats->num_dead_tuples), > + dead_tuples->num_tuples, > + IsParallelWorker() ? "by parallel vacuum worker" : ""), > > This doesn't seem to be great grammar, and translation guidelines > generally discourage this sort of incremental message construction > quite strongly. Since the user can probably infer what happened by a > suitable choice of log_line_prefix, I'm not totally sure this is worth > doing in the first place, but if we're going to do it, it should > probably have two completely separate message strings and pick between > them using IsParallelWorker(), rather than building it up > incrementally like this. Fixed. > > +compute_parallel_workers(Relation rel, int nrequests, int nindexes) > > I think 'nrequets' is meant to be 'nrequested'. It isn't the number > of requests; it's the number of workers that were requested. Fixed. > > + /* quick exit if no workers are prepared, e.g. under serializable isolation */ > > That comment makes very little sense in this context. Fixed. > > + /* Report parallel vacuum worker information */ > + initStringInfo(&buf); > + appendStringInfo(&buf, > + ngettext("launched %d parallel vacuum worker %s (planned: %d", > + "launched %d parallel vacuum workers %s (planned: %d", > + lvstate->pcxt->nworkers_launched), > + lvstate->pcxt->nworkers_launched, > + for_cleanup ? "for index cleanup" : "for index vacuum", > + lvstate->pcxt->nworkers); > + if (lvstate->options.nworkers > 0) > + appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers); > + > + appendStringInfo(&buf, ")"); > + ereport(elevel, (errmsg("%s", buf.data))); > > This is another example of incremental message construction, again > violating translation guidelines. Fixed. > > + WaitForParallelWorkersToAttach(lvstate->pcxt); > > Why? Oh not necessary, removed. > > + /* > + * If there is already-updated result in the shared memory we use it. > + * Otherwise we pass NULL to index AMs, meaning it's first time call, > + * and copy the result to the shared memory segment. > + */ > > I'm probably missing something here, but isn't the intention that we > only do each index once? If so, how would there be anything there > already? Once from for_cleanup = false and once for for_cleanup = > true? We call ambulkdelete (for_cleanup = false) 0 or more times for each index and call amvacuumcleanup (for_cleanup = true) at the end. In the first time calling either ambulkdelete or amvacuumcleanup the lazy vacuum must pass NULL to them. They return either palloc'd IndexBulkDeleteResult or NULL. If they returns the former the lazy vacuum must pass it to them again at the next time. In current design, since there is no guarantee that an index is always processed by the same vacuum process each vacuum processes save the result to DSM in order to share those results among vacuum processes. The 'updated' flags indicates that its slot is used. So we can pass the address of DSM if 'updated' is true, otherwise pass NULL. > > + if (a->options.flags != b->options.flags) > + return false; > + if (a->options.nworkers != b->options.nworkers) > + return false; > > You could just do COMPARE_SCALAR_FIELD(options.flags); > COMPARE_SCALAR_FIELD(options.nworkers); Fixed. Almost comments I got have been incorporated to the local branch but a few comments need discussion. I'll submit the updated version patch once I addressed all of comments. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Mar 1, 2019 at 12:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I wonder if we really want this behavior. Should a setting that > > controls the degree of parallelism when scanning the table also affect > > VACUUM? I tend to think that we probably don't ever want VACUUM of a > > table to be parallel by default, but rather something that the user > > must explicitly request. Happy to hear other opinions. If we do want > > this behavior, I think this should be written differently, something > > like this: The PARALLEL N option to VACUUM takes precedence over this > > option. > > For example, I can imagine a use case where a batch job does parallel > vacuum to some tables in a maintenance window. The batch operation > will need to compute and specify the degree of parallelism every time > according to for instance the number of indexes, which would be > troublesome. But if we can set the degree of parallelism for each > tables it can just to do 'VACUUM (PARALLEL)'. True, but the setting in question would also affect the behavior of sequential scans and index scans. TBH, I'm not sure that the parallel_workers reloption is really a great design as it is: is hard-coding the number of workers really what people want? Do they really want the same degree of parallelism for sequential scans and index scans? Why should they want the same degree of parallelism also for VACUUM? Maybe they do, and maybe somebody explain why they do, but as of now, it's not obvious to me why that should be true. > Since the parallel vacuum uses memory in the same manner as the single > process vacuum it's not deteriorated. I'd agree that that patch is > more smarter and this patch can be built on top of it but I'm > concerned that there two proposals on that thread and the discussion > has not been active for 8 months. I wonder if it would be worth to > think of improving the memory allocating based on that patch after the > parallel vacuum get committed. Well, I think we can't just say "oh, this patch is going to use twice as much memory as before," which is what it looks like it's doing right now. If you think it's not doing that, can you explain further? > Agreed. I'll separate patches and propose it. Cool. Probably best to keep that on this thread. > The main motivations are refactoring and improving readability but > it's mainly for the previous version patch which implements parallel > heap vacuum. It might no longer need here. I'll try to implement > without LVState. Thank you. Oh, OK. > > + /* > > + * If there is already-updated result in the shared memory we use it. > > + * Otherwise we pass NULL to index AMs, meaning it's first time call, > > + * and copy the result to the shared memory segment. > > + */ > > > > I'm probably missing something here, but isn't the intention that we > > only do each index once? If so, how would there be anything there > > already? Once from for_cleanup = false and once for for_cleanup = > > true? > > We call ambulkdelete (for_cleanup = false) 0 or more times for each > index and call amvacuumcleanup (for_cleanup = true) at the end. In the > first time calling either ambulkdelete or amvacuumcleanup the lazy > vacuum must pass NULL to them. They return either palloc'd > IndexBulkDeleteResult or NULL. If they returns the former the lazy > vacuum must pass it to them again at the next time. In current design, > since there is no guarantee that an index is always processed by the > same vacuum process each vacuum processes save the result to DSM in > order to share those results among vacuum processes. The 'updated' > flags indicates that its slot is used. So we can pass the address of > DSM if 'updated' is true, otherwise pass NULL. Ah, OK. Thanks for explaining. > Almost comments I got have been incorporated to the local branch but a > few comments need discussion. I'll submit the updated version patch > once I addressed all of comments. Cool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Mar 2, 2019 at 3:54 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Mar 1, 2019 at 12:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > I wonder if we really want this behavior. Should a setting that > > > controls the degree of parallelism when scanning the table also affect > > > VACUUM? I tend to think that we probably don't ever want VACUUM of a > > > table to be parallel by default, but rather something that the user > > > must explicitly request. Happy to hear other opinions. If we do want > > > this behavior, I think this should be written differently, something > > > like this: The PARALLEL N option to VACUUM takes precedence over this > > > option. > > > > For example, I can imagine a use case where a batch job does parallel > > vacuum to some tables in a maintenance window. The batch operation > > will need to compute and specify the degree of parallelism every time > > according to for instance the number of indexes, which would be > > troublesome. But if we can set the degree of parallelism for each > > tables it can just to do 'VACUUM (PARALLEL)'. > > True, but the setting in question would also affect the behavior of > sequential scans and index scans. TBH, I'm not sure that the > parallel_workers reloption is really a great design as it is: is > hard-coding the number of workers really what people want? Do they > really want the same degree of parallelism for sequential scans and > index scans? Why should they want the same degree of parallelism also > for VACUUM? Maybe they do, and maybe somebody explain why they do, > but as of now, it's not obvious to me why that should be true. I think that there are users who want to specify the degree of parallelism. I think that hard-coding the number of workers would be good design for something like VACUUM which is a simple operation for single object; since there are no joins, aggregations it'd be relatively easy to compute it. That's why the patch introduces PARALLEL N option as well. I think that a reloption for parallel vacuum would be just a way to save the degree of parallelism. And I agree that users don't want to use same degree of parallelism for VACUUM, so maybe it'd better to add new reloption like parallel_vacuum_workers. On the other hand, it can be a separate patch, I can remove the reloption part from this patch and will propose it when there are requests. > > > Since the parallel vacuum uses memory in the same manner as the single > > process vacuum it's not deteriorated. I'd agree that that patch is > > more smarter and this patch can be built on top of it but I'm > > concerned that there two proposals on that thread and the discussion > > has not been active for 8 months. I wonder if it would be worth to > > think of improving the memory allocating based on that patch after the > > parallel vacuum get committed. > > Well, I think we can't just say "oh, this patch is going to use twice > as much memory as before," which is what it looks like it's doing > right now. If you think it's not doing that, can you explain further? In the current design, the leader process allocates the whole DSM at once when starting and records dead tuple's TIDs to the DSM. This is the same behaviour as before except for it's recording dead tuples TID to the shared memory segment. Once index vacuuming finished the leader process re-initialize DSM for the next time. So parallel vacuum uses the same amount of memory as before during execution. > > > Agreed. I'll separate patches and propose it. > > Cool. Probably best to keep that on this thread. Understood. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Mar 4, 2019 at 10:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Sat, Mar 2, 2019 at 3:54 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Fri, Mar 1, 2019 at 12:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > I wonder if we really want this behavior. Should a setting that > > > > controls the degree of parallelism when scanning the table also affect > > > > VACUUM? I tend to think that we probably don't ever want VACUUM of a > > > > table to be parallel by default, but rather something that the user > > > > must explicitly request. Happy to hear other opinions. If we do want > > > > this behavior, I think this should be written differently, something > > > > like this: The PARALLEL N option to VACUUM takes precedence over this > > > > option. > > > > > > For example, I can imagine a use case where a batch job does parallel > > > vacuum to some tables in a maintenance window. The batch operation > > > will need to compute and specify the degree of parallelism every time > > > according to for instance the number of indexes, which would be > > > troublesome. But if we can set the degree of parallelism for each > > > tables it can just to do 'VACUUM (PARALLEL)'. > > > > True, but the setting in question would also affect the behavior of > > sequential scans and index scans. TBH, I'm not sure that the > > parallel_workers reloption is really a great design as it is: is > > hard-coding the number of workers really what people want? Do they > > really want the same degree of parallelism for sequential scans and > > index scans? Why should they want the same degree of parallelism also > > for VACUUM? Maybe they do, and maybe somebody explain why they do, > > but as of now, it's not obvious to me why that should be true. > > I think that there are users who want to specify the degree of > parallelism. I think that hard-coding the number of workers would be > good design for something like VACUUM which is a simple operation for > single object; since there are no joins, aggregations it'd be > relatively easy to compute it. That's why the patch introduces > PARALLEL N option as well. I think that a reloption for parallel > vacuum would be just a way to save the degree of parallelism. And I > agree that users don't want to use same degree of parallelism for > VACUUM, so maybe it'd better to add new reloption like > parallel_vacuum_workers. On the other hand, it can be a separate > patch, I can remove the reloption part from this patch and will > propose it when there are requests. > Okay, attached the latest version of patch set. I've incorporated all comments I got and separated the patch for making vacuum options a Node (0001 patch). And the patch doesn't use parallel_workers. It might be proposed in the another form again in the future if requested. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Wed, Mar 6, 2019 at 1:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Okay, attached the latest version of patch set. I've incorporated all > comments I got and separated the patch for making vacuum options a > Node (0001 patch). And the patch doesn't use parallel_workers. It > might be proposed in the another form again in the future if > requested. Why make it a Node? I mean I think a struct makes sense, but what's the point of giving it a NodeTag? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Mar 7, 2019 at 2:54 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Mar 6, 2019 at 1:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Okay, attached the latest version of patch set. I've incorporated all > > comments I got and separated the patch for making vacuum options a > > Node (0001 patch). And the patch doesn't use parallel_workers. It > > might be proposed in the another form again in the future if > > requested. > > Why make it a Node? I mean I think a struct makes sense, but what's > the point of giving it a NodeTag? > Well, the main point is consistency with other nodes and keep the code clean. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Mar 6, 2019 at 10:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Why make it a Node? I mean I think a struct makes sense, but what's > > the point of giving it a NodeTag? > > Well, the main point is consistency with other nodes and keep the code clean. It looks to me like if we made it a plain struct rather than a node, and embedded that struct (not a pointer) in VacuumStmt, then what would happen is that _copyVacuumStmt and _equalVacuumStmt would have clauses for each vacuum option individually, with a dot, like COPY_SCALAR_FIELD(options.flags). Also, the grammar production for VacuumStmt would need to be jiggered around a bit; the way that options consolidation is done there would have to be changed. Neither of those things sound terribly hard or terribly messy, but on the other hand I guess there's nothing really wrong with the way you did it, either ... anybody else have an opinion? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 8, 2019 at 12:22 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Mar 6, 2019 at 10:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > Why make it a Node? I mean I think a struct makes sense, but what's > > > the point of giving it a NodeTag? > > > > Well, the main point is consistency with other nodes and keep the code clean. > > It looks to me like if we made it a plain struct rather than a node, > and embedded that struct (not a pointer) in VacuumStmt, then what > would happen is that _copyVacuumStmt and _equalVacuumStmt would have > clauses for each vacuum option individually, with a dot, like > COPY_SCALAR_FIELD(options.flags). > > Also, the grammar production for VacuumStmt would need to be jiggered > around a bit; the way that options consolidation is done there would > have to be changed. > > Neither of those things sound terribly hard or terribly messy, but on > the other hand I guess there's nothing really wrong with the way you > did it, either ... anybody else have an opinion? > I don't have a strong opinion but the using a Node would be more suitable in the future when we add more options to vacuum. And it seems to me that it's unlikely to change a Node to a plain struct. So there is an idea of doing it now anyway if we might need to do it someday. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Mar 13, 2019 at 1:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I don't have a strong opinion but the using a Node would be more > suitable in the future when we add more options to vacuum. And it > seems to me that it's unlikely to change a Node to a plain struct. So > there is an idea of doing it now anyway if we might need to do it > someday. I just tried to apply 0001 again and noticed a conflict in the autovac_table structure in postmaster.c. That conflict got me thinking: aren't parameters and options an awful lot alike? Why do we need to pass around a VacuumOptions structure *and* a VacuumParams structure to all of these functions? Couldn't we just have one? That led to the attached patch, which just gets rid of the separate options flag and folds it into VacuumParams. If we took this approach, the degree of parallelism would just be another thing that would get added to VacuumParams, and VacuumOptions wouldn't end up existing at all. This patch does not address the question of what the *parse tree* representation of the PARALLEL option should look like; the idea would be that ExecVacuum() would need to extra the value for that option and put it into VacuumParams just as it already does for various other things in VacuumParams. Maybe the most natural approach would be to convert the grammar productions for the VACUUM options list so that they just build a list of DefElems, and then have ExecVacuum() iterate over that list and make sense of it, as for example ExplainQuery() already does. I kinda like the idea of doing it that way, but then I came up with it, so maybe you or others will think it's terrible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Thu, Mar 14, 2019 at 6:41 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Mar 13, 2019 at 1:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I don't have a strong opinion but the using a Node would be more > > suitable in the future when we add more options to vacuum. And it > > seems to me that it's unlikely to change a Node to a plain struct. So > > there is an idea of doing it now anyway if we might need to do it > > someday. > > I just tried to apply 0001 again and noticed a conflict in the > autovac_table structure in postmaster.c. > > That conflict got me thinking: aren't parameters and options an awful > lot alike? Why do we need to pass around a VacuumOptions structure > *and* a VacuumParams structure to all of these functions? Couldn't we > just have one? That led to the attached patch, which just gets rid of > the separate options flag and folds it into VacuumParams. Indeed. I like this approach. The comment of vacuum() says, * options is a bitmask of VacuumOption flags, indicating what to do. * (snip) * params contains a set of parameters that can be used to customize the * behavior. It seems to me that the purpose of both variables are different. But it would be acceptable even if we merge them. BTW your patch seems to not apply to the current HEAD cleanly and to need to update the comment of vacuum(). > If we took > this approach, the degree of parallelism would just be another thing > that would get added to VacuumParams, and VacuumOptions wouldn't end > up existing at all. > Agreed. > This patch does not address the question of what the *parse tree* > representation of the PARALLEL option should look like; the idea would > be that ExecVacuum() would need to extra the value for that option and > put it into VacuumParams just as it already does for various other > things in VacuumParams. Maybe the most natural approach would be to > convert the grammar productions for the VACUUM options list so that > they just build a list of DefElems, and then have ExecVacuum() iterate > over that list and make sense of it, as for example ExplainQuery() > already does. > Agreed. That change would help for the discussion changing VACUUM option syntax to field-and-value style. Attached the updated patch you proposed and the patch that converts the grammer productions for the VACUUM option on top of the former patch. The latter patch moves VacuumOption to vacuum.h since the parser no longer needs such information. If we take this direction I will change the parallel vacuum patch so that it adds new PARALLEL option and adds 'nworkers' to VacuumParams. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Tue, Feb 26, 2019 at 7:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Feb 26, 2019 at 1:35 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > > > > On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >> > >> Thank you. Attached the rebased patch. > > > > > > I ran some performance tests to compare the parallelism benefits, > > Thank you for testing! > > > but I got some strange results of performance overhead, may be it is > > because, I tested it on my laptop. > > Hmm, I think the parallel vacuum would help for heavy workloads like a > big table with multiple indexes. In your test result, all executions > are completed within 1 sec, which seems to be one use case that the > parallel vacuum wouldn't help. I suspect that the table is small, > right? Anyway I'll also do performance tests. > Here is the performance test results. I've setup a 500MB table with several indexes and made 10% of table dirty before each vacuum. Compared execution time of the patched postgrse with the current HEAD (at 'speed_up' column). In my environment, indexes | parallel_degree | patched | head | speed_up ---------+-----------------+------------+------------+---------- 0 | 0 | 238.2085 | 244.7625 | 1.0275 0 | 1 | 237.7050 | 244.7625 | 1.0297 0 | 2 | 238.0390 | 244.7625 | 1.0282 0 | 4 | 238.1045 | 244.7625 | 1.0280 0 | 8 | 237.8995 | 244.7625 | 1.0288 0 | 16 | 237.7775 | 244.7625 | 1.0294 1 | 0 | 1328.8590 | 1334.9125 | 1.0046 1 | 1 | 1325.9140 | 1334.9125 | 1.0068 1 | 2 | 1333.3665 | 1334.9125 | 1.0012 1 | 4 | 1329.5205 | 1334.9125 | 1.0041 1 | 8 | 1334.2255 | 1334.9125 | 1.0005 1 | 16 | 1335.1510 | 1334.9125 | 0.9998 2 | 0 | 2426.2905 | 2427.5165 | 1.0005 2 | 1 | 1416.0595 | 2427.5165 | 1.7143 2 | 2 | 1411.6270 | 2427.5165 | 1.7197 2 | 4 | 1411.6490 | 2427.5165 | 1.7196 2 | 8 | 1410.1750 | 2427.5165 | 1.7214 2 | 16 | 1413.4985 | 2427.5165 | 1.7174 4 | 0 | 4622.5060 | 4619.0340 | 0.9992 4 | 1 | 2536.8435 | 4619.0340 | 1.8208 4 | 2 | 2548.3615 | 4619.0340 | 1.8126 4 | 4 | 1467.9655 | 4619.0340 | 3.1466 4 | 8 | 1486.3155 | 4619.0340 | 3.1077 4 | 16 | 1481.7150 | 4619.0340 | 3.1174 8 | 0 | 9039.3810 | 8990.4735 | 0.9946 8 | 1 | 4807.5880 | 8990.4735 | 1.8701 8 | 2 | 3786.7620 | 8990.4735 | 2.3742 8 | 4 | 2924.2205 | 8990.4735 | 3.0745 8 | 8 | 2684.2545 | 8990.4735 | 3.3493 8 | 16 | 2672.9800 | 8990.4735 | 3.3635 16 | 0 | 17821.4715 | 17740.1300 | 0.9954 16 | 1 | 9318.3810 | 17740.1300 | 1.9038 16 | 2 | 7260.6315 | 17740.1300 | 2.4433 16 | 4 | 5538.5225 | 17740.1300 | 3.2030 16 | 8 | 5368.5255 | 17740.1300 | 3.3045 16 | 16 | 5291.8510 | 17740.1300 | 3.3523 (36 rows) Attached the updated version patches. The patches apply to the current HEAD cleanly but the 0001 patch still changes the vacuum option to a Node since it's under the discussion. After the direction has been decided, I'll update the patches. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Hello. At Mon, 18 Mar 2019 11:54:42 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoC6bsM0FfePgzSV40uXofbFSPe-Ax095TOnu5GOZ790uA@mail.gmail.com> > Here is the performance test results. I've setup a 500MB table with > several indexes and made 10% of table dirty before each vacuum. > Compared execution time of the patched postgrse with the current HEAD > (at 'speed_up' column). In my environment, > > indexes | parallel_degree | patched | head | speed_up > ---------+-----------------+------------+------------+---------- > 0 | 0 | 238.2085 | 244.7625 | 1.0275 > 0 | 1 | 237.7050 | 244.7625 | 1.0297 > 0 | 2 | 238.0390 | 244.7625 | 1.0282 > 0 | 4 | 238.1045 | 244.7625 | 1.0280 > 0 | 8 | 237.8995 | 244.7625 | 1.0288 > 0 | 16 | 237.7775 | 244.7625 | 1.0294 > 1 | 0 | 1328.8590 | 1334.9125 | 1.0046 > 1 | 1 | 1325.9140 | 1334.9125 | 1.0068 > 1 | 2 | 1333.3665 | 1334.9125 | 1.0012 > 1 | 4 | 1329.5205 | 1334.9125 | 1.0041 > 1 | 8 | 1334.2255 | 1334.9125 | 1.0005 > 1 | 16 | 1335.1510 | 1334.9125 | 0.9998 > 2 | 0 | 2426.2905 | 2427.5165 | 1.0005 > 2 | 1 | 1416.0595 | 2427.5165 | 1.7143 > 2 | 2 | 1411.6270 | 2427.5165 | 1.7197 > 2 | 4 | 1411.6490 | 2427.5165 | 1.7196 > 2 | 8 | 1410.1750 | 2427.5165 | 1.7214 > 2 | 16 | 1413.4985 | 2427.5165 | 1.7174 > 4 | 0 | 4622.5060 | 4619.0340 | 0.9992 > 4 | 1 | 2536.8435 | 4619.0340 | 1.8208 > 4 | 2 | 2548.3615 | 4619.0340 | 1.8126 > 4 | 4 | 1467.9655 | 4619.0340 | 3.1466 > 4 | 8 | 1486.3155 | 4619.0340 | 3.1077 > 4 | 16 | 1481.7150 | 4619.0340 | 3.1174 > 8 | 0 | 9039.3810 | 8990.4735 | 0.9946 > 8 | 1 | 4807.5880 | 8990.4735 | 1.8701 > 8 | 2 | 3786.7620 | 8990.4735 | 2.3742 > 8 | 4 | 2924.2205 | 8990.4735 | 3.0745 > 8 | 8 | 2684.2545 | 8990.4735 | 3.3493 > 8 | 16 | 2672.9800 | 8990.4735 | 3.3635 > 16 | 0 | 17821.4715 | 17740.1300 | 0.9954 > 16 | 1 | 9318.3810 | 17740.1300 | 1.9038 > 16 | 2 | 7260.6315 | 17740.1300 | 2.4433 > 16 | 4 | 5538.5225 | 17740.1300 | 3.2030 > 16 | 8 | 5368.5255 | 17740.1300 | 3.3045 > 16 | 16 | 5291.8510 | 17740.1300 | 3.3523 > (36 rows) For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave almost the same. I suspect that the indexes are too-small and all the index pages were on memory and CPU is saturated. Maybe you had four cores and parallel workers more than the number had no effect. Other normal backends should have been able do almost nothing meanwhile. Usually the number of parallel workers is determined so that IO capacity is filled up but this feature intermittently saturates CPU capacity very under such a situation. I'm not sure, but what if we do index vacuum in one-tuple-by-one manner? That is, heap vacuum passes dead tuple one-by-one (or buffering few tuples) to workers and workers process it not by bulkdelete, but just tuple_delete (we don't have one). That could avoid the sleep time of heap-scan while index bulkdelete. > Attached the updated version patches. The patches apply to the current > HEAD cleanly but the 0001 patch still changes the vacuum option to a > Node since it's under the discussion. After the direction has been > decided, I'll update the patches. As for the to-be-or-not-to-be a node problem, I don't think it is needed but from the point of consistency, it seems reasonable and it is seen in other nodes that *Stmt Node holds option Node. But makeVacOpt and it's usage, and subsequent operations on the node look somewhat strange.. Why don't you just do "makeNode(VacuumOptions)"? >+ /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */ >+ maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0); If I understand this correctly, nindexes is always > 1 there. At lesat asserted that > 0 there. >+ estdt = MAXALIGN(add_size(sizeof(LVDeadTuples), I don't think the name is good. (dt menant detach by the first look for me..) >+ if (lps->nworkers_requested > 0) >+ appendStringInfo(&buf, >+ ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d, requested %d)", "planned"? >+ /* Get the next index to vacuum */ >+ if (do_parallel) >+ idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1); >+ else >+ idx = nprocessed++; It seems that both of the two cases can be handled using LVParallelState and most of the branches by lps or do_parallel can be removed. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Mar 14, 2019 at 3:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > BTW your patch seems to not apply to the current HEAD cleanly and to > need to update the comment of vacuum(). Yeah, I omitted some hunks by being stupid with 'git'. Since you seem to like the approach, I put back the hunks I intended to have there, pulled in one change from your v2 that looked good, made one other tweak, and committed this. I think I like what I did with vacuum_open_relation a bit better than what you did; actually, I think it cannot be right to just pass 'params' when the current code is passing params->options & ~(VACOPT_VACUUM). My approach avoids that particular pitfall. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Mar 14, 2019 at 3:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Attached the updated patch you proposed and the patch that converts > the grammer productions for the VACUUM option on top of the former > patch. The latter patch moves VacuumOption to vacuum.h since the > parser no longer needs such information. Committed. > If we take this direction I will change the parallel vacuum patch so > that it adds new PARALLEL option and adds 'nworkers' to VacuumParams. Sounds good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Mar 19, 2019 at 3:05 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 14, 2019 at 3:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > BTW your patch seems to not apply to the current HEAD cleanly and to > > need to update the comment of vacuum(). > > Yeah, I omitted some hunks by being stupid with 'git'. > > Since you seem to like the approach, I put back the hunks I intended > to have there, pulled in one change from your v2 that looked good, > made one other tweak, and committed this. Thank you! > I think I like what I did > with vacuum_open_relation a bit better than what you did; actually, I > think it cannot be right to just pass 'params' when the current code > is passing params->options & ~(VACOPT_VACUUM). My approach avoids > that particular pitfall. Agreed. Thanks. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Mar 18, 2019 at 1:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Feb 26, 2019 at 7:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Feb 26, 2019 at 1:35 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> >
> > On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >>
> >> Thank you. Attached the rebased patch.
> >
> >
> > I ran some performance tests to compare the parallelism benefits,
>
> Thank you for testing!
>
> > but I got some strange results of performance overhead, may be it is
> > because, I tested it on my laptop.
>
> Hmm, I think the parallel vacuum would help for heavy workloads like a
> big table with multiple indexes. In your test result, all executions
> are completed within 1 sec, which seems to be one use case that the
> parallel vacuum wouldn't help. I suspect that the table is small,
> right? Anyway I'll also do performance tests.
>
Here is the performance test results. I've setup a 500MB table with
several indexes and made 10% of table dirty before each vacuum.
Compared execution time of the patched postgrse with the current HEAD
(at 'speed_up' column). In my environment,
indexes | parallel_degree | patched | head | speed_up
---------+-----------------+------------+------------+----------
0 | 0 | 238.2085 | 244.7625 | 1.0275
0 | 1 | 237.7050 | 244.7625 | 1.0297
0 | 2 | 238.0390 | 244.7625 | 1.0282
0 | 4 | 238.1045 | 244.7625 | 1.0280
0 | 8 | 237.8995 | 244.7625 | 1.0288
0 | 16 | 237.7775 | 244.7625 | 1.0294
1 | 0 | 1328.8590 | 1334.9125 | 1.0046
1 | 1 | 1325.9140 | 1334.9125 | 1.0068
1 | 2 | 1333.3665 | 1334.9125 | 1.0012
1 | 4 | 1329.5205 | 1334.9125 | 1.0041
1 | 8 | 1334.2255 | 1334.9125 | 1.0005
1 | 16 | 1335.1510 | 1334.9125 | 0.9998
2 | 0 | 2426.2905 | 2427.5165 | 1.0005
2 | 1 | 1416.0595 | 2427.5165 | 1.7143
2 | 2 | 1411.6270 | 2427.5165 | 1.7197
2 | 4 | 1411.6490 | 2427.5165 | 1.7196
2 | 8 | 1410.1750 | 2427.5165 | 1.7214
2 | 16 | 1413.4985 | 2427.5165 | 1.7174
4 | 0 | 4622.5060 | 4619.0340 | 0.9992
4 | 1 | 2536.8435 | 4619.0340 | 1.8208
4 | 2 | 2548.3615 | 4619.0340 | 1.8126
4 | 4 | 1467.9655 | 4619.0340 | 3.1466
4 | 8 | 1486.3155 | 4619.0340 | 3.1077
4 | 16 | 1481.7150 | 4619.0340 | 3.1174
8 | 0 | 9039.3810 | 8990.4735 | 0.9946
8 | 1 | 4807.5880 | 8990.4735 | 1.8701
8 | 2 | 3786.7620 | 8990.4735 | 2.3742
8 | 4 | 2924.2205 | 8990.4735 | 3.0745
8 | 8 | 2684.2545 | 8990.4735 | 3.3493
8 | 16 | 2672.9800 | 8990.4735 | 3.3635
16 | 0 | 17821.4715 | 17740.1300 | 0.9954
16 | 1 | 9318.3810 | 17740.1300 | 1.9038
16 | 2 | 7260.6315 | 17740.1300 | 2.4433
16 | 4 | 5538.5225 | 17740.1300 | 3.2030
16 | 8 | 5368.5255 | 17740.1300 | 3.3045
16 | 16 | 5291.8510 | 17740.1300 | 3.3523
(36 rows)
The performance results are good. Do we want to add the recommended
size in the document for the parallel option? the parallel option for smaller
tables can lead to performance overhead.
Regards,
Haribabu Kommi
Fujitsu Australia
On Mon, Mar 18, 2019 at 7:06 PM Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > Hello. > > At Mon, 18 Mar 2019 11:54:42 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoC6bsM0FfePgzSV40uXofbFSPe-Ax095TOnu5GOZ790uA@mail.gmail.com> > > Here is the performance test results. I've setup a 500MB table with > > several indexes and made 10% of table dirty before each vacuum. > > Compared execution time of the patched postgrse with the current HEAD > > (at 'speed_up' column). In my environment, > > > > indexes | parallel_degree | patched | head | speed_up > > ---------+-----------------+------------+------------+---------- > > 0 | 0 | 238.2085 | 244.7625 | 1.0275 > > 0 | 1 | 237.7050 | 244.7625 | 1.0297 > > 0 | 2 | 238.0390 | 244.7625 | 1.0282 > > 0 | 4 | 238.1045 | 244.7625 | 1.0280 > > 0 | 8 | 237.8995 | 244.7625 | 1.0288 > > 0 | 16 | 237.7775 | 244.7625 | 1.0294 > > 1 | 0 | 1328.8590 | 1334.9125 | 1.0046 > > 1 | 1 | 1325.9140 | 1334.9125 | 1.0068 > > 1 | 2 | 1333.3665 | 1334.9125 | 1.0012 > > 1 | 4 | 1329.5205 | 1334.9125 | 1.0041 > > 1 | 8 | 1334.2255 | 1334.9125 | 1.0005 > > 1 | 16 | 1335.1510 | 1334.9125 | 0.9998 > > 2 | 0 | 2426.2905 | 2427.5165 | 1.0005 > > 2 | 1 | 1416.0595 | 2427.5165 | 1.7143 > > 2 | 2 | 1411.6270 | 2427.5165 | 1.7197 > > 2 | 4 | 1411.6490 | 2427.5165 | 1.7196 > > 2 | 8 | 1410.1750 | 2427.5165 | 1.7214 > > 2 | 16 | 1413.4985 | 2427.5165 | 1.7174 > > 4 | 0 | 4622.5060 | 4619.0340 | 0.9992 > > 4 | 1 | 2536.8435 | 4619.0340 | 1.8208 > > 4 | 2 | 2548.3615 | 4619.0340 | 1.8126 > > 4 | 4 | 1467.9655 | 4619.0340 | 3.1466 > > 4 | 8 | 1486.3155 | 4619.0340 | 3.1077 > > 4 | 16 | 1481.7150 | 4619.0340 | 3.1174 > > 8 | 0 | 9039.3810 | 8990.4735 | 0.9946 > > 8 | 1 | 4807.5880 | 8990.4735 | 1.8701 > > 8 | 2 | 3786.7620 | 8990.4735 | 2.3742 > > 8 | 4 | 2924.2205 | 8990.4735 | 3.0745 > > 8 | 8 | 2684.2545 | 8990.4735 | 3.3493 > > 8 | 16 | 2672.9800 | 8990.4735 | 3.3635 > > 16 | 0 | 17821.4715 | 17740.1300 | 0.9954 > > 16 | 1 | 9318.3810 | 17740.1300 | 1.9038 > > 16 | 2 | 7260.6315 | 17740.1300 | 2.4433 > > 16 | 4 | 5538.5225 | 17740.1300 | 3.2030 > > 16 | 8 | 5368.5255 | 17740.1300 | 3.3045 > > 16 | 16 | 5291.8510 | 17740.1300 | 3.3523 > > (36 rows) > > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave > almost the same. I suspect that the indexes are too-small and all > the index pages were on memory and CPU is saturated. Maybe you > had four cores and parallel workers more than the number had no > effect. Other normal backends should have been able do almost > nothing meanwhile. Usually the number of parallel workers is > determined so that IO capacity is filled up but this feature > intermittently saturates CPU capacity very under such a > situation. > I'm sorry I didn't make it clear enough. If the parallel degree is higher than 'the number of indexes - 1' redundant workers are not launched. So for indexes=4, 8, 16 the number of actually launched parallel workers is up to 3, 7, 15 respectively. That's why the result shows almost the same execution time in the cases where nindexes <= parallel_degree. I'll share the performance test result of more larger tables and indexes. > I'm not sure, but what if we do index vacuum in one-tuple-by-one > manner? That is, heap vacuum passes dead tuple one-by-one (or > buffering few tuples) to workers and workers process it not by > bulkdelete, but just tuple_delete (we don't have one). That could > avoid the sleep time of heap-scan while index bulkdelete. > Just to be clear, in parallel lazy vacuum all parallel vacuum processes including the leader process do index vacuuming, no one doesn't sleep during index vacuuming. The leader process does heap scan and launches parallel workers before index vacuuming. Each processes exclusively processes indexes one by one. Such index deletion method could be an optimization but I'm not sure that the calling tuple_delete many times would be faster than one bulkdelete. If there are many dead tuples vacuum has to call tuple_delete as much as dead tuples. In general one seqscan is faster than tons of indexscan. There is the proposal for such one by one index deletions[1] but it's not a replacement of bulkdelete. > > > Attached the updated version patches. The patches apply to the current > > HEAD cleanly but the 0001 patch still changes the vacuum option to a > > Node since it's under the discussion. After the direction has been > > decided, I'll update the patches. > > As for the to-be-or-not-to-be a node problem, I don't think it is > needed but from the point of consistency, it seems reasonable and > it is seen in other nodes that *Stmt Node holds option Node. But > makeVacOpt and it's usage, and subsequent operations on the node > look somewhat strange.. Why don't you just do > "makeNode(VacuumOptions)"? Thank you for the comment but this part has gone away as the recent commit changed the grammar production of vacuum command. > > > >+ /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */ > >+ maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0); > > If I understand this correctly, nindexes is always > 1 there. At > lesat asserted that > 0 there. > > >+ estdt = MAXALIGN(add_size(sizeof(LVDeadTuples), > > I don't think the name is good. (dt menant detach by the first look for me..) Fixed. > > >+ if (lps->nworkers_requested > 0) > >+ appendStringInfo(&buf, > >+ ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d, requested %d)", > > "planned"? The 'planned' shows how many parallel workers we planned to launch. The degree of parallelism is determined based on either user request or the number of indexes that the table has. > > > >+ /* Get the next index to vacuum */ > >+ if (do_parallel) > >+ idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1); > >+ else > >+ idx = nprocessed++; > > It seems that both of the two cases can be handled using > LVParallelState and most of the branches by lps or do_parallel > can be removed. > Sorry I couldn't get your comment. You meant to move nprocessed to LVParallelState? [1] https://www.postgresql.org/message-id/flat/425db134-8bba-005c-b59d-56e50de3b41e%40postgrespro.ru Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoD4ivrYqg5tau460zEEcgR0t9cV-UagjJ997OfvP3gsNQ@mail.gmail.com> > > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave > > almost the same. I suspect that the indexes are too-small and all > > the index pages were on memory and CPU is saturated. Maybe you > > had four cores and parallel workers more than the number had no > > effect. Other normal backends should have been able do almost > > nothing meanwhile. Usually the number of parallel workers is > > determined so that IO capacity is filled up but this feature > > intermittently saturates CPU capacity very under such a > > situation. > > > > I'm sorry I didn't make it clear enough. If the parallel degree is > higher than 'the number of indexes - 1' redundant workers are not > launched. So for indexes=4, 8, 16 the number of actually launched > parallel workers is up to 3, 7, 15 respectively. That's why the result > shows almost the same execution time in the cases where nindexes <= > parallel_degree. In the 16 indexes case, the performance saturated at 4 workers which contradicts to your explanation. > I'll share the performance test result of more larger tables and indexes. > > > I'm not sure, but what if we do index vacuum in one-tuple-by-one > > manner? That is, heap vacuum passes dead tuple one-by-one (or > > buffering few tuples) to workers and workers process it not by > > bulkdelete, but just tuple_delete (we don't have one). That could > > avoid the sleep time of heap-scan while index bulkdelete. > > > > Just to be clear, in parallel lazy vacuum all parallel vacuum > processes including the leader process do index vacuuming, no one > doesn't sleep during index vacuuming. The leader process does heap > scan and launches parallel workers before index vacuuming. Each > processes exclusively processes indexes one by one. The leader doesn't continue heap-scan while index vacuuming is running. And the index-page-scan seems eat up CPU easily. If index vacuum can run simultaneously with the next heap scan phase, we can make index scan finishes almost the same time with the next round of heap scan. It would reduce the (possible) CPU contention. But this requires as the twice size of shared memoryas the current implement. > Such index deletion method could be an optimization but I'm not sure > that the calling tuple_delete many times would be faster than one > bulkdelete. If there are many dead tuples vacuum has to call > tuple_delete as much as dead tuples. In general one seqscan is faster > than tons of indexscan. There is the proposal for such one by one > index deletions[1] but it's not a replacement of bulkdelete. I'm not sure what you mean by 'replacement' but it depends on how large part of a table is removed at once. As mentioned in the thread. But unfortunately it doesn't seem easy to do.. > > > Attached the updated version patches. The patches apply to the current > > > HEAD cleanly but the 0001 patch still changes the vacuum option to a > > > Node since it's under the discussion. After the direction has been > > > decided, I'll update the patches. > > > > As for the to-be-or-not-to-be a node problem, I don't think it is > > needed but from the point of consistency, it seems reasonable and > > it is seen in other nodes that *Stmt Node holds option Node. But > > makeVacOpt and it's usage, and subsequent operations on the node > > look somewhat strange.. Why don't you just do > > "makeNode(VacuumOptions)"? > > Thank you for the comment but this part has gone away as the recent > commit changed the grammar production of vacuum command. Oops! > > >+ /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */ > > >+ maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0); > > > > If I understand this correctly, nindexes is always > 1 there. At > > lesat asserted that > 0 there. > > > > >+ estdt = MAXALIGN(add_size(sizeof(LVDeadTuples), > > > > I don't think the name is good. (dt menant detach by the first look for me..) > > Fixed. > > > > > >+ if (lps->nworkers_requested > 0) > > >+ appendStringInfo(&buf, > > >+ ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d, requested%d)", > > > > "planned"? > > The 'planned' shows how many parallel workers we planned to launch. > The degree of parallelism is determined based on either user request > or the number of indexes that the table has. > > > > > > > >+ /* Get the next index to vacuum */ > > >+ if (do_parallel) > > >+ idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1); > > >+ else > > >+ idx = nprocessed++; > > > > It seems that both of the two cases can be handled using > > LVParallelState and most of the branches by lps or do_parallel > > can be removed. > > > > Sorry I couldn't get your comment. You meant to move nprocessed to > LVParallelState? Exactly. I meant letting lvshared points to private memory, but it might introduce confusion. > [1] https://www.postgresql.org/message-id/flat/425db134-8bba-005c-b59d-56e50de3b41e%40postgrespro.ru regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Mar 19, 2019 at 10:39 AM Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > > > On Mon, Mar 18, 2019 at 1:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Tue, Feb 26, 2019 at 7:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> > >> > On Tue, Feb 26, 2019 at 1:35 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote: >> > > >> > > On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> > >> >> > >> Thank you. Attached the rebased patch. >> > > >> > > >> > > I ran some performance tests to compare the parallelism benefits, >> > >> > Thank you for testing! >> > >> > > but I got some strange results of performance overhead, may be it is >> > > because, I tested it on my laptop. >> > >> > Hmm, I think the parallel vacuum would help for heavy workloads like a >> > big table with multiple indexes. In your test result, all executions >> > are completed within 1 sec, which seems to be one use case that the >> > parallel vacuum wouldn't help. I suspect that the table is small, >> > right? Anyway I'll also do performance tests. >> > >> >> Here is the performance test results. I've setup a 500MB table with >> several indexes and made 10% of table dirty before each vacuum. >> Compared execution time of the patched postgrse with the current HEAD >> (at 'speed_up' column). In my environment, >> >> indexes | parallel_degree | patched | head | speed_up >> ---------+-----------------+------------+------------+---------- >> 0 | 0 | 238.2085 | 244.7625 | 1.0275 >> 0 | 1 | 237.7050 | 244.7625 | 1.0297 >> 0 | 2 | 238.0390 | 244.7625 | 1.0282 >> 0 | 4 | 238.1045 | 244.7625 | 1.0280 >> 0 | 8 | 237.8995 | 244.7625 | 1.0288 >> 0 | 16 | 237.7775 | 244.7625 | 1.0294 >> 1 | 0 | 1328.8590 | 1334.9125 | 1.0046 >> 1 | 1 | 1325.9140 | 1334.9125 | 1.0068 >> 1 | 2 | 1333.3665 | 1334.9125 | 1.0012 >> 1 | 4 | 1329.5205 | 1334.9125 | 1.0041 >> 1 | 8 | 1334.2255 | 1334.9125 | 1.0005 >> 1 | 16 | 1335.1510 | 1334.9125 | 0.9998 >> 2 | 0 | 2426.2905 | 2427.5165 | 1.0005 >> 2 | 1 | 1416.0595 | 2427.5165 | 1.7143 >> 2 | 2 | 1411.6270 | 2427.5165 | 1.7197 >> 2 | 4 | 1411.6490 | 2427.5165 | 1.7196 >> 2 | 8 | 1410.1750 | 2427.5165 | 1.7214 >> 2 | 16 | 1413.4985 | 2427.5165 | 1.7174 >> 4 | 0 | 4622.5060 | 4619.0340 | 0.9992 >> 4 | 1 | 2536.8435 | 4619.0340 | 1.8208 >> 4 | 2 | 2548.3615 | 4619.0340 | 1.8126 >> 4 | 4 | 1467.9655 | 4619.0340 | 3.1466 >> 4 | 8 | 1486.3155 | 4619.0340 | 3.1077 >> 4 | 16 | 1481.7150 | 4619.0340 | 3.1174 >> 8 | 0 | 9039.3810 | 8990.4735 | 0.9946 >> 8 | 1 | 4807.5880 | 8990.4735 | 1.8701 >> 8 | 2 | 3786.7620 | 8990.4735 | 2.3742 >> 8 | 4 | 2924.2205 | 8990.4735 | 3.0745 >> 8 | 8 | 2684.2545 | 8990.4735 | 3.3493 >> 8 | 16 | 2672.9800 | 8990.4735 | 3.3635 >> 16 | 0 | 17821.4715 | 17740.1300 | 0.9954 >> 16 | 1 | 9318.3810 | 17740.1300 | 1.9038 >> 16 | 2 | 7260.6315 | 17740.1300 | 2.4433 >> 16 | 4 | 5538.5225 | 17740.1300 | 3.2030 >> 16 | 8 | 5368.5255 | 17740.1300 | 3.3045 >> 16 | 16 | 5291.8510 | 17740.1300 | 3.3523 >> (36 rows) > > > The performance results are good. Do we want to add the recommended > size in the document for the parallel option? the parallel option for smaller > tables can lead to performance overhead. > Hmm, I don't think we can add the specific recommended size because the performance gain by parallel lazy vacuum depends on various things such as CPU cores, the number of indexes, shared buffer size, index types, HDD or SSD. I suppose that users who want to use this option have some sort of performance problem such as that vacuum takes a very long time. They would use it for relatively larger tables. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Mar 19, 2019 at 4:59 PM Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoD4ivrYqg5tau460zEEcgR0t9cV-UagjJ997OfvP3gsNQ@mail.gmail.com> > > > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave > > > almost the same. I suspect that the indexes are too-small and all > > > the index pages were on memory and CPU is saturated. Maybe you > > > had four cores and parallel workers more than the number had no > > > effect. Other normal backends should have been able do almost > > > nothing meanwhile. Usually the number of parallel workers is > > > determined so that IO capacity is filled up but this feature > > > intermittently saturates CPU capacity very under such a > > > situation. > > > > > > > I'm sorry I didn't make it clear enough. If the parallel degree is > > higher than 'the number of indexes - 1' redundant workers are not > > launched. So for indexes=4, 8, 16 the number of actually launched > > parallel workers is up to 3, 7, 15 respectively. That's why the result > > shows almost the same execution time in the cases where nindexes <= > > parallel_degree. > > In the 16 indexes case, the performance saturated at 4 workers > which contradicts to your explanation. Because the machine I used has 4 cores the performance doesn't get improved even if more than 4 parallel workers are launched. > > > I'll share the performance test result of more larger tables and indexes. > > > > > I'm not sure, but what if we do index vacuum in one-tuple-by-one > > > manner? That is, heap vacuum passes dead tuple one-by-one (or > > > buffering few tuples) to workers and workers process it not by > > > bulkdelete, but just tuple_delete (we don't have one). That could > > > avoid the sleep time of heap-scan while index bulkdelete. > > > > > > > Just to be clear, in parallel lazy vacuum all parallel vacuum > > processes including the leader process do index vacuuming, no one > > doesn't sleep during index vacuuming. The leader process does heap > > scan and launches parallel workers before index vacuuming. Each > > processes exclusively processes indexes one by one. > > The leader doesn't continue heap-scan while index vacuuming is > running. And the index-page-scan seems eat up CPU easily. If > index vacuum can run simultaneously with the next heap scan > phase, we can make index scan finishes almost the same time with > the next round of heap scan. It would reduce the (possible) CPU > contention. But this requires as the twice size of shared > memoryas the current implement. Yeah, I've considered that something like pipe-lining approach that one process continue to queue the dead tuples and other process fetches and processes them during index vacuuming but the current version patch employed the most simple approach as the first step. Once we had the retail index deletion approach we might be able to use it for parallel vacuum. > > > Such index deletion method could be an optimization but I'm not sure > > that the calling tuple_delete many times would be faster than one > > bulkdelete. If there are many dead tuples vacuum has to call > > tuple_delete as much as dead tuples. In general one seqscan is faster > > than tons of indexscan. There is the proposal for such one by one > > index deletions[1] but it's not a replacement of bulkdelete. > > I'm not sure what you mean by 'replacement' but it depends on how > large part of a table is removed at once. As mentioned in the > thread. But unfortunately it doesn't seem easy to do.. > > > > > Attached the updated version patches. The patches apply to the current > > > > HEAD cleanly but the 0001 patch still changes the vacuum option to a > > > > Node since it's under the discussion. After the direction has been > > > > decided, I'll update the patches. > > > > > > As for the to-be-or-not-to-be a node problem, I don't think it is > > > needed but from the point of consistency, it seems reasonable and > > > it is seen in other nodes that *Stmt Node holds option Node. But > > > makeVacOpt and it's usage, and subsequent operations on the node > > > look somewhat strange.. Why don't you just do > > > "makeNode(VacuumOptions)"? > > > > Thank you for the comment but this part has gone away as the recent > > commit changed the grammar production of vacuum command. > > Oops! > > > > > >+ /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */ > > > >+ maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0); > > > > > > If I understand this correctly, nindexes is always > 1 there. At > > > lesat asserted that > 0 there. > > > > > > >+ estdt = MAXALIGN(add_size(sizeof(LVDeadTuples), > > > > > > I don't think the name is good. (dt menant detach by the first look for me..) > > > > Fixed. > > > > > > > > >+ if (lps->nworkers_requested > 0) > > > >+ appendStringInfo(&buf, > > > >+ ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d, requested%d)", > > > > > > "planned"? > > > > The 'planned' shows how many parallel workers we planned to launch. > > The degree of parallelism is determined based on either user request > > or the number of indexes that the table has. > > > > > > > > > > > >+ /* Get the next index to vacuum */ > > > >+ if (do_parallel) > > > >+ idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1); > > > >+ else > > > >+ idx = nprocessed++; > > > > > > It seems that both of the two cases can be handled using > > > LVParallelState and most of the branches by lps or do_parallel > > > can be removed. > > > > > > > Sorry I couldn't get your comment. You meant to move nprocessed to > > LVParallelState? > > Exactly. I meant letting lvshared points to private memory, but > it might introduce confusion. Hmm, I'm not sure it would be a good idea. It would introduce confusion as you mentioned. And since 'nprocessed' have to be pg_atomic_uint32 in parallel mode we will end up with having an another branch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
At Tue, 19 Mar 2019 19:01:06 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoA3PpkcNNzcQmiNgFL3DudhdLRWoTvQE6=kRagFLjUiBg@mail.gmail.com> > On Tue, Mar 19, 2019 at 4:59 PM Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > > > At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoD4ivrYqg5tau460zEEcgR0t9cV-UagjJ997OfvP3gsNQ@mail.gmail.com> > > > > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave > > > > almost the same. I suspect that the indexes are too-small and all > > > > the index pages were on memory and CPU is saturated. Maybe you > > > > had four cores and parallel workers more than the number had no > > > > effect. Other normal backends should have been able do almost > > > > nothing meanwhile. Usually the number of parallel workers is > > > > determined so that IO capacity is filled up but this feature > > > > intermittently saturates CPU capacity very under such a > > > > situation. > > > > > > > > > > I'm sorry I didn't make it clear enough. If the parallel degree is > > > higher than 'the number of indexes - 1' redundant workers are not > > > launched. So for indexes=4, 8, 16 the number of actually launched > > > parallel workers is up to 3, 7, 15 respectively. That's why the result > > > shows almost the same execution time in the cases where nindexes <= > > > parallel_degree. > > > > In the 16 indexes case, the performance saturated at 4 workers > > which contradicts to your explanation. > > Because the machine I used has 4 cores the performance doesn't get > improved even if more than 4 parallel workers are launched. That is what I mentioned in the cited phrases. Sorry for perhaps hard-to-read phrases.. > > > > > I'll share the performance test result of more larger tables and indexes. > > > > > > > I'm not sure, but what if we do index vacuum in one-tuple-by-one > > > > manner? That is, heap vacuum passes dead tuple one-by-one (or > > > > buffering few tuples) to workers and workers process it not by > > > > bulkdelete, but just tuple_delete (we don't have one). That could > > > > avoid the sleep time of heap-scan while index bulkdelete. > > > > > > > > > > Just to be clear, in parallel lazy vacuum all parallel vacuum > > > processes including the leader process do index vacuuming, no one > > > doesn't sleep during index vacuuming. The leader process does heap > > > scan and launches parallel workers before index vacuuming. Each > > > processes exclusively processes indexes one by one. > > > > The leader doesn't continue heap-scan while index vacuuming is > > running. And the index-page-scan seems eat up CPU easily. If > > index vacuum can run simultaneously with the next heap scan > > phase, we can make index scan finishes almost the same time with > > the next round of heap scan. It would reduce the (possible) CPU > > contention. But this requires as the twice size of shared > > memoryas the current implement. > > Yeah, I've considered that something like pipe-lining approach that > one process continue to queue the dead tuples and other process > fetches and processes them during index vacuuming but the current > version patch employed the most simple approach as the first step. > Once we had the retail index deletion approach we might be able to use > it for parallel vacuum. Ok, I understood the direction. ... > > > Sorry I couldn't get your comment. You meant to move nprocessed to > > > LVParallelState? > > > > Exactly. I meant letting lvshared points to private memory, but > > it might introduce confusion. > > Hmm, I'm not sure it would be a good idea. It would introduce > confusion as you mentioned. And since 'nprocessed' have to be > pg_atomic_uint32 in parallel mode we will end up with having an > another branch. Ok. Agreed. Thank you for the pacience. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 19 Mar 2019 17:51:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCUZQmyXrwDw57ejoR-j1QrGqm_vrQKOkif_aJK4Gih6Q@mail.gmail.com> > On Tue, Mar 19, 2019 at 10:39 AM Haribabu Kommi > <kommi.haribabu@gmail.com> wrote: > > The performance results are good. Do we want to add the recommended > > size in the document for the parallel option? the parallel option for smaller > > tables can lead to performance overhead. > > > > Hmm, I don't think we can add the specific recommended size because > the performance gain by parallel lazy vacuum depends on various things > such as CPU cores, the number of indexes, shared buffer size, index > types, HDD or SSD. I suppose that users who want to use this option > have some sort of performance problem such as that vacuum takes a very > long time. They would use it for relatively larger tables. Agree that we have no recommended setting, but I strongly think that documentation on the downside or possible side effectof this feature is required for those who are to use the feature. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Mar 19, 2019 at 7:15 PM Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > At Tue, 19 Mar 2019 19:01:06 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoA3PpkcNNzcQmiNgFL3DudhdLRWoTvQE6=kRagFLjUiBg@mail.gmail.com> > > On Tue, Mar 19, 2019 at 4:59 PM Kyotaro HORIGUCHI > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > > > > > At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoD4ivrYqg5tau460zEEcgR0t9cV-UagjJ997OfvP3gsNQ@mail.gmail.com> > > > > > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave > > > > > almost the same. I suspect that the indexes are too-small and all > > > > > the index pages were on memory and CPU is saturated. Maybe you > > > > > had four cores and parallel workers more than the number had no > > > > > effect. Other normal backends should have been able do almost > > > > > nothing meanwhile. Usually the number of parallel workers is > > > > > determined so that IO capacity is filled up but this feature > > > > > intermittently saturates CPU capacity very under such a > > > > > situation. > > > > > > > > > > > > > I'm sorry I didn't make it clear enough. If the parallel degree is > > > > higher than 'the number of indexes - 1' redundant workers are not > > > > launched. So for indexes=4, 8, 16 the number of actually launched > > > > parallel workers is up to 3, 7, 15 respectively. That's why the result > > > > shows almost the same execution time in the cases where nindexes <= > > > > parallel_degree. > > > > > > In the 16 indexes case, the performance saturated at 4 workers > > > which contradicts to your explanation. > > > > Because the machine I used has 4 cores the performance doesn't get > > improved even if more than 4 parallel workers are launched. > > That is what I mentioned in the cited phrases. Sorry for perhaps > hard-to-read phrases.. I understood now. Thank you! Attached the updated version patches incorporated all review comments. Commit 6776142 changed the grammar production of vacuum command. This patch adds PARALLEL option on top of the commit. I realized that the commit 6776142 breaks indents in ExecVacuum() and the including nodes/parsenodes.h is no longer needed. Sorry that's my wrong. Attached the patch (vacuum_fix.patch) fixes them, although the indent issue will be resolved by pgindent before releasing. In parsing vacuum command, since only PARALLEL option can have an argument I've added the check in ExecVacuum to erroring out when other options have an argument. But it might be good to make other vacuum options (perhaps except for DISABLE_PAGE_SKIPPING option) accept an argument just like EXPLAIN command. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Tue, Mar 19, 2019 at 7:29 PM Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > At Tue, 19 Mar 2019 17:51:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoCUZQmyXrwDw57ejoR-j1QrGqm_vrQKOkif_aJK4Gih6Q@mail.gmail.com> > > On Tue, Mar 19, 2019 at 10:39 AM Haribabu Kommi > > <kommi.haribabu@gmail.com> wrote: > > > The performance results are good. Do we want to add the recommended > > > size in the document for the parallel option? the parallel option for smaller > > > tables can lead to performance overhead. > > > > > > > Hmm, I don't think we can add the specific recommended size because > > the performance gain by parallel lazy vacuum depends on various things > > such as CPU cores, the number of indexes, shared buffer size, index > > types, HDD or SSD. I suppose that users who want to use this option > > have some sort of performance problem such as that vacuum takes a very > > long time. They would use it for relatively larger tables. > > Agree that we have no recommended setting, but I strongly think that documentation on the downside or possible side effectof this feature is required for those who are to use the feature. > I think that the side effect of parallel lazy vacuum would be to consume more CPUs and I/O bandwidth, but which is also true for the other utility command (i.e. parallel create index). The description of max_parallel_maintenance_worker documents such things[1]. Anything else to document? [1] https://www.postgresql.org/docs/devel/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hello > * in_parallel is true if we're performing parallel lazy vacuum. Since any > * updates are not allowed during parallel mode we don't update statistics > * but set the index bulk-deletion result to *stats. Otherwise we update it > * and set NULL. lazy_cleanup_index has in_parallel argument only for this purpose, but caller still should check in_parallel after lazy_cleanup_indexcall and do something else with stats for parallel execution. Would be better always return stats and update statistics in caller? It's possible to update all index stats in lazy_vacuum_all_indexesfor example? This routine is always parallel leader and has comment /* Do post-vacuum cleanup andstatistics update for each index */ on for_cleanup=true call. I think we need note in documentation that parallel leader is not counted in PARALLEL N option, so with PARALLEL 2 optionwe want use 3 processes. Or even change behavior? Default with PARALLEL 1 - only current backend in single processis running, PARALLEL 2 - leader + one parallel worker, two processes works in parallel. regards, Sergei
On Tue, Mar 19, 2019 at 3:59 AM Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > The leader doesn't continue heap-scan while index vacuuming is > running. And the index-page-scan seems eat up CPU easily. If > index vacuum can run simultaneously with the next heap scan > phase, we can make index scan finishes almost the same time with > the next round of heap scan. It would reduce the (possible) CPU > contention. But this requires as the twice size of shared > memoryas the current implement. I think you're approaching this from the wrong point of view. If we have a certain amount of memory available, is it better to (a) fill the entire thing with dead tuples once, or (b) better to fill half of it with dead tuples, start index vacuuming, and then fill the other half of it with dead tuples for the next index-vacuum cycle while the current one is running? I think the answer is that (a) is clearly better, because it results in half as many index vacuum cycles. We can't really ask the user how much memory it's OK to use and then use twice as much. But if we could, what you're proposing here is probably still not the right way to use it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Mar 19, 2019 at 7:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > In parsing vacuum command, since only PARALLEL option can have an > argument I've added the check in ExecVacuum to erroring out when other > options have an argument. But it might be good to make other vacuum > options (perhaps except for DISABLE_PAGE_SKIPPING option) accept an > argument just like EXPLAIN command. I think all of the existing options, including DISABLE_PAGE_SKIPPING, should permit an argument that is passed to defGetBoolean(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 22, 2019 at 4:53 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Mar 19, 2019 at 7:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > In parsing vacuum command, since only PARALLEL option can have an > > argument I've added the check in ExecVacuum to erroring out when other > > options have an argument. But it might be good to make other vacuum > > options (perhaps except for DISABLE_PAGE_SKIPPING option) accept an > > argument just like EXPLAIN command. > > I think all of the existing options, including DISABLE_PAGE_SKIPPING, > should permit an argument that is passed to defGetBoolean(). > Agreed. The attached 0001 patch changes so. On Thu, Mar 21, 2019 at 8:05 PM Sergei Kornilov <sk@zsrv.org> wrote: > > Hello > Thank you for reviewing the patch! > > * in_parallel is true if we're performing parallel lazy vacuum. Since any > > * updates are not allowed during parallel mode we don't update statistics > > * but set the index bulk-deletion result to *stats. Otherwise we update it > > * and set NULL. > > lazy_cleanup_index has in_parallel argument only for this purpose, but caller still should check in_parallel after lazy_cleanup_indexcall and do something else with stats for parallel execution. > Would be better always return stats and update statistics in caller? It's possible to update all index stats in lazy_vacuum_all_indexesfor example? This routine is always parallel leader and has comment /* Do post-vacuum cleanup andstatistics update for each index */ on for_cleanup=true call. Agreed. I've changed the patch so that we update index statistics in lazy_vacuum_all_indexes(). > > I think we need note in documentation that parallel leader is not counted in PARALLEL N option, so with PARALLEL 2 optionwe want use 3 processes. Or even change behavior? Default with PARALLEL 1 - only current backend in single processis running, PARALLEL 2 - leader + one parallel worker, two processes works in parallel. > Hmm, the documentation says "Perform vacuum index and cleanup index phases of VACUUM in parallel using N background workers". Doesn't it already explain that? Attached the updated version patch. 0001 patch allows all existing vacuum options an boolean argument. 0002 patch introduces parallel lazy vacuum. 0003 patch adds -P (--parallel) option to vacuumdb command. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Fri, Mar 22, 2019 at 4:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Attached the updated version patch. 0001 patch allows all existing
vacuum options an boolean argument. 0002 patch introduces parallel
lazy vacuum. 0003 patch adds -P (--parallel) option to vacuumdb
command.
0001 patch:
+ PARALLEL [ <replaceable class="parameter">N</replaceable> ]
But this patch contains syntax of PARALLEL but no explanation, I saw that
it is explained in 0002. It is not a problem, but just mentioning.
+ Specifies parallel degree for <literal>PARALLEL</literal> option. The
+ value must be at least 1. If the parallel degree
+ <replaceable class="parameter">integer</replaceable> is omitted, then
+ <command>VACUUM</command> decides the number of workers based on number of
+ indexes on the relation which further limited by
+ <xref linkend="guc-max-parallel-workers-maintenance"/>.
Can we add some more details about backend participation also, parallel workers will
come into picture only when there are 2 indexes in the table.
+ /*
+ * Do post-vacuum cleanup and statistics update for each index if
+ * we're not in parallel lazy vacuum. If in parallel lazy vacuum, do
+ * only post-vacum cleanup and then update statistics after exited
+ * from parallel mode.
+ */
+ lazy_vacuum_all_indexes(vacrelstats, Irel, nindexes, indstats,
+ lps, true);
How about renaming the above function, as it does the cleanup also?
lazy_vacuum_or_cleanup_all_indexes?
+ if (!IsInParallelVacuum(lps))
+ {
+ /*
+ * Update index statistics. If in parallel lazy vacuum, we will
+ * update them after exited from parallel mode.
+ */
+ lazy_update_index_statistics(Irel[idx], stats[idx]);
+
+ if (stats[idx])
+ pfree(stats[idx]);
+ }
The above check in lazy_vacuum_all_indexes can be combined it with the outer
if check where the memcpy is happening. I still feel that the logic around the stats
makes it little bit complex.
+ if (IsParallelWorker())
+ msg = "scanned index \"%s\" to remove %d row versions by parallel vacuum worker";
+ else
+ msg = "scanned index \"%s\" to remove %d row versions";
I feel, this way of error message may not be picked for the translations.
Is there any problem if we duplicate the entire ereport message with changed message?
+ for (i = 0; i < nindexes; i++)
+ {
+ LVIndStats *s = &(copied_indstats[i]);
+
+ if (s->updated)
+ lazy_update_index_statistics(Irel[i], &(s->stats));
+ }
+
+ pfree(copied_indstats);
why can't we use the shared memory directly to update the stats once all the workers
are finished, instead of copying them to a local memory?
+ tab->at_params.nworkers = 0; /* parallel lazy autovacuum is not supported */
User is not required to provide workers number compulsory even that parallel vacuum can
work, so just setting the above parameters doesn't stop the parallel workers, user must
pass the PARALLEL option also. So mentioning that also will be helpful later when we
start supporting it or some one who is reading the code can understand.
Regards,
Haribabu Kommi
Fujitsu Australia
Hello. At Thu, 21 Mar 2019 15:51:40 -0400, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmobkRtLb5frmEF5t9U=d+iV9c5emtN+NrRS_xrHaH1Z20A@mail.gmail.com> > On Tue, Mar 19, 2019 at 3:59 AM Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > The leader doesn't continue heap-scan while index vacuuming is > > running. And the index-page-scan seems eat up CPU easily. If > > index vacuum can run simultaneously with the next heap scan > > phase, we can make index scan finishes almost the same time with > > the next round of heap scan. It would reduce the (possible) CPU > > contention. But this requires as the twice size of shared > > memoryas the current implement. > > I think you're approaching this from the wrong point of view. If we > have a certain amount of memory available, is it better to (a) fill > the entire thing with dead tuples once, or (b) better to fill half of > it with dead tuples, start index vacuuming, and then fill the other > half of it with dead tuples for the next index-vacuum cycle while the > current one is running? I think the answer is that (a) is clearly Sure. > better, because it results in half as many index vacuum cycles. The "problem" I see there is it stops heap scanning on the leader process. The leader cannot start the heap scan until the index scan on workers end. The heap scan is expected not to stop by the half-and-half stratregy especially when the whole index pages are on memory. But it is not always the case, of course. > We can't really ask the user how much memory it's OK to use and then > use twice as much. But if we could, what you're proposing here is > probably still not the right way to use it. Yes. I thought that I wrote that with such implication. "requires as the twice size" has negative implications as you wrote above. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Hello. I forgot to mention a point. At Fri, 22 Mar 2019 14:02:36 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoD7rqZPPyV7z4bku8Mn8AE2_kRdW1hTO4Lrsp+vn_U1kQ@mail.gmail.com> > Attached the updated version patch. 0001 patch allows all existing > vacuum options an boolean argument. 0002 patch introduces parallel > lazy vacuum. 0003 patch adds -P (--parallel) option to vacuumdb > command. > + if (IsParallelWorker()) > + msg = "scanned index \"%s\" to remove %d row versions by parallel vacuum worker"; > + else > + msg = "scanned index \"%s\" to remove %d row versions"; > ereport(elevel, > - (errmsg("scanned index \"%s\" to remove %d row versions", > + (errmsg(msg, > RelationGetRelationName(indrel), > - vacrelstats->num_dead_tuples), > + dead_tuples->num_tuples), The msg prevents NLS from working. Please enclose the right-hand literals by gettext_noop(). regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Mar 26, 2019 at 10:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Thank you for reviewing the patch. I don't think the approach in v20-0001 is quite right. if (strcmp(opt->defname, "verbose") == 0) - params.options |= VACOPT_VERBOSE; + params.options |= defGetBoolean(opt) ? VACOPT_VERBOSE : 0; It seems to me that it would be better to do declare a separate boolean for each flag at the top; e.g. bool verbose. Then here do verbose = defGetBoolean(opt). And then after the loop do params.options = (verbose ? VACOPT_VERBOSE : 0) | ... similarly for other options. The thing I don't like about the way you have it here is that it's not going to work well for options that are true by default but can optionally be set to false. In that case, you would need to start with the bit set and then clear it, but |= can only set bits, not clear them. I went and looked at the VACUUM (INDEX_CLEANUP) patch on the other thread and it doesn't have any special handling for that case, which makes me suspect that if you use that patch, the reloption works as expected but VACUUM (INDEX_CLEANUP false) doesn't actually succeed in disabling index cleanup. The structure I suggested above would fix that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 29, 2019 at 4:53 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Mar 26, 2019 at 10:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Thank you for reviewing the patch. > > I don't think the approach in v20-0001 is quite right. > > if (strcmp(opt->defname, "verbose") == 0) > - params.options |= VACOPT_VERBOSE; > + params.options |= defGetBoolean(opt) ? VACOPT_VERBOSE : 0; > > It seems to me that it would be better to do declare a separate > boolean for each flag at the top; e.g. bool verbose. Then here do > verbose = defGetBoolean(opt). And then after the loop do > params.options = (verbose ? VACOPT_VERBOSE : 0) | ... similarly for > other options. > > The thing I don't like about the way you have it here is that it's not > going to work well for options that are true by default but can > optionally be set to false. In that case, you would need to start > with the bit set and then clear it, but |= can only set bits, not > clear them. I went and looked at the VACUUM (INDEX_CLEANUP) patch on > the other thread and it doesn't have any special handling for that > case, which makes me suspect that if you use that patch, the reloption > works as expected but VACUUM (INDEX_CLEANUP false) doesn't actually > succeed in disabling index cleanup. The structure I suggested above > would fix that. > You're right, the previous patches are wrong. Attached the updated version patches. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Thu, Mar 28, 2019 at 10:27 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > You're right, the previous patches are wrong. Attached the updated > version patches. 0001 looks good now. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 29, 2019 at 9:28 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 28, 2019 at 10:27 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > You're right, the previous patches are wrong. Attached the updated > > version patches. > > 0001 looks good now. Committed. > Thank you! Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Mar 29, 2019 at 11:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Mar 29, 2019 at 4:53 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Tue, Mar 26, 2019 at 10:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > Thank you for reviewing the patch. > > > > I don't think the approach in v20-0001 is quite right. > > > > if (strcmp(opt->defname, "verbose") == 0) > > - params.options |= VACOPT_VERBOSE; > > + params.options |= defGetBoolean(opt) ? VACOPT_VERBOSE : 0; > > > > It seems to me that it would be better to do declare a separate > > boolean for each flag at the top; e.g. bool verbose. Then here do > > verbose = defGetBoolean(opt). And then after the loop do > > params.options = (verbose ? VACOPT_VERBOSE : 0) | ... similarly for > > other options. > > > > The thing I don't like about the way you have it here is that it's not > > going to work well for options that are true by default but can > > optionally be set to false. In that case, you would need to start > > with the bit set and then clear it, but |= can only set bits, not > > clear them. I went and looked at the VACUUM (INDEX_CLEANUP) patch on > > the other thread and it doesn't have any special handling for that > > case, which makes me suspect that if you use that patch, the reloption > > works as expected but VACUUM (INDEX_CLEANUP false) doesn't actually > > succeed in disabling index cleanup. The structure I suggested above > > would fix that. > > > > You're right, the previous patches are wrong. Attached the updated > version patches. > These patches conflict with the current HEAD. Attached the updated patches. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Thu, Apr 4, 2019 at 6:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > These patches conflict with the current HEAD. Attached the updated patches. They'll need another rebase. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Apr 5, 2019 at 4:51 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Apr 4, 2019 at 6:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > These patches conflict with the current HEAD. Attached the updated patches. > > They'll need another rebase. > Thank you for the notice. Rebased. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Thank you for the rebased version. At Fri, 5 Apr 2019 13:59:36 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoC_s0H0x-dDPhVJEqMYcnKYOMjESXd6r_9bbc3ZZegg1A@mail.gmail.com> > Thank you for the notice. Rebased. + <term><replaceable class="parameter">integer</replaceable></term> + <listitem> + <para> + Specifies parallel degree for <literal>PARALLEL</literal> option. The + value must be at least 1. If the parallel degree + <replaceable class="parameter">integer</replaceable> is omitted, then + <command>VACUUM</command> decides the number of workers based on number of + indexes on the relation which further limited by + <xref linkend="guc-max-parallel-workers-maintenance"/>. + </para> + </listitem> + </varlistentry> I'm quite confused to see this. I suppose the <para> should be a description about <integer> parameters. Actually the existing <boolean> entry is describing the boolean itself. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Apr 5, 2019 at 3:47 PM Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > Thank you for the rebased version. > > At Fri, 5 Apr 2019 13:59:36 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoC_s0H0x-dDPhVJEqMYcnKYOMjESXd6r_9bbc3ZZegg1A@mail.gmail.com> > > Thank you for the notice. Rebased. > > + <term><replaceable class="parameter">integer</replaceable></term> > + <listitem> > + <para> > + Specifies parallel degree for <literal>PARALLEL</literal> option. The > + value must be at least 1. If the parallel degree > + <replaceable class="parameter">integer</replaceable> is omitted, then > + <command>VACUUM</command> decides the number of workers based on number of > + indexes on the relation which further limited by > + <xref linkend="guc-max-parallel-workers-maintenance"/>. > + </para> > + </listitem> > + </varlistentry> > Thank you for reviewing the patch. > I'm quite confused to see this. I suppose the <para> should be a > description about <integer> parameters. Actually the existing > <boolean> entry is describing the boolean itself. > Indeed. How about the following description? PARALLEL Perform vacuum index and cleanup index phases of VACUUM in parallel using integer background workers (for the detail of each vacuum phases, please refer to Table 27.25). If the parallel degree integer is omitted, then VACUUM decides the number of workers based on number of indexes on the relation which further limited by max_parallel_maintenance_workers. Only one worker can be used per index. So parallel workers are launched only when there are at least 2 indexes in the table. Workers for vacuum are launched before starting each phases and exit at the end of the phase. These behaviors might change in a future release. This option can not use with FULL option. integer Specifies a positive integer value passed to the selected option. The integer value can also be omitted, in which case the default value of the selected option is used. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Apr 5, 2019 at 4:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Apr 5, 2019 at 3:47 PM Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > > > Thank you for the rebased version. > > > > At Fri, 5 Apr 2019 13:59:36 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoC_s0H0x-dDPhVJEqMYcnKYOMjESXd6r_9bbc3ZZegg1A@mail.gmail.com> > > > Thank you for the notice. Rebased. > > > > + <term><replaceable class="parameter">integer</replaceable></term> > > + <listitem> > > + <para> > > + Specifies parallel degree for <literal>PARALLEL</literal> option. The > > + value must be at least 1. If the parallel degree > > + <replaceable class="parameter">integer</replaceable> is omitted, then > > + <command>VACUUM</command> decides the number of workers based on number of > > + indexes on the relation which further limited by > > + <xref linkend="guc-max-parallel-workers-maintenance"/>. > > + </para> > > + </listitem> > > + </varlistentry> > > > > Thank you for reviewing the patch. > > > I'm quite confused to see this. I suppose the <para> should be a > > description about <integer> parameters. Actually the existing > > <boolean> entry is describing the boolean itself. > > > > Indeed. How about the following description? > Attached the updated version patches. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Hello. # Is this still living? I changed the status to "needs review" At Sat, 6 Apr 2019 06:47:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoAuD3txrxucnVtM6NGo=JGSjs3VDkoCzN0jGz_egc_82g@mail.gmail.com> > > Indeed. How about the following description? > > > > Attached the updated version patches. Thanks. heapam.h is including access/parallel.h but the file doesn't use parallel.h stuff and storage/shm_toc.h and storage/dsm.h are enough. + * DSM keys for parallel lazy vacuum. Since we don't need to worry about DSM + * keys conflicting with plan_node_id we can use small integers. Yeah, this is right, but "plan_node_id" seems abrupt there. Please prepend "differently from parallel execution code" or .. I think no excuse is needed to use that numbers. The executor code is already making an excuse for the large numbers as unusual instead. + * Macro to check if we in a parallel lazy vacuum. If true, we're in parallel + * mode and prepared the DSM segments. + */ +#define IsInParallelVacuum(lps) (((LVParallelState *) (lps)) != NULL) we *are* in? The name "IsInParallleVacuum()" looks (to me) like suggesting "this process is a parallel vacuum worker". How about ParallelVacuumIsActive? +typedef struct LVIndStats +typedef struct LVDeadTuples +typedef struct LVShared +typedef struct LVParallelState The names are confusing, and the name LVShared is too generic. Shared-only structs are better to be marked in the name. That is, maybe it would be better that LVIndStats were LVSharedIndStats and LVShared were LVSharedRelStats. It might be better that LVIndStats were moved out from LVShared, but I'm not confident. +static void +lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel ... + lazy_begin_parallel_index_vacuum(lps, vacrelstats, for_cleanup); ... + do_parallel_vacuum_or_cleanup_indexes(Irel, nindexes, stats, + lps->lvshared, vacrelstats->dead_tuples); ... + lazy_end_parallel_index_vacuum(lps, !for_cleanup); The function takes the parameter for_cleanup, but the flag is used by the three subfunctions in utterly ununified way. It seems to me useless to store for_cleanup in lvshared and lazy_end is rather confusing. There's no explanation why "reinitialization" == "!for_cleanup". In the first place, lazy_begin_parallel_index_vacuum and lazy_end_parallel_index_vacuum are called only from the function and rather short so it doesn't seem reasonable that the are independend functions. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Apr 8, 2019 at 7:25 PM Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > Hello. > > # Is this still living? I changed the status to "needs review" > > At Sat, 6 Apr 2019 06:47:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoAuD3txrxucnVtM6NGo=JGSjs3VDkoCzN0jGz_egc_82g@mail.gmail.com> > > > Indeed. How about the following description? > > > > > > > Attached the updated version patches. > > Thanks. > Thank you for reviewing the patch! > heapam.h is including access/parallel.h but the file doesn't use > parallel.h stuff and storage/shm_toc.h and storage/dsm.h are > enough. Fixed. > > + * DSM keys for parallel lazy vacuum. Since we don't need to worry about DSM > + * keys conflicting with plan_node_id we can use small integers. > > Yeah, this is right, but "plan_node_id" seems abrupt > there. Please prepend "differently from parallel execution code" > or .. I think no excuse is needed to use that numbers. The > executor code is already making an excuse for the large numbers > as unusual instead. Fixed. > > + * Macro to check if we in a parallel lazy vacuum. If true, we're in parallel > + * mode and prepared the DSM segments. > + */ > +#define IsInParallelVacuum(lps) (((LVParallelState *) (lps)) != NULL) > > we *are* in? Fixed. > > The name "IsInParallleVacuum()" looks (to me) like suggesting > "this process is a parallel vacuum worker". How about > ParallelVacuumIsActive? Fixed. > > > +typedef struct LVIndStats > +typedef struct LVDeadTuples > +typedef struct LVShared > +typedef struct LVParallelState > > The names are confusing, and the name LVShared is too > generic. Shared-only structs are better to be marked in the name. > That is, maybe it would be better that LVIndStats were > LVSharedIndStats and LVShared were LVSharedRelStats. Hmm, LVShared actually stores also various things that are not relevant with the relation. I'm not sure that's a good idea to rename it to LVSharedRelStats. When we support parallel vacuum for other vacuum steps the adding a struct for storing only relation statistics might work well. > > It might be better that LVIndStats were moved out from LVShared, > but I'm not confident. > > +static void > +lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel > ... > + lazy_begin_parallel_index_vacuum(lps, vacrelstats, for_cleanup); > ... > + do_parallel_vacuum_or_cleanup_indexes(Irel, nindexes, stats, > + lps->lvshared, vacrelstats->dead_tuples); > ... > + lazy_end_parallel_index_vacuum(lps, !for_cleanup); > > The function takes the parameter for_cleanup, but the flag is > used by the three subfunctions in utterly ununified way. It seems > to me useless to store for_cleanup in lvshared I think that we need to store for_cleanup or a something telling vacuum workers to do either index vacuuming or index cleanup in lvshared. Or can we use another thing instead? > and lazy_end is > rather confusing. Ah, I used "lazy" as prefix of function in vacuumlazy.c. Fixed. > There's no explanation why "reinitialization" > == "!for_cleanup". In the first place, > lazy_begin_parallel_index_vacuum and > lazy_end_parallel_index_vacuum are called only from the function > and rather short so it doesn't seem reasonable that the are > independend functions. Okay agreed, fixed. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Apr 10, 2019 at 2:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Apr 8, 2019 at 7:25 PM Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > > > Hello. > > > > # Is this still living? I changed the status to "needs review" > > > > At Sat, 6 Apr 2019 06:47:32 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoAuD3txrxucnVtM6NGo=JGSjs3VDkoCzN0jGz_egc_82g@mail.gmail.com> > > > > Indeed. How about the following description? > > > > > > > > > > Attached the updated version patches. > > > > Thanks. > > > > Thank you for reviewing the patch! > > > heapam.h is including access/parallel.h but the file doesn't use > > parallel.h stuff and storage/shm_toc.h and storage/dsm.h are > > enough. > > Fixed. > > > > > + * DSM keys for parallel lazy vacuum. Since we don't need to worry about DSM > > + * keys conflicting with plan_node_id we can use small integers. > > > > Yeah, this is right, but "plan_node_id" seems abrupt > > there. Please prepend "differently from parallel execution code" > > or .. I think no excuse is needed to use that numbers. The > > executor code is already making an excuse for the large numbers > > as unusual instead. > > Fixed. > > > > > + * Macro to check if we in a parallel lazy vacuum. If true, we're in parallel > > + * mode and prepared the DSM segments. > > + */ > > +#define IsInParallelVacuum(lps) (((LVParallelState *) (lps)) != NULL) > > > > we *are* in? > > Fixed. > > > > > The name "IsInParallleVacuum()" looks (to me) like suggesting > > "this process is a parallel vacuum worker". How about > > ParallelVacuumIsActive? > > Fixed. > > > > > > > +typedef struct LVIndStats > > +typedef struct LVDeadTuples > > +typedef struct LVShared > > +typedef struct LVParallelState > > > > The names are confusing, and the name LVShared is too > > generic. Shared-only structs are better to be marked in the name. > > That is, maybe it would be better that LVIndStats were > > LVSharedIndStats and LVShared were LVSharedRelStats. > > Hmm, LVShared actually stores also various things that are not > relevant with the relation. I'm not sure that's a good idea to rename > it to LVSharedRelStats. When we support parallel vacuum for other > vacuum steps the adding a struct for storing only relation statistics > might work well. > > > > > It might be better that LVIndStats were moved out from LVShared, > > but I'm not confident. > > > > +static void > > +lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel > > ... > > + lazy_begin_parallel_index_vacuum(lps, vacrelstats, for_cleanup); > > ... > > + do_parallel_vacuum_or_cleanup_indexes(Irel, nindexes, stats, > > + lps->lvshared, vacrelstats->dead_tuples); > > ... > > + lazy_end_parallel_index_vacuum(lps, !for_cleanup); > > > > The function takes the parameter for_cleanup, but the flag is > > used by the three subfunctions in utterly ununified way. It seems > > to me useless to store for_cleanup in lvshared > > I think that we need to store for_cleanup or a something telling > vacuum workers to do either index vacuuming or index cleanup in > lvshared. Or can we use another thing instead? > > > and lazy_end is > > rather confusing. > > Ah, I used "lazy" as prefix of function in vacuumlazy.c. Fixed. > > > There's no explanation why "reinitialization" > > == "!for_cleanup". In the first place, > > lazy_begin_parallel_index_vacuum and > > lazy_end_parallel_index_vacuum are called only from the function > > and rather short so it doesn't seem reasonable that the are > > independend functions. > > Okay agreed, fixed. > Since the previous version patch conflicts with current HEAD, I've attached the updated version patches. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: not tested Hello I reviewed v25 patches and have just a few notes. missed synopsis for "PARALLEL" option (<synopsis> block in doc/src/sgml/ref/vacuum.sgml ) missed prototype for vacuum_log_cleanup_info in "non-export function prototypes" > /* > * Do post-vacuum cleanup, and statistics update for each index if > * we're not in parallel lazy vacuum. If in parallel lazy vacuum, do > * only post-vacum cleanup and update statistics at the end of parallel > * lazy vacuum. > */ > if (vacrelstats->useindex) > lazy_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes, > indstats, lps, true); > > if (ParallelVacuumIsActive(lps)) > { > /* End parallel mode and update index statistics */ > end_parallel_vacuum(lps, Irel, nindexes); > } I personally do not like update statistics in different places. Can we change lazy_vacuum_or_cleanup_indexes to writing stats for both parallel and non-parallel cases? I means somethinglike this: > if (ParallelVacuumIsActive(lps)) > { > /* Do parallel index vacuuming or index cleanup */ > lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, > nindexes, stats, > lps, for_cleanup); > if (for_cleanup) > { > ... > for (i = 0; i < nindexes; i++) > lazy_update_index_statistics(...); > } > return; > } So all lazy_update_index_statistics would be in one place. lazy_parallel_vacuum_or_cleanup_indexes is called only from parallelleader and waits for all workers. Possible we can update stats in lazy_parallel_vacuum_or_cleanup_indexes after WaitForParallelWorkersToFinishcall. Also discussion question: vacuumdb parameters --parallel= and --jobs= will confuse users? We need more description for thisoptions? regards, Sergei
On Fri, Jun 7, 2019 at 12:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Since the previous version patch conflicts with current HEAD, I've > attached the updated version patches. > Review comments: ------------------------------ * indexes on the relation which further limited by + <xref linkend="guc-max-parallel-workers-maintenance"/>. /which further/which is further * + * index vacuuming or index cleanup, we launch parallel worker processes. Once + * all indexes are processed the parallel worker processes exit and the leader + * process re-initializes the DSM segment while keeping recorded dead tuples. It is not clear for this comment why it re-initializes the DSM segment instead of destroying it once the index work is done by workers. Can you elaborate a bit more in the comment? * + * Note that all parallel workers live during one either index vacuuming or It seems usage of 'one' is not required in the above sentence. * + +/* + * Compute the number of parallel worker process to request. /process/processes * +static int +compute_parallel_workers(Relation onerel, int nrequested, int nindexes) +{ + int parallel_workers = 0; + + Assert(nrequested >= 0); + + if (nindexes <= 1) + return 0; I think here, in the beginning, you can also check if max_parallel_maintenance_workers are 0, then return. * In function compute_parallel_workers, don't we want to cap the number of workers based on maintenance_work_mem as we do in plan_create_index_workers? The basic point is how do we want to treat maintenance_work_mem for this feature. Do we want all workers to use at max the maintenance_work_mem or each worker is allowed to use maintenance_work_mem? I would prefer earlier unless we have good reason to follow the later strategy. Accordingly, we might need to update the below paragraph in docs: "Note that parallel utility commands should not consume substantially more memory than equivalent non-parallel operations. This strategy differs from that of parallel query, where resource limits generally apply per worker process. Parallel utility commands treat the resource limit <varname>maintenance_work_mem</varname> as a limit to be applied to the entire utility command, regardless of the number of parallel worker processes." * +static int +compute_parallel_workers(Relation onerel, int nrequested, int nindexes) +{ + int parallel_workers = 0; + + Assert(nrequested >= 0); + + if (nindexes <= 1) + return 0; + + if (nrequested > 0) + { + /* At least one index is taken by the leader process */ + parallel_workers = Min(nrequested, nindexes - 1); + } I think here we always allow the leader to participate. It seems to me we have some way to disable leader participation. During the development of previous parallel operations, we find it quite handy to catch bugs. We might want to mimic what has been done for index with DISABLE_LEADER_PARTICIPATION. * +/* + * DSM keys for parallel lazy vacuum. Unlike other parallel execution code, + * since we don't need to worry about DSM keys conflicting with plan_node_id + * we can use small integers. + */ +#define PARALLEL_VACUUM_KEY_SHARED 1 +#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2 +#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3 I think it would be better if these keys should be assigned numbers in a way we do for other similar operation like create index. See below defines in code: /* Magic numbers for parallel state sharing */ #define PARALLEL_KEY_BTREE_SHARED UINT64CONST(0xA000000000000001) This will make the code consistent with other parallel operations. * +begin_parallel_vacuum(LVRelStats *vacrelstats, Oid relid, BlockNumber nblocks, + int nindexes, int nrequested) { .. + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples), .. } I think here you should use SizeOfLVDeadTuples as defined by patch. * + keys++; + + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */ + maxtuples = compute_max_dead_tuples(nblocks, true); + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples), + mul_size(sizeof(ItemPointerData), maxtuples))); + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples); + keys++; + + shm_toc_estimate_keys(&pcxt->estimator, keys); + + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */ + querylen = strlen(debug_query_string); + shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1); + shm_toc_estimate_keys(&pcxt->estimator, 1); The code style looks inconsistent here. In some cases, you are calling shm_toc_estimate_keys immediately after shm_toc_estimate_chunk and in other cases, you are accumulating keys. I think it is better to call shm_toc_estimate_keys immediately after shm_toc_estimate_chunk in all cases. * +void +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc) { .. + /* Set debug_query_string for individual workers */ + sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true); .. } I think the last parameter in shm_toc_lookup should be false. Is there a reason for passing it as true? * +void +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc) +{ .. + /* Open table */ + onerel = heap_open(lvshared->relid, ShareUpdateExclusiveLock); .. } I don't think it is a good idea to assume the lock mode as ShareUpdateExclusiveLock here. Tomorrow, if due to some reason there is a change in lock level for the vacuum process, we might forget to update it here. I think it is better if we can get this information from the master backend. * +end_parallel_vacuum(LVParallelState *lps, Relation *Irel, int nindexes) { .. + /* Shutdown worker processes and destroy the parallel context */ + WaitForParallelWorkersToFinish(lps->pcxt); .. } Do we really need to call WaitForParallelWorkersToFinish here as it must have been called in lazy_parallel_vacuum_or_cleanup_indexes before this time? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Sep 21, 2019 at 6:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jun 7, 2019 at 12:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Since the previous version patch conflicts with current HEAD, I've
> attached the updated version patches.
>
Review comments:
------------------------------
Sawada-San, are you planning to work on the review comments? I can take care of this and then proceed with further review if you are tied up with something else.
*
+/*
+ * DSM keys for parallel lazy vacuum. Unlike other parallel execution code,
+ * since we don't need to worry about DSM keys conflicting with plan_node_id
+ * we can use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
I think it would be better if these keys should be assigned numbers in
a way we do for other similar operation like create index. See below
defines
in code:
/* Magic numbers for parallel state sharing */
#define PARALLEL_KEY_BTREE_SHARED UINT64CONST(0xA000000000000001)
This will make the code consistent with other parallel operations.
I think we don't need to handle this comment. Today, I read the other emails in the thread and noticed that you have done this based on comment by Robert and that decision seems wise to me.
On Tue, Oct 1, 2019 at 10:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Sep 21, 2019 at 6:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Fri, Jun 7, 2019 at 12:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> > >> > Since the previous version patch conflicts with current HEAD, I've >> > attached the updated version patches. >> > >> >> Review comments: >> ------------------------------ > > > Sawada-San, are you planning to work on the review comments? I can take care of this and then proceed with further reviewif you are tied up with something else. > Thank you for reviewing this patch. Yes I'm addressing your comments and will submit the updated patch soon. > I think we don't need to handle this comment. Today, I read the other emails in the thread and noticed that you have donethis based on comment by Robert and that decision seems wise to me. Understood. Regards, -- Masahiko Sawada
On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jun 7, 2019 at 12:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > Since the previous version patch conflicts with current HEAD, I've > > attached the updated version patches. > > > Thank you for reviewing this patch! > Review comments: > ------------------------------ > * > indexes on the relation which further limited by > + <xref linkend="guc-max-parallel-workers-maintenance"/>. > > /which further/which is further > Fixed. > * > + * index vacuuming or index cleanup, we launch parallel worker processes. Once > + * all indexes are processed the parallel worker processes exit and the leader > + * process re-initializes the DSM segment while keeping recorded dead tuples. > > It is not clear for this comment why it re-initializes the DSM segment > instead of destroying it once the index work is done by workers. Can > you elaborate a bit more in the comment? Added more explanation. > > * > + * Note that all parallel workers live during one either index vacuuming or > > It seems usage of 'one' is not required in the above sentence. Removed. > > * > + > +/* > + * Compute the number of parallel worker process to request. > > /process/processes Fixed. > > * > +static int > +compute_parallel_workers(Relation onerel, int nrequested, int nindexes) > +{ > + int parallel_workers = 0; > + > + Assert(nrequested >= 0); > + > + if (nindexes <= 1) > + return 0; > > I think here, in the beginning, you can also check if > max_parallel_maintenance_workers are 0, then return. > Agreed, fixed. > * > In function compute_parallel_workers, don't we want to cap the number > of workers based on maintenance_work_mem as we do in > plan_create_index_workers? > > The basic point is how do we want to treat maintenance_work_mem for > this feature. Do we want all workers to use at max the > maintenance_work_mem or each worker is allowed to use > maintenance_work_mem? I would prefer earlier unless we have good > reason to follow the later strategy. > > Accordingly, we might need to update the below paragraph in docs: > "Note that parallel utility commands should not consume substantially > more memory than equivalent non-parallel operations. This strategy > differs from that of parallel query, where resource limits generally > apply per worker process. Parallel utility commands treat the > resource limit <varname>maintenance_work_mem</varname> as a limit to > be applied to the entire utility command, regardless of the number of > parallel worker processes." I'd also prefer to use maintenance_work_mem at max during parallel vacuum regardless of the number of parallel workers. This is current implementation. In lazy vacuum the maintenance_work_mem is used to record itempointer of dead tuples. This is done by leader process and worker processes just refers them for vacuuming dead index tuples. Even if user sets a small amount of maintenance_work_mem the parallel vacuum would be helpful as it still would take a time for index vacuuming. So I thought we should cap the number of parallel workers by the number of indexes rather than maintenance_work_mem. > > * > +static int > +compute_parallel_workers(Relation onerel, int nrequested, int nindexes) > +{ > + int parallel_workers = 0; > + > + Assert(nrequested >= 0); > + > + if (nindexes <= 1) > + return 0; > + > + if (nrequested > 0) > + { > + /* At least one index is taken by the leader process */ > + parallel_workers = Min(nrequested, nindexes - 1); > + } > > I think here we always allow the leader to participate. It seems to > me we have some way to disable leader participation. During the > development of previous parallel operations, we find it quite handy to > catch bugs. We might want to mimic what has been done for index with > DISABLE_LEADER_PARTICIPATION. Added the way to disable leader participation. > > * > +/* > + * DSM keys for parallel lazy vacuum. Unlike other parallel execution code, > + * since we don't need to worry about DSM keys conflicting with plan_node_id > + * we can use small integers. > + */ > +#define PARALLEL_VACUUM_KEY_SHARED 1 > +#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2 > +#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3 > > I think it would be better if these keys should be assigned numbers in > a way we do for other similar operation like create index. See below > defines > in code: > /* Magic numbers for parallel state sharing */ > #define PARALLEL_KEY_BTREE_SHARED UINT64CONST(0xA000000000000001) > > This will make the code consistent with other parallel operations. I skipped this comment according to the previous your mail. > > * > +begin_parallel_vacuum(LVRelStats *vacrelstats, Oid relid, BlockNumber nblocks, > + int nindexes, int nrequested) > { > .. > + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples), > .. > } > > I think here you should use SizeOfLVDeadTuples as defined by patch. Fixed. > > * > + keys++; > + > + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */ > + maxtuples = compute_max_dead_tuples(nblocks, true); > + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples), > + mul_size(sizeof(ItemPointerData), maxtuples))); > + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples); > + keys++; > + > + shm_toc_estimate_keys(&pcxt->estimator, keys); > + > + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */ > + querylen = strlen(debug_query_string); > + shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1); > + shm_toc_estimate_keys(&pcxt->estimator, 1); > > The code style looks inconsistent here. In some cases, you are > calling shm_toc_estimate_keys immediately after shm_toc_estimate_chunk > and in other cases, you are accumulating keys. I think it is better > to call shm_toc_estimate_keys immediately after shm_toc_estimate_chunk > in all cases. Fixed. But there are some code that call shm_toc_estimate_keys for multiple keys in for example nbtsort.c and parallel.c. What is the difference? > > * > +void > +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc) > { > .. > + /* Set debug_query_string for individual workers */ > + sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true); > .. > } > > I think the last parameter in shm_toc_lookup should be false. Is > there a reason for passing it as true? My bad, fixed. > > * > +void > +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc) > +{ > .. > + /* Open table */ > + onerel = heap_open(lvshared->relid, ShareUpdateExclusiveLock); > .. > } > > I don't think it is a good idea to assume the lock mode as > ShareUpdateExclusiveLock here. Tomorrow, if due to some reason there > is a change in lock level for the vacuum process, we might forget to > update it here. I think it is better if we can get this information > from the master backend. So did you mean to declare the lock mode for lazy vacuum somewhere as a global variable and use it in both try_relation_open in the leader process and relation_open in the worker process? Otherwise we would end up with adding something like shared->lmode = ShareUpdateExclusiveLock during parallel context initialization, which seems not to resolve your concern. > > * > +end_parallel_vacuum(LVParallelState *lps, Relation *Irel, int nindexes) > { > .. > + /* Shutdown worker processes and destroy the parallel context */ > + WaitForParallelWorkersToFinish(lps->pcxt); > .. > } > > Do we really need to call WaitForParallelWorkersToFinish here as it > must have been called in lazy_parallel_vacuum_or_cleanup_indexes > before this time? No, removed. I've attached the updated version patch that incorporated your comments excluding some comments that needs more discussion. After discussion I'll update it again. Regards, -- Masahiko Sawada
Attachment
On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I have started reviewing this patch and I have some cosmetic comments. I will continue the review tomorrow. +This change adds PARALLEL option to VACUUM command that enable us to +perform index vacuuming and index cleanup with background +workers. Indivisual /s/Indivisual/Individual/ + * parallel worker processes. Individual indexes is processed by one vacuum + * process. At beginning of lazy vacuum (at lazy_scan_heap) we prepare the /s/Individual indexes is processed/Individual indexes are processed/ /s/At beginning/ At the beginning + * parallel workers. In parallel lazy vacuum, we enter parallel mode and + * create the parallel context and the DSM segment before starting heap + * scan. Can we extend the comment to explain why we do that before starting the heap scan? + else + { + if (for_cleanup) + { + if (lps->nworkers_requested > 0) + appendStringInfo(&buf, + ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d, requested %d)", + "launched %d parallel vacuum workers for index cleanup (planned: %d, requsted %d)", + lps->pcxt->nworkers_launched), + lps->pcxt->nworkers_launched, + lps->pcxt->nworkers, + lps->nworkers_requested); + else + appendStringInfo(&buf, + ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)", + "launched %d parallel vacuum workers for index cleanup (planned: %d)", + lps->pcxt->nworkers_launched), + lps->pcxt->nworkers_launched, + lps->pcxt->nworkers); + } + else + { + if (lps->nworkers_requested > 0) + appendStringInfo(&buf, + ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d, requested %d)", + "launched %d parallel vacuum workers for index vacuuming (planned: %d, requested %d)", + lps->pcxt->nworkers_launched), + lps->pcxt->nworkers_launched, + lps->pcxt->nworkers, + lps->nworkers_requested); + else + appendStringInfo(&buf, + ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)", + "launched %d parallel vacuum workers for index vacuuming (planned: %d)", + lps->pcxt->nworkers_launched), + lps->pcxt->nworkers_launched, + lps->pcxt->nworkers); + } Multiple places I see a lot of duplicate code for for_cleanup is true or false. The only difference is in the error message whether we give index cleanup or index vacuuming otherwise complete code is the same for both the cases. Can't we create some string and based on the value of the for_cleanup and append it in the error message that way we can avoid duplicating this at many places? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 3, 2019 at 9:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > I have started reviewing this patch and I have some cosmetic comments. > I will continue the review tomorrow. > Thank you for reviewing the patch! > +This change adds PARALLEL option to VACUUM command that enable us to > +perform index vacuuming and index cleanup with background > +workers. Indivisual > > /s/Indivisual/Individual/ Fixed. > > + * parallel worker processes. Individual indexes is processed by one vacuum > + * process. At beginning of lazy vacuum (at lazy_scan_heap) we prepare the > > /s/Individual indexes is processed/Individual indexes are processed/ > /s/At beginning/ At the beginning Fixed. > > + * parallel workers. In parallel lazy vacuum, we enter parallel mode and > + * create the parallel context and the DSM segment before starting heap > + * scan. > > Can we extend the comment to explain why we do that before starting > the heap scan? Added more comment. > > + else > + { > + if (for_cleanup) > + { > + if (lps->nworkers_requested > 0) > + appendStringInfo(&buf, > + ngettext("launched %d parallel vacuum worker for index cleanup > (planned: %d, requested %d)", > + "launched %d parallel vacuum workers for index cleanup (planned: > %d, requsted %d)", > + lps->pcxt->nworkers_launched), > + lps->pcxt->nworkers_launched, > + lps->pcxt->nworkers, > + lps->nworkers_requested); > + else > + appendStringInfo(&buf, > + ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)", > + "launched %d parallel vacuum workers for index cleanup (planned: %d)", > + lps->pcxt->nworkers_launched), > + lps->pcxt->nworkers_launched, > + lps->pcxt->nworkers); > + } > + else > + { > + if (lps->nworkers_requested > 0) > + appendStringInfo(&buf, > + ngettext("launched %d parallel vacuum worker for index vacuuming > (planned: %d, requested %d)", > + "launched %d parallel vacuum workers for index vacuuming (planned: > %d, requested %d)", > + lps->pcxt->nworkers_launched), > + lps->pcxt->nworkers_launched, > + lps->pcxt->nworkers, > + lps->nworkers_requested); > + else > + appendStringInfo(&buf, > + ngettext("launched %d parallel vacuum worker for index vacuuming > (planned: %d)", > + "launched %d parallel vacuum workers for index vacuuming (planned: %d)", > + lps->pcxt->nworkers_launched), > + lps->pcxt->nworkers_launched, > + lps->pcxt->nworkers); > + } > > Multiple places I see a lot of duplicate code for for_cleanup is true > or false. The only difference is in the error message whether we give > index cleanup or index vacuuming otherwise complete code is the same > for > both the cases. Can't we create some string and based on the value of > the for_cleanup and append it in the error message that way we can > avoid duplicating this at many places? I think it's necessary for translation. IIUC if we construct the message it cannot be translated. Attached the updated patch. Regards, -- Masahiko Sawada
Attachment
On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> *
> In function compute_parallel_workers, don't we want to cap the number
> of workers based on maintenance_work_mem as we do in
> plan_create_index_workers?
>
> The basic point is how do we want to treat maintenance_work_mem for
> this feature. Do we want all workers to use at max the
> maintenance_work_mem or each worker is allowed to use
> maintenance_work_mem? I would prefer earlier unless we have good
> reason to follow the later strategy.
>
> Accordingly, we might need to update the below paragraph in docs:
> "Note that parallel utility commands should not consume substantially
> more memory than equivalent non-parallel operations. This strategy
> differs from that of parallel query, where resource limits generally
> apply per worker process. Parallel utility commands treat the
> resource limit <varname>maintenance_work_mem</varname> as a limit to
> be applied to the entire utility command, regardless of the number of
> parallel worker processes."
I'd also prefer to use maintenance_work_mem at max during parallel
vacuum regardless of the number of parallel workers. This is current
implementation. In lazy vacuum the maintenance_work_mem is used to
record itempointer of dead tuples. This is done by leader process and
worker processes just refers them for vacuuming dead index tuples.
Even if user sets a small amount of maintenance_work_mem the parallel
vacuum would be helpful as it still would take a time for index
vacuuming. So I thought we should cap the number of parallel workers
by the number of indexes rather than maintenance_work_mem.
Isn't that true only if we never use maintenance_work_mem during index cleanup? However, I think we are using during index cleanup, see forex. ginInsertCleanup. I think before reaching any conclusion about what to do about this, first we need to establish whether this is a problem. If I am correct, then only some of the index cleanups (like gin index) use maintenance_work_mem, so we need to consider that point while designing a solution for this.
> *
> + keys++;
> +
> + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> + maxtuples = compute_max_dead_tuples(nblocks, true);
> + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
> + mul_size(sizeof(ItemPointerData), maxtuples)));
> + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
> + keys++;
> +
> + shm_toc_estimate_keys(&pcxt->estimator, keys);
> +
> + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
> + querylen = strlen(debug_query_string);
> + shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
> + shm_toc_estimate_keys(&pcxt->estimator, 1);
>
> The code style looks inconsistent here. In some cases, you are
> calling shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
> and in other cases, you are accumulating keys. I think it is better
> to call shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
> in all cases.
Fixed. But there are some code that call shm_toc_estimate_keys for
multiple keys in for example nbtsort.c and parallel.c. What is the
difference?
We can do it, either way, depending on the situation. For example, in nbtsort.c, there is an if check based on which 'number of keys' can vary. I think here we should try to write in a way that it should not confuse the reader why it is done in a particular way. This is the reason I told you to be consistent.
>
> *
> +void
> +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
> +{
> ..
> + /* Open table */
> + onerel = heap_open(lvshared->relid, ShareUpdateExclusiveLock);
> ..
> }
>
> I don't think it is a good idea to assume the lock mode as
> ShareUpdateExclusiveLock here. Tomorrow, if due to some reason there
> is a change in lock level for the vacuum process, we might forget to
> update it here. I think it is better if we can get this information
> from the master backend.
So did you mean to declare the lock mode for lazy vacuum somewhere as
a global variable and use it in both try_relation_open in the leader
process and relation_open in the worker process? Otherwise we would
end up with adding something like shared->lmode =
ShareUpdateExclusiveLock during parallel context initialization, which
seems not to resolve your concern.
I was thinking that if we can find a way to pass the lockmode we used in vacuum_rel, but I guess we need to pass it through multiple functions which will be a bit inconvenient. OTOH, today, I checked nbtsort.c (_bt_parallel_build_main) and found that there also we are using it directly instead of passing it from the master backend. I think we can leave it as you have in the patch, but add a comment on why it is okay to use that lock mode?
On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Oct 3, 2019 at 9:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> + else
> + {
> + if (for_cleanup)
> + {
> + if (lps->nworkers_requested > 0)
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index cleanup
> (planned: %d, requested %d)",
> + "launched %d parallel vacuum workers for index cleanup (planned:
> %d, requsted %d)",
> + lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers,
> + lps->nworkers_requested);
> + else
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
> + "launched %d parallel vacuum workers for index cleanup (planned: %d)",
> + lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers);
> + }
> + else
> + {
> + if (lps->nworkers_requested > 0)
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index vacuuming
> (planned: %d, requested %d)",
> + "launched %d parallel vacuum workers for index vacuuming (planned:
> %d, requested %d)",
> + lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers,
> + lps->nworkers_requested);
> + else
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index vacuuming
> (planned: %d)",
> + "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
> + lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers);
> + }
>
> Multiple places I see a lot of duplicate code for for_cleanup is true
> or false. The only difference is in the error message whether we give
> index cleanup or index vacuuming otherwise complete code is the same
> for
> both the cases. Can't we create some string and based on the value of
> the for_cleanup and append it in the error message that way we can
> avoid duplicating this at many places?
I think it's necessary for translation. IIUC if we construct the
message it cannot be translated.
Do we really need to log all those messages? The other places where we launch parallel workers doesn't seem to be using such messages. Why do you think it is important to log the messages here when other cases don't use it?
On Fri, Oct 4, 2019 at 11:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Some more comments.. 1. + for (idx = 0; idx < nindexes; idx++) + { + if (!for_cleanup) + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples, + vacrelstats->old_live_tuples); + else + { + /* Cleanup one index and update index statistics */ + lazy_cleanup_index(Irel[idx], &stats[idx], vacrelstats->new_rel_tuples, + vacrelstats->tupcount_pages < vacrelstats->rel_pages); + + lazy_update_index_statistics(Irel[idx], stats[idx]); + + if (stats[idx]) + pfree(stats[idx]); + } I think instead of checking for_cleanup variable for every index of the loop we better move loop inside, like shown below? if (!for_cleanup) for (idx = 0; idx < nindexes; idx++) lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples, else for (idx = 0; idx < nindexes; idx++) { lazy_cleanup_index lazy_update_index_statistics ... } 2. +static void +lazy_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel, + int nindexes, IndexBulkDeleteResult **stats, + LVParallelState *lps, bool for_cleanup) +{ + int idx; + + Assert(!IsParallelWorker()); + + /* no job if the table has no index */ + if (nindexes <= 0) + return; Wouldn't it be good idea to call this function only if nindexes > 0? 3. +/* + * Vacuum or cleanup indexes with parallel workers. This function must be used + * by the parallel vacuum leader process. + */ +static void +lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel, + int nindexes, IndexBulkDeleteResult **stats, + LVParallelState *lps, bool for_cleanup) If you see this function there is no much common code between for_cleanup and without for_cleanup except these 3-4 statement. LaunchParallelWorkers(lps->pcxt); /* Create the log message to report */ initStringInfo(&buf); ... /* Wait for all vacuum workers to finish */ WaitForParallelWorkersToFinish(lps->pcxt); Other than that you have got a lot of checks like this + if (!for_cleanup) + { + } + else + { } I think code would be much redable if we have 2 functions one for vaccum (lazy_parallel_vacuum_indexes) and another for cleanup(lazy_parallel_cleanup_indexes). 4. * of index scans performed. So we don't use maintenance_work_mem memory for * the TID array, just enough to hold as many heap tuples as fit on one page. * + * Lazy vacuum supports parallel execution with parallel worker processes. In + * parallel lazy vacuum, we perform both index vacuuming and index cleanup with + * parallel worker processes. Individual indexes are processed by one vacuum Spacing after the "." is not uniform, previous comment is using 2 space and newly added is using 1 space. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> *
> +end_parallel_vacuum(LVParallelState *lps, Relation *Irel, int nindexes)
> {
> ..
> + /* Shutdown worker processes and destroy the parallel context */
> + WaitForParallelWorkersToFinish(lps->pcxt);
> ..
> }
>
> Do we really need to call WaitForParallelWorkersToFinish here as it
> must have been called in lazy_parallel_vacuum_or_cleanup_indexes
> before this time?
No, removed.
+ /* Shutdown worker processes and destroy the parallel context */
+ DestroyParallelContext(lps->pcxt);
But you forget to update the comment.
Few more comments:
+ DestroyParallelContext(lps->pcxt);
But you forget to update the comment.
Few more comments:
--------------------------------
1.
+/*
+ * Parallel Index vacuuming and index cleanup routine used by both the leader
+ * process and worker processes. Unlike single process vacuum, we don't update
+ * index statistics after cleanup index since it is not allowed during
+ * parallel mode, instead copy index bulk-deletion results from the local
+ * memory to the DSM segment and update them at the end of parallel lazy
+ * vacuum.
+ */
+static void
+do_parallel_vacuum_or_cleanup_indexes(Relation *Irel, int nindexes,
+ IndexBulkDeleteResult **stats,
+ LVShared *lvshared,
+ LVDeadTuples *dead_tuples)
+{
+ /* Loop until all indexes are vacuumed */
+ for (;;)
+ {
+ int idx;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1);
+
+ /* Done for all indexes? */
+ if (idx >= nindexes)
+ break;
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result
+ * if someone has already updated it.
+ */
+ if (lvshared->indstats[idx].updated &&
+ stats[idx] == NULL)
+ stats[idx] = &(lvshared->indstats[idx].stats);
+
+ /* Do vacuum or cleanup one index */
+ if (!lvshared->for_cleanup)
+ lazy_vacuum_index(Irel[idx], &stats[idx], dead_tuples,
+ lvshared->reltuples);
+ else
+ lazy_cleanup_index(Irel[idx], &stats[idx], lvshared->reltuples,
+ lvshared->estimated_count);
It seems we always run index cleanup via parallel worker which seems overkill because the cleanup index generally scans the index only when bulkdelete was not performed. In some cases like for hash index, it doesn't do anything even bulk delete is not called. OTOH, for brin index, it does the main job during cleanup but we might be able to always allow index cleanup by parallel worker for brin indexes if we remove the allocation in brinbulkdelete which I am not sure is of any use.
I think we shouldn't call cleanup via parallel worker unless bulkdelete hasn't been performed on the index.
2.
- for (i = 0; i < nindexes; i++)
- lazy_vacuum_index(Irel[i],
- &indstats[i],
- vacrelstats);
+ lazy_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
+ indstats, lps, false);
Indentation is not proper. You might want to run pgindent.
1.
+/*
+ * Parallel Index vacuuming and index cleanup routine used by both the leader
+ * process and worker processes. Unlike single process vacuum, we don't update
+ * index statistics after cleanup index since it is not allowed during
+ * parallel mode, instead copy index bulk-deletion results from the local
+ * memory to the DSM segment and update them at the end of parallel lazy
+ * vacuum.
+ */
+static void
+do_parallel_vacuum_or_cleanup_indexes(Relation *Irel, int nindexes,
+ IndexBulkDeleteResult **stats,
+ LVShared *lvshared,
+ LVDeadTuples *dead_tuples)
+{
+ /* Loop until all indexes are vacuumed */
+ for (;;)
+ {
+ int idx;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1);
+
+ /* Done for all indexes? */
+ if (idx >= nindexes)
+ break;
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result
+ * if someone has already updated it.
+ */
+ if (lvshared->indstats[idx].updated &&
+ stats[idx] == NULL)
+ stats[idx] = &(lvshared->indstats[idx].stats);
+
+ /* Do vacuum or cleanup one index */
+ if (!lvshared->for_cleanup)
+ lazy_vacuum_index(Irel[idx], &stats[idx], dead_tuples,
+ lvshared->reltuples);
+ else
+ lazy_cleanup_index(Irel[idx], &stats[idx], lvshared->reltuples,
+ lvshared->estimated_count);
It seems we always run index cleanup via parallel worker which seems overkill because the cleanup index generally scans the index only when bulkdelete was not performed. In some cases like for hash index, it doesn't do anything even bulk delete is not called. OTOH, for brin index, it does the main job during cleanup but we might be able to always allow index cleanup by parallel worker for brin indexes if we remove the allocation in brinbulkdelete which I am not sure is of any use.
I think we shouldn't call cleanup via parallel worker unless bulkdelete hasn't been performed on the index.
2.
- for (i = 0; i < nindexes; i++)
- lazy_vacuum_index(Irel[i],
- &indstats[i],
- vacrelstats);
+ lazy_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
+ indstats, lps, false);
Indentation is not proper. You might want to run pgindent.
On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> > One comment: We can check if parallel_workers is within range something within MAX_PARALLEL_WORKER_LIMIT. + int parallel_workers = 0; + + if (optarg != NULL) + { + parallel_workers = atoi(optarg); + if (parallel_workers <= 0) + { + pg_log_error("number of parallel workers must be at least 1"); + exit(1); + } + } Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 4, 2019 at 2:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> > * >> > In function compute_parallel_workers, don't we want to cap the number >> > of workers based on maintenance_work_mem as we do in >> > plan_create_index_workers? >> > >> > The basic point is how do we want to treat maintenance_work_mem for >> > this feature. Do we want all workers to use at max the >> > maintenance_work_mem or each worker is allowed to use >> > maintenance_work_mem? I would prefer earlier unless we have good >> > reason to follow the later strategy. >> > >> > Accordingly, we might need to update the below paragraph in docs: >> > "Note that parallel utility commands should not consume substantially >> > more memory than equivalent non-parallel operations. This strategy >> > differs from that of parallel query, where resource limits generally >> > apply per worker process. Parallel utility commands treat the >> > resource limit <varname>maintenance_work_mem</varname> as a limit to >> > be applied to the entire utility command, regardless of the number of >> > parallel worker processes." >> >> I'd also prefer to use maintenance_work_mem at max during parallel >> vacuum regardless of the number of parallel workers. This is current >> implementation. In lazy vacuum the maintenance_work_mem is used to >> record itempointer of dead tuples. This is done by leader process and >> worker processes just refers them for vacuuming dead index tuples. >> Even if user sets a small amount of maintenance_work_mem the parallel >> vacuum would be helpful as it still would take a time for index >> vacuuming. So I thought we should cap the number of parallel workers >> by the number of indexes rather than maintenance_work_mem. >> > > Isn't that true only if we never use maintenance_work_mem during index cleanup? However, I think we are using during indexcleanup, see forex. ginInsertCleanup. I think before reaching any conclusion about what to do about this, first weneed to establish whether this is a problem. If I am correct, then only some of the index cleanups (like gin index) usemaintenance_work_mem, so we need to consider that point while designing a solution for this. > I got your point. Currently the single process lazy vacuum could consume the amount of (maintenance_work_mem * 2) memory at max because we do index cleanup during holding the dead tuple space as you mentioned. And ginInsertCleanup is also be called at the beginning of ginbulkdelete. In current parallel lazy vacuum, each parallel vacuum worker could consume other memory apart from the memory used by heap scan depending on the implementation of target index AM. Given that the current single and parallel vacuum implementation it would be better to control the amount memory in total rather than the number of parallel workers. So one approach I came up with is that we make all vacuum workers use the amount of (maintenance_work_mem / # of participants) as new maintenance_work_mem. It might be too small in some cases but it doesn't consume more memory than single lazy vacuum as long as index AM doesn't consume more memory regardless of maintenance_work_mem. I think it really depends on the implementation of index AM. >> >> > * >> > + keys++; >> > + >> > + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */ >> > + maxtuples = compute_max_dead_tuples(nblocks, true); >> > + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples), >> > + mul_size(sizeof(ItemPointerData), maxtuples))); >> > + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples); >> > + keys++; >> > + >> > + shm_toc_estimate_keys(&pcxt->estimator, keys); >> > + >> > + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */ >> > + querylen = strlen(debug_query_string); >> > + shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1); >> > + shm_toc_estimate_keys(&pcxt->estimator, 1); >> > >> > The code style looks inconsistent here. In some cases, you are >> > calling shm_toc_estimate_keys immediately after shm_toc_estimate_chunk >> > and in other cases, you are accumulating keys. I think it is better >> > to call shm_toc_estimate_keys immediately after shm_toc_estimate_chunk >> > in all cases. >> >> Fixed. But there are some code that call shm_toc_estimate_keys for >> multiple keys in for example nbtsort.c and parallel.c. What is the >> difference? >> > > We can do it, either way, depending on the situation. For example, in nbtsort.c, there is an if check based on which 'numberof keys' can vary. I think here we should try to write in a way that it should not confuse the reader why it is donein a particular way. This is the reason I told you to be consistent. Understood. Thank you for explanation! > >> >> > >> > * >> > +void >> > +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc) >> > +{ >> > .. >> > + /* Open table */ >> > + onerel = heap_open(lvshared->relid, ShareUpdateExclusiveLock); >> > .. >> > } >> > >> > I don't think it is a good idea to assume the lock mode as >> > ShareUpdateExclusiveLock here. Tomorrow, if due to some reason there >> > is a change in lock level for the vacuum process, we might forget to >> > update it here. I think it is better if we can get this information >> > from the master backend. >> >> So did you mean to declare the lock mode for lazy vacuum somewhere as >> a global variable and use it in both try_relation_open in the leader >> process and relation_open in the worker process? Otherwise we would >> end up with adding something like shared->lmode = >> ShareUpdateExclusiveLock during parallel context initialization, which >> seems not to resolve your concern. >> > > I was thinking that if we can find a way to pass the lockmode we used in vacuum_rel, but I guess we need to pass it throughmultiple functions which will be a bit inconvenient. OTOH, today, I checked nbtsort.c (_bt_parallel_build_main) andfound that there also we are using it directly instead of passing it from the master backend. I think we can leave itas you have in the patch, but add a comment on why it is okay to use that lock mode? Yeah agreed. Regards, -- Masahiko Sawada
On Fri, Oct 4, 2019 at 2:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Thu, Oct 3, 2019 at 9:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: >> > >> > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> > >> > + else >> > + { >> > + if (for_cleanup) >> > + { >> > + if (lps->nworkers_requested > 0) >> > + appendStringInfo(&buf, >> > + ngettext("launched %d parallel vacuum worker for index cleanup >> > (planned: %d, requested %d)", >> > + "launched %d parallel vacuum workers for index cleanup (planned: >> > %d, requsted %d)", >> > + lps->pcxt->nworkers_launched), >> > + lps->pcxt->nworkers_launched, >> > + lps->pcxt->nworkers, >> > + lps->nworkers_requested); >> > + else >> > + appendStringInfo(&buf, >> > + ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)", >> > + "launched %d parallel vacuum workers for index cleanup (planned: %d)", >> > + lps->pcxt->nworkers_launched), >> > + lps->pcxt->nworkers_launched, >> > + lps->pcxt->nworkers); >> > + } >> > + else >> > + { >> > + if (lps->nworkers_requested > 0) >> > + appendStringInfo(&buf, >> > + ngettext("launched %d parallel vacuum worker for index vacuuming >> > (planned: %d, requested %d)", >> > + "launched %d parallel vacuum workers for index vacuuming (planned: >> > %d, requested %d)", >> > + lps->pcxt->nworkers_launched), >> > + lps->pcxt->nworkers_launched, >> > + lps->pcxt->nworkers, >> > + lps->nworkers_requested); >> > + else >> > + appendStringInfo(&buf, >> > + ngettext("launched %d parallel vacuum worker for index vacuuming >> > (planned: %d)", >> > + "launched %d parallel vacuum workers for index vacuuming (planned: %d)", >> > + lps->pcxt->nworkers_launched), >> > + lps->pcxt->nworkers_launched, >> > + lps->pcxt->nworkers); >> > + } >> > >> > Multiple places I see a lot of duplicate code for for_cleanup is true >> > or false. The only difference is in the error message whether we give >> > index cleanup or index vacuuming otherwise complete code is the same >> > for >> > both the cases. Can't we create some string and based on the value of >> > the for_cleanup and append it in the error message that way we can >> > avoid duplicating this at many places? >> >> I think it's necessary for translation. IIUC if we construct the >> message it cannot be translated. >> > > Do we really need to log all those messages? The other places where we launch parallel workers doesn't seem to be usingsuch messages. Why do you think it is important to log the messages here when other cases don't use it? Well I would rather think that parallel create index doesn't log enough messages. Parallel maintenance operation is invoked manually by user. I can imagine that DBA wants to cancel and try the operation again later if enough workers are not launched. But there is not a convenient way to confirm how many parallel workers planned and actually launched. We need to see ps command or pg_stat_activity. That's why I think that log message would be helpful for users. Regards, -- Masahiko Sawada
On Fri, Oct 4, 2019 at 3:35 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Oct 4, 2019 at 11:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >> > Some more comments.. > 1. > + for (idx = 0; idx < nindexes; idx++) > + { > + if (!for_cleanup) > + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples, > + vacrelstats->old_live_tuples); > + else > + { > + /* Cleanup one index and update index statistics */ > + lazy_cleanup_index(Irel[idx], &stats[idx], vacrelstats->new_rel_tuples, > + vacrelstats->tupcount_pages < vacrelstats->rel_pages); > + > + lazy_update_index_statistics(Irel[idx], stats[idx]); > + > + if (stats[idx]) > + pfree(stats[idx]); > + } > > I think instead of checking for_cleanup variable for every index of > the loop we better move loop inside, like shown below? > > if (!for_cleanup) > for (idx = 0; idx < nindexes; idx++) > lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples, > else > for (idx = 0; idx < nindexes; idx++) > { > lazy_cleanup_index > lazy_update_index_statistics > ... > } > > 2. > +static void > +lazy_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel, > + int nindexes, IndexBulkDeleteResult **stats, > + LVParallelState *lps, bool for_cleanup) > +{ > + int idx; > + > + Assert(!IsParallelWorker()); > + > + /* no job if the table has no index */ > + if (nindexes <= 0) > + return; > > Wouldn't it be good idea to call this function only if nindexes > 0? > > 3. > +/* > + * Vacuum or cleanup indexes with parallel workers. This function must be used > + * by the parallel vacuum leader process. > + */ > +static void > +lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, > Relation *Irel, > + int nindexes, IndexBulkDeleteResult **stats, > + LVParallelState *lps, bool for_cleanup) > > If you see this function there is no much common code between > for_cleanup and without for_cleanup except these 3-4 statement. > LaunchParallelWorkers(lps->pcxt); > /* Create the log message to report */ > initStringInfo(&buf); > ... > /* Wait for all vacuum workers to finish */ > WaitForParallelWorkersToFinish(lps->pcxt); > > Other than that you have got a lot of checks like this > + if (!for_cleanup) > + { > + } > + else > + { > } > > I think code would be much redable if we have 2 functions one for > vaccum (lazy_parallel_vacuum_indexes) and another for > cleanup(lazy_parallel_cleanup_indexes). > > 4. > * of index scans performed. So we don't use maintenance_work_mem memory for > * the TID array, just enough to hold as many heap tuples as fit on one page. > * > + * Lazy vacuum supports parallel execution with parallel worker processes. In > + * parallel lazy vacuum, we perform both index vacuuming and index cleanup with > + * parallel worker processes. Individual indexes are processed by one vacuum > > Spacing after the "." is not uniform, previous comment is using 2 > space and newly > added is using 1 space. Few more comments ---------------------------- 1. +static int +compute_parallel_workers(Relation onerel, int nrequested, int nindexes) +{ + int parallel_workers; + bool leaderparticipates = true; Seems like this function is not using onerel parameter so we can remove this. 2. + + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */ + maxtuples = compute_max_dead_tuples(nblocks, true); + est_deadtuples = MAXALIGN(add_size(SizeOfLVDeadTuples, + mul_size(sizeof(ItemPointerData), maxtuples))); + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples); + shm_toc_estimate_keys(&pcxt->estimator, 1); + + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */ + querylen = strlen(debug_query_string); for consistency with other comments change VACUUM_KEY_QUERY_TEXT to PARALLEL_VACUUM_KEY_QUERY_TEXT 3. @@ -2888,6 +2888,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map, (!wraparound ? VACOPT_SKIP_LOCKED : 0); tab->at_params.index_cleanup = VACOPT_TERNARY_DEFAULT; tab->at_params.truncate = VACOPT_TERNARY_DEFAULT; + /* parallel lazy vacuum is not supported for autovacuum */ + tab->at_params.nworkers = -1; What is the reason for the same? Can we explain in the comments? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 4, 2019 at 7:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Oct 4, 2019 at 2:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>
> Do we really need to log all those messages? The other places where we launch parallel workers doesn't seem to be using such messages. Why do you think it is important to log the messages here when other cases don't use it?
Well I would rather think that parallel create index doesn't log
enough messages. Parallel maintenance operation is invoked manually by
user. I can imagine that DBA wants to cancel and try the operation
again later if enough workers are not launched. But there is not a
convenient way to confirm how many parallel workers planned and
actually launched. We need to see ps command or pg_stat_activity.
That's why I think that log message would be helpful for users.
Hmm, what is a guarantee at a later time the user will get the required number of workers? I think if the user decides to vacuum, then she would want it to start sooner. Also, to cancel the vacuum, for this reason, the user needs to monitor logs which don't seem to be an easy thing considering this information will be logged at DEBUG2 level. I think it is better to add in docs that we don't guarantee that the number of workers the user has asked or expected to use for a parallel vacuum will be available during execution. Even if there is a compelling reason (which I don't see) to log this information, I think we shouldn't use more than one message to log (like there is no need for a separate message for cleanup and vacuuming) this information.
On Fri, Oct 4, 2019 at 7:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Oct 4, 2019 at 2:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> I'd also prefer to use maintenance_work_mem at max during parallel
>> vacuum regardless of the number of parallel workers. This is current
>> implementation. In lazy vacuum the maintenance_work_mem is used to
>> record itempointer of dead tuples. This is done by leader process and
>> worker processes just refers them for vacuuming dead index tuples.
>> Even if user sets a small amount of maintenance_work_mem the parallel
>> vacuum would be helpful as it still would take a time for index
>> vacuuming. So I thought we should cap the number of parallel workers
>> by the number of indexes rather than maintenance_work_mem.
>>
>
> Isn't that true only if we never use maintenance_work_mem during index cleanup? However, I think we are using during index cleanup, see forex. ginInsertCleanup. I think before reaching any conclusion about what to do about this, first we need to establish whether this is a problem. If I am correct, then only some of the index cleanups (like gin index) use maintenance_work_mem, so we need to consider that point while designing a solution for this.
>
I got your point. Currently the single process lazy vacuum could
consume the amount of (maintenance_work_mem * 2) memory at max because
we do index cleanup during holding the dead tuple space as you
mentioned. And ginInsertCleanup is also be called at the beginning of
ginbulkdelete. In current parallel lazy vacuum, each parallel vacuum
worker could consume other memory apart from the memory used by heap
scan depending on the implementation of target index AM. Given that
the current single and parallel vacuum implementation it would be
better to control the amount memory in total rather than the number of
parallel workers. So one approach I came up with is that we make all
vacuum workers use the amount of (maintenance_work_mem / # of
participants) as new maintenance_work_mem.
Yeah, we can do something like that, but I am not clear whether the current memory usage for Gin indexes is correct. I have started a new thread, let's discuss there.
On Sun, Oct 6, 2019 at 7:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Oct 4, 2019 at 7:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Fri, Oct 4, 2019 at 2:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> >> >> I'd also prefer to use maintenance_work_mem at max during parallel >> >> vacuum regardless of the number of parallel workers. This is current >> >> implementation. In lazy vacuum the maintenance_work_mem is used to >> >> record itempointer of dead tuples. This is done by leader process and >> >> worker processes just refers them for vacuuming dead index tuples. >> >> Even if user sets a small amount of maintenance_work_mem the parallel >> >> vacuum would be helpful as it still would take a time for index >> >> vacuuming. So I thought we should cap the number of parallel workers >> >> by the number of indexes rather than maintenance_work_mem. >> >> >> > >> > Isn't that true only if we never use maintenance_work_mem during index cleanup? However, I think we are using duringindex cleanup, see forex. ginInsertCleanup. I think before reaching any conclusion about what to do about this, firstwe need to establish whether this is a problem. If I am correct, then only some of the index cleanups (like gin index)use maintenance_work_mem, so we need to consider that point while designing a solution for this. >> > >> >> I got your point. Currently the single process lazy vacuum could >> consume the amount of (maintenance_work_mem * 2) memory at max because >> we do index cleanup during holding the dead tuple space as you >> mentioned. And ginInsertCleanup is also be called at the beginning of >> ginbulkdelete. In current parallel lazy vacuum, each parallel vacuum >> worker could consume other memory apart from the memory used by heap >> scan depending on the implementation of target index AM. Given that >> the current single and parallel vacuum implementation it would be >> better to control the amount memory in total rather than the number of >> parallel workers. So one approach I came up with is that we make all >> vacuum workers use the amount of (maintenance_work_mem / # of >> participants) as new maintenance_work_mem. > > > Yeah, we can do something like that, but I am not clear whether the current memory usage for Gin indexes is correct. Ihave started a new thread, let's discuss there. > Thank you for starting that discussion! Regards, -- Masahiko Sawada
On Sat, Oct 5, 2019 at 8:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Oct 4, 2019 at 7:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Fri, Oct 4, 2019 at 2:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> >> > >> > Do we really need to log all those messages? The other places where we launch parallel workers doesn't seem to be usingsuch messages. Why do you think it is important to log the messages here when other cases don't use it? >> >> Well I would rather think that parallel create index doesn't log >> enough messages. Parallel maintenance operation is invoked manually by >> user. I can imagine that DBA wants to cancel and try the operation >> again later if enough workers are not launched. But there is not a >> convenient way to confirm how many parallel workers planned and >> actually launched. We need to see ps command or pg_stat_activity. >> That's why I think that log message would be helpful for users. > > > Hmm, what is a guarantee at a later time the user will get the required number of workers? I think if the user decidesto vacuum, then she would want it to start sooner. Also, to cancel the vacuum, for this reason, the user needs tomonitor logs which don't seem to be an easy thing considering this information will be logged at DEBUG2 level. I thinkit is better to add in docs that we don't guarantee that the number of workers the user has asked or expected to usefor a parallel vacuum will be available during execution. Even if there is a compelling reason (which I don't see) tolog this information, I think we shouldn't use more than one message to log (like there is no need for a separate messagefor cleanup and vacuuming) this information. > I think that there is use case where user wants to cancel a long-running analytic query using parallel workers to use parallel workers for parallel vacuum instead. That way the lazy vacuum will eventually complete soon. Or user would want to see the vacuum log to check if lazy vacuum has been done with how many parallel workers for diagnostic when the vacuum took a long time. This log information appears when VERBOSE option is specified. When executing VACUUM command it's quite common to specify VERBOSE option to see the vacuum execution more details and VACUUM VERBOSE already emits very detailed information such as how many frozen pages are skipped and OldestXmin. So I think this information would not be too odd for that. Are you concerned that this information takes many lines of code? or it's not worth to be logged? I agreed to add in docs that we don't guarantee that the number of workers user requested will be available. -- Regards, -- Masahiko Sawada
On Mon, Oct 7, 2019 at 10:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Oct 5, 2019 at 8:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 4, 2019 at 7:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Fri, Oct 4, 2019 at 2:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >>
>> >
>> > Do we really need to log all those messages? The other places where we launch parallel workers doesn't seem to be using such messages. Why do you think it is important to log the messages here when other cases don't use it?
>>
>> Well I would rather think that parallel create index doesn't log
>> enough messages. Parallel maintenance operation is invoked manually by
>> user. I can imagine that DBA wants to cancel and try the operation
>> again later if enough workers are not launched. But there is not a
>> convenient way to confirm how many parallel workers planned and
>> actually launched. We need to see ps command or pg_stat_activity.
>> That's why I think that log message would be helpful for users.
>
>
> Hmm, what is a guarantee at a later time the user will get the required number of workers? I think if the user decides to vacuum, then she would want it to start sooner. Also, to cancel the vacuum, for this reason, the user needs to monitor logs which don't seem to be an easy thing considering this information will be logged at DEBUG2 level. I think it is better to add in docs that we don't guarantee that the number of workers the user has asked or expected to use for a parallel vacuum will be available during execution. Even if there is a compelling reason (which I don't see) to log this information, I think we shouldn't use more than one message to log (like there is no need for a separate message for cleanup and vacuuming) this information.
>
I think that there is use case where user wants to cancel a
long-running analytic query using parallel workers to use parallel
workers for parallel vacuum instead. That way the lazy vacuum will
eventually complete soon. Or user would want to see the vacuum log to
check if lazy vacuum has been done with how many parallel workers for
diagnostic when the vacuum took a long time. This log information
appears when VERBOSE option is specified. When executing VACUUM
command it's quite common to specify VERBOSE option to see the vacuum
execution more details and VACUUM VERBOSE already emits very detailed
information such as how many frozen pages are skipped and OldestXmin.
So I think this information would not be too odd for that. Are you
concerned that this information takes many lines of code? or it's not
worth to be logged?
To an extent both, but I see the point you are making. So, we should try to minimize the number of lines used to log this message. If we can use just one message to log this information, that would be ideal.
I agreed to add in docs that we don't guarantee that the number of
workers user requested will be available.
Okay.
On Fri, Oct 4, 2019 at 7:05 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Oct 4, 2019 at 11:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >> > Some more comments.. Thank you! > 1. > + for (idx = 0; idx < nindexes; idx++) > + { > + if (!for_cleanup) > + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples, > + vacrelstats->old_live_tuples); > + else > + { > + /* Cleanup one index and update index statistics */ > + lazy_cleanup_index(Irel[idx], &stats[idx], vacrelstats->new_rel_tuples, > + vacrelstats->tupcount_pages < vacrelstats->rel_pages); > + > + lazy_update_index_statistics(Irel[idx], stats[idx]); > + > + if (stats[idx]) > + pfree(stats[idx]); > + } > > I think instead of checking for_cleanup variable for every index of > the loop we better move loop inside, like shown below? Fixed. > > if (!for_cleanup) > for (idx = 0; idx < nindexes; idx++) > lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples, > else > for (idx = 0; idx < nindexes; idx++) > { > lazy_cleanup_index > lazy_update_index_statistics > ... > } > > 2. > +static void > +lazy_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel, > + int nindexes, IndexBulkDeleteResult **stats, > + LVParallelState *lps, bool for_cleanup) > +{ > + int idx; > + > + Assert(!IsParallelWorker()); > + > + /* no job if the table has no index */ > + if (nindexes <= 0) > + return; > > Wouldn't it be good idea to call this function only if nindexes > 0? > I realized the callers of this function should pass nindexes > 0 because they attempt to do index vacuuming or index cleanup. So it should be an assertion rather than returning. Thoughts? > 3. > +/* > + * Vacuum or cleanup indexes with parallel workers. This function must be used > + * by the parallel vacuum leader process. > + */ > +static void > +lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, > Relation *Irel, > + int nindexes, IndexBulkDeleteResult **stats, > + LVParallelState *lps, bool for_cleanup) > > If you see this function there is no much common code between > for_cleanup and without for_cleanup except these 3-4 statement. > LaunchParallelWorkers(lps->pcxt); > /* Create the log message to report */ > initStringInfo(&buf); > ... > /* Wait for all vacuum workers to finish */ > WaitForParallelWorkersToFinish(lps->pcxt); > > Other than that you have got a lot of checks like this > + if (!for_cleanup) > + { > + } > + else > + { > } > > I think code would be much redable if we have 2 functions one for > vaccum (lazy_parallel_vacuum_indexes) and another for > cleanup(lazy_parallel_cleanup_indexes). Seems good idea. Fixed. > > 4. > * of index scans performed. So we don't use maintenance_work_mem memory for > * the TID array, just enough to hold as many heap tuples as fit on one page. > * > + * Lazy vacuum supports parallel execution with parallel worker processes. In > + * parallel lazy vacuum, we perform both index vacuuming and index cleanup with > + * parallel worker processes. Individual indexes are processed by one vacuum > > Spacing after the "." is not uniform, previous comment is using 2 > space and newly > added is using 1 space. > FIxed. The code has been fixed in my local repository. After incorporated the all comments I got so far I'll submit the updated version patch. Regards, -- Masahiko Sawada
On Sat, Oct 5, 2019 at 4:36 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > Few more comments > ---------------------------- > > 1. > +static int > +compute_parallel_workers(Relation onerel, int nrequested, int nindexes) > +{ > + int parallel_workers; > + bool leaderparticipates = true; > > Seems like this function is not using onerel parameter so we can remove this. > Fixed. > > 2. > + > + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */ > + maxtuples = compute_max_dead_tuples(nblocks, true); > + est_deadtuples = MAXALIGN(add_size(SizeOfLVDeadTuples, > + mul_size(sizeof(ItemPointerData), maxtuples))); > + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples); > + shm_toc_estimate_keys(&pcxt->estimator, 1); > + > + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */ > + querylen = strlen(debug_query_string); > > for consistency with other comments change > VACUUM_KEY_QUERY_TEXT to PARALLEL_VACUUM_KEY_QUERY_TEXT > Fixed. > > 3. > @@ -2888,6 +2888,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map, > (!wraparound ? VACOPT_SKIP_LOCKED : 0); > tab->at_params.index_cleanup = VACOPT_TERNARY_DEFAULT; > tab->at_params.truncate = VACOPT_TERNARY_DEFAULT; > + /* parallel lazy vacuum is not supported for autovacuum */ > + tab->at_params.nworkers = -1; > > What is the reason for the same? Can we explain in the comments? I think it's just that we don't want to support parallel auto vacuum because it can consume more CPU resources in spite of background job, which might be an unexpected behavior of autovacuum. I've changed the comment. Regards, -- Masahiko Sawada
On Fri, Oct 4, 2019 at 8:55 PM vignesh C <vignesh21@gmail.com> wrote: > > On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >> > >> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > > One comment: Thank you for reviewing this patch. > We can check if parallel_workers is within range something within > MAX_PARALLEL_WORKER_LIMIT. > + int parallel_workers = 0; > + > + if (optarg != NULL) > + { > + parallel_workers = atoi(optarg); > + if (parallel_workers <= 0) > + { > + pg_log_error("number of parallel workers must be at least 1"); > + exit(1); > + } > + } > Fixed. Regards, -- Masahiko Sawada
On Wed, Oct 9, 2019 at 6:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Sat, Oct 5, 2019 at 4:36 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > 3. > > @@ -2888,6 +2888,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map, > > (!wraparound ? VACOPT_SKIP_LOCKED : 0); > > tab->at_params.index_cleanup = VACOPT_TERNARY_DEFAULT; > > tab->at_params.truncate = VACOPT_TERNARY_DEFAULT; > > + /* parallel lazy vacuum is not supported for autovacuum */ > > + tab->at_params.nworkers = -1; > > > > What is the reason for the same? Can we explain in the comments? > > I think it's just that we don't want to support parallel auto vacuum > because it can consume more CPU resources in spite of background job, > which might be an unexpected behavior of autovacuum. > I think the other reason is it can generate a lot of I/O which might choke other operations. I think if we want we can provide Guc(s) to control such behavior, but initially providing it via command should be a good start so that users can knowingly use it in appropriate cases. We can later extend it for autovacuum if required. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Few more comments: --------------------------------- 1. Caurrently parallel vacuum is allowed for temporary relations which is wrong. It leads to below error: postgres=# create temporary table tmp_t1(c1 int, c2 char(10)); CREATE TABLE postgres=# create index idx_tmp_t1 on tmp_t1(c1); CREATE INDEX postgres=# create index idx1_tmp_t1 on tmp_t1(c2); CREATE INDEX postgres=# insert into tmp_t1 values(generate_series(1,10000),'aaaa'); INSERT 0 10000 postgres=# delete from tmp_t1 where c1 > 5000; DELETE 5000 postgres=# vacuum (parallel 2) tmp_t1; ERROR: cannot access temporary tables during a parallel operation CONTEXT: parallel worker The parallel vacuum shouldn't be allowed for temporary relations. 2. --- a/doc/src/sgml/ref/vacuum.sgml +++ b/doc/src/sgml/ref/vacuum.sgml @@ -34,6 +34,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ] INDEX_CLEANUP [ <replaceable class="parameter">boolean</replaceable> ] TRUNCATE [ <replaceable class="parameter">boolean</replaceable> ] + PARALLEL [ <replaceable class="parameter">integer</replaceable> ] Now, if the user gives a command like Vacuum (analyze, parallel) <table_name>; it is not very obvious that a parallel option will be only used for vacuum purposes but not for analyze. I think we can add a note in the docs to mention this explicitly. This can avoid any confusion. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 4, 2019 at 7:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> > >> > * >> > +end_parallel_vacuum(LVParallelState *lps, Relation *Irel, int nindexes) >> > { >> > .. >> > + /* Shutdown worker processes and destroy the parallel context */ >> > + WaitForParallelWorkersToFinish(lps->pcxt); >> > .. >> > } >> > >> > Do we really need to call WaitForParallelWorkersToFinish here as it >> > must have been called in lazy_parallel_vacuum_or_cleanup_indexes >> > before this time? >> >> No, removed. > > > + /* Shutdown worker processes and destroy the parallel context */ > + DestroyParallelContext(lps->pcxt); > > But you forget to update the comment. Fixed. > > Few more comments: > -------------------------------- > 1. > +/* > + * Parallel Index vacuuming and index cleanup routine used by both the leader > + * process and worker processes. Unlike single process vacuum, we don't update > + * index statistics after cleanup index since it is not allowed during > + * parallel mode, instead copy index bulk-deletion results from the local > + * memory to the DSM segment and update them at the end of parallel lazy > + * vacuum. > + */ > +static void > +do_parallel_vacuum_or_cleanup_indexes(Relation *Irel, int nindexes, > + IndexBulkDeleteResult **stats, > + LVShared *lvshared, > + LVDeadTuples *dead_tuples) > +{ > + /* Loop until all indexes are vacuumed */ > + for (;;) > + { > + int idx; > + > + /* Get an index number to process */ > + idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1); > + > + /* Done for all indexes? */ > + if (idx >= nindexes) > + break; > + > + /* > + * Update the pointer to the corresponding bulk-deletion result > + * if someone has already updated it. > + */ > + if (lvshared->indstats[idx].updated && > + stats[idx] == NULL) > + stats[idx] = &(lvshared->indstats[idx].stats); > + > + /* Do vacuum or cleanup one index */ > + if (!lvshared->for_cleanup) > + lazy_vacuum_index(Irel[idx], &stats[idx], dead_tuples, > + lvshared->reltuples); > + else > + lazy_cleanup_index(Irel[idx], &stats[idx], lvshared->reltuples, > + lvshared->estimated_count); > > It seems we always run index cleanup via parallel worker which seems overkill because the cleanup index generally scansthe index only when bulkdelete was not performed. In some cases like for hash index, it doesn't do anything even bulkdelete is not called. OTOH, for brin index, it does the main job during cleanup but we might be able to always allowindex cleanup by parallel worker for brin indexes if we remove the allocation in brinbulkdelete which I am not sureis of any use. > > I think we shouldn't call cleanup via parallel worker unless bulkdelete hasn't been performed on the index. > Agreed. Fixed. > 2. > - for (i = 0; i < nindexes; i++) > - lazy_vacuum_index(Irel[i], > - &indstats[i], > - vacrelstats); > + lazy_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes, > + indstats, lps, false); > > Indentation is not proper. You might want to run pgindent. Fixed. Regards, -- Masahiko Sawada
On Thu, Oct 10, 2019 at 2:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >> > > Few more comments: Thank you for reviewing the patch! > --------------------------------- > 1. Caurrently parallel vacuum is allowed for temporary relations > which is wrong. It leads to below error: > > postgres=# create temporary table tmp_t1(c1 int, c2 char(10)); > CREATE TABLE > postgres=# create index idx_tmp_t1 on tmp_t1(c1); > CREATE INDEX > postgres=# create index idx1_tmp_t1 on tmp_t1(c2); > CREATE INDEX > postgres=# insert into tmp_t1 values(generate_series(1,10000),'aaaa'); > INSERT 0 10000 > postgres=# delete from tmp_t1 where c1 > 5000; > DELETE 5000 > postgres=# vacuum (parallel 2) tmp_t1; > ERROR: cannot access temporary tables during a parallel operation > CONTEXT: parallel worker > > The parallel vacuum shouldn't be allowed for temporary relations. Fixed. > > 2. > --- a/doc/src/sgml/ref/vacuum.sgml > +++ b/doc/src/sgml/ref/vacuum.sgml > @@ -34,6 +34,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ > <replaceable class="paramet > SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ] > INDEX_CLEANUP [ <replaceable > class="parameter">boolean</replaceable> ] > TRUNCATE [ <replaceable class="parameter">boolean</replaceable> ] > + PARALLEL [ <replaceable > class="parameter">integer</replaceable> ] > > Now, if the user gives a command like Vacuum (analyze, parallel) > <table_name>; it is not very obvious that a parallel option will be > only used for vacuum purposes but not for analyze. I think we can add > a note in the docs to mention this explicitly. This can avoid any > confusion. Agreed. Attached the latest version patch although the memory usage problem is under discussion. I'll update the patches according to the result of that discussion. Regards, -- Masahiko Sawada
Attachment
Hi
On Thu, 10 Oct 2019 at 13:18, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Oct 10, 2019 at 2:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >>
>
> Few more comments:
Thank you for reviewing the patch!
> ---------------------------------
> 1. Caurrently parallel vacuum is allowed for temporary relations
> which is wrong. It leads to below error:
>
> postgres=# create temporary table tmp_t1(c1 int, c2 char(10));
> CREATE TABLE
> postgres=# create index idx_tmp_t1 on tmp_t1(c1);
> CREATE INDEX
> postgres=# create index idx1_tmp_t1 on tmp_t1(c2);
> CREATE INDEX
> postgres=# insert into tmp_t1 values(generate_series(1,10000),'aaaa');
> INSERT 0 10000
> postgres=# delete from tmp_t1 where c1 > 5000;
> DELETE 5000
> postgres=# vacuum (parallel 2) tmp_t1;
> ERROR: cannot access temporary tables during a parallel operation
> CONTEXT: parallel worker
>
> The parallel vacuum shouldn't be allowed for temporary relations.
Fixed.
>
> 2.
> --- a/doc/src/sgml/ref/vacuum.sgml
> +++ b/doc/src/sgml/ref/vacuum.sgml
> @@ -34,6 +34,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [
> <replaceable class="paramet
> SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ]
> INDEX_CLEANUP [ <replaceable
> class="parameter">boolean</replaceable> ]
> TRUNCATE [ <replaceable class="parameter">boolean</replaceable> ]
> + PARALLEL [ <replaceable
> class="parameter">integer</replaceable> ]
>
> Now, if the user gives a command like Vacuum (analyze, parallel)
> <table_name>; it is not very obvious that a parallel option will be
> only used for vacuum purposes but not for analyze. I think we can add
> a note in the docs to mention this explicitly. This can avoid any
> confusion.
Agreed.
Attached the latest version patch although the memory usage problem is
under discussion. I'll update the patches according to the result of
that discussion.
Steps to reproduce:
Step 1) Apply both the patches and configure with below command.
./configure --with-zlib --enable-debug --prefix=$PWD/inst/ --with-openssl CFLAGS="-ggdb3" > war && make -j 8 install > war
Step 2) Now start the server.
Step 3) Fire below commands:
create table tmp_t1(c1 int, c2 char(10));
create index idx_tmp_t1 on tmp_t1(c1);
create index idx1_tmp_t1 on tmp_t1(c2);
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
delete from tmp_t1 where c1 > 5000;
vacuum (parallel 2) tmp_t1;
Call stack:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: mahendra postgres [local] VACUUM '.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000a4f97a in pfree (pointer=0x10baa68) at mcxt.c:1060
1060 context->methods->free_p(context, pointer);
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libselinux-2.5-12.el7.x86_64 openssl-libs-1.0.2k-12.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0 0x0000000000a4f97a in pfree (pointer=0x10baa68) at mcxt.c:1060
#1 0x00000000004e7d13 in update_index_statistics (Irel=0x10b9808, stats=0x10b9828, nindexes=2) at vacuumlazy.c:2277
#2 0x00000000004e693f in lazy_scan_heap (onerel=0x7f8d99610d08, params=0x7ffeeaddb7f0, vacrelstats=0x10b9728, Irel=0x10b9808, nindexes=2, aggressive=false) at vacuumlazy.c:1659
'#3 0x00000000004e4d25 in heap_vacuum_rel (onerel=0x7f8d99610d08, params=0x7ffeeaddb7f0, bstrategy=0x1117528) at vacuumlazy.c:431
#4 0x00000000006a71a7 in table_relation_vacuum (rel=0x7f8d99610d08, params=0x7ffeeaddb7f0, bstrategy=0x1117528) at ../../../src/include/access/tableam.h:1432
#5 0x00000000006a9899 in vacuum_rel (relid=16384, relation=0x103b308, params=0x7ffeeaddb7f0) at vacuum.c:1870
#6 0x00000000006a7c22 in vacuum (relations=0x11176b8, params=0x7ffeeaddb7f0, bstrategy=0x1117528, isTopLevel=true) at vacuum.c:425
#7 0x00000000006a77e6 in ExecVacuum (pstate=0x105f578, vacstmt=0x103b3d8, isTopLevel=true) at vacuum.c:228
#8 0x00000000008af401 in standard_ProcessUtility (pstmt=0x103b6f8, queryString=0x103a808 "vacuum (parallel 2) tmp_t1;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x103b7d8, completionTag=0x7ffeeaddbc50 "") at utility.c:670
#9 0x00000000008aec40 in ProcessUtility (pstmt=0x103b6f8, queryString=0x103a808 "vacuum (parallel 2) tmp_t1;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x103b7d8, completionTag=0x7ffeeaddbc50 "") at utility.c:360
#10 0x00000000008addbb in PortalRunUtility (portal=0x10a1a28, pstmt=0x103b6f8, isTopLevel=true, setHoldSnapshot=false, dest=0x103b7d8, completionTag=0x7ffeeaddbc50 "") at pquery.c:1175
#11 0x00000000008adf9f in PortalRunMulti (portal=0x10a1a28, isTopLevel=true, setHoldSnapshot=false, dest=0x103b7d8, altdest=0x103b7d8, completionTag=0x7ffeeaddbc50 "") at pquery.c:1321
#12 0x00000000008ad55d in PortalRun (portal=0x10a1a28, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x103b7d8, altdest=0x103b7d8, completionTag=0x7ffeeaddbc50 "")
at pquery.c:796
#13 0x00000000008a7789 in exec_simple_query (query_string=0x103a808 "vacuum (parallel 2) tmp_t1;") at postgres.c:1231
#14 0x00000000008ab8f2 in PostgresMain (argc=1, argv=0x1065b00, dbname=0x1065a28 "postgres", username=0x1065a08 "mahendra") at postgres.c:4256
#15 0x0000000000811a42 in BackendRun (port=0x105d9c0) at postmaster.c:4465
#16 0x0000000000811241 in BackendStartup (port=0x105d9c0) at postmaster.c:4156
#17 0x000000000080d7d6 in ServerLoop () at postmaster.c:1718
#18 0x000000000080d096 in PostmasterMain (argc=3, argv=0x1035270) at postmaster.c:1391
#19 0x000000000072accb in main (argc=3, argv=0x1035270) at main.c:210
I did some analysis and found that we are trying to free some already freed memory. Or we are freeing palloced memory in vac_update_relstats.
for (i = 0; i < nindexes; i++)
{
if (stats[i] == NULL || stats[i]->estimated_count)
continue;
/* Update index statistics */
vac_update_relstats(Irel[i],
stats[i]->num_pages,
stats[i]->num_index_tuples,
0,
false,
InvalidTransactionId,
InvalidMultiXactId,
false);
pfree(stats[i]);
}
As my table have 2 indexes, so we have to free both stats. When i = 0, it is freeing propery but when i = 1, then vac_update_relstats is freeing memory.
(gdb) p *stats[i]
$1 = {num_pages = 218, pages_removed = 0, estimated_count = false, num_index_tuples = 30000, tuples_removed = 30000, pages_deleted = 102, pages_free = 0}
(gdb) p *stats[i]
$2 = {num_pages = 0, pages_removed = 65536, estimated_count = false, num_index_tuples = 0, tuples_removed = 0, pages_deleted = 0, pages_free = 0}
(gdb)
From above data, it looks like, somewhere inside vac_update_relstats, we are freeing all palloced memory. I don't know, why is it.
Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 11, 2019 at 4:47 PM Mahendra Singh <mahi6run@gmail.com> wrote: > > > I did some analysis and found that we are trying to free some already freed memory. Or we are freeing palloced memory invac_update_relstats. > for (i = 0; i < nindexes; i++) > { > if (stats[i] == NULL || stats[i]->estimated_count) > continue; > > /* Update index statistics */ > vac_update_relstats(Irel[i], > stats[i]->num_pages, > stats[i]->num_index_tuples, > 0, > false, > InvalidTransactionId, > InvalidMultiXactId, > false); > pfree(stats[i]); > } > > As my table have 2 indexes, so we have to free both stats. When i = 0, it is freeing propery but when i = 1, then vac_update_relstats is freeing memory. >> >> (gdb) p *stats[i] >> $1 = {num_pages = 218, pages_removed = 0, estimated_count = false, num_index_tuples = 30000, tuples_removed = 30000, pages_deleted= 102, pages_free = 0} >> (gdb) p *stats[i] >> $2 = {num_pages = 0, pages_removed = 65536, estimated_count = false, num_index_tuples = 0, tuples_removed = 0, pages_deleted= 0, pages_free = 0} >> (gdb) > > > From above data, it looks like, somewhere inside vac_update_relstats, we are freeing all palloced memory. I don't know,why is it. > I don't think the problem is in vac_update_relstats as we are not even passing stats to it, so it won't be able to free it. I think the real problem is in the way we copy the stats from shared memory to local memory in the function end_parallel_vacuum(). Basically, it allocates the memory for all the index stats together and then in function update_index_statistics, it is trying to free memory of individual array elements, that won't work. I have tried to fix the allocation in end_parallel_vacuum, see if this fixes the problem for you. You need to apply the attached patch atop v28-0001-Add-parallel-option-to-VACUUM-command posted above by Sawada-San. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Sat, Oct 12, 2019 at 12:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Oct 11, 2019 at 4:47 PM Mahendra Singh <mahi6run@gmail.com> wrote: > > > > > > I did some analysis and found that we are trying to free some already freed memory. Or we are freeing palloced memoryin vac_update_relstats. > > for (i = 0; i < nindexes; i++) > > { > > if (stats[i] == NULL || stats[i]->estimated_count) > > continue; > > > > /* Update index statistics */ > > vac_update_relstats(Irel[i], > > stats[i]->num_pages, > > stats[i]->num_index_tuples, > > 0, > > false, > > InvalidTransactionId, > > InvalidMultiXactId, > > false); > > pfree(stats[i]); > > } > > > > As my table have 2 indexes, so we have to free both stats. When i = 0, it is freeing propery but when i = 1, then vac_update_relstats is freeing memory. > >> > >> (gdb) p *stats[i] > >> $1 = {num_pages = 218, pages_removed = 0, estimated_count = false, num_index_tuples = 30000, tuples_removed = 30000,pages_deleted = 102, pages_free = 0} > >> (gdb) p *stats[i] > >> $2 = {num_pages = 0, pages_removed = 65536, estimated_count = false, num_index_tuples = 0, tuples_removed = 0, pages_deleted= 0, pages_free = 0} > >> (gdb) > > > > > > From above data, it looks like, somewhere inside vac_update_relstats, we are freeing all palloced memory. I don't know,why is it. > > > > I don't think the problem is in vac_update_relstats as we are not even > passing stats to it, so it won't be able to free it. I think the real > problem is in the way we copy the stats from shared memory to local > memory in the function end_parallel_vacuum(). Basically, it allocates > the memory for all the index stats together and then in function > update_index_statistics, it is trying to free memory of individual > array elements, that won't work. I have tried to fix the allocation > in end_parallel_vacuum, see if this fixes the problem for you. You > need to apply the attached patch atop > v28-0001-Add-parallel-option-to-VACUUM-command posted above by > Sawada-San. Thank you for reviewing and creating the patch! I think the patch fixes this issue correctly. Attached the updated version patch. Regards, -- Masahiko Sawada
Attachment
Thanks Amit for patch.
Crash is fixed by this patch.
Thanks and Regards
Mahendra Thalor
On Sat, Oct 12, 2019, 09:03 Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Oct 11, 2019 at 4:47 PM Mahendra Singh <mahi6run@gmail.com> wrote:
>
>
> I did some analysis and found that we are trying to free some already freed memory. Or we are freeing palloced memory in vac_update_relstats.
> for (i = 0; i < nindexes; i++)
> {
> if (stats[i] == NULL || stats[i]->estimated_count)
> continue;
>
> /* Update index statistics */
> vac_update_relstats(Irel[i],
> stats[i]->num_pages,
> stats[i]->num_index_tuples,
> 0,
> false,
> InvalidTransactionId,
> InvalidMultiXactId,
> false);
> pfree(stats[i]);
> }
>
> As my table have 2 indexes, so we have to free both stats. When i = 0, it is freeing propery but when i = 1, then vac_update_relstats is freeing memory.
>>
>> (gdb) p *stats[i]
>> $1 = {num_pages = 218, pages_removed = 0, estimated_count = false, num_index_tuples = 30000, tuples_removed = 30000, pages_deleted = 102, pages_free = 0}
>> (gdb) p *stats[i]
>> $2 = {num_pages = 0, pages_removed = 65536, estimated_count = false, num_index_tuples = 0, tuples_removed = 0, pages_deleted = 0, pages_free = 0}
>> (gdb)
>
>
> From above data, it looks like, somewhere inside vac_update_relstats, we are freeing all palloced memory. I don't know, why is it.
>
I don't think the problem is in vac_update_relstats as we are not even
passing stats to it, so it won't be able to free it. I think the real
problem is in the way we copy the stats from shared memory to local
memory in the function end_parallel_vacuum(). Basically, it allocates
the memory for all the index stats together and then in function
update_index_statistics, it is trying to free memory of individual
array elements, that won't work. I have tried to fix the allocation
in end_parallel_vacuum, see if this fixes the problem for you. You
need to apply the attached patch atop
v28-0001-Add-parallel-option-to-VACUUM-command posted above by
Sawada-San.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Sat, Oct 12, 2019 at 12:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Oct 11, 2019 at 4:47 PM Mahendra Singh <mahi6run@gmail.com> wrote: > > > > > Thank you for reviewing and creating the patch! > > I think the patch fixes this issue correctly. Attached the updated > version patch. > I see a much bigger problem with the way this patch collects the index stats in shared memory. IIUC, it allocates the shared memory (DSM) for all the index stats, in the same way, considering its size as IndexBulkDeleteResult. For the first time, it gets the stats from local memory as returned by ambulkdelete/amvacuumcleanup call and then copies it in shared memory space. There onwards, it always updates the stats in shared memory by pointing each index stats to that memory. In this scheme, you overlooked the point that an index AM could choose to return a larger structure of which IndexBulkDeleteResult is just the first field. This generally provides a way for ambulkdelete to communicate additional private data to amvacuumcleanup. We use this idea in the gist index, see how gistbulkdelete and gistvacuumcleanup works. The current design won't work for such cases. One idea is to change the design such that each index method provides a method to estimate/allocate the shared memory required for stats of ambulkdelete/amvacuumscan and then later we also need to use index method-specific function which copies the stats from local memory to shared memory. I think this needs further investigation. I have also made a few other changes in the attached delta patch. The main point that fixed by attached patch is that even if we don't allow a parallel vacuum on temporary tables, the analyze should be able to work if the user has asked for it. I have changed an error message and few other cosmetic changes related to comments. Kindly include this in the next version if you don't find any problem with the changes. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Sat, Oct 12, 2019 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > I see a much bigger problem with the way this patch collects the index > stats in shared memory. IIUC, it allocates the shared memory (DSM) > for all the index stats, in the same way, considering its size as > IndexBulkDeleteResult. For the first time, it gets the stats from > local memory as returned by ambulkdelete/amvacuumcleanup call and then > copies it in shared memory space. There onwards, it always updates > the stats in shared memory by pointing each index stats to that > memory. In this scheme, you overlooked the point that an index AM > could choose to return a larger structure of which > IndexBulkDeleteResult is just the first field. This generally > provides a way for ambulkdelete to communicate additional private data > to amvacuumcleanup. We use this idea in the gist index, see how > gistbulkdelete and gistvacuumcleanup works. The current design won't > work for such cases. > Today, I looked at gistbulkdelete and gistvacuumcleanup closely and I have a few observations about those which might help us to solve this problem for gist indexes: 1. Are we using memory context GistBulkDeleteResult->page_set_context? It seems to me it is not being used. 2. Each time we perform gistbulkdelete, we always seem to reset the GistBulkDeleteResult stats, see gistvacuumscan. So, how will it accumulate it for the cleanup phase when the vacuum needs to call gistbulkdelete multiple times because the available space for dead-tuple is filled. It seems to me like we only use the stats from the very last call to gistbulkdelete. 3. Do we really need to give the responsibility of deleting empty pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup. Can't we do it in gistbulkdelte? I see one advantage of postponing it till the cleanup phase which is if somehow we can accumulate stats over multiple calls of gistbulkdelete, but I am not sure if it is feasible. At least, the way current code works, it seems that there is no advantage to postpone deleting empty pages till the cleanup phase. If we avoid postponing deleting empty pages till the cleanup phase, then we don't have the problem for gist indexes. This is not directly related to this patch, so we can discuss these observations in a separate thread as well, but before that, I wanted to check your opinion to see if this makes sense to you as this will help us in moving this patch forward. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Oct 14, 2019 at 3:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Oct 12, 2019 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > I see a much bigger problem with the way this patch collects the index > > stats in shared memory. IIUC, it allocates the shared memory (DSM) > > for all the index stats, in the same way, considering its size as > > IndexBulkDeleteResult. For the first time, it gets the stats from > > local memory as returned by ambulkdelete/amvacuumcleanup call and then > > copies it in shared memory space. There onwards, it always updates > > the stats in shared memory by pointing each index stats to that > > memory. In this scheme, you overlooked the point that an index AM > > could choose to return a larger structure of which > > IndexBulkDeleteResult is just the first field. This generally > > provides a way for ambulkdelete to communicate additional private data > > to amvacuumcleanup. We use this idea in the gist index, see how > > gistbulkdelete and gistvacuumcleanup works. The current design won't > > work for such cases. > > > > Today, I looked at gistbulkdelete and gistvacuumcleanup closely and I > have a few observations about those which might help us to solve this > problem for gist indexes: > 1. Are we using memory context GistBulkDeleteResult->page_set_context? > It seems to me it is not being used. To me also it appears that it's not being used. > 2. Each time we perform gistbulkdelete, we always seem to reset the > GistBulkDeleteResult stats, see gistvacuumscan. So, how will it > accumulate it for the cleanup phase when the vacuum needs to call > gistbulkdelete multiple times because the available space for > dead-tuple is filled. It seems to me like we only use the stats from > the very last call to gistbulkdelete. IIUC, it is fine to use the stats from the latest gistbulkdelete call because we are trying to collect the information of the empty pages while scanning the tree. So I think it would be fine to just use the information collected from the latest scan otherwise we will get duplicate information. > 3. Do we really need to give the responsibility of deleting empty > pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup. Can't we > do it in gistbulkdelte? I see one advantage of postponing it till the > cleanup phase which is if somehow we can accumulate stats over > multiple calls of gistbulkdelete, but I am not sure if it is feasible. It seems that we want to use the latest result. That might be the reason for postponing to the cleanup phase. > At least, the way current code works, it seems that there is no > advantage to postpone deleting empty pages till the cleanup phase. > > If we avoid postponing deleting empty pages till the cleanup phase, > then we don't have the problem for gist indexes. > > This is not directly related to this patch, so we can discuss these > observations in a separate thread as well, but before that, I wanted > to check your opinion to see if this makes sense to you as this will > help us in moving this patch forward. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Oct 14, 2019 at 6:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Oct 12, 2019 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > I see a much bigger problem with the way this patch collects the index > > stats in shared memory. IIUC, it allocates the shared memory (DSM) > > for all the index stats, in the same way, considering its size as > > IndexBulkDeleteResult. For the first time, it gets the stats from > > local memory as returned by ambulkdelete/amvacuumcleanup call and then > > copies it in shared memory space. There onwards, it always updates > > the stats in shared memory by pointing each index stats to that > > memory. In this scheme, you overlooked the point that an index AM > > could choose to return a larger structure of which > > IndexBulkDeleteResult is just the first field. This generally > > provides a way for ambulkdelete to communicate additional private data > > to amvacuumcleanup. We use this idea in the gist index, see how > > gistbulkdelete and gistvacuumcleanup works. The current design won't > > work for such cases. Indeed. That's a very good point. Thank you for pointing out. > > > > Today, I looked at gistbulkdelete and gistvacuumcleanup closely and I > have a few observations about those which might help us to solve this > problem for gist indexes: > 1. Are we using memory context GistBulkDeleteResult->page_set_context? > It seems to me it is not being used. Yes I also think this memory context is not being used. > 2. Each time we perform gistbulkdelete, we always seem to reset the > GistBulkDeleteResult stats, see gistvacuumscan. So, how will it > accumulate it for the cleanup phase when the vacuum needs to call > gistbulkdelete multiple times because the available space for > dead-tuple is filled. It seems to me like we only use the stats from > the very last call to gistbulkdelete. I think you're right. gistbulkdelete scans all pages and collects all internal pages and all empty pages. And then in gistvacuumcleanup it uses them to unlink all empty pages. Currently it accumulates such information over multiple gistbulkdelete calls due to missing switching the memory context but I guess this code intends to use them only from the very last call to gistbulkdelete. > 3. Do we really need to give the responsibility of deleting empty > pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup. Can't we > do it in gistbulkdelte? I see one advantage of postponing it till the > cleanup phase which is if somehow we can accumulate stats over > multiple calls of gistbulkdelete, but I am not sure if it is feasible. > At least, the way current code works, it seems that there is no > advantage to postpone deleting empty pages till the cleanup phase. > Considering the current strategy of page deletion of gist index the advantage of postponing the page deletion till the cleanup phase is that we can do the bulk deletion in cleanup phase which is called at most once. But I wonder if we can do the page deletion in the similar way to btree index. Or even we use the current strategy I think we can do that while not passing the pages information from bulkdelete to vacuumcleanup using by GistBulkDeleteResult. > If we avoid postponing deleting empty pages till the cleanup phase, > then we don't have the problem for gist indexes. Yes. But considering your pointing out I guess that there might be other index AMs use the stats returned from bulkdelete in the similar way to gist index (i.e. using more larger structure of which IndexBulkDeleteResult is just the first field). If we have the same concern the parallel vacuum still needs to deal with that as you mentioned. Regards, -- Masahiko Sawada
On Tue, Oct 15, 2019 at 10:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Oct 14, 2019 at 6:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > 3. Do we really need to give the responsibility of deleting empty > > pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup. Can't we > > do it in gistbulkdelte? I see one advantage of postponing it till the > > cleanup phase which is if somehow we can accumulate stats over > > multiple calls of gistbulkdelete, but I am not sure if it is feasible. > > At least, the way current code works, it seems that there is no > > advantage to postpone deleting empty pages till the cleanup phase. > > > > Considering the current strategy of page deletion of gist index the > advantage of postponing the page deletion till the cleanup phase is > that we can do the bulk deletion in cleanup phase which is called at > most once. But I wonder if we can do the page deletion in the similar > way to btree index. > I think there might be some advantage of the current strategy due to which it has been chosen. I was going through the development thread and noticed some old email which points something related to this. See [1]. > Or even we use the current strategy I think we can > do that while not passing the pages information from bulkdelete to > vacuumcleanup using by GistBulkDeleteResult. > Yeah, I also think so. I have started a new thread [2] to know the opinion of others on this matter. > > If we avoid postponing deleting empty pages till the cleanup phase, > > then we don't have the problem for gist indexes. > > Yes. But considering your pointing out I guess that there might be > other index AMs use the stats returned from bulkdelete in the similar > way to gist index (i.e. using more larger structure of which > IndexBulkDeleteResult is just the first field). If we have the same > concern the parallel vacuum still needs to deal with that as you > mentioned. > Right, apart from some functions for memory allocation/estimation and stats copy, we might need something like amcanparallelvacuum, so that index methods can have the option to not participate in parallel vacuum due to reasons similar to gist or something else. I think we can work towards this direction as this anyway seems to be required and till we reach any conclusion for gist indexes, you can mark amcanparallelvacuum for gist indexes as false. [1] - https://www.postgresql.org/message-id/8548498B-6EC6-4C89-8313-107BEC437489%40yandex-team.ru [2] - https://www.postgresql.org/message-id/CAA4eK1LGr%2BMN0xHZpJ2dfS8QNQ1a_aROKowZB%2BMPNep8FVtwAA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Oct 15, 2019 at 3:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 15, 2019 at 10:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Mon, Oct 14, 2019 at 6:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > 3. Do we really need to give the responsibility of deleting empty > > > pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup. Can't we > > > do it in gistbulkdelte? I see one advantage of postponing it till the > > > cleanup phase which is if somehow we can accumulate stats over > > > multiple calls of gistbulkdelete, but I am not sure if it is feasible. > > > At least, the way current code works, it seems that there is no > > > advantage to postpone deleting empty pages till the cleanup phase. > > > > > > > Considering the current strategy of page deletion of gist index the > > advantage of postponing the page deletion till the cleanup phase is > > that we can do the bulk deletion in cleanup phase which is called at > > most once. But I wonder if we can do the page deletion in the similar > > way to btree index. > > > > I think there might be some advantage of the current strategy due to > which it has been chosen. I was going through the development thread > and noticed some old email which points something related to this. > See [1]. Thanks. > > > Or even we use the current strategy I think we can > > do that while not passing the pages information from bulkdelete to > > vacuumcleanup using by GistBulkDeleteResult. > > > > Yeah, I also think so. I have started a new thread [2] to know the > opinion of others on this matter. > Thank you. > > > If we avoid postponing deleting empty pages till the cleanup phase, > > > then we don't have the problem for gist indexes. > > > > Yes. But considering your pointing out I guess that there might be > > other index AMs use the stats returned from bulkdelete in the similar > > way to gist index (i.e. using more larger structure of which > > IndexBulkDeleteResult is just the first field). If we have the same > > concern the parallel vacuum still needs to deal with that as you > > mentioned. > > > > Right, apart from some functions for memory allocation/estimation and > stats copy, we might need something like amcanparallelvacuum, so that > index methods can have the option to not participate in parallel > vacuum due to reasons similar to gist or something else. I think we > can work towards this direction as this anyway seems to be required > and till we reach any conclusion for gist indexes, you can mark > amcanparallelvacuum for gist indexes as false. Agreed. I'll create a separate patch to add this callback and change parallel vacuum patch so that it checks the participation of indexes and then vacuums on un-participated indexes after parallel vacuum. Regards, -- Masahiko Sawada
On Tue, Oct 15, 2019 at 4:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Oct 15, 2019 at 3:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Oct 15, 2019 at 10:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Mon, Oct 14, 2019 at 6:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > 3. Do we really need to give the responsibility of deleting empty > > > > pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup. Can't we > > > > do it in gistbulkdelte? I see one advantage of postponing it till the > > > > cleanup phase which is if somehow we can accumulate stats over > > > > multiple calls of gistbulkdelete, but I am not sure if it is feasible. > > > > At least, the way current code works, it seems that there is no > > > > advantage to postpone deleting empty pages till the cleanup phase. > > > > > > > > > > Considering the current strategy of page deletion of gist index the > > > advantage of postponing the page deletion till the cleanup phase is > > > that we can do the bulk deletion in cleanup phase which is called at > > > most once. But I wonder if we can do the page deletion in the similar > > > way to btree index. > > > > > > > I think there might be some advantage of the current strategy due to > > which it has been chosen. I was going through the development thread > > and noticed some old email which points something related to this. > > See [1]. > > Thanks. > > > > > > Or even we use the current strategy I think we can > > > do that while not passing the pages information from bulkdelete to > > > vacuumcleanup using by GistBulkDeleteResult. > > > > > > > Yeah, I also think so. I have started a new thread [2] to know the > > opinion of others on this matter. > > > > Thank you. > > > > > If we avoid postponing deleting empty pages till the cleanup phase, > > > > then we don't have the problem for gist indexes. > > > > > > Yes. But considering your pointing out I guess that there might be > > > other index AMs use the stats returned from bulkdelete in the similar > > > way to gist index (i.e. using more larger structure of which > > > IndexBulkDeleteResult is just the first field). If we have the same > > > concern the parallel vacuum still needs to deal with that as you > > > mentioned. > > > > > > > Right, apart from some functions for memory allocation/estimation and > > stats copy, we might need something like amcanparallelvacuum, so that > > index methods can have the option to not participate in parallel > > vacuum due to reasons similar to gist or something else. I think we > > can work towards this direction as this anyway seems to be required > > and till we reach any conclusion for gist indexes, you can mark > > amcanparallelvacuum for gist indexes as false. > > Agreed. I'll create a separate patch to add this callback and change > parallel vacuum patch so that it checks the participation of indexes > and then vacuums on un-participated indexes after parallel vacuum. amcanparallelvacuum is not necessary to be a callback, it can be a boolean field of IndexAmRoutine. Regards, -- Masahiko Sawada
On Tue, Oct 15, 2019 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Right, apart from some functions for memory allocation/estimation and > stats copy, we might need something like amcanparallelvacuum, so that > index methods can have the option to not participate in parallel > vacuum due to reasons similar to gist or something else. I think we > can work towards this direction as this anyway seems to be required > and till we reach any conclusion for gist indexes, you can mark > amcanparallelvacuum for gist indexes as false. > I think for estimating the size of the stat I suggest "amestimatestat" or "amstatsize" and for copy stat data we can add "amcopystat"? It would be helpful to extend the parallel vacuum for the indexes which has extended stats. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Oct 15, 2019 at 1:26 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Oct 15, 2019 at 4:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > If we avoid postponing deleting empty pages till the cleanup phase, > > > > > then we don't have the problem for gist indexes. > > > > > > > > Yes. But considering your pointing out I guess that there might be > > > > other index AMs use the stats returned from bulkdelete in the similar > > > > way to gist index (i.e. using more larger structure of which > > > > IndexBulkDeleteResult is just the first field). If we have the same > > > > concern the parallel vacuum still needs to deal with that as you > > > > mentioned. > > > > > > > > > > Right, apart from some functions for memory allocation/estimation and > > > stats copy, we might need something like amcanparallelvacuum, so that > > > index methods can have the option to not participate in parallel > > > vacuum due to reasons similar to gist or something else. I think we > > > can work towards this direction as this anyway seems to be required > > > and till we reach any conclusion for gist indexes, you can mark > > > amcanparallelvacuum for gist indexes as false. > > > > Agreed. I'll create a separate patch to add this callback and change > > parallel vacuum patch so that it checks the participation of indexes > > and then vacuums on un-participated indexes after parallel vacuum. > > amcanparallelvacuum is not necessary to be a callback, it can be a > boolean field of IndexAmRoutine. > Yes, it will be a boolean. Note that for parallel-index scans, we already have amcanparallel. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Oct 15, 2019 at 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 15, 2019 at 1:26 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Tue, Oct 15, 2019 at 4:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > If we avoid postponing deleting empty pages till the cleanup phase, > > > > > > then we don't have the problem for gist indexes. > > > > > > > > > > Yes. But considering your pointing out I guess that there might be > > > > > other index AMs use the stats returned from bulkdelete in the similar > > > > > way to gist index (i.e. using more larger structure of which > > > > > IndexBulkDeleteResult is just the first field). If we have the same > > > > > concern the parallel vacuum still needs to deal with that as you > > > > > mentioned. > > > > > > > > > > > > > Right, apart from some functions for memory allocation/estimation and > > > > stats copy, we might need something like amcanparallelvacuum, so that > > > > index methods can have the option to not participate in parallel > > > > vacuum due to reasons similar to gist or something else. I think we > > > > can work towards this direction as this anyway seems to be required > > > > and till we reach any conclusion for gist indexes, you can mark > > > > amcanparallelvacuum for gist indexes as false. > > > > > > Agreed. I'll create a separate patch to add this callback and change > > > parallel vacuum patch so that it checks the participation of indexes > > > and then vacuums on un-participated indexes after parallel vacuum. > > > > amcanparallelvacuum is not necessary to be a callback, it can be a > > boolean field of IndexAmRoutine. > > > > Yes, it will be a boolean. Note that for parallel-index scans, we > already have amcanparallel. > Attached updated patch set. 0001 patch introduces new index AM field amcanparallelvacuum. All index AMs except for gist sets true for now. 0002 patch incorporated the all comments I got so far. Regards, -- Masahiko Sawada
Attachment
On Wed, Oct 16, 2019 at 6:50 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Oct 15, 2019 at 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Attached updated patch set. 0001 patch introduces new index AM field > amcanparallelvacuum. All index AMs except for gist sets true for now. > 0002 patch incorporated the all comments I got so far. > I haven't studied the latest patch in detail, but it seems you are still assuming that all indexes will have the same amount of shared memory for index stats and copying it in the same way. I thought we agreed that each index AM should do this on its own. The basic problem is as of now we see this problem only with the Gist index, but some other index AM's could also have a similar problem. Another major problem with previous and this patch version is that the cost-based vacuum concept seems to be entirely broken. Basically, each parallel vacuum worker operates independently w.r.t vacuum delay and cost. Assume that the overall I/O allowed for vacuum operation is X after which it will sleep for some time, reset the balance and continue. In the patch, each worker will be allowed to perform X before which it can sleep and also there is no coordination for the same with master backend. This is somewhat similar to memory usage problem, but a bit more tricky because here we can't easily split the I/O for each of the worker. One idea could be that we somehow map vacuum costing related parameters to the shared memory (dsm) which the vacuum operation is using and then allow workers to coordinate. This way master and worker processes will have the same view of balance cost and can act accordingly. The other idea could be that we come up with some smart way to split the I/O among workers. Initially, I thought we could try something as we do for autovacuum workers (see autovac_balance_cost), but I think that will require much more math. Before launching workers, we need to compute the remaining I/O (heap operation would have used something) after which we need to sleep and continue the operation and then somehow split it equally across workers. Once the workers are finished, then need to let master backend know how much I/O they have consumed and then master backend can add it to it's current I/O consumed. I think this problem matters because the vacuum delay is useful for large vacuums and this patch is trying to exactly solve that problem, so we can't ignore this problem. I am not yet sure what is the best solution to this problem, but I think we need to do something for it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Oct 16, 2019 at 3:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Oct 16, 2019 at 6:50 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Tue, Oct 15, 2019 at 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > Attached updated patch set. 0001 patch introduces new index AM field > > amcanparallelvacuum. All index AMs except for gist sets true for now. > > 0002 patch incorporated the all comments I got so far. > > > > I haven't studied the latest patch in detail, but it seems you are > still assuming that all indexes will have the same amount of shared > memory for index stats and copying it in the same way. Yeah I thought we agreed at least to have canparallelvacuum and if an index AM cannot support parallel index vacuuming like gist, it returns false. > I thought we > agreed that each index AM should do this on its own. The basic > problem is as of now we see this problem only with the Gist index, but > some other index AM's could also have a similar problem. Okay. I'm thinking we're going to have a new callback to ack index AMs the size of the structure using within both ambulkdelete and amvacuumcleanup. But copying it to DSM can be done by the core because it knows how many bytes need to be copied to DSM. Is that okay? > > Another major problem with previous and this patch version is that the > cost-based vacuum concept seems to be entirely broken. Basically, > each parallel vacuum worker operates independently w.r.t vacuum delay > and cost. Assume that the overall I/O allowed for vacuum operation is > X after which it will sleep for some time, reset the balance and > continue. In the patch, each worker will be allowed to perform X > before which it can sleep and also there is no coordination for the > same with master backend. This is somewhat similar to memory usage > problem, but a bit more tricky because here we can't easily split the > I/O for each of the worker. > > One idea could be that we somehow map vacuum costing related > parameters to the shared memory (dsm) which the vacuum operation is > using and then allow workers to coordinate. This way master and > worker processes will have the same view of balance cost and can act > accordingly. > > The other idea could be that we come up with some smart way to split > the I/O among workers. Initially, I thought we could try something as > we do for autovacuum workers (see autovac_balance_cost), but I think > that will require much more math. Before launching workers, we need > to compute the remaining I/O (heap operation would have used > something) after which we need to sleep and continue the operation and > then somehow split it equally across workers. Once the workers are > finished, then need to let master backend know how much I/O they have > consumed and then master backend can add it to it's current I/O > consumed. > > I think this problem matters because the vacuum delay is useful for > large vacuums and this patch is trying to exactly solve that problem, > so we can't ignore this problem. I am not yet sure what is the best > solution to this problem, but I think we need to do something for it. > I guess that the concepts of vacuum delay contradicts the concepts of parallel vacuum. The concepts of parallel vacuum would be to use more resource to make vacuum faster. Vacuum delays balances I/O during vacuum in order to avoid I/O spikes by vacuum but parallel vacuum rather concentrates I/O in shorter duration. Since we need to share the memory in entire system we need to deal with the memory issue but disks are different. If we need to deal with this problem how about just dividing vacuum_cost_limit by the parallel degree and setting it to worker's vacuum_cost_limit? Regards, -- Masahiko Sawada
Hi
I applied all 3 patches and ran regression test. I was getting one regression failure.
diff -U3 /home/mahendra/postgres_base_rp/postgres/src/test/regress/expected/vacuum.out /home/mahendra/postgres_base_rp/postgres/src/test/regress/results/vacuum.out
--- /home/mahendra/postgres_base_rp/postgres/src/test/regress/expected/vacuum.out 2019-10-17 10:01:58.138863802 +0530
+++ /home/mahendra/postgres_base_rp/postgres/src/test/regress/results/vacuum.out 2019-10-17 11:41:20.930699926 +0530
@@ -105,7 +105,7 @@
CREATE TEMPORARY TABLE tmp (a int PRIMARY KEY);
CREATE INDEX tmp_idx1 ON tmp (a);
VACUUM (PARALLEL 1) tmp; -- error, cannot parallel vacuum temporary tables
-WARNING: skipping "tmp" --- cannot parallel vacuum temporary tables
+WARNING: skipping vacuum on "tmp" --- cannot vacuum temporary tables in parallel
-- INDEX_CLEANUP option
CREATE TABLE no_index_cleanup (i INT PRIMARY KEY, t TEXT);
-- Use uncompressed data stored in toast.
It look likes that you changed warning message for temp table, but haven't updated expected out file.
Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com
On Wed, 16 Oct 2019 at 06:50, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Oct 15, 2019 at 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 15, 2019 at 1:26 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Oct 15, 2019 at 4:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > > > > If we avoid postponing deleting empty pages till the cleanup phase,
> > > > > > then we don't have the problem for gist indexes.
> > > > >
> > > > > Yes. But considering your pointing out I guess that there might be
> > > > > other index AMs use the stats returned from bulkdelete in the similar
> > > > > way to gist index (i.e. using more larger structure of which
> > > > > IndexBulkDeleteResult is just the first field). If we have the same
> > > > > concern the parallel vacuum still needs to deal with that as you
> > > > > mentioned.
> > > > >
> > > >
> > > > Right, apart from some functions for memory allocation/estimation and
> > > > stats copy, we might need something like amcanparallelvacuum, so that
> > > > index methods can have the option to not participate in parallel
> > > > vacuum due to reasons similar to gist or something else. I think we
> > > > can work towards this direction as this anyway seems to be required
> > > > and till we reach any conclusion for gist indexes, you can mark
> > > > amcanparallelvacuum for gist indexes as false.
> > >
> > > Agreed. I'll create a separate patch to add this callback and change
> > > parallel vacuum patch so that it checks the participation of indexes
> > > and then vacuums on un-participated indexes after parallel vacuum.
> >
> > amcanparallelvacuum is not necessary to be a callback, it can be a
> > boolean field of IndexAmRoutine.
> >
>
> Yes, it will be a boolean. Note that for parallel-index scans, we
> already have amcanparallel.
>
Attached updated patch set. 0001 patch introduces new index AM field
amcanparallelvacuum. All index AMs except for gist sets true for now.
0002 patch incorporated the all comments I got so far.
Regards,
--
Masahiko Sawada
On Thu, Oct 17, 2019 at 3:18 PM Mahendra Singh <mahi6run@gmail.com> wrote: > > Hi > I applied all 3 patches and ran regression test. I was getting one regression failure. > >> diff -U3 /home/mahendra/postgres_base_rp/postgres/src/test/regress/expected/vacuum.out /home/mahendra/postgres_base_rp/postgres/src/test/regress/results/vacuum.out >> --- /home/mahendra/postgres_base_rp/postgres/src/test/regress/expected/vacuum.out 2019-10-17 10:01:58.138863802 +0530 >> +++ /home/mahendra/postgres_base_rp/postgres/src/test/regress/results/vacuum.out 2019-10-17 11:41:20.930699926 +0530 >> @@ -105,7 +105,7 @@ >> CREATE TEMPORARY TABLE tmp (a int PRIMARY KEY); >> CREATE INDEX tmp_idx1 ON tmp (a); >> VACUUM (PARALLEL 1) tmp; -- error, cannot parallel vacuum temporary tables >> -WARNING: skipping "tmp" --- cannot parallel vacuum temporary tables >> +WARNING: skipping vacuum on "tmp" --- cannot vacuum temporary tables in parallel >> -- INDEX_CLEANUP option >> CREATE TABLE no_index_cleanup (i INT PRIMARY KEY, t TEXT); >> -- Use uncompressed data stored in toast. > > > It look likes that you changed warning message for temp table, but haven't updated expected out file. > Thank you! I forgot to change the expected file. I'll fix it in the next version patch. Regards, -- Masahiko Sawada
On Thu, Oct 17, 2019 at 10:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Oct 16, 2019 at 3:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Oct 16, 2019 at 6:50 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Tue, Oct 15, 2019 at 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > Attached updated patch set. 0001 patch introduces new index AM field > > > amcanparallelvacuum. All index AMs except for gist sets true for now. > > > 0002 patch incorporated the all comments I got so far. > > > > > > > I haven't studied the latest patch in detail, but it seems you are > > still assuming that all indexes will have the same amount of shared > > memory for index stats and copying it in the same way. > > Yeah I thought we agreed at least to have canparallelvacuum and if an > index AM cannot support parallel index vacuuming like gist, it returns > false. > > > I thought we > > agreed that each index AM should do this on its own. The basic > > problem is as of now we see this problem only with the Gist index, but > > some other index AM's could also have a similar problem. > > Okay. I'm thinking we're going to have a new callback to ack index AMs > the size of the structure using within both ambulkdelete and > amvacuumcleanup. But copying it to DSM can be done by the core because > it knows how many bytes need to be copied to DSM. Is that okay? > That sounds okay. > > > > Another major problem with previous and this patch version is that the > > cost-based vacuum concept seems to be entirely broken. Basically, > > each parallel vacuum worker operates independently w.r.t vacuum delay > > and cost. Assume that the overall I/O allowed for vacuum operation is > > X after which it will sleep for some time, reset the balance and > > continue. In the patch, each worker will be allowed to perform X > > before which it can sleep and also there is no coordination for the > > same with master backend. This is somewhat similar to memory usage > > problem, but a bit more tricky because here we can't easily split the > > I/O for each of the worker. > > > > One idea could be that we somehow map vacuum costing related > > parameters to the shared memory (dsm) which the vacuum operation is > > using and then allow workers to coordinate. This way master and > > worker processes will have the same view of balance cost and can act > > accordingly. > > > > The other idea could be that we come up with some smart way to split > > the I/O among workers. Initially, I thought we could try something as > > we do for autovacuum workers (see autovac_balance_cost), but I think > > that will require much more math. Before launching workers, we need > > to compute the remaining I/O (heap operation would have used > > something) after which we need to sleep and continue the operation and > > then somehow split it equally across workers. Once the workers are > > finished, then need to let master backend know how much I/O they have > > consumed and then master backend can add it to it's current I/O > > consumed. > > > > I think this problem matters because the vacuum delay is useful for > > large vacuums and this patch is trying to exactly solve that problem, > > so we can't ignore this problem. I am not yet sure what is the best > > solution to this problem, but I think we need to do something for it. > > > > I guess that the concepts of vacuum delay contradicts the concepts of > parallel vacuum. The concepts of parallel vacuum would be to use more > resource to make vacuum faster. Vacuum delays balances I/O during > vacuum in order to avoid I/O spikes by vacuum but parallel vacuum > rather concentrates I/O in shorter duration. > You have a point, but the way it is currently working in the patch doesn't make much sense. Basically, each of the parallel workers will be allowed to use a complete I/O limit which is actually a limit for the entire vacuum operation. It doesn't give any consideration to the work done for the heap. > Since we need to share > the memory in entire system we need to deal with the memory issue but > disks are different. > > If we need to deal with this problem how about just dividing > vacuum_cost_limit by the parallel degree and setting it to worker's > vacuum_cost_limit? > How will we take the I/O done by heap into consideration? The vacuum_cost_limit is the cost for the entire vacuum operation not separately for heap and indexes. What makes you think that considering the limit for heap and index separately is not problematic? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 17, 2019 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Oct 17, 2019 at 10:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > I guess that the concepts of vacuum delay contradicts the concepts of > > parallel vacuum. The concepts of parallel vacuum would be to use more > > resource to make vacuum faster. Vacuum delays balances I/O during > > vacuum in order to avoid I/O spikes by vacuum but parallel vacuum > > rather concentrates I/O in shorter duration. > > > > You have a point, but the way it is currently working in the patch > doesn't make much sense. > Another point in this regard is that the user anyway has an option to turn off the cost-based vacuum. By default, it is anyway disabled. So, if the user enables it we have to provide some sensible behavior. If we can't come up with anything, then, in the end, we might want to turn it off for a parallel vacuum and mention the same in docs, but I think we should try to come up with a solution for it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Oct 17, 2019 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Oct 17, 2019 at 10:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > I guess that the concepts of vacuum delay contradicts the concepts of > > > parallel vacuum. The concepts of parallel vacuum would be to use more > > > resource to make vacuum faster. Vacuum delays balances I/O during > > > vacuum in order to avoid I/O spikes by vacuum but parallel vacuum > > > rather concentrates I/O in shorter duration. > > > > > > > You have a point, but the way it is currently working in the patch > > doesn't make much sense. > > > > Another point in this regard is that the user anyway has an option to > turn off the cost-based vacuum. By default, it is anyway disabled. > So, if the user enables it we have to provide some sensible behavior. > If we can't come up with anything, then, in the end, we might want to > turn it off for a parallel vacuum and mention the same in docs, but I > think we should try to come up with a solution for it. I finally got your point and now understood the need. And the idea I proposed doesn't work fine. So you meant that all workers share the cost count and if a parallel vacuum worker increase the cost and it reaches the limit, does the only one worker sleep? Is that okay even though other parallel workers are still running and then the sleep might not help? Regards, -- Masahiko Sawada
On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Oct 17, 2019 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Oct 17, 2019 at 10:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > I guess that the concepts of vacuum delay contradicts the concepts of > > > > parallel vacuum. The concepts of parallel vacuum would be to use more > > > > resource to make vacuum faster. Vacuum delays balances I/O during > > > > vacuum in order to avoid I/O spikes by vacuum but parallel vacuum > > > > rather concentrates I/O in shorter duration. > > > > > > > > > > You have a point, but the way it is currently working in the patch > > > doesn't make much sense. > > > > > > > Another point in this regard is that the user anyway has an option to > > turn off the cost-based vacuum. By default, it is anyway disabled. > > So, if the user enables it we have to provide some sensible behavior. > > If we can't come up with anything, then, in the end, we might want to > > turn it off for a parallel vacuum and mention the same in docs, but I > > think we should try to come up with a solution for it. > > I finally got your point and now understood the need. And the idea I > proposed doesn't work fine. > > So you meant that all workers share the cost count and if a parallel > vacuum worker increase the cost and it reaches the limit, does the > only one worker sleep? Is that okay even though other parallel workers > are still running and then the sleep might not help? > I agree with this point. There is a possibility that some of the workers who are doing heavy I/O continue to work and OTOH other workers who are doing very less I/O might become the victim and unnecessarily delay its operation. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Another point in this regard is that the user anyway has an option to > > > turn off the cost-based vacuum. By default, it is anyway disabled. > > > So, if the user enables it we have to provide some sensible behavior. > > > If we can't come up with anything, then, in the end, we might want to > > > turn it off for a parallel vacuum and mention the same in docs, but I > > > think we should try to come up with a solution for it. > > > > I finally got your point and now understood the need. And the idea I > > proposed doesn't work fine. > > > > So you meant that all workers share the cost count and if a parallel > > vacuum worker increase the cost and it reaches the limit, does the > > only one worker sleep? Is that okay even though other parallel workers > > are still running and then the sleep might not help? > > Remember that the other running workers will also increase VacuumCostBalance and whichever worker finds that it becomes greater than VacuumCostLimit will reset its value and sleep. So, won't this make sure that overall throttling works the same? > I agree with this point. There is a possibility that some of the > workers who are doing heavy I/O continue to work and OTOH other > workers who are doing very less I/O might become the victim and > unnecessarily delay its operation. > Sure, but will it impact the overall I/O? I mean to say the rate limit we want to provide for overall vacuum operation will still be the same. Also, isn't a similar thing happens now also where heap might have done a major portion of I/O but soon after we start vacuuming the index, we will hit the limit and will sleep. I think this might not be the perfect solution and we should try to come up with something else if this doesn't seem to be working. Have you guys thought about the second solution I mentioned in email [1] (Before launching workers, we need to compute the remaining I/O ....)? Any other better ideas? [1] - https://www.postgresql.org/message-id/CAA4eK1%2BySETHCaCnAsEC-dC4GSXaE2sNGMOgD6J%3DX%2BN43bBqJQ%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > Another point in this regard is that the user anyway has an option to > > > > turn off the cost-based vacuum. By default, it is anyway disabled. > > > > So, if the user enables it we have to provide some sensible behavior. > > > > If we can't come up with anything, then, in the end, we might want to > > > > turn it off for a parallel vacuum and mention the same in docs, but I > > > > think we should try to come up with a solution for it. > > > > > > I finally got your point and now understood the need. And the idea I > > > proposed doesn't work fine. > > > > > > So you meant that all workers share the cost count and if a parallel > > > vacuum worker increase the cost and it reaches the limit, does the > > > only one worker sleep? Is that okay even though other parallel workers > > > are still running and then the sleep might not help? > > > > > Remember that the other running workers will also increase > VacuumCostBalance and whichever worker finds that it becomes greater > than VacuumCostLimit will reset its value and sleep. So, won't this > make sure that overall throttling works the same? > > > I agree with this point. There is a possibility that some of the > > workers who are doing heavy I/O continue to work and OTOH other > > workers who are doing very less I/O might become the victim and > > unnecessarily delay its operation. > > > > Sure, but will it impact the overall I/O? I mean to say the rate > limit we want to provide for overall vacuum operation will still be > the same. Also, isn't a similar thing happens now also where heap > might have done a major portion of I/O but soon after we start > vacuuming the index, we will hit the limit and will sleep. Actually, What I meant is that the worker who performing actual I/O might not go for the delay and another worker which has done only CPU operation might pay the penalty? So basically the worker who is doing CPU intensive operation might go for the delay and pay the penalty and the worker who is performing actual I/O continues to work and do further I/O. Do you think this is not a practical problem? Stepping back a bit, OTOH, I think that we can not guarantee that the one worker who has done more I/O will continue to do further I/O and the one which has not done much I/O will not perform more I/O in future. So it might not be too bad if we compute shared costs as you suggested above. > > I think this might not be the perfect solution and we should try to > come up with something else if this doesn't seem to be working. Have > you guys thought about the second solution I mentioned in email [1] > (Before launching workers, we need to compute the remaining I/O ....)? > Any other better ideas? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > Another point in this regard is that the user anyway has an option to > > > > > turn off the cost-based vacuum. By default, it is anyway disabled. > > > > > So, if the user enables it we have to provide some sensible behavior. > > > > > If we can't come up with anything, then, in the end, we might want to > > > > > turn it off for a parallel vacuum and mention the same in docs, but I > > > > > think we should try to come up with a solution for it. > > > > > > > > I finally got your point and now understood the need. And the idea I > > > > proposed doesn't work fine. > > > > > > > > So you meant that all workers share the cost count and if a parallel > > > > vacuum worker increase the cost and it reaches the limit, does the > > > > only one worker sleep? Is that okay even though other parallel workers > > > > are still running and then the sleep might not help? > > > > > > > > Remember that the other running workers will also increase > > VacuumCostBalance and whichever worker finds that it becomes greater > > than VacuumCostLimit will reset its value and sleep. So, won't this > > make sure that overall throttling works the same? > > > > > I agree with this point. There is a possibility that some of the > > > workers who are doing heavy I/O continue to work and OTOH other > > > workers who are doing very less I/O might become the victim and > > > unnecessarily delay its operation. > > > > > > > Sure, but will it impact the overall I/O? I mean to say the rate > > limit we want to provide for overall vacuum operation will still be > > the same. Also, isn't a similar thing happens now also where heap > > might have done a major portion of I/O but soon after we start > > vacuuming the index, we will hit the limit and will sleep. > > Actually, What I meant is that the worker who performing actual I/O > might not go for the delay and another worker which has done only CPU > operation might pay the penalty? So basically the worker who is doing > CPU intensive operation might go for the delay and pay the penalty and > the worker who is performing actual I/O continues to work and do > further I/O. Do you think this is not a practical problem? > I don't know. Generally, we try to delay (if required) before processing (read/write) one page which means it will happen for I/O intensive operations, so I am not sure if the point you are making is completely correct. > Stepping back a bit, OTOH, I think that we can not guarantee that the > one worker who has done more I/O will continue to do further I/O and > the one which has not done much I/O will not perform more I/O in > future. So it might not be too bad if we compute shared costs as you > suggested above. > I am thinking if we can write the patch for both the approaches (a. compute shared costs and try to delay based on that, b. try to divide the I/O cost among workers as described in the email above[1]) and do some tests to see the behavior of throttling, that might help us in deciding what is the best strategy to solve this problem, if any. What do you think? [1] - https://www.postgresql.org/message-id/CAA4eK1%2BySETHCaCnAsEC-dC4GSXaE2sNGMOgD6J%3DX%2BN43bBqJQ%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > Another point in this regard is that the user anyway has an option to > > > > > > turn off the cost-based vacuum. By default, it is anyway disabled. > > > > > > So, if the user enables it we have to provide some sensible behavior. > > > > > > If we can't come up with anything, then, in the end, we might want to > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I > > > > > > think we should try to come up with a solution for it. > > > > > > > > > > I finally got your point and now understood the need. And the idea I > > > > > proposed doesn't work fine. > > > > > > > > > > So you meant that all workers share the cost count and if a parallel > > > > > vacuum worker increase the cost and it reaches the limit, does the > > > > > only one worker sleep? Is that okay even though other parallel workers > > > > > are still running and then the sleep might not help? > > > > > > > > > > > Remember that the other running workers will also increase > > > VacuumCostBalance and whichever worker finds that it becomes greater > > > than VacuumCostLimit will reset its value and sleep. So, won't this > > > make sure that overall throttling works the same? > > > > > > > I agree with this point. There is a possibility that some of the > > > > workers who are doing heavy I/O continue to work and OTOH other > > > > workers who are doing very less I/O might become the victim and > > > > unnecessarily delay its operation. > > > > > > > > > > Sure, but will it impact the overall I/O? I mean to say the rate > > > limit we want to provide for overall vacuum operation will still be > > > the same. Also, isn't a similar thing happens now also where heap > > > might have done a major portion of I/O but soon after we start > > > vacuuming the index, we will hit the limit and will sleep. > > > > Actually, What I meant is that the worker who performing actual I/O > > might not go for the delay and another worker which has done only CPU > > operation might pay the penalty? So basically the worker who is doing > > CPU intensive operation might go for the delay and pay the penalty and > > the worker who is performing actual I/O continues to work and do > > further I/O. Do you think this is not a practical problem? > > > > I don't know. Generally, we try to delay (if required) before > processing (read/write) one page which means it will happen for I/O > intensive operations, so I am not sure if the point you are making is > completely correct. Ok, I agree with the point that we are checking it only when we are doing the I/O operation. But, we also need to consider that each I/O operations have a different weightage. So even if we have a delay point at I/O operation there is a possibility that we might delay the worker which is just performing read buffer with page hit(VacuumCostPageHit). But, the other worker who is actually dirtying the page(VacuumCostPageDirty = 20) continue the work and do more I/O. > > > Stepping back a bit, OTOH, I think that we can not guarantee that the > > one worker who has done more I/O will continue to do further I/O and > > the one which has not done much I/O will not perform more I/O in > > future. So it might not be too bad if we compute shared costs as you > > suggested above. > > > > I am thinking if we can write the patch for both the approaches (a. > compute shared costs and try to delay based on that, b. try to divide > the I/O cost among workers as described in the email above[1]) and do > some tests to see the behavior of throttling, that might help us in > deciding what is the best strategy to solve this problem, if any. > What do you think? I agree with this idea. I can come up with a POC patch for approach (b). Meanwhile, if someone is interested to quickly hack with the approach (a) then we can do some testing and compare. Sawada-san, by any chance will you be interested to write POC with approach (a)? Otherwise, I will try to write it after finishing the first one (approach b). -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 18, 2019 at 3:48 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > Another point in this regard is that the user anyway has an option to > > > > > > > turn off the cost-based vacuum. By default, it is anyway disabled. > > > > > > > So, if the user enables it we have to provide some sensible behavior. > > > > > > > If we can't come up with anything, then, in the end, we might want to > > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I > > > > > > > think we should try to come up with a solution for it. > > > > > > > > > > > > I finally got your point and now understood the need. And the idea I > > > > > > proposed doesn't work fine. > > > > > > > > > > > > So you meant that all workers share the cost count and if a parallel > > > > > > vacuum worker increase the cost and it reaches the limit, does the > > > > > > only one worker sleep? Is that okay even though other parallel workers > > > > > > are still running and then the sleep might not help? > > > > > > > > > > > > > > Remember that the other running workers will also increase > > > > VacuumCostBalance and whichever worker finds that it becomes greater > > > > than VacuumCostLimit will reset its value and sleep. So, won't this > > > > make sure that overall throttling works the same? > > > > > > > > > I agree with this point. There is a possibility that some of the > > > > > workers who are doing heavy I/O continue to work and OTOH other > > > > > workers who are doing very less I/O might become the victim and > > > > > unnecessarily delay its operation. > > > > > > > > > > > > > Sure, but will it impact the overall I/O? I mean to say the rate > > > > limit we want to provide for overall vacuum operation will still be > > > > the same. Also, isn't a similar thing happens now also where heap > > > > might have done a major portion of I/O but soon after we start > > > > vacuuming the index, we will hit the limit and will sleep. > > > > > > Actually, What I meant is that the worker who performing actual I/O > > > might not go for the delay and another worker which has done only CPU > > > operation might pay the penalty? So basically the worker who is doing > > > CPU intensive operation might go for the delay and pay the penalty and > > > the worker who is performing actual I/O continues to work and do > > > further I/O. Do you think this is not a practical problem? > > > > > > > I don't know. Generally, we try to delay (if required) before > > processing (read/write) one page which means it will happen for I/O > > intensive operations, so I am not sure if the point you are making is > > completely correct. > > Ok, I agree with the point that we are checking it only when we are > doing the I/O operation. But, we also need to consider that each I/O > operations have a different weightage. So even if we have a delay > point at I/O operation there is a possibility that we might delay the > worker which is just performing read buffer with page > hit(VacuumCostPageHit). But, the other worker who is actually > dirtying the page(VacuumCostPageDirty = 20) continue the work and do > more I/O. > > > > > > Stepping back a bit, OTOH, I think that we can not guarantee that the > > > one worker who has done more I/O will continue to do further I/O and > > > the one which has not done much I/O will not perform more I/O in > > > future. So it might not be too bad if we compute shared costs as you > > > suggested above. > > > > > > > I am thinking if we can write the patch for both the approaches (a. > > compute shared costs and try to delay based on that, b. try to divide > > the I/O cost among workers as described in the email above[1]) and do > > some tests to see the behavior of throttling, that might help us in > > deciding what is the best strategy to solve this problem, if any. > > What do you think? > > I agree with this idea. I can come up with a POC patch for approach > (b). Meanwhile, if someone is interested to quickly hack with the > approach (a) then we can do some testing and compare. Sawada-san, > by any chance will you be interested to write POC with approach (a)? Yes, I will try to write the PoC patch with approach (a). Regards, -- Masahiko Sawada
On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > Another point in this regard is that the user anyway has an option to > > > > > > > turn off the cost-based vacuum. By default, it is anyway disabled. > > > > > > > So, if the user enables it we have to provide some sensible behavior. > > > > > > > If we can't come up with anything, then, in the end, we might want to > > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I > > > > > > > think we should try to come up with a solution for it. > > > > > > > > > > > > I finally got your point and now understood the need. And the idea I > > > > > > proposed doesn't work fine. > > > > > > > > > > > > So you meant that all workers share the cost count and if a parallel > > > > > > vacuum worker increase the cost and it reaches the limit, does the > > > > > > only one worker sleep? Is that okay even though other parallel workers > > > > > > are still running and then the sleep might not help? > > > > > > > > > > > > > > Remember that the other running workers will also increase > > > > VacuumCostBalance and whichever worker finds that it becomes greater > > > > than VacuumCostLimit will reset its value and sleep. So, won't this > > > > make sure that overall throttling works the same? > > > > > > > > > I agree with this point. There is a possibility that some of the > > > > > workers who are doing heavy I/O continue to work and OTOH other > > > > > workers who are doing very less I/O might become the victim and > > > > > unnecessarily delay its operation. > > > > > > > > > > > > > Sure, but will it impact the overall I/O? I mean to say the rate > > > > limit we want to provide for overall vacuum operation will still be > > > > the same. Also, isn't a similar thing happens now also where heap > > > > might have done a major portion of I/O but soon after we start > > > > vacuuming the index, we will hit the limit and will sleep. > > > > > > Actually, What I meant is that the worker who performing actual I/O > > > might not go for the delay and another worker which has done only CPU > > > operation might pay the penalty? So basically the worker who is doing > > > CPU intensive operation might go for the delay and pay the penalty and > > > the worker who is performing actual I/O continues to work and do > > > further I/O. Do you think this is not a practical problem? > > > > > > > I don't know. Generally, we try to delay (if required) before > > processing (read/write) one page which means it will happen for I/O > > intensive operations, so I am not sure if the point you are making is > > completely correct. > > Ok, I agree with the point that we are checking it only when we are > doing the I/O operation. But, we also need to consider that each I/O > operations have a different weightage. So even if we have a delay > point at I/O operation there is a possibility that we might delay the > worker which is just performing read buffer with page > hit(VacuumCostPageHit). But, the other worker who is actually > dirtying the page(VacuumCostPageDirty = 20) continue the work and do > more I/O. > > > > > > Stepping back a bit, OTOH, I think that we can not guarantee that the > > > one worker who has done more I/O will continue to do further I/O and > > > the one which has not done much I/O will not perform more I/O in > > > future. So it might not be too bad if we compute shared costs as you > > > suggested above. > > > > > > > I am thinking if we can write the patch for both the approaches (a. > > compute shared costs and try to delay based on that, b. try to divide > > the I/O cost among workers as described in the email above[1]) and do > > some tests to see the behavior of throttling, that might help us in > > deciding what is the best strategy to solve this problem, if any. > > What do you think? > > I agree with this idea. I can come up with a POC patch for approach > (b). Meanwhile, if someone is interested to quickly hack with the > approach (a) then we can do some testing and compare. Sawada-san, > by any chance will you be interested to write POC with approach (a)? > Otherwise, I will try to write it after finishing the first one > (approach b). > I have come up with the POC for approach (a). The idea is 1) Before launching the worker divide the current VacuumCostBalance among workers so that workers start accumulating the balance from that point. 2) Also, divide the VacuumCostLimit among the workers. 3) Once the worker are done with the index vacuum, send back the remaining balance with the leader. 4) The leader will sum all the balances and add that to its current VacuumCostBalance. And start accumulating its balance from this point. I was trying to test how is the behaviour of the vacuum I/O limit, but I could not find an easy way to test that so I just put the tracepoint in the code and just checked that at what point we are giving the delay. I also printed the cost balance at various point to see that after how much I/O accumulation we are hitting the delay. Please feel free to suggest a better way to test this. I have printed these logs for parallel vacuum patch (v30) vs v(30) + patch for dividing i/o limit (attached with the mail) Note: Patch and the test results are attached. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I am thinking if we can write the patch for both the approaches (a. > > > compute shared costs and try to delay based on that, b. try to divide > > > the I/O cost among workers as described in the email above[1]) and do > > > some tests to see the behavior of throttling, that might help us in > > > deciding what is the best strategy to solve this problem, if any. > > > What do you think? > > > > I agree with this idea. I can come up with a POC patch for approach > > (b). Meanwhile, if someone is interested to quickly hack with the > > approach (a) then we can do some testing and compare. Sawada-san, > > by any chance will you be interested to write POC with approach (a)? > > Otherwise, I will try to write it after finishing the first one > > (approach b). > > > I have come up with the POC for approach (a). > I think you mean to say approach (b). > The idea is > 1) Before launching the worker divide the current VacuumCostBalance > among workers so that workers start accumulating the balance from that > point. > 2) Also, divide the VacuumCostLimit among the workers. > 3) Once the worker are done with the index vacuum, send back the > remaining balance with the leader. > 4) The leader will sum all the balances and add that to its current > VacuumCostBalance. And start accumulating its balance from this > point. > > I was trying to test how is the behaviour of the vacuum I/O limit, but > I could not find an easy way to test that so I just put the tracepoint > in the code and just checked that at what point we are giving the > delay. > I also printed the cost balance at various point to see that after how > much I/O accumulation we are hitting the delay. Please feel free to > suggest a better way to test this. > Can we compute the overall throttling (sleep time) in the operation separately for heap and index, then divide the index's sleep_time with a number of workers and add it to heap's sleep time? Then, it will be a bit easier to compare the data between parallel and non-parallel case. > I have printed these logs for parallel vacuum patch (v30) vs v(30) + > patch for dividing i/o limit (attached with the mail) > > Note: Patch and the test results are attached. > I think it is always a good idea to summarize the results and tell your conclusion about it. AFAICT, it seems to me this technique as done in patch might not work for the cases when there is an uneven amount of work done by parallel workers (say the index sizes vary (maybe due partial indexes or index column width or some other reasons)). The reason for it is that when the worker finishes it's work we don't rebalance the cost among other workers. Can we generate such a test and see how it behaves? I think it might be possible to address this if it turns out to be a problem. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > I am thinking if we can write the patch for both the approaches (a. > > > > compute shared costs and try to delay based on that, b. try to divide > > > > the I/O cost among workers as described in the email above[1]) and do > > > > some tests to see the behavior of throttling, that might help us in > > > > deciding what is the best strategy to solve this problem, if any. > > > > What do you think? > > > > > > I agree with this idea. I can come up with a POC patch for approach > > > (b). Meanwhile, if someone is interested to quickly hack with the > > > approach (a) then we can do some testing and compare. Sawada-san, > > > by any chance will you be interested to write POC with approach (a)? > > > Otherwise, I will try to write it after finishing the first one > > > (approach b). > > > > > I have come up with the POC for approach (a). > > > > I think you mean to say approach (b). Yeah, sorry for the confusion. It's approach (b). > > > The idea is > > 1) Before launching the worker divide the current VacuumCostBalance > > among workers so that workers start accumulating the balance from that > > point. > > 2) Also, divide the VacuumCostLimit among the workers. > > 3) Once the worker are done with the index vacuum, send back the > > remaining balance with the leader. > > 4) The leader will sum all the balances and add that to its current > > VacuumCostBalance. And start accumulating its balance from this > > point. > > > > I was trying to test how is the behaviour of the vacuum I/O limit, but > > I could not find an easy way to test that so I just put the tracepoint > > in the code and just checked that at what point we are giving the > > delay. > > I also printed the cost balance at various point to see that after how > > much I/O accumulation we are hitting the delay. Please feel free to > > suggest a better way to test this. > > > > Can we compute the overall throttling (sleep time) in the operation > separately for heap and index, then divide the index's sleep_time with > a number of workers and add it to heap's sleep time? Then, it will be > a bit easier to compare the data between parallel and non-parallel > case. Okay, I will try to do that. > > > I have printed these logs for parallel vacuum patch (v30) vs v(30) + > > patch for dividing i/o limit (attached with the mail) > > > > Note: Patch and the test results are attached. > > > > I think it is always a good idea to summarize the results and tell > your conclusion about it. AFAICT, it seems to me this technique as > done in patch might not work for the cases when there is an uneven > amount of work done by parallel workers (say the index sizes vary > (maybe due partial indexes or index column width or some other > reasons)). The reason for it is that when the worker finishes it's > work we don't rebalance the cost among other workers. Right, thats one problem I observed. Can we generate > such a test and see how it behaves? I think it might be possible to > address this if it turns out to be a problem. Yeah, we can address this by rebalancing the cost. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > > Another point in this regard is that the user anyway has an option to > > > > > > > > turn off the cost-based vacuum. By default, it is anyway disabled. > > > > > > > > So, if the user enables it we have to provide some sensible behavior. > > > > > > > > If we can't come up with anything, then, in the end, we might want to > > > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I > > > > > > > > think we should try to come up with a solution for it. > > > > > > > > > > > > > > I finally got your point and now understood the need. And the idea I > > > > > > > proposed doesn't work fine. > > > > > > > > > > > > > > So you meant that all workers share the cost count and if a parallel > > > > > > > vacuum worker increase the cost and it reaches the limit, does the > > > > > > > only one worker sleep? Is that okay even though other parallel workers > > > > > > > are still running and then the sleep might not help? > > > > > > > > > > > > > > > > > Remember that the other running workers will also increase > > > > > VacuumCostBalance and whichever worker finds that it becomes greater > > > > > than VacuumCostLimit will reset its value and sleep. So, won't this > > > > > make sure that overall throttling works the same? > > > > > > > > > > > I agree with this point. There is a possibility that some of the > > > > > > workers who are doing heavy I/O continue to work and OTOH other > > > > > > workers who are doing very less I/O might become the victim and > > > > > > unnecessarily delay its operation. > > > > > > > > > > > > > > > > Sure, but will it impact the overall I/O? I mean to say the rate > > > > > limit we want to provide for overall vacuum operation will still be > > > > > the same. Also, isn't a similar thing happens now also where heap > > > > > might have done a major portion of I/O but soon after we start > > > > > vacuuming the index, we will hit the limit and will sleep. > > > > > > > > Actually, What I meant is that the worker who performing actual I/O > > > > might not go for the delay and another worker which has done only CPU > > > > operation might pay the penalty? So basically the worker who is doing > > > > CPU intensive operation might go for the delay and pay the penalty and > > > > the worker who is performing actual I/O continues to work and do > > > > further I/O. Do you think this is not a practical problem? > > > > > > > > > > I don't know. Generally, we try to delay (if required) before > > > processing (read/write) one page which means it will happen for I/O > > > intensive operations, so I am not sure if the point you are making is > > > completely correct. > > > > Ok, I agree with the point that we are checking it only when we are > > doing the I/O operation. But, we also need to consider that each I/O > > operations have a different weightage. So even if we have a delay > > point at I/O operation there is a possibility that we might delay the > > worker which is just performing read buffer with page > > hit(VacuumCostPageHit). But, the other worker who is actually > > dirtying the page(VacuumCostPageDirty = 20) continue the work and do > > more I/O. > > > > > > > > > Stepping back a bit, OTOH, I think that we can not guarantee that the > > > > one worker who has done more I/O will continue to do further I/O and > > > > the one which has not done much I/O will not perform more I/O in > > > > future. So it might not be too bad if we compute shared costs as you > > > > suggested above. > > > > > > > > > > I am thinking if we can write the patch for both the approaches (a. > > > compute shared costs and try to delay based on that, b. try to divide > > > the I/O cost among workers as described in the email above[1]) and do > > > some tests to see the behavior of throttling, that might help us in > > > deciding what is the best strategy to solve this problem, if any. > > > What do you think? > > > > I agree with this idea. I can come up with a POC patch for approach > > (b). Meanwhile, if someone is interested to quickly hack with the > > approach (a) then we can do some testing and compare. Sawada-san, > > by any chance will you be interested to write POC with approach (a)? > > Otherwise, I will try to write it after finishing the first one > > (approach b). > > > I have come up with the POC for approach (a). > > The idea is > 1) Before launching the worker divide the current VacuumCostBalance > among workers so that workers start accumulating the balance from that > point. > 2) Also, divide the VacuumCostLimit among the workers. > 3) Once the worker are done with the index vacuum, send back the > remaining balance with the leader. > 4) The leader will sum all the balances and add that to its current > VacuumCostBalance. And start accumulating its balance from this > point. > > I was trying to test how is the behaviour of the vacuum I/O limit, but > I could not find an easy way to test that so I just put the tracepoint > in the code and just checked that at what point we are giving the > delay. > I also printed the cost balance at various point to see that after how > much I/O accumulation we are hitting the delay. Please feel free to > suggest a better way to test this. > > I have printed these logs for parallel vacuum patch (v30) vs v(30) + > patch for dividing i/o limit (attached with the mail) > > Note: Patch and the test results are attached. > Thank you! For approach (a) the basic idea I've come up with is that we have a shared balance value on DSM and each workers including the leader process add its local balance value to it in vacuum_delay_point, and then based on the shared value workers sleep. I'll submit that patch with other updates. Regards, -- Masahiko Sawada
On Thu, Oct 24, 2019 at 8:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I have come up with the POC for approach (a). > > > > The idea is > > 1) Before launching the worker divide the current VacuumCostBalance > > among workers so that workers start accumulating the balance from that > > point. > > 2) Also, divide the VacuumCostLimit among the workers. > > 3) Once the worker are done with the index vacuum, send back the > > remaining balance with the leader. > > 4) The leader will sum all the balances and add that to its current > > VacuumCostBalance. And start accumulating its balance from this > > point. > > > > I was trying to test how is the behaviour of the vacuum I/O limit, but > > I could not find an easy way to test that so I just put the tracepoint > > in the code and just checked that at what point we are giving the > > delay. > > I also printed the cost balance at various point to see that after how > > much I/O accumulation we are hitting the delay. Please feel free to > > suggest a better way to test this. > > > > I have printed these logs for parallel vacuum patch (v30) vs v(30) + > > patch for dividing i/o limit (attached with the mail) > > > > Note: Patch and the test results are attached. > > > > Thank you! > > For approach (a) the basic idea I've come up with is that we have a > shared balance value on DSM and each workers including the leader > process add its local balance value to it in vacuum_delay_point, and > then based on the shared value workers sleep. I'll submit that patch > with other updates. > I think it would be better if we can prepare the I/O balance patches on top of main patch and evaluate both approaches. We can test both the approaches and integrate the one which turned out to be good. Note that, I will be away next week, so I won't be able to review your latest patch unless you are planning to post today or tomorrow. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 25, 2019 at 7:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Oct 24, 2019 at 8:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > I have come up with the POC for approach (a). > > > > > > The idea is > > > 1) Before launching the worker divide the current VacuumCostBalance > > > among workers so that workers start accumulating the balance from that > > > point. > > > 2) Also, divide the VacuumCostLimit among the workers. > > > 3) Once the worker are done with the index vacuum, send back the > > > remaining balance with the leader. > > > 4) The leader will sum all the balances and add that to its current > > > VacuumCostBalance. And start accumulating its balance from this > > > point. > > > > > > I was trying to test how is the behaviour of the vacuum I/O limit, but > > > I could not find an easy way to test that so I just put the tracepoint > > > in the code and just checked that at what point we are giving the > > > delay. > > > I also printed the cost balance at various point to see that after how > > > much I/O accumulation we are hitting the delay. Please feel free to > > > suggest a better way to test this. > > > > > > I have printed these logs for parallel vacuum patch (v30) vs v(30) + > > > patch for dividing i/o limit (attached with the mail) > > > > > > Note: Patch and the test results are attached. > > > > > > > Thank you! > > > > For approach (a) the basic idea I've come up with is that we have a > > shared balance value on DSM and each workers including the leader > > process add its local balance value to it in vacuum_delay_point, and > > then based on the shared value workers sleep. I'll submit that patch > > with other updates. > > > > I think it would be better if we can prepare the I/O balance patches > on top of main patch and evaluate both approaches. We can test both > the approaches and integrate the one which turned out to be good. > Just to add something to testing both approaches. I think we can first come up with a way to compute the throttling vacuum does as mentioned by me in one of the emails above [1] or in some other way. I think Dilip is planning to give it a try and once we have that we can evaluate both the patches. Some of the tests I have in mind are: a. All indexes have an equal amount of deleted data. b. indexes have an uneven amount of deleted data. c. try with mix of indexes (btree, gin, gist, hash, etc..) on a table. Feel free to add more tests. [1] - https://www.postgresql.org/message-id/CAA4eK1%2BPeiFLdTuwrE6CvbNdx80E-O%3DZxCuWB2maREKFD-RaCA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 24, 2019 at 8:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > > > > Another point in this regard is that the user anyway has an option to > > > > > > > > > turn off the cost-based vacuum. By default, it is anyway disabled. > > > > > > > > > So, if the user enables it we have to provide some sensible behavior. > > > > > > > > > If we can't come up with anything, then, in the end, we might want to > > > > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I > > > > > > > > > think we should try to come up with a solution for it. > > > > > > > > > > > > > > > > I finally got your point and now understood the need. And the idea I > > > > > > > > proposed doesn't work fine. > > > > > > > > > > > > > > > > So you meant that all workers share the cost count and if a parallel > > > > > > > > vacuum worker increase the cost and it reaches the limit, does the > > > > > > > > only one worker sleep? Is that okay even though other parallel workers > > > > > > > > are still running and then the sleep might not help? > > > > > > > > > > > > > > > > > > > > Remember that the other running workers will also increase > > > > > > VacuumCostBalance and whichever worker finds that it becomes greater > > > > > > than VacuumCostLimit will reset its value and sleep. So, won't this > > > > > > make sure that overall throttling works the same? > > > > > > > > > > > > > I agree with this point. There is a possibility that some of the > > > > > > > workers who are doing heavy I/O continue to work and OTOH other > > > > > > > workers who are doing very less I/O might become the victim and > > > > > > > unnecessarily delay its operation. > > > > > > > > > > > > > > > > > > > Sure, but will it impact the overall I/O? I mean to say the rate > > > > > > limit we want to provide for overall vacuum operation will still be > > > > > > the same. Also, isn't a similar thing happens now also where heap > > > > > > might have done a major portion of I/O but soon after we start > > > > > > vacuuming the index, we will hit the limit and will sleep. > > > > > > > > > > Actually, What I meant is that the worker who performing actual I/O > > > > > might not go for the delay and another worker which has done only CPU > > > > > operation might pay the penalty? So basically the worker who is doing > > > > > CPU intensive operation might go for the delay and pay the penalty and > > > > > the worker who is performing actual I/O continues to work and do > > > > > further I/O. Do you think this is not a practical problem? > > > > > > > > > > > > > I don't know. Generally, we try to delay (if required) before > > > > processing (read/write) one page which means it will happen for I/O > > > > intensive operations, so I am not sure if the point you are making is > > > > completely correct. > > > > > > Ok, I agree with the point that we are checking it only when we are > > > doing the I/O operation. But, we also need to consider that each I/O > > > operations have a different weightage. So even if we have a delay > > > point at I/O operation there is a possibility that we might delay the > > > worker which is just performing read buffer with page > > > hit(VacuumCostPageHit). But, the other worker who is actually > > > dirtying the page(VacuumCostPageDirty = 20) continue the work and do > > > more I/O. > > > > > > > > > > > > Stepping back a bit, OTOH, I think that we can not guarantee that the > > > > > one worker who has done more I/O will continue to do further I/O and > > > > > the one which has not done much I/O will not perform more I/O in > > > > > future. So it might not be too bad if we compute shared costs as you > > > > > suggested above. > > > > > > > > > > > > > I am thinking if we can write the patch for both the approaches (a. > > > > compute shared costs and try to delay based on that, b. try to divide > > > > the I/O cost among workers as described in the email above[1]) and do > > > > some tests to see the behavior of throttling, that might help us in > > > > deciding what is the best strategy to solve this problem, if any. > > > > What do you think? > > > > > > I agree with this idea. I can come up with a POC patch for approach > > > (b). Meanwhile, if someone is interested to quickly hack with the > > > approach (a) then we can do some testing and compare. Sawada-san, > > > by any chance will you be interested to write POC with approach (a)? > > > Otherwise, I will try to write it after finishing the first one > > > (approach b). > > > > > I have come up with the POC for approach (a). > > > > The idea is > > 1) Before launching the worker divide the current VacuumCostBalance > > among workers so that workers start accumulating the balance from that > > point. > > 2) Also, divide the VacuumCostLimit among the workers. > > 3) Once the worker are done with the index vacuum, send back the > > remaining balance with the leader. > > 4) The leader will sum all the balances and add that to its current > > VacuumCostBalance. And start accumulating its balance from this > > point. > > > > I was trying to test how is the behaviour of the vacuum I/O limit, but > > I could not find an easy way to test that so I just put the tracepoint > > in the code and just checked that at what point we are giving the > > delay. > > I also printed the cost balance at various point to see that after how > > much I/O accumulation we are hitting the delay. Please feel free to > > suggest a better way to test this. > > > > I have printed these logs for parallel vacuum patch (v30) vs v(30) + > > patch for dividing i/o limit (attached with the mail) > > > > Note: Patch and the test results are attached. > > > > Thank you! > > For approach (a) the basic idea I've come up with is that we have a > shared balance value on DSM and each workers including the leader > process add its local balance value to it in vacuum_delay_point, and > then based on the shared value workers sleep. I'll submit that patch > with other updates. IMHO, if we add the local balance to the shared balance in vacuum_delay_point and each worker is working with full limit then there will be a problem right? because suppose VacuumCostLimit is 2000 then the first time each worker hit the vacuum_delay_point when their local limit will be 2000 so in most cases, the first delay will be hit when there gross I/O is 6000 (if there are 3 workers). I think if we want to have the shared accounting then we must accumulate the balance always in a shared variable so that as soon as the gross limit hits the VacuumCostLimit, we can have the delay point. Maybe we can do this 1. change VacuumCostBalance from integer to pg_atomic_uint32 * 2. In heap_parallel_vacuum_main function, make this point into a shared memory location. Basically, for the non-parallel case, it will point to the process-specific global variable whereas in parallel case it will point to a shared memory variable. 3. Now, I think in code (I think 5-6 occurrence) wherever we are using VacuumCostBalance, change them to use atomic operations. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 25, 2019 at 12:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Oct 24, 2019 at 8:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > Another point in this regard is that the user anyway has an option to > > > > > > > > > > turn off the cost-based vacuum. By default, it is anyway disabled. > > > > > > > > > > So, if the user enables it we have to provide some sensible behavior. > > > > > > > > > > If we can't come up with anything, then, in the end, we might want to > > > > > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I > > > > > > > > > > think we should try to come up with a solution for it. > > > > > > > > > > > > > > > > > > I finally got your point and now understood the need. And the idea I > > > > > > > > > proposed doesn't work fine. > > > > > > > > > > > > > > > > > > So you meant that all workers share the cost count and if a parallel > > > > > > > > > vacuum worker increase the cost and it reaches the limit, does the > > > > > > > > > only one worker sleep? Is that okay even though other parallel workers > > > > > > > > > are still running and then the sleep might not help? > > > > > > > > > > > > > > > > > > > > > > > Remember that the other running workers will also increase > > > > > > > VacuumCostBalance and whichever worker finds that it becomes greater > > > > > > > than VacuumCostLimit will reset its value and sleep. So, won't this > > > > > > > make sure that overall throttling works the same? > > > > > > > > > > > > > > > I agree with this point. There is a possibility that some of the > > > > > > > > workers who are doing heavy I/O continue to work and OTOH other > > > > > > > > workers who are doing very less I/O might become the victim and > > > > > > > > unnecessarily delay its operation. > > > > > > > > > > > > > > > > > > > > > > Sure, but will it impact the overall I/O? I mean to say the rate > > > > > > > limit we want to provide for overall vacuum operation will still be > > > > > > > the same. Also, isn't a similar thing happens now also where heap > > > > > > > might have done a major portion of I/O but soon after we start > > > > > > > vacuuming the index, we will hit the limit and will sleep. > > > > > > > > > > > > Actually, What I meant is that the worker who performing actual I/O > > > > > > might not go for the delay and another worker which has done only CPU > > > > > > operation might pay the penalty? So basically the worker who is doing > > > > > > CPU intensive operation might go for the delay and pay the penalty and > > > > > > the worker who is performing actual I/O continues to work and do > > > > > > further I/O. Do you think this is not a practical problem? > > > > > > > > > > > > > > > > I don't know. Generally, we try to delay (if required) before > > > > > processing (read/write) one page which means it will happen for I/O > > > > > intensive operations, so I am not sure if the point you are making is > > > > > completely correct. > > > > > > > > Ok, I agree with the point that we are checking it only when we are > > > > doing the I/O operation. But, we also need to consider that each I/O > > > > operations have a different weightage. So even if we have a delay > > > > point at I/O operation there is a possibility that we might delay the > > > > worker which is just performing read buffer with page > > > > hit(VacuumCostPageHit). But, the other worker who is actually > > > > dirtying the page(VacuumCostPageDirty = 20) continue the work and do > > > > more I/O. > > > > > > > > > > > > > > > Stepping back a bit, OTOH, I think that we can not guarantee that the > > > > > > one worker who has done more I/O will continue to do further I/O and > > > > > > the one which has not done much I/O will not perform more I/O in > > > > > > future. So it might not be too bad if we compute shared costs as you > > > > > > suggested above. > > > > > > > > > > > > > > > > I am thinking if we can write the patch for both the approaches (a. > > > > > compute shared costs and try to delay based on that, b. try to divide > > > > > the I/O cost among workers as described in the email above[1]) and do > > > > > some tests to see the behavior of throttling, that might help us in > > > > > deciding what is the best strategy to solve this problem, if any. > > > > > What do you think? > > > > > > > > I agree with this idea. I can come up with a POC patch for approach > > > > (b). Meanwhile, if someone is interested to quickly hack with the > > > > approach (a) then we can do some testing and compare. Sawada-san, > > > > by any chance will you be interested to write POC with approach (a)? > > > > Otherwise, I will try to write it after finishing the first one > > > > (approach b). > > > > > > > I have come up with the POC for approach (a). > > > > > > The idea is > > > 1) Before launching the worker divide the current VacuumCostBalance > > > among workers so that workers start accumulating the balance from that > > > point. > > > 2) Also, divide the VacuumCostLimit among the workers. > > > 3) Once the worker are done with the index vacuum, send back the > > > remaining balance with the leader. > > > 4) The leader will sum all the balances and add that to its current > > > VacuumCostBalance. And start accumulating its balance from this > > > point. > > > > > > I was trying to test how is the behaviour of the vacuum I/O limit, but > > > I could not find an easy way to test that so I just put the tracepoint > > > in the code and just checked that at what point we are giving the > > > delay. > > > I also printed the cost balance at various point to see that after how > > > much I/O accumulation we are hitting the delay. Please feel free to > > > suggest a better way to test this. > > > > > > I have printed these logs for parallel vacuum patch (v30) vs v(30) + > > > patch for dividing i/o limit (attached with the mail) > > > > > > Note: Patch and the test results are attached. > > > > > > > Thank you! > > > > For approach (a) the basic idea I've come up with is that we have a > > shared balance value on DSM and each workers including the leader > > process add its local balance value to it in vacuum_delay_point, and > > then based on the shared value workers sleep. I'll submit that patch > > with other updates. > IMHO, if we add the local balance to the shared balance in > vacuum_delay_point and each worker is working with full limit then > there will be a problem right? because suppose VacuumCostLimit is 2000 > then the first time each worker hit the vacuum_delay_point when their > local limit will be 2000 so in most cases, the first delay will be hit > when there gross I/O is 6000 (if there are 3 workers). For more detail of my idea it is that the first worker who entered to vacuum_delay_point adds its local value to shared value and reset the local value to 0. And then the worker sleeps if it exceeds VacuumCostLimit but before sleeping it can subtract VacuumCostLimit from the shared value. Since vacuum_delay_point are typically called per page processed I expect there will not such problem. Thoughts? Regards, -- Masahiko Sawada
On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Oct 25, 2019 at 12:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Thu, Oct 24, 2019 at 8:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > > > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > Another point in this regard is that the user anyway has an option to > > > > > > > > > > > turn off the cost-based vacuum. By default, it is anyway disabled. > > > > > > > > > > > So, if the user enables it we have to provide some sensible behavior. > > > > > > > > > > > If we can't come up with anything, then, in the end, we might want to > > > > > > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I > > > > > > > > > > > think we should try to come up with a solution for it. > > > > > > > > > > > > > > > > > > > > I finally got your point and now understood the need. And the idea I > > > > > > > > > > proposed doesn't work fine. > > > > > > > > > > > > > > > > > > > > So you meant that all workers share the cost count and if a parallel > > > > > > > > > > vacuum worker increase the cost and it reaches the limit, does the > > > > > > > > > > only one worker sleep? Is that okay even though other parallel workers > > > > > > > > > > are still running and then the sleep might not help? > > > > > > > > > > > > > > > > > > > > > > > > > > Remember that the other running workers will also increase > > > > > > > > VacuumCostBalance and whichever worker finds that it becomes greater > > > > > > > > than VacuumCostLimit will reset its value and sleep. So, won't this > > > > > > > > make sure that overall throttling works the same? > > > > > > > > > > > > > > > > > I agree with this point. There is a possibility that some of the > > > > > > > > > workers who are doing heavy I/O continue to work and OTOH other > > > > > > > > > workers who are doing very less I/O might become the victim and > > > > > > > > > unnecessarily delay its operation. > > > > > > > > > > > > > > > > > > > > > > > > > Sure, but will it impact the overall I/O? I mean to say the rate > > > > > > > > limit we want to provide for overall vacuum operation will still be > > > > > > > > the same. Also, isn't a similar thing happens now also where heap > > > > > > > > might have done a major portion of I/O but soon after we start > > > > > > > > vacuuming the index, we will hit the limit and will sleep. > > > > > > > > > > > > > > Actually, What I meant is that the worker who performing actual I/O > > > > > > > might not go for the delay and another worker which has done only CPU > > > > > > > operation might pay the penalty? So basically the worker who is doing > > > > > > > CPU intensive operation might go for the delay and pay the penalty and > > > > > > > the worker who is performing actual I/O continues to work and do > > > > > > > further I/O. Do you think this is not a practical problem? > > > > > > > > > > > > > > > > > > > I don't know. Generally, we try to delay (if required) before > > > > > > processing (read/write) one page which means it will happen for I/O > > > > > > intensive operations, so I am not sure if the point you are making is > > > > > > completely correct. > > > > > > > > > > Ok, I agree with the point that we are checking it only when we are > > > > > doing the I/O operation. But, we also need to consider that each I/O > > > > > operations have a different weightage. So even if we have a delay > > > > > point at I/O operation there is a possibility that we might delay the > > > > > worker which is just performing read buffer with page > > > > > hit(VacuumCostPageHit). But, the other worker who is actually > > > > > dirtying the page(VacuumCostPageDirty = 20) continue the work and do > > > > > more I/O. > > > > > > > > > > > > > > > > > > Stepping back a bit, OTOH, I think that we can not guarantee that the > > > > > > > one worker who has done more I/O will continue to do further I/O and > > > > > > > the one which has not done much I/O will not perform more I/O in > > > > > > > future. So it might not be too bad if we compute shared costs as you > > > > > > > suggested above. > > > > > > > > > > > > > > > > > > > I am thinking if we can write the patch for both the approaches (a. > > > > > > compute shared costs and try to delay based on that, b. try to divide > > > > > > the I/O cost among workers as described in the email above[1]) and do > > > > > > some tests to see the behavior of throttling, that might help us in > > > > > > deciding what is the best strategy to solve this problem, if any. > > > > > > What do you think? > > > > > > > > > > I agree with this idea. I can come up with a POC patch for approach > > > > > (b). Meanwhile, if someone is interested to quickly hack with the > > > > > approach (a) then we can do some testing and compare. Sawada-san, > > > > > by any chance will you be interested to write POC with approach (a)? > > > > > Otherwise, I will try to write it after finishing the first one > > > > > (approach b). > > > > > > > > > I have come up with the POC for approach (a). > > > > > > > > The idea is > > > > 1) Before launching the worker divide the current VacuumCostBalance > > > > among workers so that workers start accumulating the balance from that > > > > point. > > > > 2) Also, divide the VacuumCostLimit among the workers. > > > > 3) Once the worker are done with the index vacuum, send back the > > > > remaining balance with the leader. > > > > 4) The leader will sum all the balances and add that to its current > > > > VacuumCostBalance. And start accumulating its balance from this > > > > point. > > > > > > > > I was trying to test how is the behaviour of the vacuum I/O limit, but > > > > I could not find an easy way to test that so I just put the tracepoint > > > > in the code and just checked that at what point we are giving the > > > > delay. > > > > I also printed the cost balance at various point to see that after how > > > > much I/O accumulation we are hitting the delay. Please feel free to > > > > suggest a better way to test this. > > > > > > > > I have printed these logs for parallel vacuum patch (v30) vs v(30) + > > > > patch for dividing i/o limit (attached with the mail) > > > > > > > > Note: Patch and the test results are attached. > > > > > > > > > > Thank you! > > > > > > For approach (a) the basic idea I've come up with is that we have a > > > shared balance value on DSM and each workers including the leader > > > process add its local balance value to it in vacuum_delay_point, and > > > then based on the shared value workers sleep. I'll submit that patch > > > with other updates. > > IMHO, if we add the local balance to the shared balance in > > vacuum_delay_point and each worker is working with full limit then > > there will be a problem right? because suppose VacuumCostLimit is 2000 > > then the first time each worker hit the vacuum_delay_point when their > > local limit will be 2000 so in most cases, the first delay will be hit > > when there gross I/O is 6000 (if there are 3 workers). > > For more detail of my idea it is that the first worker who entered to > vacuum_delay_point adds its local value to shared value and reset the > local value to 0. And then the worker sleeps if it exceeds > VacuumCostLimit but before sleeping it can subtract VacuumCostLimit > from the shared value. Since vacuum_delay_point are typically called > per page processed I expect there will not such problem. Thoughts? Oh right, I assumed that when the local balance is exceeding the VacuumCostLimit that time you are adding it to the shared value but you are adding it to to shared value every time in vacuum_delay_point. So I think your idea is correct. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 25, 2019 at 2:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > For more detail of my idea it is that the first worker who entered to > > vacuum_delay_point adds its local value to shared value and reset the > > local value to 0. And then the worker sleeps if it exceeds > > VacuumCostLimit but before sleeping it can subtract VacuumCostLimit > > from the shared value. Since vacuum_delay_point are typically called > > per page processed I expect there will not such problem. Thoughts? > > Oh right, I assumed that when the local balance is exceeding the > VacuumCostLimit that time you are adding it to the shared value but > you are adding it to to shared value every time in vacuum_delay_point. > So I think your idea is correct. I've attached the updated patch set. First three patches add new variables and a callback to index AM. Next two patches are the main part to support parallel vacuum. I've incorporated all review comments I got so far. The memory layout of variable-length index statistics might be complex a bit. It's similar to the format of heap tuple header, having a null bitmap. And both the size of index statistics and actual data for each indexes follows. Last patch is a PoC patch that implements the shared vacuum cost balance. For now it's separated but after testing both approaches it will be merged to 0004 patch. I'll test both next week. This patch set can be applied on top of the patch[1] that improves gist index bulk-deletion. So canparallelvacuum of gist index is true. [1] https://www.postgresql.org/message-id/CAFiTN-uQY%2BB%2BCLb8W3YYdb7XmB9hyYFXkAy3C7RY%3D-YSWRV1DA%40mail.gmail.com Regards, -- Masahiko Sawada
Attachment
- v31-0002-Add-an-index-AM-callback-to-estimate-DSM-for-par.patch
- v31-0003-Add-an-index-AM-field-to-check-if-use-maintenanc.patch
- v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch
- v31-0004-Add-parallel-option-to-VACUUM-command.patch
- v31-0006-PoC-shared-vacuum-cost-balance.patch
- v31-0001-Add-an-index-AM-field-to-check-parallel-index-pa.patch
On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Oct 25, 2019 at 2:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > For more detail of my idea it is that the first worker who entered to > > > vacuum_delay_point adds its local value to shared value and reset the > > > local value to 0. And then the worker sleeps if it exceeds > > > VacuumCostLimit but before sleeping it can subtract VacuumCostLimit > > > from the shared value. Since vacuum_delay_point are typically called > > > per page processed I expect there will not such problem. Thoughts? > > > > Oh right, I assumed that when the local balance is exceeding the > > VacuumCostLimit that time you are adding it to the shared value but > > you are adding it to to shared value every time in vacuum_delay_point. > > So I think your idea is correct. > > I've attached the updated patch set. > > First three patches add new variables and a callback to index AM. > > Next two patches are the main part to support parallel vacuum. I've > incorporated all review comments I got so far. The memory layout of > variable-length index statistics might be complex a bit. It's similar > to the format of heap tuple header, having a null bitmap. And both the > size of index statistics and actual data for each indexes follows. > > Last patch is a PoC patch that implements the shared vacuum cost > balance. For now it's separated but after testing both approaches it > will be merged to 0004 patch. I'll test both next week. > > This patch set can be applied on top of the patch[1] that improves > gist index bulk-deletion. So canparallelvacuum of gist index is true. > > [1] https://www.postgresql.org/message-id/CAFiTN-uQY%2BB%2BCLb8W3YYdb7XmB9hyYFXkAy3C7RY%3D-YSWRV1DA%40mail.gmail.com > I haven't yet read the new set of the patch. But, I have noticed one thing. That we are getting the size of the statistics using the AM routine. But, we are copying those statistics from local memory to the shared memory directly using the memcpy. Wouldn't it be a good idea to have an AM specific routine to get it copied from the local memory to the shared memory? I am not sure it is worth it or not but my thought behind this point is that it will give AM to have local stats in any form ( like they can store a pointer in that ) but they can serialize that while copying to shared stats. And, later when shared stats are passed back to the Am then it can deserialize in its local form and use it. + * Since all vacuum workers write the bulk-deletion result at + * different slots we can write them without locking. + */ + if (!shared_indstats->updated && stats[idx] != NULL) + { + memcpy(bulkdelete_res, stats[idx], shared_indstats->size); + shared_indstats->updated = true; + + /* + * no longer need the locally allocated result and now + * stats[idx] points to the DSM segment. + */ + pfree(stats[idx]); + stats[idx] = bulkdelete_res; + } -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > I am thinking if we can write the patch for both the approaches (a. > > > > > compute shared costs and try to delay based on that, b. try to divide > > > > > the I/O cost among workers as described in the email above[1]) and do > > > > > some tests to see the behavior of throttling, that might help us in > > > > > deciding what is the best strategy to solve this problem, if any. > > > > > What do you think? > > > > > > > > I agree with this idea. I can come up with a POC patch for approach > > > > (b). Meanwhile, if someone is interested to quickly hack with the > > > > approach (a) then we can do some testing and compare. Sawada-san, > > > > by any chance will you be interested to write POC with approach (a)? > > > > Otherwise, I will try to write it after finishing the first one > > > > (approach b). > > > > > > > I have come up with the POC for approach (a). > > Can we compute the overall throttling (sleep time) in the operation > > separately for heap and index, then divide the index's sleep_time with > > a number of workers and add it to heap's sleep time? Then, it will be > > a bit easier to compare the data between parallel and non-parallel > > case. I have come up with a patch to compute the total delay during the vacuum. So the idea of computing the total cost delay is Total cost delay = Total dealy of heap scan + Total dealy of index/worker; Patch is attached for the same. I have prepared this patch on the latest patch of the parallel vacuum[1]. I have also rebased the patch for the approach [b] for dividing the vacuum cost limit and done some testing for computing the I/O throttling. Attached patches 0001-POC-compute-total-cost-delay and 0002-POC-divide-vacuum-cost-limit can be applied on top of v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch. I haven't rebased on top of v31-0006, because v31-0006 is implementing the I/O throttling with one approach and 0002-POC-divide-vacuum-cost-limit is doing the same with another approach. But, 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as well (just 1-2 lines conflict). Testing: I have performed 2 tests, one with the same size indexes and second with the different size indexes and measured total I/O delay with the attached patch. Setup: VacuumCostDelay=10ms VacuumCostLimit=2000 Test1 (Same size index): create table test(a int, b varchar, c varchar); create index idx1 on test(a); create index idx2 on test(b); create index idx3 on test(c); insert into test select i, repeat('a',30)||i, repeat('a',20)||i from generate_series(1,500000) as i; delete from test where a < 200000; Vacuum (Head) Parallel Vacuum Vacuum Cost Divide Patch Total Delay 1784 (ms) 1398(ms) 1938(ms) Test2 (Variable size dead tuple in index) create table test(a int, b varchar, c varchar); create index idx1 on test(a); create index idx2 on test(b) where a > 100000; create index idx3 on test(c) where a > 150000; insert into test select i, repeat('a',30)||i, repeat('a',20)||i from generate_series(1,500000) as i; delete from test where a < 200000; Vacuum (Head) Parallel Vacuum Vacuum Cost Divide Patch Total Delay 1438 (ms) 1029(ms) 1529(ms) Conclusion: 1. The tests prove that the total I/O delay is significantly less with the parallel vacuum. 2. With the vacuum cost divide the problem is solved but the delay bit more compared to the non-parallel version. The reason could be the problem discussed at[2], but it needs further investigation. Next, I will test with the v31-0006 (shared vacuum cost) patch. I will also try to test different types of indexes. [1] https://www.postgresql.org/message-id/CAD21AoBMo9dr_QmhT%3DdKh7fmiq7tpx%2ByLHR8nw9i5NZ-SgtaVg%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAA4eK1%2BPeiFLdTuwrE6CvbNdx80E-O%3DZxCuWB2maREKFD-RaCA%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
On Sun, Oct 27, 2019 at 12:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > I haven't yet read the new set of the patch. But, I have noticed one > thing. That we are getting the size of the statistics using the AM > routine. But, we are copying those statistics from local memory to > the shared memory directly using the memcpy. Wouldn't it be a good > idea to have an AM specific routine to get it copied from the local > memory to the shared memory? I am not sure it is worth it or not but > my thought behind this point is that it will give AM to have local > stats in any form ( like they can store a pointer in that ) but they > can serialize that while copying to shared stats. And, later when > shared stats are passed back to the Am then it can deserialize in its > local form and use it. > You have a point, but after changing the gist index, we don't have any current usage for indexes that need something like that. So, on one side there is some value in having an API to copy the stats, but on the other side without having clear usage of an API, it might not be good to expose a new API for the same. I think we can expose such an API in the future if there is a need for the same. Do you or anyone know of any external IndexAM that has such a need? Few minor comments while glancing through the latest patchset. 1. I think you can merge 0001*, 0002*, 0003* patch into one patch as all three expose new variable/function from IndexAmRoutine. 2. +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes) +{ + char *p = (char *) GetSharedIndStats(lvshared); + int vac_work_mem = IsAutoVacuumWorkerProcess() && + autovacuum_work_mem != -1 ? + autovacuum_work_mem : maintenance_work_mem; I think this function won't be called from AutoVacuumWorkerProcess at least not as of now, so isn't it a better idea to have an Assert for it? 3. +void +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc) This function is for performing a parallel operation on the index, so why to start with heap? It is better to name it as index_parallel_vacuum_main or simply parallel_vacuum_main. 4. /* useindex = true means two-pass strategy; false means one-pass */ @@ -128,17 +280,12 @@ typedef struct LVRelStats BlockNumber pages_removed; double tuples_deleted; BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */ - /* List of TIDs of tuples we intend to delete */ - /* NB: this list is ordered by TID address */ - int num_dead_tuples; /* current # of entries */ - int max_dead_tuples; /* # slots allocated in array */ - ItemPointer dead_tuples; /* array of ItemPointerData */ + LVDeadTuples *dead_tuples; int num_index_scans; TransactionId latestRemovedXid; bool lock_waiter_detected; } LVRelStats; - /* A few variables that don't seem worth passing around as parameters */ static int elevel = -1; It seems like a spurious line removal. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Oct 28, 2019 at 12:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Oct 27, 2019 at 12:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > I haven't yet read the new set of the patch. But, I have noticed one > > thing. That we are getting the size of the statistics using the AM > > routine. But, we are copying those statistics from local memory to > > the shared memory directly using the memcpy. Wouldn't it be a good > > idea to have an AM specific routine to get it copied from the local > > memory to the shared memory? I am not sure it is worth it or not but > > my thought behind this point is that it will give AM to have local > > stats in any form ( like they can store a pointer in that ) but they > > can serialize that while copying to shared stats. And, later when > > shared stats are passed back to the Am then it can deserialize in its > > local form and use it. > > > > You have a point, but after changing the gist index, we don't have any > current usage for indexes that need something like that. So, on one > side there is some value in having an API to copy the stats, but on > the other side without having clear usage of an API, it might not be > good to expose a new API for the same. I think we can expose such an > API in the future if there is a need for the same. I agree with the point. But, the current patch exposes an API for estimating the size for the statistics. So IMHO, either we expose both APIs for estimating the size of the stats and copy the stats or none. Am I missing something here? Do you or anyone > know of any external IndexAM that has such a need? > > Few minor comments while glancing through the latest patchset. > > 1. I think you can merge 0001*, 0002*, 0003* patch into one patch as > all three expose new variable/function from IndexAmRoutine. > > 2. > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes) > +{ > + char *p = (char *) GetSharedIndStats(lvshared); > + int vac_work_mem = IsAutoVacuumWorkerProcess() && > + autovacuum_work_mem != -1 ? > + autovacuum_work_mem : maintenance_work_mem; > > I think this function won't be called from AutoVacuumWorkerProcess at > least not as of now, so isn't it a better idea to have an Assert for > it? > > 3. > +void > +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc) > > This function is for performing a parallel operation on the index, so > why to start with heap? It is better to name it as > index_parallel_vacuum_main or simply parallel_vacuum_main. > > 4. > /* useindex = true means two-pass strategy; false means one-pass */ > @@ -128,17 +280,12 @@ typedef struct LVRelStats > BlockNumber pages_removed; > double tuples_deleted; > BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */ > - /* List of TIDs of tuples we intend to delete */ > - /* NB: this list is ordered by TID address */ > - int num_dead_tuples; /* current # of entries */ > - int max_dead_tuples; /* # slots allocated in array */ > - ItemPointer dead_tuples; /* array of ItemPointerData */ > + LVDeadTuples *dead_tuples; > int num_index_scans; > TransactionId latestRemovedXid; > bool lock_waiter_detected; > } LVRelStats; > > - > /* A few variables that don't seem worth passing around as parameters */ > static int elevel = -1; > > It seems like a spurious line removal. > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Oct 25, 2019 at 2:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > For more detail of my idea it is that the first worker who entered to > > > vacuum_delay_point adds its local value to shared value and reset the > > > local value to 0. And then the worker sleeps if it exceeds > > > VacuumCostLimit but before sleeping it can subtract VacuumCostLimit > > > from the shared value. Since vacuum_delay_point are typically called > > > per page processed I expect there will not such problem. Thoughts? > > > > Oh right, I assumed that when the local balance is exceeding the > > VacuumCostLimit that time you are adding it to the shared value but > > you are adding it to to shared value every time in vacuum_delay_point. > > So I think your idea is correct. > > I've attached the updated patch set. > > First three patches add new variables and a callback to index AM. > > Next two patches are the main part to support parallel vacuum. I've > incorporated all review comments I got so far. The memory layout of > variable-length index statistics might be complex a bit. It's similar > to the format of heap tuple header, having a null bitmap. And both the > size of index statistics and actual data for each indexes follows. > > Last patch is a PoC patch that implements the shared vacuum cost > balance. For now it's separated but after testing both approaches it > will be merged to 0004 patch. I'll test both next week. > > This patch set can be applied on top of the patch[1] that improves > gist index bulk-deletion. So canparallelvacuum of gist index is true. > + /* Get the space for IndexBulkDeleteResult */ + bulkdelete_res = GetIndexBulkDeleteResult(shared_indstats); + + /* + * Update the pointer to the corresponding bulk-deletion result + * if someone has already updated it. + */ + if (shared_indstats->updated && stats[idx] == NULL) + stats[idx] = bulkdelete_res; + I have a doubt in this hunk, I do not understand when this condition will be hit? Because whenever we are setting shared_indstats->updated to true at the same time we are setting stats[idx] to shared stat. So I am not sure in what case the shared_indstats->updated will be true but stats[idx] is still pointing to NULL? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Oct 28, 2019 at 6:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Fri, Oct 25, 2019 at 2:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > For more detail of my idea it is that the first worker who entered to > > > > vacuum_delay_point adds its local value to shared value and reset the > > > > local value to 0. And then the worker sleeps if it exceeds > > > > VacuumCostLimit but before sleeping it can subtract VacuumCostLimit > > > > from the shared value. Since vacuum_delay_point are typically called > > > > per page processed I expect there will not such problem. Thoughts? > > > > > > Oh right, I assumed that when the local balance is exceeding the > > > VacuumCostLimit that time you are adding it to the shared value but > > > you are adding it to to shared value every time in vacuum_delay_point. > > > So I think your idea is correct. > > > > I've attached the updated patch set. > > > > First three patches add new variables and a callback to index AM. > > > > Next two patches are the main part to support parallel vacuum. I've > > incorporated all review comments I got so far. The memory layout of > > variable-length index statistics might be complex a bit. It's similar > > to the format of heap tuple header, having a null bitmap. And both the > > size of index statistics and actual data for each indexes follows. > > > > Last patch is a PoC patch that implements the shared vacuum cost > > balance. For now it's separated but after testing both approaches it > > will be merged to 0004 patch. I'll test both next week. > > > > This patch set can be applied on top of the patch[1] that improves > > gist index bulk-deletion. So canparallelvacuum of gist index is true. > > > > + /* Get the space for IndexBulkDeleteResult */ > + bulkdelete_res = GetIndexBulkDeleteResult(shared_indstats); > + > + /* > + * Update the pointer to the corresponding bulk-deletion result > + * if someone has already updated it. > + */ > + if (shared_indstats->updated && stats[idx] == NULL) > + stats[idx] = bulkdelete_res; > + > > I have a doubt in this hunk, I do not understand when this condition > will be hit? Because whenever we are setting shared_indstats->updated > to true at the same time we are setting stats[idx] to shared stat. So > I am not sure in what case the shared_indstats->updated will be true > but stats[idx] is still pointing to NULL? > I think it can be true in the case where one parallel vacuum worker vacuums the index that was vacuumed by other workers in previous index vacuum cycle. Suppose that worker-A and worker-B vacuumed index-A and index-B respectively. After that worker-A vacuum index-B in the next index vacuum cycle. In this case, shared_indstats->updated is true because worker-B already vacuumed in the previous vacuum cycle. On the other hand stats[idx] on worker-A is NULL because it's first time for worker-A to vacuum index-B. Therefore worker-A updates its stats[idx] to the bulk-deletion result on DSM in order to pass it to the index AM. Regards, -- Masahiko Sawada
On Tue, Oct 29, 2019 at 10:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Oct 28, 2019 at 6:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Fri, Oct 25, 2019 at 2:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > For more detail of my idea it is that the first worker who entered to > > > > > vacuum_delay_point adds its local value to shared value and reset the > > > > > local value to 0. And then the worker sleeps if it exceeds > > > > > VacuumCostLimit but before sleeping it can subtract VacuumCostLimit > > > > > from the shared value. Since vacuum_delay_point are typically called > > > > > per page processed I expect there will not such problem. Thoughts? > > > > > > > > Oh right, I assumed that when the local balance is exceeding the > > > > VacuumCostLimit that time you are adding it to the shared value but > > > > you are adding it to to shared value every time in vacuum_delay_point. > > > > So I think your idea is correct. > > > > > > I've attached the updated patch set. > > > > > > First three patches add new variables and a callback to index AM. > > > > > > Next two patches are the main part to support parallel vacuum. I've > > > incorporated all review comments I got so far. The memory layout of > > > variable-length index statistics might be complex a bit. It's similar > > > to the format of heap tuple header, having a null bitmap. And both the > > > size of index statistics and actual data for each indexes follows. > > > > > > Last patch is a PoC patch that implements the shared vacuum cost > > > balance. For now it's separated but after testing both approaches it > > > will be merged to 0004 patch. I'll test both next week. > > > > > > This patch set can be applied on top of the patch[1] that improves > > > gist index bulk-deletion. So canparallelvacuum of gist index is true. > > > > > > > + /* Get the space for IndexBulkDeleteResult */ > > + bulkdelete_res = GetIndexBulkDeleteResult(shared_indstats); > > + > > + /* > > + * Update the pointer to the corresponding bulk-deletion result > > + * if someone has already updated it. > > + */ > > + if (shared_indstats->updated && stats[idx] == NULL) > > + stats[idx] = bulkdelete_res; > > + > > > > I have a doubt in this hunk, I do not understand when this condition > > will be hit? Because whenever we are setting shared_indstats->updated > > to true at the same time we are setting stats[idx] to shared stat. So > > I am not sure in what case the shared_indstats->updated will be true > > but stats[idx] is still pointing to NULL? > > > > I think it can be true in the case where one parallel vacuum worker > vacuums the index that was vacuumed by other workers in previous index > vacuum cycle. Suppose that worker-A and worker-B vacuumed index-A and > index-B respectively. After that worker-A vacuum index-B in the next > index vacuum cycle. In this case, shared_indstats->updated is true > because worker-B already vacuumed in the previous vacuum cycle. On the > other hand stats[idx] on worker-A is NULL because it's first time for > worker-A to vacuum index-B. Therefore worker-A updates its stats[idx] > to the bulk-deletion result on DSM in order to pass it to the index > AM. Okay, that makes sense. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > I am thinking if we can write the patch for both the approaches (a. > > > > > > compute shared costs and try to delay based on that, b. try to divide > > > > > > the I/O cost among workers as described in the email above[1]) and do > > > > > > some tests to see the behavior of throttling, that might help us in > > > > > > deciding what is the best strategy to solve this problem, if any. > > > > > > What do you think? > > > > > > > > > > I agree with this idea. I can come up with a POC patch for approach > > > > > (b). Meanwhile, if someone is interested to quickly hack with the > > > > > approach (a) then we can do some testing and compare. Sawada-san, > > > > > by any chance will you be interested to write POC with approach (a)? > > > > > Otherwise, I will try to write it after finishing the first one > > > > > (approach b). > > > > > > > > > I have come up with the POC for approach (a). > > > > Can we compute the overall throttling (sleep time) in the operation > > > separately for heap and index, then divide the index's sleep_time with > > > a number of workers and add it to heap's sleep time? Then, it will be > > > a bit easier to compare the data between parallel and non-parallel > > > case. > I have come up with a patch to compute the total delay during the > vacuum. So the idea of computing the total cost delay is > > Total cost delay = Total dealy of heap scan + Total dealy of > index/worker; Patch is attached for the same. > > I have prepared this patch on the latest patch of the parallel > vacuum[1]. I have also rebased the patch for the approach [b] for > dividing the vacuum cost limit and done some testing for computing the > I/O throttling. Attached patches 0001-POC-compute-total-cost-delay > and 0002-POC-divide-vacuum-cost-limit can be applied on top of > v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch. I haven't > rebased on top of v31-0006, because v31-0006 is implementing the I/O > throttling with one approach and 0002-POC-divide-vacuum-cost-limit is > doing the same with another approach. But, > 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as > well (just 1-2 lines conflict). > > Testing: I have performed 2 tests, one with the same size indexes and > second with the different size indexes and measured total I/O delay > with the attached patch. > > Setup: > VacuumCostDelay=10ms > VacuumCostLimit=2000 > > Test1 (Same size index): > create table test(a int, b varchar, c varchar); > create index idx1 on test(a); > create index idx2 on test(b); > create index idx3 on test(c); > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from > generate_series(1,500000) as i; > delete from test where a < 200000; > > Vacuum (Head) Parallel Vacuum > Vacuum Cost Divide Patch > Total Delay 1784 (ms) 1398(ms) > 1938(ms) > > > Test2 (Variable size dead tuple in index) > create table test(a int, b varchar, c varchar); > create index idx1 on test(a); > create index idx2 on test(b) where a > 100000; > create index idx3 on test(c) where a > 150000; > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from > generate_series(1,500000) as i; > delete from test where a < 200000; > > Vacuum (Head) Parallel Vacuum > Vacuum Cost Divide Patch > Total Delay 1438 (ms) 1029(ms) > 1529(ms) > > > Conclusion: > 1. The tests prove that the total I/O delay is significantly less with > the parallel vacuum. > 2. With the vacuum cost divide the problem is solved but the delay bit > more compared to the non-parallel version. The reason could be the > problem discussed at[2], but it needs further investigation. > > Next, I will test with the v31-0006 (shared vacuum cost) patch. I > will also try to test different types of indexes. > Thank you for testing! I realized that v31-0006 patch doesn't work fine so I've attached the updated version patch that also incorporated some comments I got so far. Sorry for the inconvenience. I'll apply your 0001 patch and also test the total delay time. Regards, -- Masahiko Sawada
Attachment
On Tue, Oct 29, 2019 at 4:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > I am thinking if we can write the patch for both the approaches (a. > > > > > > > compute shared costs and try to delay based on that, b. try to divide > > > > > > > the I/O cost among workers as described in the email above[1]) and do > > > > > > > some tests to see the behavior of throttling, that might help us in > > > > > > > deciding what is the best strategy to solve this problem, if any. > > > > > > > What do you think? > > > > > > > > > > > > I agree with this idea. I can come up with a POC patch for approach > > > > > > (b). Meanwhile, if someone is interested to quickly hack with the > > > > > > approach (a) then we can do some testing and compare. Sawada-san, > > > > > > by any chance will you be interested to write POC with approach (a)? > > > > > > Otherwise, I will try to write it after finishing the first one > > > > > > (approach b). > > > > > > > > > > > I have come up with the POC for approach (a). > > > > > > Can we compute the overall throttling (sleep time) in the operation > > > > separately for heap and index, then divide the index's sleep_time with > > > > a number of workers and add it to heap's sleep time? Then, it will be > > > > a bit easier to compare the data between parallel and non-parallel > > > > case. > > I have come up with a patch to compute the total delay during the > > vacuum. So the idea of computing the total cost delay is > > > > Total cost delay = Total dealy of heap scan + Total dealy of > > index/worker; Patch is attached for the same. > > > > I have prepared this patch on the latest patch of the parallel > > vacuum[1]. I have also rebased the patch for the approach [b] for > > dividing the vacuum cost limit and done some testing for computing the > > I/O throttling. Attached patches 0001-POC-compute-total-cost-delay > > and 0002-POC-divide-vacuum-cost-limit can be applied on top of > > v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch. I haven't > > rebased on top of v31-0006, because v31-0006 is implementing the I/O > > throttling with one approach and 0002-POC-divide-vacuum-cost-limit is > > doing the same with another approach. But, > > 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as > > well (just 1-2 lines conflict). > > > > Testing: I have performed 2 tests, one with the same size indexes and > > second with the different size indexes and measured total I/O delay > > with the attached patch. > > > > Setup: > > VacuumCostDelay=10ms > > VacuumCostLimit=2000 > > > > Test1 (Same size index): > > create table test(a int, b varchar, c varchar); > > create index idx1 on test(a); > > create index idx2 on test(b); > > create index idx3 on test(c); > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from > > generate_series(1,500000) as i; > > delete from test where a < 200000; > > > > Vacuum (Head) Parallel Vacuum > > Vacuum Cost Divide Patch > > Total Delay 1784 (ms) 1398(ms) > > 1938(ms) > > > > > > Test2 (Variable size dead tuple in index) > > create table test(a int, b varchar, c varchar); > > create index idx1 on test(a); > > create index idx2 on test(b) where a > 100000; > > create index idx3 on test(c) where a > 150000; > > > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from > > generate_series(1,500000) as i; > > delete from test where a < 200000; > > > > Vacuum (Head) Parallel Vacuum > > Vacuum Cost Divide Patch > > Total Delay 1438 (ms) 1029(ms) > > 1529(ms) > > > > > > Conclusion: > > 1. The tests prove that the total I/O delay is significantly less with > > the parallel vacuum. > > 2. With the vacuum cost divide the problem is solved but the delay bit > > more compared to the non-parallel version. The reason could be the > > problem discussed at[2], but it needs further investigation. > > > > Next, I will test with the v31-0006 (shared vacuum cost) patch. I > > will also try to test different types of indexes. > > > > Thank you for testing! > > I realized that v31-0006 patch doesn't work fine so I've attached the > updated version patch that also incorporated some comments I got so > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > test the total delay time. > FWIW I'd like to share the results of total delay time evaluation of approach (a) (shared cost balance). I used the same workloads that Dilip shared and set vacuum_cost_delay to 10. The results of two test cases are here: * Test1 normal : 12656 ms (hit 50594, miss 5700, dirty 7258, total 63552) 2 workers : 17149 ms (hit 47673, miss 8647, dirty 9157, total 65477) 1 worker : 19498 ms (hit 45954, miss 10340, dirty 10517, total 66811) * Test2 normal : 1530 ms (hit 30645, miss 2, dirty 3, total 30650) 2 workers : 1538 ms (hit 30645, miss 2, dirty 3, total 30650) 1 worker : 1538 ms (hit 30645, miss 2, dirty 3, total 30650) 'hit', 'miss' and 'dirty' are the total numbers of buffer hits, buffer misses and flushing dirty buffer, respectively. 'total' is the sum of these three values. In this evaluation I expect that parallel vacuum cases delay time as much as the time of normal vacuum because the total number of pages to vacuum is the same and we have the shared cost balance value and each workers decide to sleep based on that value. According to the above Test1 results, we can see that there is a big difference in the total delay time among these cases (normal vacuum case is shortest), but the cause of this is that parallel vacuum had to to flush more dirty pages. Actually after increased shared_buffer I got expected results: * Test1 (after increased shared_buffers) normal : 2807 ms (hit 56295, miss 2, dirty 3, total 56300) 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300) 1 worker : 2841 ms (hit 56295, miss 2, dirty 3, total 56300) I updated the patch that computes the total cost delay shared by Dilip[1] so that it collects the number of buffer hits and so on, and have attached it. It can be applied on top of my latest patch set[1]. [1] https://www.postgresql.org/message-id/CAFiTN-thU-z8f04jO7xGMu5yUUpTpsBTvBrFW6EhRf-jGvEz%3Dg%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAD21AoAqT17QwKJ_sWOqRxNvg66wMw1oZZzf9Rt-E-zD%2BXOh_Q%40mail.gmail.com Regards, -- Masahiko Sawada
Attachment
On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Oct 29, 2019 at 4:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > > I am thinking if we can write the patch for both the approaches (a. > > > > > > > > compute shared costs and try to delay based on that, b. try to divide > > > > > > > > the I/O cost among workers as described in the email above[1]) and do > > > > > > > > some tests to see the behavior of throttling, that might help us in > > > > > > > > deciding what is the best strategy to solve this problem, if any. > > > > > > > > What do you think? > > > > > > > > > > > > > > I agree with this idea. I can come up with a POC patch for approach > > > > > > > (b). Meanwhile, if someone is interested to quickly hack with the > > > > > > > approach (a) then we can do some testing and compare. Sawada-san, > > > > > > > by any chance will you be interested to write POC with approach (a)? > > > > > > > Otherwise, I will try to write it after finishing the first one > > > > > > > (approach b). > > > > > > > > > > > > > I have come up with the POC for approach (a). > > > > > > > > Can we compute the overall throttling (sleep time) in the operation > > > > > separately for heap and index, then divide the index's sleep_time with > > > > > a number of workers and add it to heap's sleep time? Then, it will be > > > > > a bit easier to compare the data between parallel and non-parallel > > > > > case. > > > I have come up with a patch to compute the total delay during the > > > vacuum. So the idea of computing the total cost delay is > > > > > > Total cost delay = Total dealy of heap scan + Total dealy of > > > index/worker; Patch is attached for the same. > > > > > > I have prepared this patch on the latest patch of the parallel > > > vacuum[1]. I have also rebased the patch for the approach [b] for > > > dividing the vacuum cost limit and done some testing for computing the > > > I/O throttling. Attached patches 0001-POC-compute-total-cost-delay > > > and 0002-POC-divide-vacuum-cost-limit can be applied on top of > > > v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch. I haven't > > > rebased on top of v31-0006, because v31-0006 is implementing the I/O > > > throttling with one approach and 0002-POC-divide-vacuum-cost-limit is > > > doing the same with another approach. But, > > > 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as > > > well (just 1-2 lines conflict). > > > > > > Testing: I have performed 2 tests, one with the same size indexes and > > > second with the different size indexes and measured total I/O delay > > > with the attached patch. > > > > > > Setup: > > > VacuumCostDelay=10ms > > > VacuumCostLimit=2000 > > > > > > Test1 (Same size index): > > > create table test(a int, b varchar, c varchar); > > > create index idx1 on test(a); > > > create index idx2 on test(b); > > > create index idx3 on test(c); > > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from > > > generate_series(1,500000) as i; > > > delete from test where a < 200000; > > > > > > Vacuum (Head) Parallel Vacuum > > > Vacuum Cost Divide Patch > > > Total Delay 1784 (ms) 1398(ms) > > > 1938(ms) > > > > > > > > > Test2 (Variable size dead tuple in index) > > > create table test(a int, b varchar, c varchar); > > > create index idx1 on test(a); > > > create index idx2 on test(b) where a > 100000; > > > create index idx3 on test(c) where a > 150000; > > > > > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from > > > generate_series(1,500000) as i; > > > delete from test where a < 200000; > > > > > > Vacuum (Head) Parallel Vacuum > > > Vacuum Cost Divide Patch > > > Total Delay 1438 (ms) 1029(ms) > > > 1529(ms) > > > > > > > > > Conclusion: > > > 1. The tests prove that the total I/O delay is significantly less with > > > the parallel vacuum. > > > 2. With the vacuum cost divide the problem is solved but the delay bit > > > more compared to the non-parallel version. The reason could be the > > > problem discussed at[2], but it needs further investigation. > > > > > > Next, I will test with the v31-0006 (shared vacuum cost) patch. I > > > will also try to test different types of indexes. > > > > > > > Thank you for testing! > > > > I realized that v31-0006 patch doesn't work fine so I've attached the > > updated version patch that also incorporated some comments I got so > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > test the total delay time. > > > > FWIW I'd like to share the results of total delay time evaluation of > approach (a) (shared cost balance). I used the same workloads that > Dilip shared and set vacuum_cost_delay to 10. The results of two test > cases are here: > > * Test1 > normal : 12656 ms (hit 50594, miss 5700, dirty 7258, total 63552) > 2 workers : 17149 ms (hit 47673, miss 8647, dirty 9157, total 65477) > 1 worker : 19498 ms (hit 45954, miss 10340, dirty 10517, total 66811) > > * Test2 > normal : 1530 ms (hit 30645, miss 2, dirty 3, total 30650) > 2 workers : 1538 ms (hit 30645, miss 2, dirty 3, total 30650) > 1 worker : 1538 ms (hit 30645, miss 2, dirty 3, total 30650) > > 'hit', 'miss' and 'dirty' are the total numbers of buffer hits, buffer > misses and flushing dirty buffer, respectively. 'total' is the sum of > these three values. > > In this evaluation I expect that parallel vacuum cases delay time as > much as the time of normal vacuum because the total number of pages to > vacuum is the same and we have the shared cost balance value and each > workers decide to sleep based on that value. According to the above > Test1 results, we can see that there is a big difference in the total > delay time among these cases (normal vacuum case is shortest), but > the cause of this is that parallel vacuum had to to flush more dirty > pages. Actually after increased shared_buffer I got expected results: > > * Test1 (after increased shared_buffers) > normal : 2807 ms (hit 56295, miss 2, dirty 3, total 56300) > 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300) > 1 worker : 2841 ms (hit 56295, miss 2, dirty 3, total 56300) > > I updated the patch that computes the total cost delay shared by > Dilip[1] so that it collects the number of buffer hits and so on, and > have attached it. It can be applied on top of my latest patch set[1]. Thanks, Sawada-san. In my next test, I will use this updated patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Oct 29, 2019 at 3:11 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Tue, Oct 29, 2019 at 4:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > > > > I am thinking if we can write the patch for both the approaches (a. > > > > > > > > > compute shared costs and try to delay based on that, b. try to divide > > > > > > > > > the I/O cost among workers as described in the email above[1]) and do > > > > > > > > > some tests to see the behavior of throttling, that might help us in > > > > > > > > > deciding what is the best strategy to solve this problem, if any. > > > > > > > > > What do you think? > > > > > > > > > > > > > > > > I agree with this idea. I can come up with a POC patch for approach > > > > > > > > (b). Meanwhile, if someone is interested to quickly hack with the > > > > > > > > approach (a) then we can do some testing and compare. Sawada-san, > > > > > > > > by any chance will you be interested to write POC with approach (a)? > > > > > > > > Otherwise, I will try to write it after finishing the first one > > > > > > > > (approach b). > > > > > > > > > > > > > > > I have come up with the POC for approach (a). > > > > > > > > > > Can we compute the overall throttling (sleep time) in the operation > > > > > > separately for heap and index, then divide the index's sleep_time with > > > > > > a number of workers and add it to heap's sleep time? Then, it will be > > > > > > a bit easier to compare the data between parallel and non-parallel > > > > > > case. > > > > I have come up with a patch to compute the total delay during the > > > > vacuum. So the idea of computing the total cost delay is > > > > > > > > Total cost delay = Total dealy of heap scan + Total dealy of > > > > index/worker; Patch is attached for the same. > > > > > > > > I have prepared this patch on the latest patch of the parallel > > > > vacuum[1]. I have also rebased the patch for the approach [b] for > > > > dividing the vacuum cost limit and done some testing for computing the > > > > I/O throttling. Attached patches 0001-POC-compute-total-cost-delay > > > > and 0002-POC-divide-vacuum-cost-limit can be applied on top of > > > > v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch. I haven't > > > > rebased on top of v31-0006, because v31-0006 is implementing the I/O > > > > throttling with one approach and 0002-POC-divide-vacuum-cost-limit is > > > > doing the same with another approach. But, > > > > 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as > > > > well (just 1-2 lines conflict). > > > > > > > > Testing: I have performed 2 tests, one with the same size indexes and > > > > second with the different size indexes and measured total I/O delay > > > > with the attached patch. > > > > > > > > Setup: > > > > VacuumCostDelay=10ms > > > > VacuumCostLimit=2000 > > > > > > > > Test1 (Same size index): > > > > create table test(a int, b varchar, c varchar); > > > > create index idx1 on test(a); > > > > create index idx2 on test(b); > > > > create index idx3 on test(c); > > > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from > > > > generate_series(1,500000) as i; > > > > delete from test where a < 200000; > > > > > > > > Vacuum (Head) Parallel Vacuum > > > > Vacuum Cost Divide Patch > > > > Total Delay 1784 (ms) 1398(ms) > > > > 1938(ms) > > > > > > > > > > > > Test2 (Variable size dead tuple in index) > > > > create table test(a int, b varchar, c varchar); > > > > create index idx1 on test(a); > > > > create index idx2 on test(b) where a > 100000; > > > > create index idx3 on test(c) where a > 150000; > > > > > > > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from > > > > generate_series(1,500000) as i; > > > > delete from test where a < 200000; > > > > > > > > Vacuum (Head) Parallel Vacuum > > > > Vacuum Cost Divide Patch > > > > Total Delay 1438 (ms) 1029(ms) > > > > 1529(ms) > > > > > > > > > > > > Conclusion: > > > > 1. The tests prove that the total I/O delay is significantly less with > > > > the parallel vacuum. > > > > 2. With the vacuum cost divide the problem is solved but the delay bit > > > > more compared to the non-parallel version. The reason could be the > > > > problem discussed at[2], but it needs further investigation. > > > > > > > > Next, I will test with the v31-0006 (shared vacuum cost) patch. I > > > > will also try to test different types of indexes. > > > > > > > > > > Thank you for testing! > > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the > > > updated version patch that also incorporated some comments I got so > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > > test the total delay time. > > > > > > > FWIW I'd like to share the results of total delay time evaluation of > > approach (a) (shared cost balance). I used the same workloads that > > Dilip shared and set vacuum_cost_delay to 10. The results of two test > > cases are here: > > > > * Test1 > > normal : 12656 ms (hit 50594, miss 5700, dirty 7258, total 63552) > > 2 workers : 17149 ms (hit 47673, miss 8647, dirty 9157, total 65477) > > 1 worker : 19498 ms (hit 45954, miss 10340, dirty 10517, total 66811) > > > > * Test2 > > normal : 1530 ms (hit 30645, miss 2, dirty 3, total 30650) > > 2 workers : 1538 ms (hit 30645, miss 2, dirty 3, total 30650) > > 1 worker : 1538 ms (hit 30645, miss 2, dirty 3, total 30650) > > > > 'hit', 'miss' and 'dirty' are the total numbers of buffer hits, buffer > > misses and flushing dirty buffer, respectively. 'total' is the sum of > > these three values. > > > > In this evaluation I expect that parallel vacuum cases delay time as > > much as the time of normal vacuum because the total number of pages to > > vacuum is the same and we have the shared cost balance value and each > > workers decide to sleep based on that value. According to the above > > Test1 results, we can see that there is a big difference in the total > > delay time among these cases (normal vacuum case is shortest), but > > the cause of this is that parallel vacuum had to to flush more dirty > > pages. Actually after increased shared_buffer I got expected results: > > > > * Test1 (after increased shared_buffers) > > normal : 2807 ms (hit 56295, miss 2, dirty 3, total 56300) > > 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300) > > 1 worker : 2841 ms (hit 56295, miss 2, dirty 3, total 56300) > > > > I updated the patch that computes the total cost delay shared by > > Dilip[1] so that it collects the number of buffer hits and so on, and > > have attached it. It can be applied on top of my latest patch set[1]. > > Thanks, Sawada-san. In my next test, I will use this updated patch. > Few comments on the latest patch. +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc) +{ ... + + stats = (IndexBulkDeleteResult **) + palloc0(nindexes * sizeof(IndexBulkDeleteResult *)); + + if (lvshared->maintenance_work_mem_worker > 0) + maintenance_work_mem = lvshared->maintenance_work_mem_worker; So for a worker, we have set the new value of the maintenance_work_mem, But if the leader is participating in the index vacuuming then shouldn't we set the new value of the maintenance_work_mem for the leader as well? +static void +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes) +{ + char *p = (char *) GetSharedIndStats(lvshared); + int vac_work_mem = IsAutoVacuumWorkerProcess() && + autovacuum_work_mem != -1 ? + autovacuum_work_mem : maintenance_work_mem; + int nindexes_mwm = 0; + int i; Can this ever be called from the Autovacuum Worker? I think instead of adding handling for the auto vacuum worker we can have an assert. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Oct 28, 2019 at 3:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Oct 27, 2019 at 12:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > I haven't yet read the new set of the patch. But, I have noticed one > > thing. That we are getting the size of the statistics using the AM > > routine. But, we are copying those statistics from local memory to > > the shared memory directly using the memcpy. Wouldn't it be a good > > idea to have an AM specific routine to get it copied from the local > > memory to the shared memory? I am not sure it is worth it or not but > > my thought behind this point is that it will give AM to have local > > stats in any form ( like they can store a pointer in that ) but they > > can serialize that while copying to shared stats. And, later when > > shared stats are passed back to the Am then it can deserialize in its > > local form and use it. > > > > You have a point, but after changing the gist index, we don't have any > current usage for indexes that need something like that. So, on one > side there is some value in having an API to copy the stats, but on > the other side without having clear usage of an API, it might not be > good to expose a new API for the same. I think we can expose such an > API in the future if there is a need for the same. Do you or anyone > know of any external IndexAM that has such a need? > > Few minor comments while glancing through the latest patchset. > > 1. I think you can merge 0001*, 0002*, 0003* patch into one patch as > all three expose new variable/function from IndexAmRoutine. Fixed. > > 2. > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes) > +{ > + char *p = (char *) GetSharedIndStats(lvshared); > + int vac_work_mem = IsAutoVacuumWorkerProcess() && > + autovacuum_work_mem != -1 ? > + autovacuum_work_mem : maintenance_work_mem; > > I think this function won't be called from AutoVacuumWorkerProcess at > least not as of now, so isn't it a better idea to have an Assert for > it? Fixed. > > 3. > +void > +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc) > > This function is for performing a parallel operation on the index, so > why to start with heap? Because parallel vacuum supports only indexes that are created on heaps. > It is better to name it as > index_parallel_vacuum_main or simply parallel_vacuum_main. I'm concerned that both names index_parallel_vacuum_main and parallel_vacuum_main seem to be generic in spite of these codes are heap-specific code. > > 4. > /* useindex = true means two-pass strategy; false means one-pass */ > @@ -128,17 +280,12 @@ typedef struct LVRelStats > BlockNumber pages_removed; > double tuples_deleted; > BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */ > - /* List of TIDs of tuples we intend to delete */ > - /* NB: this list is ordered by TID address */ > - int num_dead_tuples; /* current # of entries */ > - int max_dead_tuples; /* # slots allocated in array */ > - ItemPointer dead_tuples; /* array of ItemPointerData */ > + LVDeadTuples *dead_tuples; > int num_index_scans; > TransactionId latestRemovedXid; > bool lock_waiter_detected; > } LVRelStats; > > - > /* A few variables that don't seem worth passing around as parameters */ > static int elevel = -1; > > It seems like a spurious line removal. Fixed. These above comments are incorporated in the latest patch set(v32) [1]. [1] https://www.postgresql.org/message-id/CAD21AoAqT17QwKJ_sWOqRxNvg66wMw1oZZzf9Rt-E-zD%2BXOh_Q%40mail.gmail.com Regards, -- Masahiko Sawada
On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Actually after increased shared_buffer I got expected results: > > * Test1 (after increased shared_buffers) > normal : 2807 ms (hit 56295, miss 2, dirty 3, total 56300) > 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300) > 1 worker : 2841 ms (hit 56295, miss 2, dirty 3, total 56300) > > I updated the patch that computes the total cost delay shared by > Dilip[1] so that it collects the number of buffer hits and so on, and > have attached it. It can be applied on top of my latest patch set[1]. I tried to repeat the test to see the IO delay with v32-0004-PoC-shared-vacuum-cost-balance.patch [1]. I tried with shared memory 4GB. I recreated the database and restarted the server before each run. But, I could not see the same I/O delay and cost is also not the same. Can you please tell me how much shared buffers did you set? Test1 (4GB shared buffers) normal: stats delay 1348.160000, hit 68952, miss 2, dirty 10063, total 79017 1 worker: stats delay 1821.255000, hit 78184, miss 2, dirty 14095, total 92281 2 workers: stats delay 2224.415000, hit 86482, miss 2, dirty 17665, total 104149 [1] https://www.postgresql.org/message-id/CAD21AoAqT17QwKJ_sWOqRxNvg66wMw1oZZzf9Rt-E-zD%2BXOh_Q%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 31, 2019 at 11:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Actually after increased shared_buffer I got expected results: > > > > * Test1 (after increased shared_buffers) > > normal : 2807 ms (hit 56295, miss 2, dirty 3, total 56300) > > 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300) > > 1 worker : 2841 ms (hit 56295, miss 2, dirty 3, total 56300) > > > > I updated the patch that computes the total cost delay shared by > > Dilip[1] so that it collects the number of buffer hits and so on, and > > have attached it. It can be applied on top of my latest patch set[1]. While reading your modified patch (PoC-delay-stats.patch), I have noticed that in my patch I used below formulae to compute the total delay total delay = delay in heap scan + (total delay of index scan /nworkers). But, in your patch, I can see that it is just total sum of all delay. IMHO, the total sleep time during the index vacuum phase must be divided by the number of workers, because even if at some point, all the workers go for sleep (e.g. 10 msec) then the delay in I/O will be only for 10msec not 30 msec. I think the same is discussed upthread[1] [1] https://www.postgresql.org/message-id/CAA4eK1%2BPeiFLdTuwrE6CvbNdx80E-O%3DZxCuWB2maREKFD-RaCA%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 31, 2019 at 3:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Oct 31, 2019 at 11:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > Actually after increased shared_buffer I got expected results: > > > > > > * Test1 (after increased shared_buffers) > > > normal : 2807 ms (hit 56295, miss 2, dirty 3, total 56300) > > > 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300) > > > 1 worker : 2841 ms (hit 56295, miss 2, dirty 3, total 56300) > > > > > > I updated the patch that computes the total cost delay shared by > > > Dilip[1] so that it collects the number of buffer hits and so on, and > > > have attached it. It can be applied on top of my latest patch set[1]. > > While reading your modified patch (PoC-delay-stats.patch), I have > noticed that in my patch I used below formulae to compute the total > delay > total delay = delay in heap scan + (total delay of index scan > /nworkers). But, in your patch, I can see that it is just total sum of > all delay. IMHO, the total sleep time during the index vacuum phase > must be divided by the number of workers, because even if at some > point, all the workers go for sleep (e.g. 10 msec) then the delay in > I/O will be only for 10msec not 30 msec. I think the same is > discussed upthread[1] > I think that two approaches make parallel vacuum worker wait in different way: in approach(a) the vacuum delay works as if vacuum is performed by single process, on the other hand in approach(b) the vacuum delay work for each workers independently. Suppose that the total number of blocks to vacuum is 10,000 blocks, the cost per blocks is 10, the cost limit is 200 and sleep time is 5 ms. In single process vacuum the total sleep time is 2,500ms (= (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms. Because all parallel vacuum workers use the shared balance value and a worker sleeps once the balance value exceeds the limit. In approach(b), since the cost limit is divided evenly the value of each workers is 40 (e.g. when 5 parallel degree). And suppose each workers processes blocks evenly, the total sleep time of all workers is 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can compute the sleep time of approach(b) by dividing the total value by the number of parallel workers. IOW the approach(b) makes parallel vacuum delay much more than normal vacuum and parallel vacuum with approach(a) even with the same settings. Which behaviors do we expect? I thought the vacuum delay for parallel vacuum should work as if it's a single process vacuum as we did for memory usage. I might be missing something. If we prefer approach(b) I should change the patch so that the leader process divides the cost limit evenly. Regards, -- Masahiko Sawada
On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Oct 31, 2019 at 3:45 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Thu, Oct 31, 2019 at 11:33 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > Actually after increased shared_buffer I got expected results: > > > > > > > > * Test1 (after increased shared_buffers) > > > > normal : 2807 ms (hit 56295, miss 2, dirty 3, total 56300) > > > > 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300) > > > > 1 worker : 2841 ms (hit 56295, miss 2, dirty 3, total 56300) > > > > > > > > I updated the patch that computes the total cost delay shared by > > > > Dilip[1] so that it collects the number of buffer hits and so on, and > > > > have attached it. It can be applied on top of my latest patch set[1]. > > > > While reading your modified patch (PoC-delay-stats.patch), I have > > noticed that in my patch I used below formulae to compute the total > > delay > > total delay = delay in heap scan + (total delay of index scan > > /nworkers). But, in your patch, I can see that it is just total sum of > > all delay. IMHO, the total sleep time during the index vacuum phase > > must be divided by the number of workers, because even if at some > > point, all the workers go for sleep (e.g. 10 msec) then the delay in > > I/O will be only for 10msec not 30 msec. I think the same is > > discussed upthread[1] > > > > I think that two approaches make parallel vacuum worker wait in > different way: in approach(a) the vacuum delay works as if vacuum is > performed by single process, on the other hand in approach(b) the > vacuum delay work for each workers independently. > > Suppose that the total number of blocks to vacuum is 10,000 blocks, > the cost per blocks is 10, the cost limit is 200 and sleep time is 5 > ms. In single process vacuum the total sleep time is 2,500ms (= > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms. > Because all parallel vacuum workers use the shared balance value and a > worker sleeps once the balance value exceeds the limit. In > approach(b), since the cost limit is divided evenly the value of each > workers is 40 (e.g. when 5 parallel degree). And suppose each workers > processes blocks evenly, the total sleep time of all workers is > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can > compute the sleep time of approach(b) by dividing the total value by > the number of parallel workers. > > IOW the approach(b) makes parallel vacuum delay much more than normal > vacuum and parallel vacuum with approach(a) even with the same > settings. Which behaviors do we expect? I thought the vacuum delay for > parallel vacuum should work as if it's a single process vacuum as we > did for memory usage. I might be missing something. If we prefer > approach(b) I should change the patch so that the leader process > divides the cost limit evenly. > I have repeated the same test (test1 and test2)[1] with a higher shared buffer (1GB). Currently, I have used the same formula for computing the total delay heap scan delay + index vacuuming delay / workers. Because, In my opinion, multiple workers are doing I/O here so the total delay should also be in multiple of the number of workers. So if we want to compare the delay with the sequential vacuum then we should divide total delay by the number of workers. But, I am not sure whether computing the total delay is the right way to compute the I/O throttling or not. But, I support the approach (b) for dividing the I/O limit because auto vacuum workers are already operating with this approach. test1: normal: stats delay 1348.160000, hit 68952, miss 2, dirty 10063, total 79017 1 worker: stats delay 1349.585000, hit 68954, miss 2, dirty 10146, total 79102 (cost divide patch) 2 worker: stats delay 1341.416141, hit 68956, miss 2, dirty 10036, total 78994 (cost divide patch) 1 worker: stats delay 1025.495000, hit 78184, miss 2, dirty 14066, total 92252 (share cost patch) 2 worker: stats delay 904.366667, hit 86482, miss 2, dirty 17806, total 104290 (share cost patch) test2: normal: stats delay 530.475000, hit 36982, miss 2, dirty 3488, total 40472 1 worker: stats delay 530.700000, hit 36984, miss 2, dirty 3527, total 40513 (cost divide patch) 2 worker: stats delay 530.675000, hit 36984, miss 2, dirty 3532, total 40518 (cost divide patch) 1 worker: stats delay 490.570000, hit 39090, miss 2, dirty 3497, total 42589 (share cost patch) 2 worker: stats delay 480.571667, hit 39050, miss 2, dirty 3819, total 42871 (share cost patch) So with higher, shared buffers, I can see with approach (b) we can see the same total delay. With approach (a) I can see a bit less total delay. But, a point to be noted that I have used the same formulae for computing the total delay for both the approaches. But, Sawada-san explained in the above mail that it may not be the right way to computing the total delay for the approach (a). But my take is that whether we are working with shared cost or we are dividing the cost, the delay must be divided by number of workers in the parallel phase. @Amit Kapila, what is your opinion on this? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Oct 28, 2019 at 1:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Oct 28, 2019 at 12:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sun, Oct 27, 2019 at 12:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > I haven't yet read the new set of the patch. But, I have noticed one > > > thing. That we are getting the size of the statistics using the AM > > > routine. But, we are copying those statistics from local memory to > > > the shared memory directly using the memcpy. Wouldn't it be a good > > > idea to have an AM specific routine to get it copied from the local > > > memory to the shared memory? I am not sure it is worth it or not but > > > my thought behind this point is that it will give AM to have local > > > stats in any form ( like they can store a pointer in that ) but they > > > can serialize that while copying to shared stats. And, later when > > > shared stats are passed back to the Am then it can deserialize in its > > > local form and use it. > > > > > > > You have a point, but after changing the gist index, we don't have any > > current usage for indexes that need something like that. So, on one > > side there is some value in having an API to copy the stats, but on > > the other side without having clear usage of an API, it might not be > > good to expose a new API for the same. I think we can expose such an > > API in the future if there is a need for the same. > I agree with the point. But, the current patch exposes an API for > estimating the size for the statistics. So IMHO, either we expose > both APIs for estimating the size of the stats and copy the stats or > none. Am I missing something here? > I think the first one is a must as the things stand today because otherwise, we won't be able to copy the stats. The second one (expose an API to copy stats) is good to have but there is no usage of it immediately. We can expose the second API considering the future need but as there is no valid case as of now, it will be difficult to test and we are also not sure whether in future any IndexAM will require such an API. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I think that two approaches make parallel vacuum worker wait in > different way: in approach(a) the vacuum delay works as if vacuum is > performed by single process, on the other hand in approach(b) the > vacuum delay work for each workers independently. > > Suppose that the total number of blocks to vacuum is 10,000 blocks, > the cost per blocks is 10, the cost limit is 200 and sleep time is 5 > ms. In single process vacuum the total sleep time is 2,500ms (= > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms. > Because all parallel vacuum workers use the shared balance value and a > worker sleeps once the balance value exceeds the limit. In > approach(b), since the cost limit is divided evenly the value of each > workers is 40 (e.g. when 5 parallel degree). And suppose each workers > processes blocks evenly, the total sleep time of all workers is > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can > compute the sleep time of approach(b) by dividing the total value by > the number of parallel workers. > > IOW the approach(b) makes parallel vacuum delay much more than normal > vacuum and parallel vacuum with approach(a) even with the same > settings. Which behaviors do we expect? > Yeah, this is an important thing to decide. I don't think that the conclusion you are drawing is correct because it that is true then the same applies to the current autovacuum work division where we divide the cost_limit among workers but the cost_delay is same (see autovac_balance_cost). Basically, if we consider the delay time of each worker independently, then it would appear that a parallel vacuum delay with approach (b) is more, but that is true only if the workers run serially which is not true. > I thought the vacuum delay for > parallel vacuum should work as if it's a single process vacuum as we > did for memory usage. I might be missing something. If we prefer > approach(b) I should change the patch so that the leader process > divides the cost limit evenly. > I am also not completely sure which approach is better but I slightly lean towards approach (b). I think we need input from some other people as well. I will start a separate thread to discuss this and see if that helps to get the input from others. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sun, Nov 3, 2019 at 9:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > I think that two approaches make parallel vacuum worker wait in > > different way: in approach(a) the vacuum delay works as if vacuum is > > performed by single process, on the other hand in approach(b) the > > vacuum delay work for each workers independently. > > > > Suppose that the total number of blocks to vacuum is 10,000 blocks, > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5 > > ms. In single process vacuum the total sleep time is 2,500ms (= > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms. > > Because all parallel vacuum workers use the shared balance value and a > > worker sleeps once the balance value exceeds the limit. In > > approach(b), since the cost limit is divided evenly the value of each > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers > > processes blocks evenly, the total sleep time of all workers is > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can > > compute the sleep time of approach(b) by dividing the total value by > > the number of parallel workers. > > > > IOW the approach(b) makes parallel vacuum delay much more than normal > > vacuum and parallel vacuum with approach(a) even with the same > > settings. Which behaviors do we expect? I thought the vacuum delay for > > parallel vacuum should work as if it's a single process vacuum as we > > did for memory usage. I might be missing something. If we prefer > > approach(b) I should change the patch so that the leader process > > divides the cost limit evenly. > > > I have repeated the same test (test1 and test2)[1] with a higher > shared buffer (1GB). Currently, I have used the same formula for > computing the total delay > heap scan delay + index vacuuming delay / workers. Because, In my > opinion, multiple workers are doing I/O here so the total delay should > also be in multiple > of the number of workers. So if we want to compare the delay with the > sequential vacuum then we should divide total delay by the number of > workers. But, I am not > sure whether computing the total delay is the right way to compute the > I/O throttling or not. But, I support the approach (b) for dividing > the I/O limit because > auto vacuum workers are already operating with this approach. > > test1: > normal: stats delay 1348.160000, hit 68952, miss 2, dirty 10063, total 79017 > 1 worker: stats delay 1349.585000, hit 68954, miss 2, dirty 10146, > total 79102 (cost divide patch) > 2 worker: stats delay 1341.416141, hit 68956, miss 2, dirty 10036, > total 78994 (cost divide patch) > 1 worker: stats delay 1025.495000, hit 78184, miss 2, dirty 14066, > total 92252 (share cost patch) > 2 worker: stats delay 904.366667, hit 86482, miss 2, dirty 17806, > total 104290 (share cost patch) > > test2: > normal: stats delay 530.475000, hit 36982, miss 2, dirty 3488, total 40472 > 1 worker: stats delay 530.700000, hit 36984, miss 2, dirty 3527, total > 40513 (cost divide patch) > 2 worker: stats delay 530.675000, hit 36984, miss 2, dirty 3532, total > 40518 (cost divide patch) > 1 worker: stats delay 490.570000, hit 39090, miss 2, dirty 3497, total > 42589 (share cost patch) > 2 worker: stats delay 480.571667, hit 39050, miss 2, dirty 3819, total > 42871 (share cost patch) > > So with higher, shared buffers, I can see with approach (b) we can > see the same total delay. With approach (a) I can see a bit less > total delay. But, a point to be noted that I have used the same > formulae for computing the total delay for both the approaches. But, > Sawada-san explained in the above mail that it may not be the right > way to computing the total delay for the approach (a). But my take is > that whether we are working with shared cost or we are dividing the > cost, the delay must be divided by number of workers in the parallel > phase. > Why do you think so? I think with approach (b) if all the workers are doing equal amount of I/O, they will probably sleep at the same time whereas with approach (a) each of them will sleep at different times. So, probably dividing the delay in approach (b) makes more sense. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Nov 4, 2019 at 10:45 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Nov 3, 2019 at 9:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > I think that two approaches make parallel vacuum worker wait in > > > different way: in approach(a) the vacuum delay works as if vacuum is > > > performed by single process, on the other hand in approach(b) the > > > vacuum delay work for each workers independently. > > > > > > Suppose that the total number of blocks to vacuum is 10,000 blocks, > > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5 > > > ms. In single process vacuum the total sleep time is 2,500ms (= > > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms. > > > Because all parallel vacuum workers use the shared balance value and a > > > worker sleeps once the balance value exceeds the limit. In > > > approach(b), since the cost limit is divided evenly the value of each > > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers > > > processes blocks evenly, the total sleep time of all workers is > > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can > > > compute the sleep time of approach(b) by dividing the total value by > > > the number of parallel workers. > > > > > > IOW the approach(b) makes parallel vacuum delay much more than normal > > > vacuum and parallel vacuum with approach(a) even with the same > > > settings. Which behaviors do we expect? I thought the vacuum delay for > > > parallel vacuum should work as if it's a single process vacuum as we > > > did for memory usage. I might be missing something. If we prefer > > > approach(b) I should change the patch so that the leader process > > > divides the cost limit evenly. > > > > > I have repeated the same test (test1 and test2)[1] with a higher > > shared buffer (1GB). Currently, I have used the same formula for > > computing the total delay > > heap scan delay + index vacuuming delay / workers. Because, In my > > opinion, multiple workers are doing I/O here so the total delay should > > also be in multiple > > of the number of workers. So if we want to compare the delay with the > > sequential vacuum then we should divide total delay by the number of > > workers. But, I am not > > sure whether computing the total delay is the right way to compute the > > I/O throttling or not. But, I support the approach (b) for dividing > > the I/O limit because > > auto vacuum workers are already operating with this approach. > > > > test1: > > normal: stats delay 1348.160000, hit 68952, miss 2, dirty 10063, total 79017 > > 1 worker: stats delay 1349.585000, hit 68954, miss 2, dirty 10146, > > total 79102 (cost divide patch) > > 2 worker: stats delay 1341.416141, hit 68956, miss 2, dirty 10036, > > total 78994 (cost divide patch) > > 1 worker: stats delay 1025.495000, hit 78184, miss 2, dirty 14066, > > total 92252 (share cost patch) > > 2 worker: stats delay 904.366667, hit 86482, miss 2, dirty 17806, > > total 104290 (share cost patch) > > > > test2: > > normal: stats delay 530.475000, hit 36982, miss 2, dirty 3488, total 40472 > > 1 worker: stats delay 530.700000, hit 36984, miss 2, dirty 3527, total > > 40513 (cost divide patch) > > 2 worker: stats delay 530.675000, hit 36984, miss 2, dirty 3532, total > > 40518 (cost divide patch) > > 1 worker: stats delay 490.570000, hit 39090, miss 2, dirty 3497, total > > 42589 (share cost patch) > > 2 worker: stats delay 480.571667, hit 39050, miss 2, dirty 3819, total > > 42871 (share cost patch) > > > > So with higher, shared buffers, I can see with approach (b) we can > > see the same total delay. With approach (a) I can see a bit less > > total delay. But, a point to be noted that I have used the same > > formulae for computing the total delay for both the approaches. But, > > Sawada-san explained in the above mail that it may not be the right > > way to computing the total delay for the approach (a). But my take is > > that whether we are working with shared cost or we are dividing the > > cost, the delay must be divided by number of workers in the parallel > > phase. > > > > Why do you think so? I think with approach (b) if all the workers are > doing equal amount of I/O, they will probably sleep at the same time > whereas with approach (a) each of them will sleep at different times. > So, probably dividing the delay in approach (b) makes more sense. Just to be clear, I did not mean that we divide the sleep time for each worker. Actually, I meant how to project the total delay in the test patch. So I think if we directly want to compare the sleep time of the sequential vs parallel then it's not fair to just compare the total sleep time because when multiple workers are working parallelly shouldn't we need to consider their average sleep time? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Nov 4, 2019 at 10:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > I think that two approaches make parallel vacuum worker wait in > > different way: in approach(a) the vacuum delay works as if vacuum is > > performed by single process, on the other hand in approach(b) the > > vacuum delay work for each workers independently. > > > > Suppose that the total number of blocks to vacuum is 10,000 blocks, > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5 > > ms. In single process vacuum the total sleep time is 2,500ms (= > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms. > > Because all parallel vacuum workers use the shared balance value and a > > worker sleeps once the balance value exceeds the limit. In > > approach(b), since the cost limit is divided evenly the value of each > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers > > processes blocks evenly, the total sleep time of all workers is > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can > > compute the sleep time of approach(b) by dividing the total value by > > the number of parallel workers. > > > > IOW the approach(b) makes parallel vacuum delay much more than normal > > vacuum and parallel vacuum with approach(a) even with the same > > settings. Which behaviors do we expect? > > > > Yeah, this is an important thing to decide. I don't think that the > conclusion you are drawing is correct because it that is true then the > same applies to the current autovacuum work division where we divide > the cost_limit among workers but the cost_delay is same (see > autovac_balance_cost). Basically, if we consider the delay time of > each worker independently, then it would appear that a parallel vacuum > delay with approach (b) is more, but that is true only if the workers > run serially which is not true. > > > I thought the vacuum delay for > > parallel vacuum should work as if it's a single process vacuum as we > > did for memory usage. I might be missing something. If we prefer > > approach(b) I should change the patch so that the leader process > > divides the cost limit evenly. > > > > I am also not completely sure which approach is better but I slightly > lean towards approach (b). I think we need input from some other > people as well. I will start a separate thread to discuss this and > see if that helps to get the input from others. +1 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, 4 Nov 2019 at 14:02, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > I think that two approaches make parallel vacuum worker wait in > > different way: in approach(a) the vacuum delay works as if vacuum is > > performed by single process, on the other hand in approach(b) the > > vacuum delay work for each workers independently. > > > > Suppose that the total number of blocks to vacuum is 10,000 blocks, > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5 > > ms. In single process vacuum the total sleep time is 2,500ms (= > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms. > > Because all parallel vacuum workers use the shared balance value and a > > worker sleeps once the balance value exceeds the limit. In > > approach(b), since the cost limit is divided evenly the value of each > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers > > processes blocks evenly, the total sleep time of all workers is > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can > > compute the sleep time of approach(b) by dividing the total value by > > the number of parallel workers. > > > > IOW the approach(b) makes parallel vacuum delay much more than normal > > vacuum and parallel vacuum with approach(a) even with the same > > settings. Which behaviors do we expect? > > > > Yeah, this is an important thing to decide. I don't think that the > conclusion you are drawing is correct because it that is true then the > same applies to the current autovacuum work division where we divide > the cost_limit among workers but the cost_delay is same (see > autovac_balance_cost). Basically, if we consider the delay time of > each worker independently, then it would appear that a parallel vacuum > delay with approach (b) is more, but that is true only if the workers > run serially which is not true. > > > I thought the vacuum delay for > > parallel vacuum should work as if it's a single process vacuum as we > > did for memory usage. I might be missing something. If we prefer > > approach(b) I should change the patch so that the leader process > > divides the cost limit evenly. > > > > I am also not completely sure which approach is better but I slightly > lean towards approach (b). Can we get the same sleep time as approach (b) if we divide the cost limit by the number of workers and have the shared cost balance (i.e. approach (a) with dividing the cost limit)? Currently the approach (b) seems better but I'm concerned that it might unnecessarily delay vacuum if some indexes are very small or bulk-deletions of indexes does almost nothing such as brin. > > I think we need input from some other > people as well. I will start a separate thread to discuss this and > see if that helps to get the input from others. +1 -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Nov 4, 2019 at 1:00 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 4 Nov 2019 at 14:02, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > I think that two approaches make parallel vacuum worker wait in > > > different way: in approach(a) the vacuum delay works as if vacuum is > > > performed by single process, on the other hand in approach(b) the > > > vacuum delay work for each workers independently. > > > > > > Suppose that the total number of blocks to vacuum is 10,000 blocks, > > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5 > > > ms. In single process vacuum the total sleep time is 2,500ms (= > > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms. > > > Because all parallel vacuum workers use the shared balance value and a > > > worker sleeps once the balance value exceeds the limit. In > > > approach(b), since the cost limit is divided evenly the value of each > > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers > > > processes blocks evenly, the total sleep time of all workers is > > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can > > > compute the sleep time of approach(b) by dividing the total value by > > > the number of parallel workers. > > > > > > IOW the approach(b) makes parallel vacuum delay much more than normal > > > vacuum and parallel vacuum with approach(a) even with the same > > > settings. Which behaviors do we expect? > > > > > > > Yeah, this is an important thing to decide. I don't think that the > > conclusion you are drawing is correct because it that is true then the > > same applies to the current autovacuum work division where we divide > > the cost_limit among workers but the cost_delay is same (see > > autovac_balance_cost). Basically, if we consider the delay time of > > each worker independently, then it would appear that a parallel vacuum > > delay with approach (b) is more, but that is true only if the workers > > run serially which is not true. > > > > > I thought the vacuum delay for > > > parallel vacuum should work as if it's a single process vacuum as we > > > did for memory usage. I might be missing something. If we prefer > > > approach(b) I should change the patch so that the leader process > > > divides the cost limit evenly. > > > > > > > I am also not completely sure which approach is better but I slightly > > lean towards approach (b). > > Can we get the same sleep time as approach (b) if we divide the cost > limit by the number of workers and have the shared cost balance (i.e. > approach (a) with dividing the cost limit)? Currently the approach (b) > seems better but I'm concerned that it might unnecessarily delay > vacuum if some indexes are very small or bulk-deletions of indexes > does almost nothing such as brin. Are you worried that some of the workers might not have much I/O to do but still we divide the cost limit equally? If that is the case then that is the case with the auto vacuum workers also right? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, 4 Nov 2019 at 17:26, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Nov 4, 2019 at 1:00 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Mon, 4 Nov 2019 at 14:02, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > I think that two approaches make parallel vacuum worker wait in > > > > different way: in approach(a) the vacuum delay works as if vacuum is > > > > performed by single process, on the other hand in approach(b) the > > > > vacuum delay work for each workers independently. > > > > > > > > Suppose that the total number of blocks to vacuum is 10,000 blocks, > > > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5 > > > > ms. In single process vacuum the total sleep time is 2,500ms (= > > > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms. > > > > Because all parallel vacuum workers use the shared balance value and a > > > > worker sleeps once the balance value exceeds the limit. In > > > > approach(b), since the cost limit is divided evenly the value of each > > > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers > > > > processes blocks evenly, the total sleep time of all workers is > > > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can > > > > compute the sleep time of approach(b) by dividing the total value by > > > > the number of parallel workers. > > > > > > > > IOW the approach(b) makes parallel vacuum delay much more than normal > > > > vacuum and parallel vacuum with approach(a) even with the same > > > > settings. Which behaviors do we expect? > > > > > > > > > > Yeah, this is an important thing to decide. I don't think that the > > > conclusion you are drawing is correct because it that is true then the > > > same applies to the current autovacuum work division where we divide > > > the cost_limit among workers but the cost_delay is same (see > > > autovac_balance_cost). Basically, if we consider the delay time of > > > each worker independently, then it would appear that a parallel vacuum > > > delay with approach (b) is more, but that is true only if the workers > > > run serially which is not true. > > > > > > > I thought the vacuum delay for > > > > parallel vacuum should work as if it's a single process vacuum as we > > > > did for memory usage. I might be missing something. If we prefer > > > > approach(b) I should change the patch so that the leader process > > > > divides the cost limit evenly. > > > > > > > > > > I am also not completely sure which approach is better but I slightly > > > lean towards approach (b). > > > > Can we get the same sleep time as approach (b) if we divide the cost > > limit by the number of workers and have the shared cost balance (i.e. > > approach (a) with dividing the cost limit)? Currently the approach (b) > > seems better but I'm concerned that it might unnecessarily delay > > vacuum if some indexes are very small or bulk-deletions of indexes > > does almost nothing such as brin. > > Are you worried that some of the workers might not have much I/O to do > but still we divide the cost limit equally? Yes. > If that is the case then > that is the case with the auto vacuum workers also right? I think It is not right because we rebalance the cost after an autovacuum worker finished. So as Amit mentioned on the new thread we might need to make parallel vacuum workers notice to the leader once exited so that it can rebalance the cost. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Nov 4, 2019 at 2:11 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 4 Nov 2019 at 17:26, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Mon, Nov 4, 2019 at 1:00 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Mon, 4 Nov 2019 at 14:02, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > I think that two approaches make parallel vacuum worker wait in > > > > > different way: in approach(a) the vacuum delay works as if vacuum is > > > > > performed by single process, on the other hand in approach(b) the > > > > > vacuum delay work for each workers independently. > > > > > > > > > > Suppose that the total number of blocks to vacuum is 10,000 blocks, > > > > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5 > > > > > ms. In single process vacuum the total sleep time is 2,500ms (= > > > > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms. > > > > > Because all parallel vacuum workers use the shared balance value and a > > > > > worker sleeps once the balance value exceeds the limit. In > > > > > approach(b), since the cost limit is divided evenly the value of each > > > > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers > > > > > processes blocks evenly, the total sleep time of all workers is > > > > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can > > > > > compute the sleep time of approach(b) by dividing the total value by > > > > > the number of parallel workers. > > > > > > > > > > IOW the approach(b) makes parallel vacuum delay much more than normal > > > > > vacuum and parallel vacuum with approach(a) even with the same > > > > > settings. Which behaviors do we expect? > > > > > > > > > > > > > Yeah, this is an important thing to decide. I don't think that the > > > > conclusion you are drawing is correct because it that is true then the > > > > same applies to the current autovacuum work division where we divide > > > > the cost_limit among workers but the cost_delay is same (see > > > > autovac_balance_cost). Basically, if we consider the delay time of > > > > each worker independently, then it would appear that a parallel vacuum > > > > delay with approach (b) is more, but that is true only if the workers > > > > run serially which is not true. > > > > > > > > > I thought the vacuum delay for > > > > > parallel vacuum should work as if it's a single process vacuum as we > > > > > did for memory usage. I might be missing something. If we prefer > > > > > approach(b) I should change the patch so that the leader process > > > > > divides the cost limit evenly. > > > > > > > > > > > > > I am also not completely sure which approach is better but I slightly > > > > lean towards approach (b). > > > > > > Can we get the same sleep time as approach (b) if we divide the cost > > > limit by the number of workers and have the shared cost balance (i.e. > > > approach (a) with dividing the cost limit)? Currently the approach (b) > > > seems better but I'm concerned that it might unnecessarily delay > > > vacuum if some indexes are very small or bulk-deletions of indexes > > > does almost nothing such as brin. > > > > Are you worried that some of the workers might not have much I/O to do > > but still we divide the cost limit equally? > > Yes. > > > If that is the case then > > that is the case with the auto vacuum workers also right? > > I think It is not right because we rebalance the cost after an > autovacuum worker finished. So as Amit mentioned on the new thread we > might need to make parallel vacuum workers notice to the leader once > exited so that it can rebalance the cost. I agree that if the auto vacuum worker finishes then we rebalance the cost and we need to do something similar here. And, that will be a bit difficult to implement in parallel vacuum case. We might need some shared memory array where we can set the worker status as running as soon as the worker started running. And, when a worker exit we can set it false and we can also set some flag saying we need cost rebalancing. And, in vacuum_delay_point if we identify that we need to rebalance then we can process the shared memory array and find out how many workers are running and based on that we can rebalance. Having said that I think for rebalancing we just need a shared memory counter that how many workers are running. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Hi
I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist vacuum" mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for all existence test suite.
For reference, I am attaching patch.
What does this patch?
As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to test, I used existence guc force_parallel_mode and tested parallel vacuuming.
If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use parallel workers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option is not given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader in this case), but if there is more than one index, then i am using leader as a worker for one index and launching workers for all other indexes.
After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world)
I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_mode is set as on, or we should use new GUC to test parallel worker vacuum for existence test suite?
Please let me know your thoughts for this patch.
Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com
On Tue, 29 Oct 2019 at 12:37, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > What do you think?
> > > > >
> > > > > I agree with this idea. I can come up with a POC patch for approach
> > > > > (b). Meanwhile, if someone is interested to quickly hack with the
> > > > > approach (a) then we can do some testing and compare. Sawada-san,
> > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > Otherwise, I will try to write it after finishing the first one
> > > > > (approach b).
> > > > >
> > > > I have come up with the POC for approach (a).
>
> > > Can we compute the overall throttling (sleep time) in the operation
> > > separately for heap and index, then divide the index's sleep_time with
> > > a number of workers and add it to heap's sleep time? Then, it will be
> > > a bit easier to compare the data between parallel and non-parallel
> > > case.
> I have come up with a patch to compute the total delay during the
> vacuum. So the idea of computing the total cost delay is
>
> Total cost delay = Total dealy of heap scan + Total dealy of
> index/worker; Patch is attached for the same.
>
> I have prepared this patch on the latest patch of the parallel
> vacuum[1]. I have also rebased the patch for the approach [b] for
> dividing the vacuum cost limit and done some testing for computing the
> I/O throttling. Attached patches 0001-POC-compute-total-cost-delay
> and 0002-POC-divide-vacuum-cost-limit can be applied on top of
> v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch. I haven't
> rebased on top of v31-0006, because v31-0006 is implementing the I/O
> throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
> doing the same with another approach. But,
> 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
> well (just 1-2 lines conflict).
>
> Testing: I have performed 2 tests, one with the same size indexes and
> second with the different size indexes and measured total I/O delay
> with the attached patch.
>
> Setup:
> VacuumCostDelay=10ms
> VacuumCostLimit=2000
>
> Test1 (Same size index):
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b);
> create index idx3 on test(c);
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
> Vacuum (Head) Parallel Vacuum
> Vacuum Cost Divide Patch
> Total Delay 1784 (ms) 1398(ms)
> 1938(ms)
>
>
> Test2 (Variable size dead tuple in index)
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b) where a > 100000;
> create index idx3 on test(c) where a > 150000;
>
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
> Vacuum (Head) Parallel Vacuum
> Vacuum Cost Divide Patch
> Total Delay 1438 (ms) 1029(ms)
> 1529(ms)
>
>
> Conclusion:
> 1. The tests prove that the total I/O delay is significantly less with
> the parallel vacuum.
> 2. With the vacuum cost divide the problem is solved but the delay bit
> more compared to the non-parallel version. The reason could be the
> problem discussed at[2], but it needs further investigation.
>
> Next, I will test with the v31-0006 (shared vacuum cost) patch. I
> will also try to test different types of indexes.
>
Thank you for testing!
I realized that v31-0006 patch doesn't work fine so I've attached the
updated version patch that also incorporated some comments I got so
far. Sorry for the inconvenience. I'll apply your 0001 patch and also
test the total delay time.
Regards,
--
Masahiko Sawada
Attachment
On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <mahi6run@gmail.com> wrote: > > Hi > I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist vacuum"mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for allexistence test suite. > For reference, I am attaching patch. > > What does this patch? > As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to test,I used existence guc force_parallel_mode and tested parallel vacuuming. > > If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use parallelworkers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option isnot given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader inthis case), but if there is more than one index, then i am using leader as a worker for one index and launching workersfor all other indexes. > > After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world) > > I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_mode isset as on, or we should use new GUC to test parallel worker vacuum for existence test suite? IMHO, with force_parallel_mode=on we don't need to do anything here because that is useful for normal query parallelism where if the user thinks that the parallel plan should have been selected by the planer but planer did not select the parallel plan then the user can force and check. But, vacuum parallelism is itself forced by the user so there is no point in doing it with force_parallel_mode=on. However, force_parallel_mode=regress is useful for testing the vacuum with an existing test suit. > > Please let me know your thoughts for this patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, 6 Nov 2019 at 18:42, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <mahi6run@gmail.com> wrote: > > > > Hi > > I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist vacuum"mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for allexistence test suite. Thank you for looking at this patch! > > For reference, I am attaching patch. > > > > What does this patch? > > As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to test,I used existence guc force_parallel_mode and tested parallel vacuuming. > > > > If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use parallelworkers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option isnot given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader inthis case), but if there is more than one index, then i am using leader as a worker for one index and launching workersfor all other indexes. > > > > After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world) > > > > I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_mode isset as on, or we should use new GUC to test parallel worker vacuum for existence test suite? > > IMHO, with force_parallel_mode=on we don't need to do anything here > because that is useful for normal query parallelism where if the user > thinks that the parallel plan should have been selected by the planer > but planer did not select the parallel plan then the user can force > and check. But, vacuum parallelism is itself forced by the user so > there is no point in doing it with force_parallel_mode=on. Yeah I think so too. force_parallel_mode is a planner parameter and parallel vacuum can be forced by vacuum option. > However, > force_parallel_mode=regress is useful for testing the vacuum with an > existing test suit. If we want to control the leader participation by GUC parameter I think we would need to have another GUC parameter rather than using force_parallel_mode. And it's useful if we can use the parameter for parallel CREATE INDEX as well. But it should be a separate patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Nov 6, 2019 at 3:50 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 6 Nov 2019 at 18:42, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <mahi6run@gmail.com> wrote: > > > > > > Hi > > > I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist vacuum"mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for allexistence test suite. > > Thank you for looking at this patch! > > > > For reference, I am attaching patch. > > > > > > What does this patch? > > > As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to test,I used existence guc force_parallel_mode and tested parallel vacuuming. > > > > > > If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use parallelworkers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option isnot given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader inthis case), but if there is more than one index, then i am using leader as a worker for one index and launching workersfor all other indexes. > > > > > > After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world) > > > > > > I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_modeis set as on, or we should use new GUC to test parallel worker vacuum for existence test suite? > > > > IMHO, with force_parallel_mode=on we don't need to do anything here > > because that is useful for normal query parallelism where if the user > > thinks that the parallel plan should have been selected by the planer > > but planer did not select the parallel plan then the user can force > > and check. But, vacuum parallelism is itself forced by the user so > > there is no point in doing it with force_parallel_mode=on. > > Yeah I think so too. force_parallel_mode is a planner parameter and > parallel vacuum can be forced by vacuum option. > > > However, > > force_parallel_mode=regress is useful for testing the vacuum with an > > existing test suit. > > If we want to control the leader participation by GUC parameter I > think we would need to have another GUC parameter rather than using > force_parallel_mode. I think the purpose is not to disable the leader participation, instead, I think the purpose of 'force_parallel_mode=regress' is that without changing the existing test suit we can execute the existing vacuum commands in the test suit with the worker. I did not study the patch but the idea should be that if "force_parallel_mode=regress" then normal vacuum command should be executed in parallel by using 1 worker. And it's useful if we can use the parameter for > parallel CREATE INDEX as well. But it should be a separate patch. > -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, 6 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Nov 6, 2019 at 3:50 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 6 Nov 2019 at 18:42, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <mahi6run@gmail.com> wrote: > > > > > > > > Hi > > > > I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist vacuum"mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for allexistence test suite. > > > > Thank you for looking at this patch! > > > > > > For reference, I am attaching patch. > > > > > > > > What does this patch? > > > > As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So totest, I used existence guc force_parallel_mode and tested parallel vacuuming. > > > > > > > > If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use parallelworkers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option isnot given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader inthis case), but if there is more than one index, then i am using leader as a worker for one index and launching workersfor all other indexes. > > > > > > > > After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world) > > > > > > > > I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_modeis set as on, or we should use new GUC to test parallel worker vacuum for existence test suite? > > > > > > IMHO, with force_parallel_mode=on we don't need to do anything here > > > because that is useful for normal query parallelism where if the user > > > thinks that the parallel plan should have been selected by the planer > > > but planer did not select the parallel plan then the user can force > > > and check. But, vacuum parallelism is itself forced by the user so > > > there is no point in doing it with force_parallel_mode=on. > > > > Yeah I think so too. force_parallel_mode is a planner parameter and > > parallel vacuum can be forced by vacuum option. > > > > > However, > > > force_parallel_mode=regress is useful for testing the vacuum with an > > > existing test suit. > > > > If we want to control the leader participation by GUC parameter I > > think we would need to have another GUC parameter rather than using > > force_parallel_mode. > I think the purpose is not to disable the leader participation, > instead, I think the purpose of 'force_parallel_mode=regress' is that > without changing the existing test suit we can execute the existing > vacuum commands in the test suit with the worker. I did not study the > patch but the idea should be that if "force_parallel_mode=regress" > then normal vacuum command should be executed in parallel by using 1 > worker. Oh I got it. Considering the current parallel vacuum design I'm not sure that we can cover more test cases by forcing parallel vacuum during existing test suite because most of these would be tables with several indexes and one index vacuum cycle. It might be better to add more test cases for parallel vacuum. -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, 6 Nov 2019, 20:07 Masahiko Sawada, <masahiko.sawada@2ndquadrant.com> wrote:
On Wed, 6 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 6, 2019 at 3:50 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 6 Nov 2019 at 18:42, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> > > >
> > > > Hi
> > > > I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist vacuum" mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for all existence test suite.
> >
> > Thank you for looking at this patch!
> >
> > > > For reference, I am attaching patch.
> > > >
> > > > What does this patch?
> > > > As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to test, I used existence guc force_parallel_mode and tested parallel vacuuming.
> > > >
> > > > If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use parallel workers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option is not given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader in this case), but if there is more than one index, then i am using leader as a worker for one index and launching workers for all other indexes.
> > > >
> > > > After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world)
> > > >
> > > > I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_mode is set as on, or we should use new GUC to test parallel worker vacuum for existence test suite?
> > >
> > > IMHO, with force_parallel_mode=on we don't need to do anything here
> > > because that is useful for normal query parallelism where if the user
> > > thinks that the parallel plan should have been selected by the planer
> > > but planer did not select the parallel plan then the user can force
> > > and check. But, vacuum parallelism is itself forced by the user so
> > > there is no point in doing it with force_parallel_mode=on.
> >
> > Yeah I think so too. force_parallel_mode is a planner parameter and
> > parallel vacuum can be forced by vacuum option.
> >
> > > However,
> > > force_parallel_mode=regress is useful for testing the vacuum with an
> > > existing test suit.
> >
> > If we want to control the leader participation by GUC parameter I
> > think we would need to have another GUC parameter rather than using
> > force_parallel_mode.
> I think the purpose is not to disable the leader participation,
> instead, I think the purpose of 'force_parallel_mode=regress' is that
> without changing the existing test suit we can execute the existing
> vacuum commands in the test suit with the worker. I did not study the
> patch but the idea should be that if "force_parallel_mode=regress"
> then normal vacuum command should be executed in parallel by using 1
> worker.
Oh I got it. Considering the current parallel vacuum design I'm not
sure that we can cover more test cases by forcing parallel vacuum
during existing test suite because most of these would be tables with
several indexes and one index vacuum cycle.
Oh sure, but still it would be good to get them tested with the parallel vacuum.
It might be better to add
more test cases for parallel vacuum.
I agree that it would be good to add additional test cases.
On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I realized that v31-0006 patch doesn't work fine so I've attached the > updated version patch that also incorporated some comments I got so > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > test the total delay time. > + /* + * Generally index cleanup does not scan the index when index + * vacuuming (ambulkdelete) was already performed. So we perform + * index cleanup with parallel workers only if we have not + * performed index vacuuming yet. Otherwise, we do it in the + * leader process alone. + */ + if (vacrelstats->num_index_scans == 0) + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes, + stats, lps); Today, I was thinking about this point where this check will work for most cases, but still, exceptions are there like for brin index, the main work is done in amvacuumcleanup function. Similarly, I think there are few more indexes like gin, bloom where it seems we take another pass over-index in the amvacuumcleanup phase. Don't you think we should try to allow parallel workers for such cases? If so, I don't have any great ideas on how to do that, but what comes to my mind is to indicate that via stats ( IndexBulkDeleteResult) or via an indexam API. I am not sure if it is acceptable to have indexam API for this. Thoughts? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Thanks Masahiko san and Dilip for looking into this patch.
In previous patch, when 'force_parallel_mode=regress', I was doing all the vacuum using multiple workers but we should do all the vacuuming using only 1 worker(leader should not participate in vacuuming). So attaching patch for same.
What does this patch?
If 'force_parallel_mode=regress' and parallel option is not given with vacuum, then all the vacuuming work will be done by one single worker and leader will not participate. But if parallel option is given with vacuum, then preference will be given to specified degree.
After applying this patch, all the test cases are passing(make check-world) and I can't see any improvement in code coverage with this patch.
Please let me know your thoughts for this patch.
Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com
On Wed, 6 Nov 2019 at 16:59, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Nov 6, 2019 at 3:50 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 6 Nov 2019 at 18:42, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> > >
> > > Hi
> > > I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist vacuum" mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for all existence test suite.
>
> Thank you for looking at this patch!
>
> > > For reference, I am attaching patch.
> > >
> > > What does this patch?
> > > As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to test, I used existence guc force_parallel_mode and tested parallel vacuuming.
> > >
> > > If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use parallel workers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option is not given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader in this case), but if there is more than one index, then i am using leader as a worker for one index and launching workers for all other indexes.
> > >
> > > After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world)
> > >
> > > I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_mode is set as on, or we should use new GUC to test parallel worker vacuum for existence test suite?
> >
> > IMHO, with force_parallel_mode=on we don't need to do anything here
> > because that is useful for normal query parallelism where if the user
> > thinks that the parallel plan should have been selected by the planer
> > but planer did not select the parallel plan then the user can force
> > and check. But, vacuum parallelism is itself forced by the user so
> > there is no point in doing it with force_parallel_mode=on.
>
> Yeah I think so too. force_parallel_mode is a planner parameter and
> parallel vacuum can be forced by vacuum option.
>
> > However,
> > force_parallel_mode=regress is useful for testing the vacuum with an
> > existing test suit.
>
> If we want to control the leader participation by GUC parameter I
> think we would need to have another GUC parameter rather than using
> force_parallel_mode.
I think the purpose is not to disable the leader participation,
instead, I think the purpose of 'force_parallel_mode=regress' is that
without changing the existing test suit we can execute the existing
vacuum commands in the test suit with the worker. I did not study the
patch but the idea should be that if "force_parallel_mode=regress"
then normal vacuum command should be executed in parallel by using 1
worker.
And it's useful if we can use the parameter for
> parallel CREATE INDEX as well. But it should be a separate patch.
>
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachment
On Fri, 8 Nov 2019 at 18:48, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > I realized that v31-0006 patch doesn't work fine so I've attached the > > updated version patch that also incorporated some comments I got so > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > test the total delay time. > > > > + /* > + * Generally index cleanup does not scan the index when index > + * vacuuming (ambulkdelete) was already performed. So we perform > + * index cleanup with parallel workers only if we have not > + * performed index vacuuming yet. Otherwise, we do it in the > + * leader process alone. > + */ > + if (vacrelstats->num_index_scans == 0) > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes, > + stats, lps); > > Today, I was thinking about this point where this check will work for > most cases, but still, exceptions are there like for brin index, the > main work is done in amvacuumcleanup function. Similarly, I think > there are few more indexes like gin, bloom where it seems we take > another pass over-index in the amvacuumcleanup phase. Don't you think > we should try to allow parallel workers for such cases? If so, I > don't have any great ideas on how to do that, but what comes to my > mind is to indicate that via stats ( > IndexBulkDeleteResult) or via an indexam API. I am not sure if it is > acceptable to have indexam API for this. > > Thoughts? Good point. gin and bloom do a certain heavy work during cleanup and during bulkdelete as you mentioned. Brin does it during cleanup, and hash and gist do it during bulkdelete. There are three types of index AM just inside postgres code. An idea I came up with is that we can control parallel vacuum and parallel cleanup separately. That is, adding a variable amcanparallelcleanup and we can do parallel cleanup on only indexes of which amcanparallelcleanup is true. IndexBulkDelete can be stored locally if both amcanparallelvacuum and amcanparallelcleanup of an index are false because only the leader process deals with such indexes. Otherwise we need to store it in DSM as always. -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Fri, 8 Nov 2019 at 18:48, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the > > > updated version patch that also incorporated some comments I got so > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > > test the total delay time. > > > > > > > + /* > > + * Generally index cleanup does not scan the index when index > > + * vacuuming (ambulkdelete) was already performed. So we perform > > + * index cleanup with parallel workers only if we have not > > + * performed index vacuuming yet. Otherwise, we do it in the > > + * leader process alone. > > + */ > > + if (vacrelstats->num_index_scans == 0) > > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes, > > + stats, lps); > > > > Today, I was thinking about this point where this check will work for > > most cases, but still, exceptions are there like for brin index, the > > main work is done in amvacuumcleanup function. Similarly, I think > > there are few more indexes like gin, bloom where it seems we take > > another pass over-index in the amvacuumcleanup phase. Don't you think > > we should try to allow parallel workers for such cases? If so, I > > don't have any great ideas on how to do that, but what comes to my > > mind is to indicate that via stats ( > > IndexBulkDeleteResult) or via an indexam API. I am not sure if it is > > acceptable to have indexam API for this. > > > > Thoughts? > > Good point. gin and bloom do a certain heavy work during cleanup and > during bulkdelete as you mentioned. Brin does it during cleanup, and > hash and gist do it during bulkdelete. There are three types of index > AM just inside postgres code. An idea I came up with is that we can > control parallel vacuum and parallel cleanup separately. That is, > adding a variable amcanparallelcleanup and we can do parallel cleanup > on only indexes of which amcanparallelcleanup is true. IndexBulkDelete > can be stored locally if both amcanparallelvacuum and > amcanparallelcleanup of an index are false because only the leader > process deals with such indexes. Otherwise we need to store it in DSM > as always. > IIUC, amcanparallelcleanup will be true for those indexes which does heavy work during cleanup irrespective of whether bulkdelete is called or not e.g. gin? If so, along with an amcanparallelcleanup flag, we need to consider vacrelstats->num_index_scans right? So if vacrelstats->num_index_scans == 0 then we need to launch parallel worker for all the indexes who support amcanparallelvacuum and if vacrelstats->num_index_scans > 0 then only for those who has amcanparallelcleanup as true. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, 11 Nov 2019 at 15:06, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Fri, 8 Nov 2019 at 18:48, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the > > > > updated version patch that also incorporated some comments I got so > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > > > test the total delay time. > > > > > > > > > > + /* > > > + * Generally index cleanup does not scan the index when index > > > + * vacuuming (ambulkdelete) was already performed. So we perform > > > + * index cleanup with parallel workers only if we have not > > > + * performed index vacuuming yet. Otherwise, we do it in the > > > + * leader process alone. > > > + */ > > > + if (vacrelstats->num_index_scans == 0) > > > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes, > > > + stats, lps); > > > > > > Today, I was thinking about this point where this check will work for > > > most cases, but still, exceptions are there like for brin index, the > > > main work is done in amvacuumcleanup function. Similarly, I think > > > there are few more indexes like gin, bloom where it seems we take > > > another pass over-index in the amvacuumcleanup phase. Don't you think > > > we should try to allow parallel workers for such cases? If so, I > > > don't have any great ideas on how to do that, but what comes to my > > > mind is to indicate that via stats ( > > > IndexBulkDeleteResult) or via an indexam API. I am not sure if it is > > > acceptable to have indexam API for this. > > > > > > Thoughts? > > > > Good point. gin and bloom do a certain heavy work during cleanup and > > during bulkdelete as you mentioned. Brin does it during cleanup, and > > hash and gist do it during bulkdelete. There are three types of index > > AM just inside postgres code. An idea I came up with is that we can > > control parallel vacuum and parallel cleanup separately. That is, > > adding a variable amcanparallelcleanup and we can do parallel cleanup > > on only indexes of which amcanparallelcleanup is true. IndexBulkDelete > > can be stored locally if both amcanparallelvacuum and > > amcanparallelcleanup of an index are false because only the leader > > process deals with such indexes. Otherwise we need to store it in DSM > > as always. > > > IIUC, amcanparallelcleanup will be true for those indexes which does > heavy work during cleanup irrespective of whether bulkdelete is called > or not e.g. gin? Yes, I guess that gin and brin set amcanparallelcleanup to true (gin might set amcanparallevacuum to true as well). > If so, along with an amcanparallelcleanup flag, we > need to consider vacrelstats->num_index_scans right? So if > vacrelstats->num_index_scans == 0 then we need to launch parallel > worker for all the indexes who support amcanparallelvacuum and if > vacrelstats->num_index_scans > 0 then only for those who has > amcanparallelcleanup as true. Yes, you're right. But this won't work fine for brin indexes who don't want to participate in parallel vacuum but always want to participate in parallel cleanup. After more thoughts, I think we can have a ternary value: never, always, once. If it's 'never' the index never participates in parallel cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the index always participates regardless of vacrelstats->num_index_scan. I guess gin, brin and bloom use 'always'. Finally if it's 'once' the index participates in parallel cleanup only when it's the first time (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and spgist use 'once'. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 11 Nov 2019 at 15:06, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Fri, 8 Nov 2019 at 18:48, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the > > > > > updated version patch that also incorporated some comments I got so > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > > > > test the total delay time. > > > > > > > > > > > > > + /* > > > > + * Generally index cleanup does not scan the index when index > > > > + * vacuuming (ambulkdelete) was already performed. So we perform > > > > + * index cleanup with parallel workers only if we have not > > > > + * performed index vacuuming yet. Otherwise, we do it in the > > > > + * leader process alone. > > > > + */ > > > > + if (vacrelstats->num_index_scans == 0) > > > > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes, > > > > + stats, lps); > > > > > > > > Today, I was thinking about this point where this check will work for > > > > most cases, but still, exceptions are there like for brin index, the > > > > main work is done in amvacuumcleanup function. Similarly, I think > > > > there are few more indexes like gin, bloom where it seems we take > > > > another pass over-index in the amvacuumcleanup phase. Don't you think > > > > we should try to allow parallel workers for such cases? If so, I > > > > don't have any great ideas on how to do that, but what comes to my > > > > mind is to indicate that via stats ( > > > > IndexBulkDeleteResult) or via an indexam API. I am not sure if it is > > > > acceptable to have indexam API for this. > > > > > > > > Thoughts? > > > > > > Good point. gin and bloom do a certain heavy work during cleanup and > > > during bulkdelete as you mentioned. Brin does it during cleanup, and > > > hash and gist do it during bulkdelete. There are three types of index > > > AM just inside postgres code. An idea I came up with is that we can > > > control parallel vacuum and parallel cleanup separately. That is, > > > adding a variable amcanparallelcleanup and we can do parallel cleanup > > > on only indexes of which amcanparallelcleanup is true. IndexBulkDelete > > > can be stored locally if both amcanparallelvacuum and > > > amcanparallelcleanup of an index are false because only the leader > > > process deals with such indexes. Otherwise we need to store it in DSM > > > as always. > > > > > IIUC, amcanparallelcleanup will be true for those indexes which does > > heavy work during cleanup irrespective of whether bulkdelete is called > > or not e.g. gin? > > Yes, I guess that gin and brin set amcanparallelcleanup to true (gin > might set amcanparallevacuum to true as well). > > > If so, along with an amcanparallelcleanup flag, we > > need to consider vacrelstats->num_index_scans right? So if > > vacrelstats->num_index_scans == 0 then we need to launch parallel > > worker for all the indexes who support amcanparallelvacuum and if > > vacrelstats->num_index_scans > 0 then only for those who has > > amcanparallelcleanup as true. > > Yes, you're right. But this won't work fine for brin indexes who don't > want to participate in parallel vacuum but always want to participate > in parallel cleanup. Yeah, right. > > After more thoughts, I think we can have a ternary value: never, > always, once. If it's 'never' the index never participates in parallel > cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the > index always participates regardless of vacrelstats->num_index_scan. I > guess gin, brin and bloom use 'always'. Finally if it's 'once' the > index participates in parallel cleanup only when it's the first time > (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and > spgist use 'once'. Yeah, this make sense to me. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I realized that v31-0006 patch doesn't work fine so I've attached the > updated version patch that also incorporated some comments I got so > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > test the total delay time. > While reviewing the 0002, I got one doubt related to how we are dividing the maintainance_work_mem +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes) +{ + /* Compute the new maitenance_work_mem value for index vacuuming */ + lvshared->maintenance_work_mem_worker = + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm : maintenance_work_mem; +} Is it fair to just consider the number of indexes which use maintenance_work_mem? Or we need to consider the number of worker as well. My point is suppose there are 10 indexes which will use the maintenance_work_mem but we are launching just 2 workers then what is the point in dividing the maintenance_work_mem by 10. IMHO the calculation should be like this lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ? maintenance_work_mem / Min(nindexes_mwm, nworkers) : maintenance_work_mem; Am I missing something? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Hi All,
I did some performance testing with the help of Dilip to test normal vacuum and parallel vacuum. Below is the test summary-
Configuration settings:
autovacuum = off
shared_buffers = 2GB
max_parallel_maintenance_workers = 6
Test 1: (table has 4 index on all tuples and deleting alternative tuples)
create table test(a int, b int, c int, d int, e int, f int, g int, h int); create index i1 on test (a); create index i2 on test (b); create index i3 on test (c); create index i4 on test (d); insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i; delete from test where a %2=0;
case 1: (run normal vacuum)
vacuum test;
1019.453 ms
Case 2: (run vacuum with 1 parallel degree)
vacuum (parallel 1) test;
765.366 ms
Case 3:(run vacuum with 3 parallel degree)
vacuum (parallel 3) test;
555.227 ms
From above results, we can concluded that with the help of parallel vacuum, performance is increased for large indexes.
Test 2:(table has 16 indexes and indexes are small , deleting alternative tuples)
create table test(a int, b int, c int, d int, e int, f int, g int, h int);
create index i1 on test (a) where a < 100000;
create index i2 on test (a) where a > 100000 and a < 200000;
create index i3 on test (a) where a > 200000 and a < 300000;
create index i4 on test (a) where a > 300000 and a < 400000;
create index i5 on test (a) where a > 400000 and a < 500000;
create index i6 on test (a) where a > 500000 and a < 600000;
create index i7 on test (b) where a < 100000;
create index i8 on test (c) where a < 100000;
create index i9 on test (d) where a < 100000;
create index i10 on test (d) where a < 100000;
create index i11 on test (d) where a < 100000;
create index i12 on test (d) where a < 100000;
create index i13 on test (d) where a < 100000;
create index i14 on test (d) where a < 100000;
create index i15 on test (d) where a < 100000;
create index i16 on test (d) where a < 100000;
insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i;
delete from test where a %2=0;
create index i1 on test (a) where a < 100000;
create index i2 on test (a) where a > 100000 and a < 200000;
create index i3 on test (a) where a > 200000 and a < 300000;
create index i4 on test (a) where a > 300000 and a < 400000;
create index i5 on test (a) where a > 400000 and a < 500000;
create index i6 on test (a) where a > 500000 and a < 600000;
create index i7 on test (b) where a < 100000;
create index i8 on test (c) where a < 100000;
create index i9 on test (d) where a < 100000;
create index i10 on test (d) where a < 100000;
create index i11 on test (d) where a < 100000;
create index i12 on test (d) where a < 100000;
create index i13 on test (d) where a < 100000;
create index i14 on test (d) where a < 100000;
create index i15 on test (d) where a < 100000;
create index i16 on test (d) where a < 100000;
insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i;
delete from test where a %2=0;
case 1: (run normal vacuum)
vacuum test;
649.187 ms
Case 2: (run vacuum with 1 parallel degree)
vacuum (parallel 1) test;
492.075 ms
Case 3:(run vacuum with 3 parallel degree)
vacuum (parallel 3) test;
435.581 ms
For small indexes also, we gained some performance by parallel vacuum.
I will continue my testing for stats collection.
Please let me know, if anybody has any suggestion for other testing(What should be tested).
Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com
On Tue, 29 Oct 2019 at 12:37, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > What do you think?
> > > > >
> > > > > I agree with this idea. I can come up with a POC patch for approach
> > > > > (b). Meanwhile, if someone is interested to quickly hack with the
> > > > > approach (a) then we can do some testing and compare. Sawada-san,
> > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > Otherwise, I will try to write it after finishing the first one
> > > > > (approach b).
> > > > >
> > > > I have come up with the POC for approach (a).
>
> > > Can we compute the overall throttling (sleep time) in the operation
> > > separately for heap and index, then divide the index's sleep_time with
> > > a number of workers and add it to heap's sleep time? Then, it will be
> > > a bit easier to compare the data between parallel and non-parallel
> > > case.
> I have come up with a patch to compute the total delay during the
> vacuum. So the idea of computing the total cost delay is
>
> Total cost delay = Total dealy of heap scan + Total dealy of
> index/worker; Patch is attached for the same.
>
> I have prepared this patch on the latest patch of the parallel
> vacuum[1]. I have also rebased the patch for the approach [b] for
> dividing the vacuum cost limit and done some testing for computing the
> I/O throttling. Attached patches 0001-POC-compute-total-cost-delay
> and 0002-POC-divide-vacuum-cost-limit can be applied on top of
> v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch. I haven't
> rebased on top of v31-0006, because v31-0006 is implementing the I/O
> throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
> doing the same with another approach. But,
> 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
> well (just 1-2 lines conflict).
>
> Testing: I have performed 2 tests, one with the same size indexes and
> second with the different size indexes and measured total I/O delay
> with the attached patch.
>
> Setup:
> VacuumCostDelay=10ms
> VacuumCostLimit=2000
>
> Test1 (Same size index):
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b);
> create index idx3 on test(c);
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
> Vacuum (Head) Parallel Vacuum
> Vacuum Cost Divide Patch
> Total Delay 1784 (ms) 1398(ms)
> 1938(ms)
>
>
> Test2 (Variable size dead tuple in index)
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b) where a > 100000;
> create index idx3 on test(c) where a > 150000;
>
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
> Vacuum (Head) Parallel Vacuum
> Vacuum Cost Divide Patch
> Total Delay 1438 (ms) 1029(ms)
> 1529(ms)
>
>
> Conclusion:
> 1. The tests prove that the total I/O delay is significantly less with
> the parallel vacuum.
> 2. With the vacuum cost divide the problem is solved but the delay bit
> more compared to the non-parallel version. The reason could be the
> problem discussed at[2], but it needs further investigation.
>
> Next, I will test with the v31-0006 (shared vacuum cost) patch. I
> will also try to test different types of indexes.
>
Thank you for testing!
I realized that v31-0006 patch doesn't work fine so I've attached the
updated version patch that also incorporated some comments I got so
far. Sorry for the inconvenience. I'll apply your 0001 patch and also
test the total delay time.
Regards,
--
Masahiko Sawada
On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 11 Nov 2019 at 15:06, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > Good point. gin and bloom do a certain heavy work during cleanup and > > > during bulkdelete as you mentioned. Brin does it during cleanup, and > > > hash and gist do it during bulkdelete. There are three types of index > > > AM just inside postgres code. An idea I came up with is that we can > > > control parallel vacuum and parallel cleanup separately. That is, > > > adding a variable amcanparallelcleanup and we can do parallel cleanup > > > on only indexes of which amcanparallelcleanup is true. > > > This is what I mentioned in my email as a second option (whether to expose via IndexAM). I am not sure if we can have a new variable just for this. > > > IndexBulkDelete > > > can be stored locally if both amcanparallelvacuum and > > > amcanparallelcleanup of an index are false because only the leader > > > process deals with such indexes. Otherwise we need to store it in DSM > > > as always. > > > > > IIUC, amcanparallelcleanup will be true for those indexes which does > > heavy work during cleanup irrespective of whether bulkdelete is called > > or not e.g. gin? > > Yes, I guess that gin and brin set amcanparallelcleanup to true (gin > might set amcanparallevacuum to true as well). > > > If so, along with an amcanparallelcleanup flag, we > > need to consider vacrelstats->num_index_scans right? So if > > vacrelstats->num_index_scans == 0 then we need to launch parallel > > worker for all the indexes who support amcanparallelvacuum and if > > vacrelstats->num_index_scans > 0 then only for those who has > > amcanparallelcleanup as true. > > Yes, you're right. But this won't work fine for brin indexes who don't > want to participate in parallel vacuum but always want to participate > in parallel cleanup. > > After more thoughts, I think we can have a ternary value: never, > always, once. If it's 'never' the index never participates in parallel > cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the > index always participates regardless of vacrelstats->num_index_scan. I > guess gin, brin and bloom use 'always'. Finally if it's 'once' the > index participates in parallel cleanup only when it's the first time > (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and > spgist use 'once'. > I think this 'once' option is confusing especially because it also depends on 'num_index_scans' which the IndexAM has no control over. It might be that the option name is not good, but I am not sure. Another thing is that for brin indexes, we don't want bulkdelete to participate in parallelism. Do we want to have separate variables for ambulkdelete and amvacuumcleanup which decides whether the particular phase can be done in parallel? Another possibility could be to just have one variable (say uint16 amparallelvacuum) which will tell us all the options but I don't think that will be a popular approach considering all the other methods and variables exposed. What do you think? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Nov 11, 2019 at 2:53 PM Mahendra Singh <mahi6run@gmail.com> wrote: > > > For small indexes also, we gained some performance by parallel vacuum. > Thanks for doing all these tests. It is clear with this and previous tests that this patch has benefit in wide variety of cases. However, we should try to see some worst cases as well. For example, if there are multiple indexes on a table and only one of them is large whereas all others are very small say having a few 100 or 1000 rows. Note: Please don't use the top-posting style to reply. Here, we use inline reply. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, 11 Nov 2019 at 19:29, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Mon, 11 Nov 2019 at 15:06, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > Good point. gin and bloom do a certain heavy work during cleanup and > > > > during bulkdelete as you mentioned. Brin does it during cleanup, and > > > > hash and gist do it during bulkdelete. There are three types of index > > > > AM just inside postgres code. An idea I came up with is that we can > > > > control parallel vacuum and parallel cleanup separately. That is, > > > > adding a variable amcanparallelcleanup and we can do parallel cleanup > > > > on only indexes of which amcanparallelcleanup is true. > > > > > > This is what I mentioned in my email as a second option (whether to > expose via IndexAM). I am not sure if we can have a new variable just > for this. > > > > > IndexBulkDelete > > > > can be stored locally if both amcanparallelvacuum and > > > > amcanparallelcleanup of an index are false because only the leader > > > > process deals with such indexes. Otherwise we need to store it in DSM > > > > as always. > > > > > > > IIUC, amcanparallelcleanup will be true for those indexes which does > > > heavy work during cleanup irrespective of whether bulkdelete is called > > > or not e.g. gin? > > > > Yes, I guess that gin and brin set amcanparallelcleanup to true (gin > > might set amcanparallevacuum to true as well). > > > > > If so, along with an amcanparallelcleanup flag, we > > > need to consider vacrelstats->num_index_scans right? So if > > > vacrelstats->num_index_scans == 0 then we need to launch parallel > > > worker for all the indexes who support amcanparallelvacuum and if > > > vacrelstats->num_index_scans > 0 then only for those who has > > > amcanparallelcleanup as true. > > > > Yes, you're right. But this won't work fine for brin indexes who don't > > want to participate in parallel vacuum but always want to participate > > in parallel cleanup. > > > > After more thoughts, I think we can have a ternary value: never, > > always, once. If it's 'never' the index never participates in parallel > > cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the > > index always participates regardless of vacrelstats->num_index_scan. I > > guess gin, brin and bloom use 'always'. Finally if it's 'once' the > > index participates in parallel cleanup only when it's the first time > > (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and > > spgist use 'once'. > > > > I think this 'once' option is confusing especially because it also > depends on 'num_index_scans' which the IndexAM has no control over. > It might be that the option name is not good, but I am not sure. > Another thing is that for brin indexes, we don't want bulkdelete to > participate in parallelism. I thought brin should set amcanparallelvacuum is false and amcanparallelcleanup is 'always'. > Do we want to have separate variables for > ambulkdelete and amvacuumcleanup which decides whether the particular > phase can be done in parallel? You mean adding variables to ambulkdelete and amvacuumcleanup as function arguments? If so isn't it too late to tell the leader whether the particular pchase can be done in parallel? > Another possibility could be to just > have one variable (say uint16 amparallelvacuum) which will tell us all > the options but I don't think that will be a popular approach > considering all the other methods and variables exposed. What do you > think? Adding only one variable that can have flags would also be a good idea, instead of having multiple variables for each option. For instance FDW API uses such interface (see eflags of BeginForeignScan). -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Nov 12, 2019 at 7:43 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 11 Nov 2019 at 19:29, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > After more thoughts, I think we can have a ternary value: never, > > > always, once. If it's 'never' the index never participates in parallel > > > cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the > > > index always participates regardless of vacrelstats->num_index_scan. I > > > guess gin, brin and bloom use 'always'. Finally if it's 'once' the > > > index participates in parallel cleanup only when it's the first time > > > (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and > > > spgist use 'once'. > > > > > > > I think this 'once' option is confusing especially because it also > > depends on 'num_index_scans' which the IndexAM has no control over. > > It might be that the option name is not good, but I am not sure. > > Another thing is that for brin indexes, we don't want bulkdelete to > > participate in parallelism. > > I thought brin should set amcanparallelvacuum is false and > amcanparallelcleanup is 'always'. > In that case, it is better to name the variable as amcanparallelbulkdelete. > > Do we want to have separate variables for > > ambulkdelete and amvacuumcleanup which decides whether the particular > > phase can be done in parallel? > > You mean adding variables to ambulkdelete and amvacuumcleanup as > function arguments? > No, I mean separate variables amcanparallelbulkdelete (bool) and amcanparallelvacuumcleanup (unit16) variables. > > > Another possibility could be to just > > have one variable (say uint16 amparallelvacuum) which will tell us all > > the options but I don't think that will be a popular approach > > considering all the other methods and variables exposed. What do you > > think? > > Adding only one variable that can have flags would also be a good > idea, instead of having multiple variables for each option. For > instance FDW API uses such interface (see eflags of BeginForeignScan). > Yeah, maybe something like amparallelvacuumoptions. The options can be: VACUUM_OPTION_NO_PARALLEL 0 # vacuum (neither bulkdelete nor vacuumcleanup) can't be performed in parallel VACUUM_OPTION_NO_PARALLEL_CLEANUP 1 # vacuumcleanup cannot be performed in parallel (hash index will set this flag) VACUUM_OPTION_PARALLEL_BULKDEL 2 # bulkdelete can be done in parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this flag) VACUUM_OPTION_PARALLEL_COND_CLEANUP 3 # vacuumcleanup can be done in parallel if bulkdelete is not performed (Indexes nbtree, brin, hash, gin, gist, spgist, bloom will set this flag) VACUUM_OPTION_PARALLEL_CLEANUP 4 # vacuumcleanup can be done in parallel even if bulkdelete is already performed (Indexes gin, brin, and bloom will set this flag) Does something like this make sense? If we all agree on this, then I think we can summarize the part of the discussion related to this API and get feedback from a broader audience. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Nov 12, 2019 at 7:43 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Mon, 11 Nov 2019 at 19:29, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > After more thoughts, I think we can have a ternary value: never, > > > > always, once. If it's 'never' the index never participates in parallel > > > > cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the > > > > index always participates regardless of vacrelstats->num_index_scan. I > > > > guess gin, brin and bloom use 'always'. Finally if it's 'once' the > > > > index participates in parallel cleanup only when it's the first time > > > > (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and > > > > spgist use 'once'. > > > > > > > > > > I think this 'once' option is confusing especially because it also > > > depends on 'num_index_scans' which the IndexAM has no control over. > > > It might be that the option name is not good, but I am not sure. > > > Another thing is that for brin indexes, we don't want bulkdelete to > > > participate in parallelism. > > > > I thought brin should set amcanparallelvacuum is false and > > amcanparallelcleanup is 'always'. > > > > In that case, it is better to name the variable as amcanparallelbulkdelete. > > > > Do we want to have separate variables for > > > ambulkdelete and amvacuumcleanup which decides whether the particular > > > phase can be done in parallel? > > > > You mean adding variables to ambulkdelete and amvacuumcleanup as > > function arguments? > > > > No, I mean separate variables amcanparallelbulkdelete (bool) and > amcanparallelvacuumcleanup (unit16) variables. > > > > > > Another possibility could be to just > > > have one variable (say uint16 amparallelvacuum) which will tell us all > > > the options but I don't think that will be a popular approach > > > considering all the other methods and variables exposed. What do you > > > think? > > > > Adding only one variable that can have flags would also be a good > > idea, instead of having multiple variables for each option. For > > instance FDW API uses such interface (see eflags of BeginForeignScan). > > > > Yeah, maybe something like amparallelvacuumoptions. The options can be: > > VACUUM_OPTION_NO_PARALLEL 0 # vacuum (neither bulkdelete nor > vacuumcleanup) can't be performed in parallel > VACUUM_OPTION_NO_PARALLEL_CLEANUP 1 # vacuumcleanup cannot be > performed in parallel (hash index will set this flag) Maybe we don't want this option? because if 3 or 4 is not set then we will not do the cleanup in parallel right? > VACUUM_OPTION_PARALLEL_BULKDEL 2 # bulkdelete can be done in > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > flag) > VACUUM_OPTION_PARALLEL_COND_CLEANUP 3 # vacuumcleanup can be done in > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash, > gin, gist, spgist, bloom will set this flag) > VACUUM_OPTION_PARALLEL_CLEANUP 4 # vacuumcleanup can be done in > parallel even if bulkdelete is already performed (Indexes gin, brin, > and bloom will set this flag) > > Does something like this make sense? Yeah, something like that seems better to me. > If we all agree on this, then I > think we can summarize the part of the discussion related to this API > and get feedback from a broader audience. Make sense. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, 11 Nov 2019 at 16:36, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 11, 2019 at 2:53 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> >
> >
> > For small indexes also, we gained some performance by parallel vacuum.
> >
>
> Thanks for doing all these tests. It is clear with this and previous
> tests that this patch has benefit in wide variety of cases. However,
> we should try to see some worst cases as well. For example, if there
> are multiple indexes on a table and only one of them is large whereas
> all others are very small say having a few 100 or 1000 rows.
>
I did some testing on the above suggested lines. Below is the summary:
Test case:(I created 16 indexes but only 1 index is large, other are very small)
create table test(a int, b int, c int, d int, e int, f int, g int, h int);
create index i3 on test (a) where a > 2000 and a < 3000;
create index i4 on test (a) where a > 3000 and a < 4000;
create index i5 on test (a) where a > 4000 and a < 5000;
create index i6 on test (a) where a > 5000 and a < 6000;
create index i7 on test (b) where a < 1000;
create index i8 on test (c) where a < 1000;
create index i9 on test (d) where a < 1000;
create index i10 on test (d) where a < 1000;
create index i11 on test (d) where a < 1000;
create index i12 on test (d) where a < 1000;
create index i13 on test (d) where a < 1000;
create index i14 on test (d) where a < 1000;
create index i15 on test (d) where a < 1000;
create index i16 on test (d) where a < 1000;
insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i;
delete from test where a %2=0;
> Note: Please don't use the top-posting style to reply. Here, we use
>
> On Mon, Nov 11, 2019 at 2:53 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> >
> >
> > For small indexes also, we gained some performance by parallel vacuum.
> >
>
> Thanks for doing all these tests. It is clear with this and previous
> tests that this patch has benefit in wide variety of cases. However,
> we should try to see some worst cases as well. For example, if there
> are multiple indexes on a table and only one of them is large whereas
> all others are very small say having a few 100 or 1000 rows.
>
Thanks Amit for your comments.
Test case:(I created 16 indexes but only 1 index is large, other are very small)
create table test(a int, b int, c int, d int, e int, f int, g int, h int);
create index i3 on test (a) where a > 2000 and a < 3000;
create index i4 on test (a) where a > 3000 and a < 4000;
create index i5 on test (a) where a > 4000 and a < 5000;
create index i6 on test (a) where a > 5000 and a < 6000;
create index i7 on test (b) where a < 1000;
create index i8 on test (c) where a < 1000;
create index i9 on test (d) where a < 1000;
create index i10 on test (d) where a < 1000;
create index i11 on test (d) where a < 1000;
create index i12 on test (d) where a < 1000;
create index i13 on test (d) where a < 1000;
create index i14 on test (d) where a < 1000;
create index i15 on test (d) where a < 1000;
create index i16 on test (d) where a < 1000;
insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i;
delete from test where a %2=0;
case 1: vacuum without using parallel workers.
vacuum test;228.259 ms
case 2: vacuum with 1 parallel worker.
vacuum (parallel 1) test;
251.725 ms
case 3: vacuum with 3 parallel workers.
vacuum (parallel 3) test;
259.986
From above results, it seems that if indexes are small, then parallel vacuum is not beneficial as compared to normal vacuum.
> inline reply.
Okay. I will follow inline reply.
Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com
On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Nov 12, 2019 at 7:43 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Mon, 11 Nov 2019 at 19:29, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > After more thoughts, I think we can have a ternary value: never, > > > > > always, once. If it's 'never' the index never participates in parallel > > > > > cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the > > > > > index always participates regardless of vacrelstats->num_index_scan. I > > > > > guess gin, brin and bloom use 'always'. Finally if it's 'once' the > > > > > index participates in parallel cleanup only when it's the first time > > > > > (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and > > > > > spgist use 'once'. > > > > > > > > > > > > > I think this 'once' option is confusing especially because it also > > > > depends on 'num_index_scans' which the IndexAM has no control over. > > > > It might be that the option name is not good, but I am not sure. > > > > Another thing is that for brin indexes, we don't want bulkdelete to > > > > participate in parallelism. > > > > > > I thought brin should set amcanparallelvacuum is false and > > > amcanparallelcleanup is 'always'. > > > > > > > In that case, it is better to name the variable as amcanparallelbulkdelete. > > > > > > Do we want to have separate variables for > > > > ambulkdelete and amvacuumcleanup which decides whether the particular > > > > phase can be done in parallel? > > > > > > You mean adding variables to ambulkdelete and amvacuumcleanup as > > > function arguments? > > > > > > > No, I mean separate variables amcanparallelbulkdelete (bool) and > > amcanparallelvacuumcleanup (unit16) variables. > > > > > > > > > Another possibility could be to just > > > > have one variable (say uint16 amparallelvacuum) which will tell us all > > > > the options but I don't think that will be a popular approach > > > > considering all the other methods and variables exposed. What do you > > > > think? > > > > > > Adding only one variable that can have flags would also be a good > > > idea, instead of having multiple variables for each option. For > > > instance FDW API uses such interface (see eflags of BeginForeignScan). > > > > > > > Yeah, maybe something like amparallelvacuumoptions. The options can be: > > > > VACUUM_OPTION_NO_PARALLEL 0 # vacuum (neither bulkdelete nor > > vacuumcleanup) can't be performed in parallel > > VACUUM_OPTION_NO_PARALLEL_CLEANUP 1 # vacuumcleanup cannot be > > performed in parallel (hash index will set this flag) > > Maybe we don't want this option? because if 3 or 4 is not set then we > will not do the cleanup in parallel right? > > > VACUUM_OPTION_PARALLEL_BULKDEL 2 # bulkdelete can be done in > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > > flag) > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 3 # vacuumcleanup can be done in > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash, > > gin, gist, spgist, bloom will set this flag) > > VACUUM_OPTION_PARALLEL_CLEANUP 4 # vacuumcleanup can be done in > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > and bloom will set this flag) > > > > Does something like this make sense? 3 and 4 confused me because 4 also looks conditional. How about having two flags instead: one for doing parallel cleanup when not performed yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)? That way, we can have flags as follows and index AM chooses two flags, one from the first two flags for bulk deletion and another from next three flags for cleanup. VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0 VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2 VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3 VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4 > Yeah, something like that seems better to me. > > > If we all agree on this, then I > > think we can summarize the part of the discussion related to this API > > and get feedback from a broader audience. > > Make sense. +1 Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I realized that v31-0006 patch doesn't work fine so I've attached the > > updated version patch that also incorporated some comments I got so > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > test the total delay time. > > > While reviewing the 0002, I got one doubt related to how we are > dividing the maintainance_work_mem > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes) > +{ > + /* Compute the new maitenance_work_mem value for index vacuuming */ > + lvshared->maintenance_work_mem_worker = > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm : > maintenance_work_mem; > +} > Is it fair to just consider the number of indexes which use > maintenance_work_mem? Or we need to consider the number of worker as > well. My point is suppose there are 10 indexes which will use the > maintenance_work_mem but we are launching just 2 workers then what is > the point in dividing the maintenance_work_mem by 10. > > IMHO the calculation should be like this > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ? > maintenance_work_mem / Min(nindexes_mwm, nworkers) : > maintenance_work_mem; > > Am I missing something? No, I think you're right. On the other hand I think that dividing it by the number of indexes that will use the mantenance_work_mem makes sense when parallel degree > the number of such indexes. Suppose the table has 2 indexes and there are 10 workers then we should divide the maintenance_work_mem by 2 rather than 10 because it's possible that at most 2 indexes that uses the maintenance_work_mem are processed in parallel at a time. -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Yeah, maybe something like amparallelvacuumoptions. The options can be: > > > > > > VACUUM_OPTION_NO_PARALLEL 0 # vacuum (neither bulkdelete nor > > > vacuumcleanup) can't be performed in parallel > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP 1 # vacuumcleanup cannot be > > > performed in parallel (hash index will set this flag) > > > > Maybe we don't want this option? because if 3 or 4 is not set then we > > will not do the cleanup in parallel right? > > Yeah, but it is better to be explicit about this. > > > VACUUM_OPTION_PARALLEL_BULKDEL 2 # bulkdelete can be done in > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > > > flag) > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 3 # vacuumcleanup can be done in > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash, > > > gin, gist, spgist, bloom will set this flag) > > > VACUUM_OPTION_PARALLEL_CLEANUP 4 # vacuumcleanup can be done in > > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > > and bloom will set this flag) > > > > > > Does something like this make sense? > > 3 and 4 confused me because 4 also looks conditional. How about having > two flags instead: one for doing parallel cleanup when not performed > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)? > Hmm, this is exactly what I intend to say with 3 and 4. I am not sure what makes you think 4 is conditional. > That way, we > can have flags as follows and index AM chooses two flags, one from the > first two flags for bulk deletion and another from next three flags > for cleanup. > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0 > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2 > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3 > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4 > This also looks reasonable, but if there is an index that doesn't want to support a parallel vacuum, it needs to set multiple flags. > > Yeah, something like that seems better to me. > > > > > If we all agree on this, then I > > > think we can summarize the part of the discussion related to this API > > > and get feedback from a broader audience. > > > > Make sense. > > +1 > Okay, then I will write a separate email for this topic. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > I realized that v31-0006 patch doesn't work fine so I've attached the > > > updated version patch that also incorporated some comments I got so > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > > test the total delay time. > > > > > While reviewing the 0002, I got one doubt related to how we are > > dividing the maintainance_work_mem > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes) > > +{ > > + /* Compute the new maitenance_work_mem value for index vacuuming */ > > + lvshared->maintenance_work_mem_worker = > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm : > > maintenance_work_mem; > > +} > > Is it fair to just consider the number of indexes which use > > maintenance_work_mem? Or we need to consider the number of worker as > > well. My point is suppose there are 10 indexes which will use the > > maintenance_work_mem but we are launching just 2 workers then what is > > the point in dividing the maintenance_work_mem by 10. > > > > IMHO the calculation should be like this > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ? > > maintenance_work_mem / Min(nindexes_mwm, nworkers) : > > maintenance_work_mem; > > > > Am I missing something? > > No, I think you're right. On the other hand I think that dividing it > by the number of indexes that will use the mantenance_work_mem makes > sense when parallel degree > the number of such indexes. Suppose the > table has 2 indexes and there are 10 workers then we should divide the > maintenance_work_mem by 2 rather than 10 because it's possible that at > most 2 indexes that uses the maintenance_work_mem are processed in > parallel at a time. > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers). -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, 12 Nov 2019 at 20:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > Yeah, maybe something like amparallelvacuumoptions. The options can be: > > > > > > > > VACUUM_OPTION_NO_PARALLEL 0 # vacuum (neither bulkdelete nor > > > > vacuumcleanup) can't be performed in parallel > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP 1 # vacuumcleanup cannot be > > > > performed in parallel (hash index will set this flag) > > > > > > Maybe we don't want this option? because if 3 or 4 is not set then we > > > will not do the cleanup in parallel right? > > > > > Yeah, but it is better to be explicit about this. VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing? I think brin indexes will use this flag. It will end up with (VACUUM_OPTION_NO_PARALLEL_CLEANUP | VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to VACUUM_OPTION_NO_PARALLEL, though. > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 2 # bulkdelete can be done in > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > > > > flag) > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 3 # vacuumcleanup can be done in > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash, > > > > gin, gist, spgist, bloom will set this flag) > > > > VACUUM_OPTION_PARALLEL_CLEANUP 4 # vacuumcleanup can be done in > > > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > > > and bloom will set this flag) > > > > > > > > Does something like this make sense? > > > > 3 and 4 confused me because 4 also looks conditional. How about having > > two flags instead: one for doing parallel cleanup when not performed > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)? > > > > Hmm, this is exactly what I intend to say with 3 and 4. I am not sure > what makes you think 4 is conditional. Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets 4 it doesn't need to set 3 because 4 means always doing cleanup in parallel. > > > That way, we > > can have flags as follows and index AM chooses two flags, one from the > > first two flags for bulk deletion and another from next three flags > > for cleanup. > > > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0 > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2 > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3 > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4 > > > > This also looks reasonable, but if there is an index that doesn't want > to support a parallel vacuum, it needs to set multiple flags. Right. It would be better to use uint16 as two uint8. I mean that if first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags could be followings: VACUUM_OPTION_PARALLEL_BULKDEL 0x0001 VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100 VACUUM_OPTION_PARALLEL_CLEANUP 0x0200 -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > I realized that v31-0006 patch doesn't work fine so I've attached the > > > > updated version patch that also incorporated some comments I got so > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > > > test the total delay time. > > > > > > > While reviewing the 0002, I got one doubt related to how we are > > > dividing the maintainance_work_mem > > > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes) > > > +{ > > > + /* Compute the new maitenance_work_mem value for index vacuuming */ > > > + lvshared->maintenance_work_mem_worker = > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm : > > > maintenance_work_mem; > > > +} > > > Is it fair to just consider the number of indexes which use > > > maintenance_work_mem? Or we need to consider the number of worker as > > > well. My point is suppose there are 10 indexes which will use the > > > maintenance_work_mem but we are launching just 2 workers then what is > > > the point in dividing the maintenance_work_mem by 10. > > > > > > IMHO the calculation should be like this > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ? > > > maintenance_work_mem / Min(nindexes_mwm, nworkers) : > > > maintenance_work_mem; > > > > > > Am I missing something? > > > > No, I think you're right. On the other hand I think that dividing it > > by the number of indexes that will use the mantenance_work_mem makes > > sense when parallel degree > the number of such indexes. Suppose the > > table has 2 indexes and there are 10 workers then we should divide the > > maintenance_work_mem by 2 rather than 10 because it's possible that at > > most 2 indexes that uses the maintenance_work_mem are processed in > > parallel at a time. > > > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers). Thanks! I'll fix it in the next version patch. -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Nov 12, 2019 at 5:30 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 12 Nov 2019 at 20:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > Yeah, maybe something like amparallelvacuumoptions. The options can be: > > > > > > > > > > VACUUM_OPTION_NO_PARALLEL 0 # vacuum (neither bulkdelete nor > > > > > vacuumcleanup) can't be performed in parallel > > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP 1 # vacuumcleanup cannot be > > > > > performed in parallel (hash index will set this flag) > > > > > > > > Maybe we don't want this option? because if 3 or 4 is not set then we > > > > will not do the cleanup in parallel right? > > > > > > > > Yeah, but it is better to be explicit about this. > > VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing? > I am not sure if that is required. > I think brin indexes > will use this flag. > Brin index can set VACUUM_OPTION_PARALLEL_CLEANUP in my proposal and it should work. > It will end up with > (VACUUM_OPTION_NO_PARALLEL_CLEANUP | > VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to > VACUUM_OPTION_NO_PARALLEL, though. > > > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 2 # bulkdelete can be done in > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > > > > > flag) > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 3 # vacuumcleanup can be done in > > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash, > > > > > gin, gist, spgist, bloom will set this flag) > > > > > VACUUM_OPTION_PARALLEL_CLEANUP 4 # vacuumcleanup can be done in > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > > > > and bloom will set this flag) > > > > > > > > > > Does something like this make sense? > > > > > > 3 and 4 confused me because 4 also looks conditional. How about having > > > two flags instead: one for doing parallel cleanup when not performed > > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing > > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)? > > > > > > > Hmm, this is exactly what I intend to say with 3 and 4. I am not sure > > what makes you think 4 is conditional. > > Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets > 4 it doesn't need to set 3 because 4 means always doing cleanup in > parallel. > Yeah, that makes sense. They can just set 4. > > > > > That way, we > > > can have flags as follows and index AM chooses two flags, one from the > > > first two flags for bulk deletion and another from next three flags > > > for cleanup. > > > > > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0 > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 > > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2 > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3 > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4 > > > > > > > This also looks reasonable, but if there is an index that doesn't want > > to support a parallel vacuum, it needs to set multiple flags. > > Right. It would be better to use uint16 as two uint8. I mean that if > first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if > next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags > could be followings: > > VACUUM_OPTION_PARALLEL_BULKDEL 0x0001 > VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100 > VACUUM_OPTION_PARALLEL_CLEANUP 0x0200 > Hmm, I think we should define these flags in the most simple way. Your previous proposal sounds okay to me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, 12 Nov 2019 at 22:33, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Nov 12, 2019 at 5:30 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 12 Nov 2019 at 20:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > Yeah, maybe something like amparallelvacuumoptions. The options can be: > > > > > > > > > > > > VACUUM_OPTION_NO_PARALLEL 0 # vacuum (neither bulkdelete nor > > > > > > vacuumcleanup) can't be performed in parallel > > > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP 1 # vacuumcleanup cannot be > > > > > > performed in parallel (hash index will set this flag) > > > > > > > > > > Maybe we don't want this option? because if 3 or 4 is not set then we > > > > > will not do the cleanup in parallel right? > > > > > > > > > > > Yeah, but it is better to be explicit about this. > > > > VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing? > > > > I am not sure if that is required. > > > I think brin indexes > > will use this flag. > > > > Brin index can set VACUUM_OPTION_PARALLEL_CLEANUP in my proposal and > it should work. > > > It will end up with > > (VACUUM_OPTION_NO_PARALLEL_CLEANUP | > > VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to > > VACUUM_OPTION_NO_PARALLEL, though. > > > > > > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 2 # bulkdelete can be done in > > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > > > > > > flag) > > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 3 # vacuumcleanup can be done in > > > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash, > > > > > > gin, gist, spgist, bloom will set this flag) > > > > > > VACUUM_OPTION_PARALLEL_CLEANUP 4 # vacuumcleanup can be done in > > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > > > > > and bloom will set this flag) > > > > > > > > > > > > Does something like this make sense? > > > > > > > > 3 and 4 confused me because 4 also looks conditional. How about having > > > > two flags instead: one for doing parallel cleanup when not performed > > > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing > > > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)? > > > > > > > > > > Hmm, this is exactly what I intend to say with 3 and 4. I am not sure > > > what makes you think 4 is conditional. > > > > Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets > > 4 it doesn't need to set 3 because 4 means always doing cleanup in > > parallel. > > > > Yeah, that makes sense. They can just set 4. Okay, > > > > > > > > That way, we > > > > can have flags as follows and index AM chooses two flags, one from the > > > > first two flags for bulk deletion and another from next three flags > > > > for cleanup. > > > > > > > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0 > > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 > > > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2 > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3 > > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4 > > > > > > > > > > This also looks reasonable, but if there is an index that doesn't want > > > to support a parallel vacuum, it needs to set multiple flags. > > > > Right. It would be better to use uint16 as two uint8. I mean that if > > first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if > > next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags > > could be followings: > > > > VACUUM_OPTION_PARALLEL_BULKDEL 0x0001 > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100 > > VACUUM_OPTION_PARALLEL_CLEANUP 0x0200 > > > > Hmm, I think we should define these flags in the most simple way. > Your previous proposal sounds okay to me. Okay. As you mentioned before, my previous proposal won't work for existing index AMs that don't set amparallelvacuumoptions. But since we have amcanparallelvacuum which is false by default I think we don't need to worry about backward compatibility problem. The existing index AM will use neither parallel bulk-deletion nor parallel cleanup by default. When it wants to support parallel vacuum they will set amparallelvacuumoptions as well as amcanparallelvacuum. I'll try to use my previous proposal and check it. If something wrong we can back to your proposal or others. -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Nov 13, 2019 at 6:53 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 12 Nov 2019 at 22:33, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Hmm, I think we should define these flags in the most simple way. > > Your previous proposal sounds okay to me. > > Okay. As you mentioned before, my previous proposal won't work for > existing index AMs that don't set amparallelvacuumoptions. > You mean to say it won't work because it has to set multiple flags which means that if IndexAm user doesn't set the value of amparallelvacuumoptions then it won't work? > But since we > have amcanparallelvacuum which is false by default I think we don't > need to worry about backward compatibility problem. The existing index > AM will use neither parallel bulk-deletion nor parallel cleanup by > default. When it wants to support parallel vacuum they will set > amparallelvacuumoptions as well as amcanparallelvacuum. > Hmm, I was not thinking of multiple variables rather only one variable. The default value should indicate that IndexAm doesn't support a parallel vacuum. It might be that we need to do it the way I originally proposed the different values of amparallelvacuumoptions or maybe some variant of it where the default value can clearly say that IndexAm doesn't support a parallel vacuum. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Nov 12, 2019 at 7:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Nov 12, 2019 at 5:30 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 12 Nov 2019 at 20:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > Yeah, maybe something like amparallelvacuumoptions. The options can be: > > > > > > > > > > > > VACUUM_OPTION_NO_PARALLEL 0 # vacuum (neither bulkdelete nor > > > > > > vacuumcleanup) can't be performed in parallel > > > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP 1 # vacuumcleanup cannot be > > > > > > performed in parallel (hash index will set this flag) > > > > > > > > > > Maybe we don't want this option? because if 3 or 4 is not set then we > > > > > will not do the cleanup in parallel right? > > > > > > > > > > > Yeah, but it is better to be explicit about this. > > > > VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing? > > > > I am not sure if that is required. > > > I think brin indexes > > will use this flag. > > > > Brin index can set VACUUM_OPTION_PARALLEL_CLEANUP in my proposal and > it should work. IIUC, VACUUM_OPTION_PARALLEL_CLEANUP means no parallel bulk delete and always parallel cleanup? I am not sure whether this is the best way because for the cleanup option we are being explicit for each option i.e PARALLEL_CLEANUP, NO_PARALLEL_CLEANUP, etc, then why not the same for the bulk delete. I mean why don't we keep both PARALLEL_BULKDEL and NO_PARALLEL_BULKDEL? > > > It will end up with > > (VACUUM_OPTION_NO_PARALLEL_CLEANUP | > > VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to > > VACUUM_OPTION_NO_PARALLEL, though. > > > > > > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 2 # bulkdelete can be done in > > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > > > > > > flag) > > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 3 # vacuumcleanup can be done in > > > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash, > > > > > > gin, gist, spgist, bloom will set this flag) > > > > > > VACUUM_OPTION_PARALLEL_CLEANUP 4 # vacuumcleanup can be done in > > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > > > > > and bloom will set this flag) > > > > > > > > > > > > Does something like this make sense? > > > > > > > > 3 and 4 confused me because 4 also looks conditional. How about having > > > > two flags instead: one for doing parallel cleanup when not performed > > > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing > > > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)? > > > > > > > > > > Hmm, this is exactly what I intend to say with 3 and 4. I am not sure > > > what makes you think 4 is conditional. > > > > Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets > > 4 it doesn't need to set 3 because 4 means always doing cleanup in > > parallel. > > > > Yeah, that makes sense. They can just set 4. > > > > > > > > That way, we > > > > can have flags as follows and index AM chooses two flags, one from the > > > > first two flags for bulk deletion and another from next three flags > > > > for cleanup. > > > > > > > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0 > > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 > > > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2 > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3 > > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4 > > > > > > > > > > This also looks reasonable, but if there is an index that doesn't want > > > to support a parallel vacuum, it needs to set multiple flags. > > > > Right. It would be better to use uint16 as two uint8. I mean that if > > first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if > > next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags > > could be followings: > > > > VACUUM_OPTION_PARALLEL_BULKDEL 0x0001 > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100 > > VACUUM_OPTION_PARALLEL_CLEANUP 0x0200 > > > > Hmm, I think we should define these flags in the most simple way. > Your previous proposal sounds okay to me. > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, 13 Nov 2019 at 11:38, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Nov 13, 2019 at 6:53 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 12 Nov 2019 at 22:33, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > Hmm, I think we should define these flags in the most simple way. > > > Your previous proposal sounds okay to me. > > > > Okay. As you mentioned before, my previous proposal won't work for > > existing index AMs that don't set amparallelvacuumoptions. > > > > You mean to say it won't work because it has to set multiple flags > which means that if IndexAm user doesn't set the value of > amparallelvacuumoptions then it won't work? Yes. In my previous proposal every index AMs need to set two flags. > > > But since we > > have amcanparallelvacuum which is false by default I think we don't > > need to worry about backward compatibility problem. The existing index > > AM will use neither parallel bulk-deletion nor parallel cleanup by > > default. When it wants to support parallel vacuum they will set > > amparallelvacuumoptions as well as amcanparallelvacuum. > > > > Hmm, I was not thinking of multiple variables rather only one > variable. The default value should indicate that IndexAm doesn't > support a parallel vacuum. Yes. > It might be that we need to do it the way > I originally proposed the different values of amparallelvacuumoptions > or maybe some variant of it where the default value can clearly say > that IndexAm doesn't support a parallel vacuum. Okay. After more thoughts on your original proposal, what I get confused on your proposal is that there are two types of flags that enable and disable options. Looking at 2, 3 and 4, it looks like all options are disabled by default and setting these flags means to enable them. On the other hand looking at 1, it looks like these options are enabled by default and setting the flag means to disable it. 0 makes sense to me. So how about having 0, 2, 3 and 4? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the > > > > > updated version patch that also incorporated some comments I got so > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > > > > test the total delay time. > > > > > > > > > While reviewing the 0002, I got one doubt related to how we are > > > > dividing the maintainance_work_mem > > > > > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes) > > > > +{ > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */ > > > > + lvshared->maintenance_work_mem_worker = > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm : > > > > maintenance_work_mem; > > > > +} > > > > Is it fair to just consider the number of indexes which use > > > > maintenance_work_mem? Or we need to consider the number of worker as > > > > well. My point is suppose there are 10 indexes which will use the > > > > maintenance_work_mem but we are launching just 2 workers then what is > > > > the point in dividing the maintenance_work_mem by 10. > > > > > > > > IMHO the calculation should be like this > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ? > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers) : > > > > maintenance_work_mem; > > > > > > > > Am I missing something? > > > > > > No, I think you're right. On the other hand I think that dividing it > > > by the number of indexes that will use the mantenance_work_mem makes > > > sense when parallel degree > the number of such indexes. Suppose the > > > table has 2 indexes and there are 10 workers then we should divide the > > > maintenance_work_mem by 2 rather than 10 because it's possible that at > > > most 2 indexes that uses the maintenance_work_mem are processed in > > > parallel at a time. > > > > > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers). > > Thanks! I'll fix it in the next version patch. > One more comment. +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel, + int nindexes, IndexBulkDeleteResult **stats, + LVParallelState *lps) +{ + .... + if (ParallelVacuumIsActive(lps)) + { + + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes, + stats, lps); + + } + + for (idx = 0; idx < nindexes; idx++) + { + /* + * Skip indexes that we have already vacuumed during parallel index + * vacuuming. + */ + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx)) + continue; + + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples, + vacrelstats->old_live_tuples); + } +} In this function, if ParallelVacuumIsActive, we perform the parallel vacuum for all the index for which parallel vacuum is supported and once that is over we finish vacuuming remaining indexes for which parallel vacuum is not supported. But, my question is that inside lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers to finish their job then only we start with the sequential vacuuming shouldn't we start that immediately as soon as the leader participation is over in the parallel vacuum? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Nov 13, 2019 at 8:34 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 13 Nov 2019 at 11:38, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > It might be that we need to do it the way > > I originally proposed the different values of amparallelvacuumoptions > > or maybe some variant of it where the default value can clearly say > > that IndexAm doesn't support a parallel vacuum. > > Okay. After more thoughts on your original proposal, what I get > confused on your proposal is that there are two types of flags that > enable and disable options. Looking at 2, 3 and 4, it looks like all > options are disabled by default and setting these flags means to > enable them. On the other hand looking at 1, it looks like these > options are enabled by default and setting the flag means to disable > it. 0 makes sense to me. So how about having 0, 2, 3 and 4? > Yeah, 0,2,3 and 4 sounds reasonable to me. Earlier, Dilip also got confused with option 1. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Nov 13, 2019 at 9:12 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the > > > > > > updated version patch that also incorporated some comments I got so > > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > > > > > test the total delay time. > > > > > > > > > > > While reviewing the 0002, I got one doubt related to how we are > > > > > dividing the maintainance_work_mem > > > > > > > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes) > > > > > +{ > > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */ > > > > > + lvshared->maintenance_work_mem_worker = > > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm : > > > > > maintenance_work_mem; > > > > > +} > > > > > Is it fair to just consider the number of indexes which use > > > > > maintenance_work_mem? Or we need to consider the number of worker as > > > > > well. My point is suppose there are 10 indexes which will use the > > > > > maintenance_work_mem but we are launching just 2 workers then what is > > > > > the point in dividing the maintenance_work_mem by 10. > > > > > > > > > > IMHO the calculation should be like this > > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ? > > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers) : > > > > > maintenance_work_mem; > > > > > > > > > > Am I missing something? > > > > > > > > No, I think you're right. On the other hand I think that dividing it > > > > by the number of indexes that will use the mantenance_work_mem makes > > > > sense when parallel degree > the number of such indexes. Suppose the > > > > table has 2 indexes and there are 10 workers then we should divide the > > > > maintenance_work_mem by 2 rather than 10 because it's possible that at > > > > most 2 indexes that uses the maintenance_work_mem are processed in > > > > parallel at a time. > > > > > > > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers). > > > > Thanks! I'll fix it in the next version patch. > > > One more comment. > > +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel, > + int nindexes, IndexBulkDeleteResult **stats, > + LVParallelState *lps) > +{ > + .... > > + if (ParallelVacuumIsActive(lps)) > + { > > + > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes, > + stats, lps); > + > + } > + > + for (idx = 0; idx < nindexes; idx++) > + { > + /* > + * Skip indexes that we have already vacuumed during parallel index > + * vacuuming. > + */ > + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx)) > + continue; > + > + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples, > + vacrelstats->old_live_tuples); > + } > +} > > In this function, if ParallelVacuumIsActive, we perform the parallel > vacuum for all the index for which parallel vacuum is supported and > once that is over we finish vacuuming remaining indexes for which > parallel vacuum is not supported. But, my question is that inside > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers > to finish their job then only we start with the sequential vacuuming > shouldn't we start that immediately as soon as the leader > participation is over in the parallel vacuum? > + /* + * Since parallel workers cannot access data in temporary tables, parallel + * vacuum is not allowed for temporary relation. + */ + if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0) + { + ereport(WARNING, + (errmsg("skipping vacuum on \"%s\" --- cannot vacuum temporary tables in parallel", + RelationGetRelationName(onerel)))); + relation_close(onerel, lmode); + PopActiveSnapshot(); + CommitTransactionCommand(); + /* It's OK to proceed with ANALYZE on this table */ + return true; + } + If we can not support the parallel vacuum for the temporary table then shouldn't we fall back to the normal vacuum instead of skipping the table. I think it's not fair that if the user has given system-wide parallel vacuum then all the temp table will be skipped and not at all vacuumed then user need to again perform normal vacuum on those tables. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Nov 13, 2019 at 9:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Yeah, 0,2,3 and 4 sounds reasonable to me. Earlier, Dilip also got > confused with option 1. > Let me try to summarize the discussion on this point and see if others have any opinion on this matter. We need a way to allow IndexAm to specify whether it can participate in a parallel vacuum. As we know there are two phases of index-vacuum, bulkdelete and vacuumcleanup and in many cases, the bulkdelete performs the main deletion work and then vacuumcleanup just returns index statistics. So, for such cases, we don't want the second phase to be performed by a parallel vacuum worker. Now, if the bulkdelete phase is not performed, then vacuumcleanup can process the entire index in which case it is better to do that phase via parallel worker. OTOH, in some cases vacuumcleanup takes another pass over-index to reclaim empty pages and update record the same in FSM even if bulkdelete is performed. This happens in gin and bloom indexes. Then, we have an index where we do all the work in cleanup phase like in the case of brin indexes. Now, for this category of indexes, we want vacuumcleanup phase to be also performed by a parallel worker. In short different indexes have different requirements for which phase of index vacuum can be performed in parallel. Just to be clear, we can't perform both the phases (bulkdelete and cleanup) in one-go as bulk-delete can happen multiple times on a large index whereas vacuumcleanup is done once at the end. Based on these needs, we came up with a way to allow users to specify this information for IndexAm's. Basically, Indexam will expose a variable amparallelvacuumoptions which can have below options VACUUM_OPTION_NO_PARALLEL 1 << 0 # vacuum (neither bulkdelete nor vacuumcleanup) can't be performed in parallel VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 # bulkdelete can be done in parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this flag) VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 2 # vacuumcleanup can be done in parallel if bulkdelete is not performed (Indexes nbtree, brin, gin, gist, spgist, bloom will set this flag) VACUUM_OPTION_PARALLEL_CLEANUP 1 << 3 # vacuumcleanup can be done in parallel even if bulkdelete is already performed (Indexes gin, brin, and bloom will set this flag) We have discussed to expose this information via two variables but the above seems like a better idea to all the people involved. Any suggestions? Anyone thinks this is not the right way to expose this information or there is no need to expose this information or they have a better idea for this? Sawada-San, Dilip, feel free to correct me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the > > > > > > updated version patch that also incorporated some comments I got so > > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > > > > > test the total delay time. > > > > > > > > > > > While reviewing the 0002, I got one doubt related to how we are > > > > > dividing the maintainance_work_mem > > > > > > > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes) > > > > > +{ > > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */ > > > > > + lvshared->maintenance_work_mem_worker = > > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm : > > > > > maintenance_work_mem; > > > > > +} > > > > > Is it fair to just consider the number of indexes which use > > > > > maintenance_work_mem? Or we need to consider the number of worker as > > > > > well. My point is suppose there are 10 indexes which will use the > > > > > maintenance_work_mem but we are launching just 2 workers then what is > > > > > the point in dividing the maintenance_work_mem by 10. > > > > > > > > > > IMHO the calculation should be like this > > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ? > > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers) : > > > > > maintenance_work_mem; > > > > > > > > > > Am I missing something? > > > > > > > > No, I think you're right. On the other hand I think that dividing it > > > > by the number of indexes that will use the mantenance_work_mem makes > > > > sense when parallel degree > the number of such indexes. Suppose the > > > > table has 2 indexes and there are 10 workers then we should divide the > > > > maintenance_work_mem by 2 rather than 10 because it's possible that at > > > > most 2 indexes that uses the maintenance_work_mem are processed in > > > > parallel at a time. > > > > > > > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers). > > > > Thanks! I'll fix it in the next version patch. > > > One more comment. > > +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel, > + int nindexes, IndexBulkDeleteResult **stats, > + LVParallelState *lps) > +{ > + .... > > + if (ParallelVacuumIsActive(lps)) > + { > > + > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes, > + stats, lps); > + > + } > + > + for (idx = 0; idx < nindexes; idx++) > + { > + /* > + * Skip indexes that we have already vacuumed during parallel index > + * vacuuming. > + */ > + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx)) > + continue; > + > + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples, > + vacrelstats->old_live_tuples); > + } > +} > > In this function, if ParallelVacuumIsActive, we perform the parallel > vacuum for all the index for which parallel vacuum is supported and > once that is over we finish vacuuming remaining indexes for which > parallel vacuum is not supported. But, my question is that inside > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers > to finish their job then only we start with the sequential vacuuming > shouldn't we start that immediately as soon as the leader > participation is over in the parallel vacuum? If we do that, while the leader process is vacuuming indexes that don't not support parallel vacuum sequentially some workers might be vacuuming for other indexes. Isn't it a problem? If it's not problem, I think we can tie up indexes that don't support parallel vacuum to the leader and do parallel index vacuum. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Nov 12, 2019 at 3:14 PM Mahendra Singh <mahi6run@gmail.com> wrote: > > On Mon, 11 Nov 2019 at 16:36, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Nov 11, 2019 at 2:53 PM Mahendra Singh <mahi6run@gmail.com> wrote: > > > > > > > > > For small indexes also, we gained some performance by parallel vacuum. > > > > > > > Thanks for doing all these tests. It is clear with this and previous > > tests that this patch has benefit in wide variety of cases. However, > > we should try to see some worst cases as well. For example, if there > > are multiple indexes on a table and only one of them is large whereas > > all others are very small say having a few 100 or 1000 rows. > > > > Thanks Amit for your comments. > > I did some testing on the above suggested lines. Below is the summary: > Test case:(I created 16 indexes but only 1 index is large, other are very small) > create table test(a int, b int, c int, d int, e int, f int, g int, h int); > create index i3 on test (a) where a > 2000 and a < 3000; > create index i4 on test (a) where a > 3000 and a < 4000; > create index i5 on test (a) where a > 4000 and a < 5000; > create index i6 on test (a) where a > 5000 and a < 6000; > create index i7 on test (b) where a < 1000; > create index i8 on test (c) where a < 1000; > create index i9 on test (d) where a < 1000; > create index i10 on test (d) where a < 1000; > create index i11 on test (d) where a < 1000; > create index i12 on test (d) where a < 1000; > create index i13 on test (d) where a < 1000; > create index i14 on test (d) where a < 1000; > create index i15 on test (d) where a < 1000; > create index i16 on test (d) where a < 1000; > insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i; > delete from test where a %2=0; > > case 1: vacuum without using parallel workers. > vacuum test; > 228.259 ms > > case 2: vacuum with 1 parallel worker. > vacuum (parallel 1) test; > 251.725 ms > > case 3: vacuum with 3 parallel workers. > vacuum (parallel 3) test; > 259.986 > > From above results, it seems that if indexes are small, then parallel vacuum is not beneficial as compared to normal vacuum. > Right and that is what is expected as well. However, I think if somehow disallow very small indexes to use parallel worker, then it will be better. Can we use min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Nov 13, 2019 at 11:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Nov 13, 2019 at 9:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Yeah, 0,2,3 and 4 sounds reasonable to me. Earlier, Dilip also got > > confused with option 1. > > > > Let me try to summarize the discussion on this point and see if others > have any opinion on this matter. > > We need a way to allow IndexAm to specify whether it can participate > in a parallel vacuum. As we know there are two phases of > index-vacuum, bulkdelete and vacuumcleanup and in many cases, the > bulkdelete performs the main deletion work and then vacuumcleanup just > returns index statistics. So, for such cases, we don't want the second > phase to be performed by a parallel vacuum worker. Now, if the > bulkdelete phase is not performed, then vacuumcleanup can process the > entire index in which case it is better to do that phase via parallel > worker. > > OTOH, in some cases vacuumcleanup takes another pass over-index to > reclaim empty pages and update record the same in FSM even if > bulkdelete is performed. This happens in gin and bloom indexes. > Then, we have an index where we do all the work in cleanup phase like > in the case of brin indexes. Now, for this category of indexes, we > want vacuumcleanup phase to be also performed by a parallel worker. > > In short different indexes have different requirements for which phase > of index vacuum can be performed in parallel. Just to be clear, we > can't perform both the phases (bulkdelete and cleanup) in one-go as > bulk-delete can happen multiple times on a large index whereas > vacuumcleanup is done once at the end. > > Based on these needs, we came up with a way to allow users to specify > this information for IndexAm's. Basically, Indexam will expose a > variable amparallelvacuumoptions which can have below options > > VACUUM_OPTION_NO_PARALLEL 1 << 0 # vacuum (neither bulkdelete nor > vacuumcleanup) can't be performed in parallel > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 # bulkdelete can be done in > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > flag) > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 2 # vacuumcleanup can be > done in parallel if bulkdelete is not performed (Indexes nbtree, brin, > gin, gist, > spgist, bloom will set this flag) > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 3 # vacuumcleanup can be done in > parallel even if bulkdelete is already performed (Indexes gin, brin, > and bloom will set this flag) > > We have discussed to expose this information via two variables but the > above seems like a better idea to all the people involved. > > Any suggestions? Anyone thinks this is not the right way to expose > this information or there is no need to expose this information or > they have a better idea for this? > > Sawada-San, Dilip, feel free to correct me. Looks fine to me. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Nov 13, 2019 at 11:39 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > In this function, if ParallelVacuumIsActive, we perform the parallel > > vacuum for all the index for which parallel vacuum is supported and > > once that is over we finish vacuuming remaining indexes for which > > parallel vacuum is not supported. But, my question is that inside > > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers > > to finish their job then only we start with the sequential vacuuming > > shouldn't we start that immediately as soon as the leader > > participation is over in the parallel vacuum? > > If we do that, while the leader process is vacuuming indexes that > don't not support parallel vacuum sequentially some workers might be > vacuuming for other indexes. Isn't it a problem? > Can you please explain what problem do you see with that? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Nov 13, 2019 at 11:39 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the > > > > > > > updated version patch that also incorporated some comments I got so > > > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > > > > > > test the total delay time. > > > > > > > > > > > > > While reviewing the 0002, I got one doubt related to how we are > > > > > > dividing the maintainance_work_mem > > > > > > > > > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes) > > > > > > +{ > > > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */ > > > > > > + lvshared->maintenance_work_mem_worker = > > > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm : > > > > > > maintenance_work_mem; > > > > > > +} > > > > > > Is it fair to just consider the number of indexes which use > > > > > > maintenance_work_mem? Or we need to consider the number of worker as > > > > > > well. My point is suppose there are 10 indexes which will use the > > > > > > maintenance_work_mem but we are launching just 2 workers then what is > > > > > > the point in dividing the maintenance_work_mem by 10. > > > > > > > > > > > > IMHO the calculation should be like this > > > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ? > > > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers) : > > > > > > maintenance_work_mem; > > > > > > > > > > > > Am I missing something? > > > > > > > > > > No, I think you're right. On the other hand I think that dividing it > > > > > by the number of indexes that will use the mantenance_work_mem makes > > > > > sense when parallel degree > the number of such indexes. Suppose the > > > > > table has 2 indexes and there are 10 workers then we should divide the > > > > > maintenance_work_mem by 2 rather than 10 because it's possible that at > > > > > most 2 indexes that uses the maintenance_work_mem are processed in > > > > > parallel at a time. > > > > > > > > > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers). > > > > > > Thanks! I'll fix it in the next version patch. > > > > > One more comment. > > > > +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel, > > + int nindexes, IndexBulkDeleteResult **stats, > > + LVParallelState *lps) > > +{ > > + .... > > > > + if (ParallelVacuumIsActive(lps)) > > + { > > > > + > > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes, > > + stats, lps); > > + > > + } > > + > > + for (idx = 0; idx < nindexes; idx++) > > + { > > + /* > > + * Skip indexes that we have already vacuumed during parallel index > > + * vacuuming. > > + */ > > + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx)) > > + continue; > > + > > + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples, > > + vacrelstats->old_live_tuples); > > + } > > +} > > > > In this function, if ParallelVacuumIsActive, we perform the parallel > > vacuum for all the index for which parallel vacuum is supported and > > once that is over we finish vacuuming remaining indexes for which > > parallel vacuum is not supported. But, my question is that inside > > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers > > to finish their job then only we start with the sequential vacuuming > > shouldn't we start that immediately as soon as the leader > > participation is over in the parallel vacuum? > > If we do that, while the leader process is vacuuming indexes that > don't not support parallel vacuum sequentially some workers might be > vacuuming for other indexes. Isn't it a problem? I am not sure what could be the problem. If it's not problem, > I think we can tie up indexes that don't support parallel vacuum to > the leader and do parallel index vacuum. I am not sure whether we can do that or not. Because if we do a parallel vacuum from the leader for the indexes which don't support a parallel option then we will unnecessarily allocate the shared memory for those indexes (index stats). Moreover, I think it could also cause a problem in a multi-pass vacuum if we try to copy its stats into the shared memory. I think simple option would be that as soon as leader participation is over we can have a loop for all the indexes who don't support parallelism in that phase and after completing that we wait for the parallel workers to finish. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, 13 Nov 2019 at 17:57, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Nov 13, 2019 at 11:39 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > In this function, if ParallelVacuumIsActive, we perform the parallel > > > vacuum for all the index for which parallel vacuum is supported and > > > once that is over we finish vacuuming remaining indexes for which > > > parallel vacuum is not supported. But, my question is that inside > > > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers > > > to finish their job then only we start with the sequential vacuuming > > > shouldn't we start that immediately as soon as the leader > > > participation is over in the parallel vacuum? > > > > If we do that, while the leader process is vacuuming indexes that > > don't not support parallel vacuum sequentially some workers might be > > vacuuming for other indexes. Isn't it a problem? > > > > Can you please explain what problem do you see with that? I think it depends on index AM user expectation. If disabling parallel vacuum for an index means that index AM user doesn't just want to vacuum the index by parallel worker, it's not problem. But if it means that the user doesn't want to vacuum the index during other indexes is being processed in parallel it's unexpected behaviour for the user. I'm probably worrying too much. -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Nov 13, 2019 at 3:55 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 13 Nov 2019 at 17:57, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Nov 13, 2019 at 11:39 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > In this function, if ParallelVacuumIsActive, we perform the parallel > > > > vacuum for all the index for which parallel vacuum is supported and > > > > once that is over we finish vacuuming remaining indexes for which > > > > parallel vacuum is not supported. But, my question is that inside > > > > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers > > > > to finish their job then only we start with the sequential vacuuming > > > > shouldn't we start that immediately as soon as the leader > > > > participation is over in the parallel vacuum? > > > > > > If we do that, while the leader process is vacuuming indexes that > > > don't not support parallel vacuum sequentially some workers might be > > > vacuuming for other indexes. Isn't it a problem? > > > > > > > Can you please explain what problem do you see with that? > > I think it depends on index AM user expectation. If disabling parallel > vacuum for an index means that index AM user doesn't just want to > vacuum the index by parallel worker, it's not problem. But if it means > that the user doesn't want to vacuum the index during other indexes is > being processed in parallel it's unexpected behaviour for the user. > I would expect the earlier. > I'm probably worrying too much. > Yeah, we can keep the behavior with respect to your first expectation (If disabling parallel vacuum for an index means that index AM user doesn't just want to vacuum the index by parallel worker, it's not problem). It might not be difficult to change later if there is an example of such a case. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, 13 Nov 2019 at 18:49, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Nov 13, 2019 at 11:39 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada > > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the > > > > > > > > updated version patch that also incorporated some comments I got so > > > > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also > > > > > > > > test the total delay time. > > > > > > > > > > > > > > > While reviewing the 0002, I got one doubt related to how we are > > > > > > > dividing the maintainance_work_mem > > > > > > > > > > > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes) > > > > > > > +{ > > > > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */ > > > > > > > + lvshared->maintenance_work_mem_worker = > > > > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm : > > > > > > > maintenance_work_mem; > > > > > > > +} > > > > > > > Is it fair to just consider the number of indexes which use > > > > > > > maintenance_work_mem? Or we need to consider the number of worker as > > > > > > > well. My point is suppose there are 10 indexes which will use the > > > > > > > maintenance_work_mem but we are launching just 2 workers then what is > > > > > > > the point in dividing the maintenance_work_mem by 10. > > > > > > > > > > > > > > IMHO the calculation should be like this > > > > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ? > > > > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers) : > > > > > > > maintenance_work_mem; > > > > > > > > > > > > > > Am I missing something? > > > > > > > > > > > > No, I think you're right. On the other hand I think that dividing it > > > > > > by the number of indexes that will use the mantenance_work_mem makes > > > > > > sense when parallel degree > the number of such indexes. Suppose the > > > > > > table has 2 indexes and there are 10 workers then we should divide the > > > > > > maintenance_work_mem by 2 rather than 10 because it's possible that at > > > > > > most 2 indexes that uses the maintenance_work_mem are processed in > > > > > > parallel at a time. > > > > > > > > > > > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers). > > > > > > > > Thanks! I'll fix it in the next version patch. > > > > > > > One more comment. > > > > > > +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel, > > > + int nindexes, IndexBulkDeleteResult **stats, > > > + LVParallelState *lps) > > > +{ > > > + .... > > > > > > + if (ParallelVacuumIsActive(lps)) > > > + { > > > > > > + > > > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes, > > > + stats, lps); > > > + > > > + } > > > + > > > + for (idx = 0; idx < nindexes; idx++) > > > + { > > > + /* > > > + * Skip indexes that we have already vacuumed during parallel index > > > + * vacuuming. > > > + */ > > > + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx)) > > > + continue; > > > + > > > + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples, > > > + vacrelstats->old_live_tuples); > > > + } > > > +} > > > > > > In this function, if ParallelVacuumIsActive, we perform the parallel > > > vacuum for all the index for which parallel vacuum is supported and > > > once that is over we finish vacuuming remaining indexes for which > > > parallel vacuum is not supported. But, my question is that inside > > > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers > > > to finish their job then only we start with the sequential vacuuming > > > shouldn't we start that immediately as soon as the leader > > > participation is over in the parallel vacuum? > > > > If we do that, while the leader process is vacuuming indexes that > > don't not support parallel vacuum sequentially some workers might be > > vacuuming for other indexes. Isn't it a problem? > > I am not sure what could be the problem. > > If it's not problem, > > I think we can tie up indexes that don't support parallel vacuum to > > the leader and do parallel index vacuum. > > I am not sure whether we can do that or not. Because if we do a > parallel vacuum from the leader for the indexes which don't support a > parallel option then we will unnecessarily allocate the shared memory > for those indexes (index stats). Moreover, I think it could also > cause a problem in a multi-pass vacuum if we try to copy its stats > into the shared memory. > > I think simple option would be that as soon as leader participation is > over we can have a loop for all the indexes who don't support > parallelism in that phase and after completing that we wait for the > parallel workers to finish. Hmm I thought we don't allocate DSM for indexes which don't support both parallel bulk deletion and parallel cleanup and we can always assign indexes to the leader process if they don't support particular phase during parallel index vacuuming. But your suggestion sounds more simple. I'll incorporate your suggestion in the next version patch. Thanks! Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Nov 13, 2019 at 9:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > + /* > + * Since parallel workers cannot access data in temporary tables, parallel > + * vacuum is not allowed for temporary relation. > + */ > + if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0) > + { > + ereport(WARNING, > + (errmsg("skipping vacuum on \"%s\" --- cannot vacuum temporary > tables in parallel", > + RelationGetRelationName(onerel)))); > + relation_close(onerel, lmode); > + PopActiveSnapshot(); > + CommitTransactionCommand(); > + /* It's OK to proceed with ANALYZE on this table */ > + return true; > + } > + > > If we can not support the parallel vacuum for the temporary table then > shouldn't we fall back to the normal vacuum instead of skipping the > table. I think it's not fair that if the user has given system-wide > parallel vacuum then all the temp table will be skipped and not at all > vacuumed then user need to again perform normal vacuum on those > tables. > Good point. However, I think the current coding also makes sense for cases like "Vacuum (analyze, parallel 2) tmp_tab;". In such a case, it will skip the vacuum part of it but will perform analyze. Having said that, I can see the merit of your point and I also vote to follow your suggestion and add a note to the document unless it makes code look ugly. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Nov 13, 2019 at 9:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Yeah, 0,2,3 and 4 sounds reasonable to me. Earlier, Dilip also got > > confused with option 1. > > > > Let me try to summarize the discussion on this point and see if others > have any opinion on this matter. Thank you for summarizing. > > We need a way to allow IndexAm to specify whether it can participate > in a parallel vacuum. As we know there are two phases of > index-vacuum, bulkdelete and vacuumcleanup and in many cases, the > bulkdelete performs the main deletion work and then vacuumcleanup just > returns index statistics. So, for such cases, we don't want the second > phase to be performed by a parallel vacuum worker. Now, if the > bulkdelete phase is not performed, then vacuumcleanup can process the > entire index in which case it is better to do that phase via parallel > worker. > > OTOH, in some cases vacuumcleanup takes another pass over-index to > reclaim empty pages and update record the same in FSM even if > bulkdelete is performed. This happens in gin and bloom indexes. > Then, we have an index where we do all the work in cleanup phase like > in the case of brin indexes. Now, for this category of indexes, we > want vacuumcleanup phase to be also performed by a parallel worker. > > In short different indexes have different requirements for which phase > of index vacuum can be performed in parallel. Just to be clear, we > can't perform both the phases (bulkdelete and cleanup) in one-go as > bulk-delete can happen multiple times on a large index whereas > vacuumcleanup is done once at the end. > > Based on these needs, we came up with a way to allow users to specify > this information for IndexAm's. Basically, Indexam will expose a > variable amparallelvacuumoptions which can have below options > > VACUUM_OPTION_NO_PARALLEL 1 << 0 # vacuum (neither bulkdelete nor > vacuumcleanup) can't be performed in parallel I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't want to support parallel vacuum don't have to set anything. > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 # bulkdelete can be done in > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > flag) > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 2 # vacuumcleanup can be > done in parallel if bulkdelete is not performed (Indexes nbtree, brin, > gin, gist, > spgist, bloom will set this flag) > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 3 # vacuumcleanup can be done in > parallel even if bulkdelete is already performed (Indexes gin, brin, > and bloom will set this flag) I think gin and bloom don't need to set both but should set only VACUUM_OPTION_PARALLEL_CLEANUP. And I'm going to disallow index AMs to set both VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP by assertions, is that okay? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Based on these needs, we came up with a way to allow users to specify > > this information for IndexAm's. Basically, Indexam will expose a > > variable amparallelvacuumoptions which can have below options > > > > VACUUM_OPTION_NO_PARALLEL 1 << 0 # vacuum (neither bulkdelete nor > > vacuumcleanup) can't be performed in parallel > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't > want to support parallel vacuum don't have to set anything. > make sense. > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 # bulkdelete can be done in > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > > flag) > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 2 # vacuumcleanup can be > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin, > > gin, gist, > > spgist, bloom will set this flag) > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 3 # vacuumcleanup can be done in > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > and bloom will set this flag) > > I think gin and bloom don't need to set both but should set only > VACUUM_OPTION_PARALLEL_CLEANUP. > > And I'm going to disallow index AMs to set both > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP > by assertions, is that okay? > Sounds reasonable to me. Are you planning to include the changes related to I/O throttling based on the discussion in the nearby thread [1]? I think you can do that if you agree with the conclusion in the last email[1], otherwise, we can explore it separately. [1] - https://www.postgresql.org/message-id/CAA4eK1%2BuDgLwfnAhQWGpAe66D85PdkeBygZGVyX96%2BovN1PbOg%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > Based on these needs, we came up with a way to allow users to specify > > > this information for IndexAm's. Basically, Indexam will expose a > > > variable amparallelvacuumoptions which can have below options > > > > > > VACUUM_OPTION_NO_PARALLEL 1 << 0 # vacuum (neither bulkdelete nor > > > vacuumcleanup) can't be performed in parallel > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't > > want to support parallel vacuum don't have to set anything. > > > > make sense. > > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 # bulkdelete can be done in > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > > > flag) > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 2 # vacuumcleanup can be > > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin, > > > gin, gist, > > > spgist, bloom will set this flag) > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 3 # vacuumcleanup can be done in > > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > > and bloom will set this flag) > > > > I think gin and bloom don't need to set both but should set only > > VACUUM_OPTION_PARALLEL_CLEANUP. > > > > And I'm going to disallow index AMs to set both > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP > > by assertions, is that okay? > > > > Sounds reasonable to me. > > Are you planning to include the changes related to I/O throttling > based on the discussion in the nearby thread [1]? I think you can do > that if you agree with the conclusion in the last email[1], otherwise, > we can explore it separately. Yes I agreed. I'm going to include that changes in the next version patches. And I think we will be able to do more discussion based on the patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > Based on these needs, we came up with a way to allow users to specify > > > > this information for IndexAm's. Basically, Indexam will expose a > > > > variable amparallelvacuumoptions which can have below options > > > > > > > > VACUUM_OPTION_NO_PARALLEL 1 << 0 # vacuum (neither bulkdelete nor > > > > vacuumcleanup) can't be performed in parallel > > > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't > > > want to support parallel vacuum don't have to set anything. > > > > > > > make sense. > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 # bulkdelete can be done in > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > > > > flag) > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 2 # vacuumcleanup can be > > > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin, > > > > gin, gist, > > > > spgist, bloom will set this flag) > > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 3 # vacuumcleanup can be done in > > > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > > > and bloom will set this flag) > > > > > > I think gin and bloom don't need to set both but should set only > > > VACUUM_OPTION_PARALLEL_CLEANUP. > > > > > > And I'm going to disallow index AMs to set both > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP > > > by assertions, is that okay? > > > > > > > Sounds reasonable to me. > > > > Are you planning to include the changes related to I/O throttling > > based on the discussion in the nearby thread [1]? I think you can do > > that if you agree with the conclusion in the last email[1], otherwise, > > we can explore it separately. > > Yes I agreed. I'm going to include that changes in the next version > patches. And I think we will be able to do more discussion based on > the patch. > I've attached the latest version patch set. The patch set includes all discussed points regarding index AM options as well as shared cost balance. Also I added some test cases used all types of index AM. During developments I had one concern about the number of parallel workers to launch. In current design each index AMs can choose the participation of parallel bulk-deletion and parallel cleanup. That also means the number of parallel worker to launch might be different for each time of parallel bulk-deletion and parallel cleanup. In current patch the leader will always launch the number of indexes that support either one but it would not be efficient in some cases. For example, if we have 3 indexes supporting only parallel bulk-deletion and 2 indexes supporting only parallel index cleanup, we would launch 5 workers for each execution but some workers will do nothing at all. To deal with this problem, I wonder if we can improve the parallel query so that the leader process creates a parallel context with the maximum number of indexes and can launch a part of workers instead of all of them. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > I've attached the latest version patch set. The patch set includes all > discussed points regarding index AM options as well as shared cost > balance. Also I added some test cases used all types of index AM. > > During developments I had one concern about the number of parallel > workers to launch. In current design each index AMs can choose the > participation of parallel bulk-deletion and parallel cleanup. That > also means the number of parallel worker to launch might be different > for each time of parallel bulk-deletion and parallel cleanup. In > current patch the leader will always launch the number of indexes that > support either one but it would not be efficient in some cases. For > example, if we have 3 indexes supporting only parallel bulk-deletion > and 2 indexes supporting only parallel index cleanup, we would launch > 5 workers for each execution but some workers will do nothing at all. > To deal with this problem, I wonder if we can improve the parallel > query so that the leader process creates a parallel context with the > maximum number of indexes and can launch a part of workers instead of > all of them. > Can't we choose the number of workers as a maximum of "num_of_indexes_that_support_bulk_del" and "num_of_indexes_that_support_cleanup"? If we can do that, then we can always launch the required number of workers for each phase (bulk_del, cleanup). In your above example, it should choose 3 workers while creating a parallel context. Do you see any problem with that? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, 20 Nov 2019 at 17:02, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > I've attached the latest version patch set. The patch set includes all > > discussed points regarding index AM options as well as shared cost > > balance. Also I added some test cases used all types of index AM. > > > > During developments I had one concern about the number of parallel > > workers to launch. In current design each index AMs can choose the > > participation of parallel bulk-deletion and parallel cleanup. That > > also means the number of parallel worker to launch might be different > > for each time of parallel bulk-deletion and parallel cleanup. In > > current patch the leader will always launch the number of indexes that > > support either one but it would not be efficient in some cases. For > > example, if we have 3 indexes supporting only parallel bulk-deletion > > and 2 indexes supporting only parallel index cleanup, we would launch > > 5 workers for each execution but some workers will do nothing at all. > > To deal with this problem, I wonder if we can improve the parallel > > query so that the leader process creates a parallel context with the > > maximum number of indexes and can launch a part of workers instead of > > all of them. > > > > Can't we choose the number of workers as a maximum of > "num_of_indexes_that_support_bulk_del" and > "num_of_indexes_that_support_cleanup"? If we can do that, then we can > always launch the required number of workers for each phase (bulk_del, > cleanup). In your above example, it should choose 3 workers while > creating a parallel context. Do you see any problem with that? I might be missing something but if we create the parallel context with 3 workers the leader process always launches 3 workers. Therefore in the above case it launches 3 workers even in cleanup although 2 workers is enough. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Nov 20, 2019 at 4:04 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 20 Nov 2019 at 17:02, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > I've attached the latest version patch set. The patch set includes all > > > discussed points regarding index AM options as well as shared cost > > > balance. Also I added some test cases used all types of index AM. > > > > > > During developments I had one concern about the number of parallel > > > workers to launch. In current design each index AMs can choose the > > > participation of parallel bulk-deletion and parallel cleanup. That > > > also means the number of parallel worker to launch might be different > > > for each time of parallel bulk-deletion and parallel cleanup. In > > > current patch the leader will always launch the number of indexes that > > > support either one but it would not be efficient in some cases. For > > > example, if we have 3 indexes supporting only parallel bulk-deletion > > > and 2 indexes supporting only parallel index cleanup, we would launch > > > 5 workers for each execution but some workers will do nothing at all. > > > To deal with this problem, I wonder if we can improve the parallel > > > query so that the leader process creates a parallel context with the > > > maximum number of indexes and can launch a part of workers instead of > > > all of them. > > > > > > > Can't we choose the number of workers as a maximum of > > "num_of_indexes_that_support_bulk_del" and > > "num_of_indexes_that_support_cleanup"? If we can do that, then we can > > always launch the required number of workers for each phase (bulk_del, > > cleanup). In your above example, it should choose 3 workers while > > creating a parallel context. Do you see any problem with that? > > I might be missing something but if we create the parallel context > with 3 workers the leader process always launches 3 workers. Therefore > in the above case it launches 3 workers even in cleanup although 2 > workers is enough. > Right, so we can either extend parallel API to launch fewer workers than it has in parallel context as suggested by you or we can use separate parallel context for each phase. Going with the earlier has the benefit that we don't need to recreate the parallel context and the latter has the advantage that we won't keep additional shared memory allocated. BTW, what kind of API change you have in mind for the approach you are suggesting? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, 20 Nov 2019 at 20:36, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Nov 20, 2019 at 4:04 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 20 Nov 2019 at 17:02, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > I've attached the latest version patch set. The patch set includes all > > > > discussed points regarding index AM options as well as shared cost > > > > balance. Also I added some test cases used all types of index AM. > > > > > > > > During developments I had one concern about the number of parallel > > > > workers to launch. In current design each index AMs can choose the > > > > participation of parallel bulk-deletion and parallel cleanup. That > > > > also means the number of parallel worker to launch might be different > > > > for each time of parallel bulk-deletion and parallel cleanup. In > > > > current patch the leader will always launch the number of indexes that > > > > support either one but it would not be efficient in some cases. For > > > > example, if we have 3 indexes supporting only parallel bulk-deletion > > > > and 2 indexes supporting only parallel index cleanup, we would launch > > > > 5 workers for each execution but some workers will do nothing at all. > > > > To deal with this problem, I wonder if we can improve the parallel > > > > query so that the leader process creates a parallel context with the > > > > maximum number of indexes and can launch a part of workers instead of > > > > all of them. > > > > > > > > > > Can't we choose the number of workers as a maximum of > > > "num_of_indexes_that_support_bulk_del" and > > > "num_of_indexes_that_support_cleanup"? If we can do that, then we can > > > always launch the required number of workers for each phase (bulk_del, > > > cleanup). In your above example, it should choose 3 workers while > > > creating a parallel context. Do you see any problem with that? > > > > I might be missing something but if we create the parallel context > > with 3 workers the leader process always launches 3 workers. Therefore > > in the above case it launches 3 workers even in cleanup although 2 > > workers is enough. > > > > Right, so we can either extend parallel API to launch fewer workers > than it has in parallel context as suggested by you or we can use > separate parallel context for each phase. Going with the earlier has > the benefit that we don't need to recreate the parallel context and > the latter has the advantage that we won't keep additional shared > memory allocated. I also thought to use separate parallel contexts for each phase but can the same DSM be used by parallel workers who initiated from different parallel contexts? If not I think that doesn't work because the parallel vacuum needs to set data to DSM of ambulkdelete and then parallel workers for amvacuumcleanup needs to access it. > BTW, what kind of API change you have in mind for > the approach you are suggesting? I was thinking to add a new API, say LaunchParallelNWorkers(pcxt, n), where n is the number of workers the caller wants to launch and should be lower than the value in the parallel context. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Nov 21, 2019 at 6:53 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 20 Nov 2019 at 20:36, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Nov 20, 2019 at 4:04 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Wed, 20 Nov 2019 at 17:02, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > I've attached the latest version patch set. The patch set includes all > > > > > discussed points regarding index AM options as well as shared cost > > > > > balance. Also I added some test cases used all types of index AM. > > > > > > > > > > During developments I had one concern about the number of parallel > > > > > workers to launch. In current design each index AMs can choose the > > > > > participation of parallel bulk-deletion and parallel cleanup. That > > > > > also means the number of parallel worker to launch might be different > > > > > for each time of parallel bulk-deletion and parallel cleanup. In > > > > > current patch the leader will always launch the number of indexes that > > > > > support either one but it would not be efficient in some cases. For > > > > > example, if we have 3 indexes supporting only parallel bulk-deletion > > > > > and 2 indexes supporting only parallel index cleanup, we would launch > > > > > 5 workers for each execution but some workers will do nothing at all. > > > > > To deal with this problem, I wonder if we can improve the parallel > > > > > query so that the leader process creates a parallel context with the > > > > > maximum number of indexes and can launch a part of workers instead of > > > > > all of them. > > > > > > > > > > > > > Can't we choose the number of workers as a maximum of > > > > "num_of_indexes_that_support_bulk_del" and > > > > "num_of_indexes_that_support_cleanup"? If we can do that, then we can > > > > always launch the required number of workers for each phase (bulk_del, > > > > cleanup). In your above example, it should choose 3 workers while > > > > creating a parallel context. Do you see any problem with that? > > > > > > I might be missing something but if we create the parallel context > > > with 3 workers the leader process always launches 3 workers. Therefore > > > in the above case it launches 3 workers even in cleanup although 2 > > > workers is enough. > > > > > > > Right, so we can either extend parallel API to launch fewer workers > > than it has in parallel context as suggested by you or we can use > > separate parallel context for each phase. Going with the earlier has > > the benefit that we don't need to recreate the parallel context and > > the latter has the advantage that we won't keep additional shared > > memory allocated. > > I also thought to use separate parallel contexts for each phase but > can the same DSM be used by parallel workers who initiated from > different parallel contexts? If not I think that doesn't work because > the parallel vacuum needs to set data to DSM of ambulkdelete and then > parallel workers for amvacuumcleanup needs to access it. > We can probably copy the stats in local memory instead of pointing it to dsm after bulk-deletion, but I think that would unnecessary overhead and doesn't sound like a good idea. > > BTW, what kind of API change you have in mind for > > the approach you are suggesting? > > I was thinking to add a new API, say LaunchParallelNWorkers(pcxt, n), > where n is the number of workers the caller wants to launch and should > be lower than the value in the parallel context. > For that won't you need to duplicate most of the code of LaunchParallelWorkers or maybe move the entire code in LaunchParallelNWorkers and then LaunchParallelWorkers can also call it. Another idea could be to just extend the existing API LaunchParallelWorkers to take input parameter as the number of workers, do you see any problem with that or is there a reason you prefer to write a new API for this? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Nov 21, 2019 at 9:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Nov 21, 2019 at 6:53 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 20 Nov 2019 at 20:36, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Nov 20, 2019 at 4:04 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Wed, 20 Nov 2019 at 17:02, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > > I've attached the latest version patch set. The patch set includes all > > > > > > discussed points regarding index AM options as well as shared cost > > > > > > balance. Also I added some test cases used all types of index AM. > > > > > > > > > > > > During developments I had one concern about the number of parallel > > > > > > workers to launch. In current design each index AMs can choose the > > > > > > participation of parallel bulk-deletion and parallel cleanup. That > > > > > > also means the number of parallel worker to launch might be different > > > > > > for each time of parallel bulk-deletion and parallel cleanup. In > > > > > > current patch the leader will always launch the number of indexes that > > > > > > support either one but it would not be efficient in some cases. For > > > > > > example, if we have 3 indexes supporting only parallel bulk-deletion > > > > > > and 2 indexes supporting only parallel index cleanup, we would launch > > > > > > 5 workers for each execution but some workers will do nothing at all. > > > > > > To deal with this problem, I wonder if we can improve the parallel > > > > > > query so that the leader process creates a parallel context with the > > > > > > maximum number of indexes and can launch a part of workers instead of > > > > > > all of them. > > > > > > > > > > > > > > > > Can't we choose the number of workers as a maximum of > > > > > "num_of_indexes_that_support_bulk_del" and > > > > > "num_of_indexes_that_support_cleanup"? If we can do that, then we can > > > > > always launch the required number of workers for each phase (bulk_del, > > > > > cleanup). In your above example, it should choose 3 workers while > > > > > creating a parallel context. Do you see any problem with that? > > > > > > > > I might be missing something but if we create the parallel context > > > > with 3 workers the leader process always launches 3 workers. Therefore > > > > in the above case it launches 3 workers even in cleanup although 2 > > > > workers is enough. > > > > > > > > > > Right, so we can either extend parallel API to launch fewer workers > > > than it has in parallel context as suggested by you or we can use > > > separate parallel context for each phase. Going with the earlier has > > > the benefit that we don't need to recreate the parallel context and > > > the latter has the advantage that we won't keep additional shared > > > memory allocated. > > > > I also thought to use separate parallel contexts for each phase but > > can the same DSM be used by parallel workers who initiated from > > different parallel contexts? If not I think that doesn't work because > > the parallel vacuum needs to set data to DSM of ambulkdelete and then > > parallel workers for amvacuumcleanup needs to access it. > > > > We can probably copy the stats in local memory instead of pointing it > to dsm after bulk-deletion, but I think that would unnecessary > overhead and doesn't sound like a good idea. I agree that it will be unnecessary overhead. > > > > BTW, what kind of API change you have in mind for > > > the approach you are suggesting? > > > > I was thinking to add a new API, say LaunchParallelNWorkers(pcxt, n), > > where n is the number of workers the caller wants to launch and should > > be lower than the value in the parallel context. > > > > For that won't you need to duplicate most of the code of > LaunchParallelWorkers or maybe move the entire code in > LaunchParallelNWorkers and then LaunchParallelWorkers can also call > it. Another idea could be to just extend the existing API > LaunchParallelWorkers to take input parameter as the number of > workers, do you see any problem with that or is there a reason you > prefer to write a new API for this? I think we can pass an extra parameter to LaunchParallelWorkers therein we can try to launch min(pcxt->nworkers, n). Or we can put an assert (n <= pcxt->nworkers). -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > Based on these needs, we came up with a way to allow users to specify > > > > > this information for IndexAm's. Basically, Indexam will expose a > > > > > variable amparallelvacuumoptions which can have below options > > > > > > > > > > VACUUM_OPTION_NO_PARALLEL 1 << 0 # vacuum (neither bulkdelete nor > > > > > vacuumcleanup) can't be performed in parallel > > > > > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't > > > > want to support parallel vacuum don't have to set anything. > > > > > > > > > > make sense. > > > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 # bulkdelete can be done in > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > > > > > flag) > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 2 # vacuumcleanup can be > > > > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin, > > > > > gin, gist, > > > > > spgist, bloom will set this flag) > > > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 3 # vacuumcleanup can be done in > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > > > > and bloom will set this flag) > > > > > > > > I think gin and bloom don't need to set both but should set only > > > > VACUUM_OPTION_PARALLEL_CLEANUP. > > > > > > > > And I'm going to disallow index AMs to set both > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP > > > > by assertions, is that okay? > > > > > > > > > > Sounds reasonable to me. > > > > > > Are you planning to include the changes related to I/O throttling > > > based on the discussion in the nearby thread [1]? I think you can do > > > that if you agree with the conclusion in the last email[1], otherwise, > > > we can explore it separately. > > > > Yes I agreed. I'm going to include that changes in the next version > > patches. And I think we will be able to do more discussion based on > > the patch. > > > > I've attached the latest version patch set. The patch set includes all > discussed points regarding index AM options as well as shared cost > balance. Also I added some test cases used all types of index AM. > > During developments I had one concern about the number of parallel > workers to launch. In current design each index AMs can choose the > participation of parallel bulk-deletion and parallel cleanup. That > also means the number of parallel worker to launch might be different > for each time of parallel bulk-deletion and parallel cleanup. In > current patch the leader will always launch the number of indexes that > support either one but it would not be efficient in some cases. For > example, if we have 3 indexes supporting only parallel bulk-deletion > and 2 indexes supporting only parallel index cleanup, we would launch > 5 workers for each execution but some workers will do nothing at all. > To deal with this problem, I wonder if we can improve the parallel > query so that the leader process creates a parallel context with the > maximum number of indexes and can launch a part of workers instead of > all of them. > + + /* compute new balance by adding the local value */ + shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance); + new_balance = shared_balance + VacuumCostBalance; + /* also compute the total local balance */ + local_balance = VacuumCostBalanceLocal + VacuumCostBalance; + + if ((new_balance >= VacuumCostLimit) && + (local_balance > 0.5 * (VacuumCostLimit / nworkers))) + { + /* compute sleep time based on the local cost balance */ + msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit; + new_balance = shared_balance - VacuumCostBalanceLocal; + VacuumCostBalanceLocal = 0; + } + + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance, + &shared_balance, + new_balance)) + { + /* Updated successfully, break */ + break; + } While looking at the shared costing delay part, I have noticed that while checking the delay condition, we are considering local_balance which is VacuumCostBalanceLocal + VacuumCostBalance, but while computing the new balance we only reduce shared balance by VacuumCostBalanceLocal, I think it should be reduced with local_balance? I see that later we are adding VacuumCostBalance to the VacuumCostBalanceLocal so we are not loosing accounting for this balance. But, I feel it is not right that we compare based on one value and operate based on other. I think we can immediately set VacuumCostBalanceLocal += VacuumCostBalance before checking the condition. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Nov 21, 2019 at 10:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > > > > Based on these needs, we came up with a way to allow users to specify > > > > > > this information for IndexAm's. Basically, Indexam will expose a > > > > > > variable amparallelvacuumoptions which can have below options > > > > > > > > > > > > VACUUM_OPTION_NO_PARALLEL 1 << 0 # vacuum (neither bulkdelete nor > > > > > > vacuumcleanup) can't be performed in parallel > > > > > > > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't > > > > > want to support parallel vacuum don't have to set anything. > > > > > > > > > > > > > make sense. > > > > > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 # bulkdelete can be done in > > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > > > > > > flag) > > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 2 # vacuumcleanup can be > > > > > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin, > > > > > > gin, gist, > > > > > > spgist, bloom will set this flag) > > > > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 3 # vacuumcleanup can be done in > > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > > > > > and bloom will set this flag) > > > > > > > > > > I think gin and bloom don't need to set both but should set only > > > > > VACUUM_OPTION_PARALLEL_CLEANUP. > > > > > > > > > > And I'm going to disallow index AMs to set both > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP > > > > > by assertions, is that okay? > > > > > > > > > > > > > Sounds reasonable to me. > > > > > > > > Are you planning to include the changes related to I/O throttling > > > > based on the discussion in the nearby thread [1]? I think you can do > > > > that if you agree with the conclusion in the last email[1], otherwise, > > > > we can explore it separately. > > > > > > Yes I agreed. I'm going to include that changes in the next version > > > patches. And I think we will be able to do more discussion based on > > > the patch. > > > > > > > I've attached the latest version patch set. The patch set includes all > > discussed points regarding index AM options as well as shared cost > > balance. Also I added some test cases used all types of index AM. > > > > During developments I had one concern about the number of parallel > > workers to launch. In current design each index AMs can choose the > > participation of parallel bulk-deletion and parallel cleanup. That > > also means the number of parallel worker to launch might be different > > for each time of parallel bulk-deletion and parallel cleanup. In > > current patch the leader will always launch the number of indexes that > > support either one but it would not be efficient in some cases. For > > example, if we have 3 indexes supporting only parallel bulk-deletion > > and 2 indexes supporting only parallel index cleanup, we would launch > > 5 workers for each execution but some workers will do nothing at all. > > To deal with this problem, I wonder if we can improve the parallel > > query so that the leader process creates a parallel context with the > > maximum number of indexes and can launch a part of workers instead of > > all of them. > > > + > + /* compute new balance by adding the local value */ > + shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance); > + new_balance = shared_balance + VacuumCostBalance; > > + /* also compute the total local balance */ > + local_balance = VacuumCostBalanceLocal + VacuumCostBalance; > + > + if ((new_balance >= VacuumCostLimit) && > + (local_balance > 0.5 * (VacuumCostLimit / nworkers))) > + { > + /* compute sleep time based on the local cost balance */ > + msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit; > + new_balance = shared_balance - VacuumCostBalanceLocal; > + VacuumCostBalanceLocal = 0; > + } > + > + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance, > + &shared_balance, > + new_balance)) > + { > + /* Updated successfully, break */ > + break; > + } > While looking at the shared costing delay part, I have noticed that > while checking the delay condition, we are considering local_balance > which is VacuumCostBalanceLocal + VacuumCostBalance, but while > computing the new balance we only reduce shared balance by > VacuumCostBalanceLocal, I think it should be reduced with > local_balance? I see that later we are adding VacuumCostBalance to > the VacuumCostBalanceLocal so we are not loosing accounting for this > balance. But, I feel it is not right that we compare based on one > value and operate based on other. I think we can immediately set > VacuumCostBalanceLocal += VacuumCostBalance before checking the > condition. > +/* + * index_parallelvacuum_estimate - estimate shared memory for parallel vacuum + * + * Currently, we don't pass any information to the AM-specific estimator, + * so it can probably only return a constant. In the future, we might need + * to pass more information. + */ +Size +index_parallelvacuum_estimate(Relation indexRelation) +{ + Size nbytes; + + RELATION_CHECKS; + + /* + * If amestimateparallelvacuum is not provided, assume only + * IndexBulkDeleteResult is needed. + */ + if (indexRelation->rd_indam->amestimateparallelvacuum != NULL) + { + nbytes = indexRelation->rd_indam->amestimateparallelvacuum(); + Assert(nbytes >= MAXALIGN(sizeof(IndexBulkDeleteResult))); + } + else + nbytes = MAXALIGN(sizeof(IndexBulkDeleteResult)); + + return nbytes; +} In v33-0001-Add-index-AM-field-and-callback-for-parallel-ind patch, I am a bit doubtful about this kind of arrangement, where the code in the "if" is always unreachable with the current AMs. I am not sure what is the best way to handle this, should we just drop the amestimateparallelvacuum altogether? Because currently, we are just providing a size estimate function without a copy function, even if the in future some Am give an estimate about the size of the stats, we can not directly memcpy the stat from the local memory to the shared memory, we might then need a copy function also from the AM so that it can flatten the stats and store in proper format? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, 21 Nov 2019 at 13:25, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Nov 21, 2019 at 9:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Nov 21, 2019 at 6:53 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Wed, 20 Nov 2019 at 20:36, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Wed, Nov 20, 2019 at 4:04 PM Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > On Wed, 20 Nov 2019 at 17:02, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > > > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > > > > I've attached the latest version patch set. The patch set includes all > > > > > > > discussed points regarding index AM options as well as shared cost > > > > > > > balance. Also I added some test cases used all types of index AM. > > > > > > > > > > > > > > During developments I had one concern about the number of parallel > > > > > > > workers to launch. In current design each index AMs can choose the > > > > > > > participation of parallel bulk-deletion and parallel cleanup. That > > > > > > > also means the number of parallel worker to launch might be different > > > > > > > for each time of parallel bulk-deletion and parallel cleanup. In > > > > > > > current patch the leader will always launch the number of indexes that > > > > > > > support either one but it would not be efficient in some cases. For > > > > > > > example, if we have 3 indexes supporting only parallel bulk-deletion > > > > > > > and 2 indexes supporting only parallel index cleanup, we would launch > > > > > > > 5 workers for each execution but some workers will do nothing at all. > > > > > > > To deal with this problem, I wonder if we can improve the parallel > > > > > > > query so that the leader process creates a parallel context with the > > > > > > > maximum number of indexes and can launch a part of workers instead of > > > > > > > all of them. > > > > > > > > > > > > > > > > > > > Can't we choose the number of workers as a maximum of > > > > > > "num_of_indexes_that_support_bulk_del" and > > > > > > "num_of_indexes_that_support_cleanup"? If we can do that, then we can > > > > > > always launch the required number of workers for each phase (bulk_del, > > > > > > cleanup). In your above example, it should choose 3 workers while > > > > > > creating a parallel context. Do you see any problem with that? > > > > > > > > > > I might be missing something but if we create the parallel context > > > > > with 3 workers the leader process always launches 3 workers. Therefore > > > > > in the above case it launches 3 workers even in cleanup although 2 > > > > > workers is enough. > > > > > > > > > > > > > Right, so we can either extend parallel API to launch fewer workers > > > > than it has in parallel context as suggested by you or we can use > > > > separate parallel context for each phase. Going with the earlier has > > > > the benefit that we don't need to recreate the parallel context and > > > > the latter has the advantage that we won't keep additional shared > > > > memory allocated. > > > > > > I also thought to use separate parallel contexts for each phase but > > > can the same DSM be used by parallel workers who initiated from > > > different parallel contexts? If not I think that doesn't work because > > > the parallel vacuum needs to set data to DSM of ambulkdelete and then > > > parallel workers for amvacuumcleanup needs to access it. > > > > > > > We can probably copy the stats in local memory instead of pointing it > > to dsm after bulk-deletion, but I think that would unnecessary > > overhead and doesn't sound like a good idea. Right. > > I agree that it will be unnecessary overhead. > > > > > > > BTW, what kind of API change you have in mind for > > > > the approach you are suggesting? > > > > > > I was thinking to add a new API, say LaunchParallelNWorkers(pcxt, n), > > > where n is the number of workers the caller wants to launch and should > > > be lower than the value in the parallel context. > > > > > > > For that won't you need to duplicate most of the code of > > LaunchParallelWorkers or maybe move the entire code in > > LaunchParallelNWorkers and then LaunchParallelWorkers can also call > > it. Another idea could be to just extend the existing API > > LaunchParallelWorkers to take input parameter as the number of > > workers, do you see any problem with that or is there a reason you > > prefer to write a new API for this? > Yeah, passing an extra parameter to LaunchParallelWorkers seems to be a good idea. I just thought that the current API is also reasonable because the caller of LaunchParallelWorkers doesn't need to care about the number of workers, which is helpful for some cases, for example, where the caller of CreateParallelContext and the caller of LaunchParallelWorker are in different components. However it's not be a problem since as far as I can see the current code there is no such designed feature (these functions are called in the same function). > I think we can pass an extra parameter to LaunchParallelWorkers > therein we can try to launch min(pcxt->nworkers, n). Or we can put an > assert (n <= pcxt->nworkers). I prefer to use min(pcxt->nworkers, n). Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 21 Nov 2019 at 14:16, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > > > > Based on these needs, we came up with a way to allow users to specify > > > > > > this information for IndexAm's. Basically, Indexam will expose a > > > > > > variable amparallelvacuumoptions which can have below options > > > > > > > > > > > > VACUUM_OPTION_NO_PARALLEL 1 << 0 # vacuum (neither bulkdelete nor > > > > > > vacuumcleanup) can't be performed in parallel > > > > > > > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't > > > > > want to support parallel vacuum don't have to set anything. > > > > > > > > > > > > > make sense. > > > > > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 # bulkdelete can be done in > > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > > > > > > flag) > > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 2 # vacuumcleanup can be > > > > > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin, > > > > > > gin, gist, > > > > > > spgist, bloom will set this flag) > > > > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 3 # vacuumcleanup can be done in > > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > > > > > and bloom will set this flag) > > > > > > > > > > I think gin and bloom don't need to set both but should set only > > > > > VACUUM_OPTION_PARALLEL_CLEANUP. > > > > > > > > > > And I'm going to disallow index AMs to set both > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP > > > > > by assertions, is that okay? > > > > > > > > > > > > > Sounds reasonable to me. > > > > > > > > Are you planning to include the changes related to I/O throttling > > > > based on the discussion in the nearby thread [1]? I think you can do > > > > that if you agree with the conclusion in the last email[1], otherwise, > > > > we can explore it separately. > > > > > > Yes I agreed. I'm going to include that changes in the next version > > > patches. And I think we will be able to do more discussion based on > > > the patch. > > > > > > > I've attached the latest version patch set. The patch set includes all > > discussed points regarding index AM options as well as shared cost > > balance. Also I added some test cases used all types of index AM. > > > > During developments I had one concern about the number of parallel > > workers to launch. In current design each index AMs can choose the > > participation of parallel bulk-deletion and parallel cleanup. That > > also means the number of parallel worker to launch might be different > > for each time of parallel bulk-deletion and parallel cleanup. In > > current patch the leader will always launch the number of indexes that > > support either one but it would not be efficient in some cases. For > > example, if we have 3 indexes supporting only parallel bulk-deletion > > and 2 indexes supporting only parallel index cleanup, we would launch > > 5 workers for each execution but some workers will do nothing at all. > > To deal with this problem, I wonder if we can improve the parallel > > query so that the leader process creates a parallel context with the > > maximum number of indexes and can launch a part of workers instead of > > all of them. > > > + > + /* compute new balance by adding the local value */ > + shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance); > + new_balance = shared_balance + VacuumCostBalance; > > + /* also compute the total local balance */ > + local_balance = VacuumCostBalanceLocal + VacuumCostBalance; > + > + if ((new_balance >= VacuumCostLimit) && > + (local_balance > 0.5 * (VacuumCostLimit / nworkers))) > + { > + /* compute sleep time based on the local cost balance */ > + msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit; > + new_balance = shared_balance - VacuumCostBalanceLocal; > + VacuumCostBalanceLocal = 0; > + } > + > + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance, > + &shared_balance, > + new_balance)) > + { > + /* Updated successfully, break */ > + break; > + } > While looking at the shared costing delay part, I have noticed that > while checking the delay condition, we are considering local_balance > which is VacuumCostBalanceLocal + VacuumCostBalance, but while > computing the new balance we only reduce shared balance by > VacuumCostBalanceLocal, I think it should be reduced with > local_balance? Right. > I see that later we are adding VacuumCostBalance to > the VacuumCostBalanceLocal so we are not loosing accounting for this > balance. But, I feel it is not right that we compare based on one > value and operate based on other. I think we can immediately set > VacuumCostBalanceLocal += VacuumCostBalance before checking the > condition. I think we should not do VacuumCostBalanceLocal += VacuumCostBalance inside the while loop because it's repeatedly executed until CAS operation succeeds. Instead we can move it before the loop and remove local_balance? The code would be like the following: if (VacuumSharedCostBalance != NULL) { : VacuumCostBalanceLocal += VacuumCostBalance; : /* Update the shared cost balance value atomically */ while (true) { uint32 shared_balance; uint32 new_balance; msec = 0; /* compute new balance by adding the local value */ shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance); new_balance = shared_balance + VacuumCostBalance; if ((new_balance >= VacuumCostLimit) && (VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers))) { /* compute sleep time based on the local cost balance */ msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit; new_balance = shared_balance - VacuumCostBalanceLocal; VacuumCostBalanceLocal = 0; } if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance, &shared_balance, new_balance)) { /* Updated successfully, break */ break; } } : VacuumCostBalance = 0; } Thoughts? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 21 Nov 2019 at 14:32, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Nov 21, 2019 at 10:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada > > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > Based on these needs, we came up with a way to allow users to specify > > > > > > > this information for IndexAm's. Basically, Indexam will expose a > > > > > > > variable amparallelvacuumoptions which can have below options > > > > > > > > > > > > > > VACUUM_OPTION_NO_PARALLEL 1 << 0 # vacuum (neither bulkdelete nor > > > > > > > vacuumcleanup) can't be performed in parallel > > > > > > > > > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't > > > > > > want to support parallel vacuum don't have to set anything. > > > > > > > > > > > > > > > > make sense. > > > > > > > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 # bulkdelete can be done in > > > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this > > > > > > > flag) > > > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 2 # vacuumcleanup can be > > > > > > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin, > > > > > > > gin, gist, > > > > > > > spgist, bloom will set this flag) > > > > > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 3 # vacuumcleanup can be done in > > > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin, > > > > > > > and bloom will set this flag) > > > > > > > > > > > > I think gin and bloom don't need to set both but should set only > > > > > > VACUUM_OPTION_PARALLEL_CLEANUP. > > > > > > > > > > > > And I'm going to disallow index AMs to set both > > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP > > > > > > by assertions, is that okay? > > > > > > > > > > > > > > > > Sounds reasonable to me. > > > > > > > > > > Are you planning to include the changes related to I/O throttling > > > > > based on the discussion in the nearby thread [1]? I think you can do > > > > > that if you agree with the conclusion in the last email[1], otherwise, > > > > > we can explore it separately. > > > > > > > > Yes I agreed. I'm going to include that changes in the next version > > > > patches. And I think we will be able to do more discussion based on > > > > the patch. > > > > > > > > > > I've attached the latest version patch set. The patch set includes all > > > discussed points regarding index AM options as well as shared cost > > > balance. Also I added some test cases used all types of index AM. > > > > > > During developments I had one concern about the number of parallel > > > workers to launch. In current design each index AMs can choose the > > > participation of parallel bulk-deletion and parallel cleanup. That > > > also means the number of parallel worker to launch might be different > > > for each time of parallel bulk-deletion and parallel cleanup. In > > > current patch the leader will always launch the number of indexes that > > > support either one but it would not be efficient in some cases. For > > > example, if we have 3 indexes supporting only parallel bulk-deletion > > > and 2 indexes supporting only parallel index cleanup, we would launch > > > 5 workers for each execution but some workers will do nothing at all. > > > To deal with this problem, I wonder if we can improve the parallel > > > query so that the leader process creates a parallel context with the > > > maximum number of indexes and can launch a part of workers instead of > > > all of them. > > > > > + > > + /* compute new balance by adding the local value */ > > + shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance); > > + new_balance = shared_balance + VacuumCostBalance; > > > > + /* also compute the total local balance */ > > + local_balance = VacuumCostBalanceLocal + VacuumCostBalance; > > + > > + if ((new_balance >= VacuumCostLimit) && > > + (local_balance > 0.5 * (VacuumCostLimit / nworkers))) > > + { > > + /* compute sleep time based on the local cost balance */ > > + msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit; > > + new_balance = shared_balance - VacuumCostBalanceLocal; > > + VacuumCostBalanceLocal = 0; > > + } > > + > > + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance, > > + &shared_balance, > > + new_balance)) > > + { > > + /* Updated successfully, break */ > > + break; > > + } > > While looking at the shared costing delay part, I have noticed that > > while checking the delay condition, we are considering local_balance > > which is VacuumCostBalanceLocal + VacuumCostBalance, but while > > computing the new balance we only reduce shared balance by > > VacuumCostBalanceLocal, I think it should be reduced with > > local_balance? I see that later we are adding VacuumCostBalance to > > the VacuumCostBalanceLocal so we are not loosing accounting for this > > balance. But, I feel it is not right that we compare based on one > > value and operate based on other. I think we can immediately set > > VacuumCostBalanceLocal += VacuumCostBalance before checking the > > condition. > > > > +/* > + * index_parallelvacuum_estimate - estimate shared memory for parallel vacuum > + * > + * Currently, we don't pass any information to the AM-specific estimator, > + * so it can probably only return a constant. In the future, we might need > + * to pass more information. > + */ > +Size > +index_parallelvacuum_estimate(Relation indexRelation) > +{ > + Size nbytes; > + > + RELATION_CHECKS; > + > + /* > + * If amestimateparallelvacuum is not provided, assume only > + * IndexBulkDeleteResult is needed. > + */ > + if (indexRelation->rd_indam->amestimateparallelvacuum != NULL) > + { > + nbytes = indexRelation->rd_indam->amestimateparallelvacuum(); > + Assert(nbytes >= MAXALIGN(sizeof(IndexBulkDeleteResult))); > + } > + else > + nbytes = MAXALIGN(sizeof(IndexBulkDeleteResult)); > + > + return nbytes; > +} > > In v33-0001-Add-index-AM-field-and-callback-for-parallel-ind patch, I > am a bit doubtful about this kind of arrangement, where the code in > the "if" is always unreachable with the current AMs. I am not sure > what is the best way to handle this, should we just drop the > amestimateparallelvacuum altogether? IIUC the motivation of amestimateparallelvacuum is for third party index AM. If it allocates memory more than IndexBulkDeleteResult like the current gist indexes (although we'll change it) it will break index statistics of other indexes or even can be cause of crash. I'm not sure there is such third party index AMs and it's true that all index AMs in postgres code will not use this callback as you mentioned, but I think we need to take care of it because such usage is still possible. > Because currently, we are just > providing a size estimate function without a copy function, even if > the in future some Am give an estimate about the size of the stats, we > can not directly memcpy the stat from the local memory to the shared > memory, we might then need a copy function also from the AM so that it > can flatten the stats and store in proper format? I might be missing something but why can't we copy the stats from the local memory to the DSM without the callback for copying stats? The lazy vacuum code will get the pointer of the stats that are allocated by index AM and the code can know the size of it. So I think we can just memcpy to DSM. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 21 Nov 2019, 13:52 Masahiko Sawada, <masahiko.sawada@2ndquadrant.com> wrote:
On Thu, 21 Nov 2019 at 14:16, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > >
> > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > Based on these needs, we came up with a way to allow users to specify
> > > > > > this information for IndexAm's. Basically, Indexam will expose a
> > > > > > variable amparallelvacuumoptions which can have below options
> > > > > >
> > > > > > VACUUM_OPTION_NO_PARALLEL 1 << 0 # vacuum (neither bulkdelete nor
> > > > > > vacuumcleanup) can't be performed in parallel
> > > > >
> > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't
> > > > > want to support parallel vacuum don't have to set anything.
> > > > >
> > > >
> > > > make sense.
> > > >
> > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 # bulkdelete can be done in
> > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > > > flag)
> > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 2 # vacuumcleanup can be
> > > > > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
> > > > > > gin, gist,
> > > > > > spgist, bloom will set this flag)
> > > > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 3 # vacuumcleanup can be done in
> > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > > and bloom will set this flag)
> > > > >
> > > > > I think gin and bloom don't need to set both but should set only
> > > > > VACUUM_OPTION_PARALLEL_CLEANUP.
> > > > >
> > > > > And I'm going to disallow index AMs to set both
> > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP
> > > > > by assertions, is that okay?
> > > > >
> > > >
> > > > Sounds reasonable to me.
> > > >
> > > > Are you planning to include the changes related to I/O throttling
> > > > based on the discussion in the nearby thread [1]? I think you can do
> > > > that if you agree with the conclusion in the last email[1], otherwise,
> > > > we can explore it separately.
> > >
> > > Yes I agreed. I'm going to include that changes in the next version
> > > patches. And I think we will be able to do more discussion based on
> > > the patch.
> > >
> >
> > I've attached the latest version patch set. The patch set includes all
> > discussed points regarding index AM options as well as shared cost
> > balance. Also I added some test cases used all types of index AM.
> >
> > During developments I had one concern about the number of parallel
> > workers to launch. In current design each index AMs can choose the
> > participation of parallel bulk-deletion and parallel cleanup. That
> > also means the number of parallel worker to launch might be different
> > for each time of parallel bulk-deletion and parallel cleanup. In
> > current patch the leader will always launch the number of indexes that
> > support either one but it would not be efficient in some cases. For
> > example, if we have 3 indexes supporting only parallel bulk-deletion
> > and 2 indexes supporting only parallel index cleanup, we would launch
> > 5 workers for each execution but some workers will do nothing at all.
> > To deal with this problem, I wonder if we can improve the parallel
> > query so that the leader process creates a parallel context with the
> > maximum number of indexes and can launch a part of workers instead of
> > all of them.
> >
> +
> + /* compute new balance by adding the local value */
> + shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
> + new_balance = shared_balance + VacuumCostBalance;
>
> + /* also compute the total local balance */
> + local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
> +
> + if ((new_balance >= VacuumCostLimit) &&
> + (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
> + {
> + /* compute sleep time based on the local cost balance */
> + msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> + new_balance = shared_balance - VacuumCostBalanceLocal;
> + VacuumCostBalanceLocal = 0;
> + }
> +
> + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
> + &shared_balance,
> + new_balance))
> + {
> + /* Updated successfully, break */
> + break;
> + }
> While looking at the shared costing delay part, I have noticed that
> while checking the delay condition, we are considering local_balance
> which is VacuumCostBalanceLocal + VacuumCostBalance, but while
> computing the new balance we only reduce shared balance by
> VacuumCostBalanceLocal, I think it should be reduced with
> local_balance?
Right.
> I see that later we are adding VacuumCostBalance to
> the VacuumCostBalanceLocal so we are not loosing accounting for this
> balance. But, I feel it is not right that we compare based on one
> value and operate based on other. I think we can immediately set
> VacuumCostBalanceLocal += VacuumCostBalance before checking the
> condition.
I think we should not do VacuumCostBalanceLocal += VacuumCostBalance
inside the while loop because it's repeatedly executed until CAS
operation succeeds. Instead we can move it before the loop and remove
local_balance?
Right, I meant before loop.
The code would be like the following:
if (VacuumSharedCostBalance != NULL)
{
:
VacuumCostBalanceLocal += VacuumCostBalance;
:
/* Update the shared cost balance value atomically */
while (true)
{
uint32 shared_balance;
uint32 new_balance;
msec = 0;
/* compute new balance by adding the local value */
shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
new_balance = shared_balance + VacuumCostBalance;
if ((new_balance >= VacuumCostLimit) &&
(VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers)))
{
/* compute sleep time based on the local cost balance */
msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
new_balance = shared_balance - VacuumCostBalanceLocal;
VacuumCostBalanceLocal = 0;
}
if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
&shared_balance,
new_balance))
{
/* Updated successfully, break */
break;
}
}
:
VacuumCostBalance = 0;
}
Thoughts?
Looks fine to me.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 21 Nov 2019, 14:15 Masahiko Sawada, <masahiko.sawada@2ndquadrant.com> wrote:
On Thu, 21 Nov 2019 at 14:32, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Nov 21, 2019 at 10:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Mon, 18 Nov 2019 at 15:38, Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > On Mon, 18 Nov 2019 at 15:34, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Mon, Nov 18, 2019 at 11:37 AM Masahiko Sawada
> > > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > > >
> > > > > > On Wed, 13 Nov 2019 at 14:31, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > > Based on these needs, we came up with a way to allow users to specify
> > > > > > > this information for IndexAm's. Basically, Indexam will expose a
> > > > > > > variable amparallelvacuumoptions which can have below options
> > > > > > >
> > > > > > > VACUUM_OPTION_NO_PARALLEL 1 << 0 # vacuum (neither bulkdelete nor
> > > > > > > vacuumcleanup) can't be performed in parallel
> > > > > >
> > > > > > I think VACUUM_OPTION_NO_PARALLEL can be 0 so that index AMs who don't
> > > > > > want to support parallel vacuum don't have to set anything.
> > > > > >
> > > > >
> > > > > make sense.
> > > > >
> > > > > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1 # bulkdelete can be done in
> > > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > > > > flag)
> > > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 2 # vacuumcleanup can be
> > > > > > > done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
> > > > > > > gin, gist,
> > > > > > > spgist, bloom will set this flag)
> > > > > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 3 # vacuumcleanup can be done in
> > > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > > > and bloom will set this flag)
> > > > > >
> > > > > > I think gin and bloom don't need to set both but should set only
> > > > > > VACUUM_OPTION_PARALLEL_CLEANUP.
> > > > > >
> > > > > > And I'm going to disallow index AMs to set both
> > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP and VACUUM_OPTION_PARALLEL_CLEANUP
> > > > > > by assertions, is that okay?
> > > > > >
> > > > >
> > > > > Sounds reasonable to me.
> > > > >
> > > > > Are you planning to include the changes related to I/O throttling
> > > > > based on the discussion in the nearby thread [1]? I think you can do
> > > > > that if you agree with the conclusion in the last email[1], otherwise,
> > > > > we can explore it separately.
> > > >
> > > > Yes I agreed. I'm going to include that changes in the next version
> > > > patches. And I think we will be able to do more discussion based on
> > > > the patch.
> > > >
> > >
> > > I've attached the latest version patch set. The patch set includes all
> > > discussed points regarding index AM options as well as shared cost
> > > balance. Also I added some test cases used all types of index AM.
> > >
> > > During developments I had one concern about the number of parallel
> > > workers to launch. In current design each index AMs can choose the
> > > participation of parallel bulk-deletion and parallel cleanup. That
> > > also means the number of parallel worker to launch might be different
> > > for each time of parallel bulk-deletion and parallel cleanup. In
> > > current patch the leader will always launch the number of indexes that
> > > support either one but it would not be efficient in some cases. For
> > > example, if we have 3 indexes supporting only parallel bulk-deletion
> > > and 2 indexes supporting only parallel index cleanup, we would launch
> > > 5 workers for each execution but some workers will do nothing at all.
> > > To deal with this problem, I wonder if we can improve the parallel
> > > query so that the leader process creates a parallel context with the
> > > maximum number of indexes and can launch a part of workers instead of
> > > all of them.
> > >
> > +
> > + /* compute new balance by adding the local value */
> > + shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
> > + new_balance = shared_balance + VacuumCostBalance;
> >
> > + /* also compute the total local balance */
> > + local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
> > +
> > + if ((new_balance >= VacuumCostLimit) &&
> > + (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
> > + {
> > + /* compute sleep time based on the local cost balance */
> > + msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> > + new_balance = shared_balance - VacuumCostBalanceLocal;
> > + VacuumCostBalanceLocal = 0;
> > + }
> > +
> > + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
> > + &shared_balance,
> > + new_balance))
> > + {
> > + /* Updated successfully, break */
> > + break;
> > + }
> > While looking at the shared costing delay part, I have noticed that
> > while checking the delay condition, we are considering local_balance
> > which is VacuumCostBalanceLocal + VacuumCostBalance, but while
> > computing the new balance we only reduce shared balance by
> > VacuumCostBalanceLocal, I think it should be reduced with
> > local_balance? I see that later we are adding VacuumCostBalance to
> > the VacuumCostBalanceLocal so we are not loosing accounting for this
> > balance. But, I feel it is not right that we compare based on one
> > value and operate based on other. I think we can immediately set
> > VacuumCostBalanceLocal += VacuumCostBalance before checking the
> > condition.
> >
>
> +/*
> + * index_parallelvacuum_estimate - estimate shared memory for parallel vacuum
> + *
> + * Currently, we don't pass any information to the AM-specific estimator,
> + * so it can probably only return a constant. In the future, we might need
> + * to pass more information.
> + */
> +Size
> +index_parallelvacuum_estimate(Relation indexRelation)
> +{
> + Size nbytes;
> +
> + RELATION_CHECKS;
> +
> + /*
> + * If amestimateparallelvacuum is not provided, assume only
> + * IndexBulkDeleteResult is needed.
> + */
> + if (indexRelation->rd_indam->amestimateparallelvacuum != NULL)
> + {
> + nbytes = indexRelation->rd_indam->amestimateparallelvacuum();
> + Assert(nbytes >= MAXALIGN(sizeof(IndexBulkDeleteResult)));
> + }
> + else
> + nbytes = MAXALIGN(sizeof(IndexBulkDeleteResult));
> +
> + return nbytes;
> +}
>
> In v33-0001-Add-index-AM-field-and-callback-for-parallel-ind patch, I
> am a bit doubtful about this kind of arrangement, where the code in
> the "if" is always unreachable with the current AMs. I am not sure
> what is the best way to handle this, should we just drop the
> amestimateparallelvacuum altogether?
IIUC the motivation of amestimateparallelvacuum is for third party
index AM. If it allocates memory more than IndexBulkDeleteResult like
the current gist indexes (although we'll change it) it will break
index statistics of other indexes or even can be cause of crash. I'm
not sure there is such third party index AMs and it's true that all
index AMs in postgres code will not use this callback as you
mentioned, but I think we need to take care of it because such usage
is still possible.
> Because currently, we are just
> providing a size estimate function without a copy function, even if
> the in future some Am give an estimate about the size of the stats, we
> can not directly memcpy the stat from the local memory to the shared
> memory, we might then need a copy function also from the AM so that it
> can flatten the stats and store in proper format?
I might be missing something but why can't we copy the stats from the
local memory to the DSM without the callback for copying stats? The
lazy vacuum code will get the pointer of the stats that are allocated
by index AM and the code can know the size of it. So I think we can
just memcpy to DSM.
Oh sure. But, what I meant is that if AM may keep pointers in its stats as GistBulkDeleteResult do so we might not be able to copy directly outside the AM. So I thought that if we have a call back for the copy then the AM can flatten the stats such that IndexBulkDeleteResult, followed by AM specific stats. Yeah but someone may argue that we might force the AM to return the stats in a form that it can be memcpy directly. So I think I am fine with the way it is.
On Thu, Nov 21, 2019 at 3:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, 21 Nov 2019, 14:15 Masahiko Sawada, <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Thu, 21 Nov 2019 at 14:32, Dilip Kumar <dilipbalaut@gmail.com> wrote: >> > >> > >> > In v33-0001-Add-index-AM-field-and-callback-for-parallel-ind patch, I >> > am a bit doubtful about this kind of arrangement, where the code in >> > the "if" is always unreachable with the current AMs. I am not sure >> > what is the best way to handle this, should we just drop the >> > amestimateparallelvacuum altogether? >> >> IIUC the motivation of amestimateparallelvacuum is for third party >> index AM. If it allocates memory more than IndexBulkDeleteResult like >> the current gist indexes (although we'll change it) it will break >> index statistics of other indexes or even can be cause of crash. I'm >> not sure there is such third party index AMs and it's true that all >> index AMs in postgres code will not use this callback as you >> mentioned, but I think we need to take care of it because such usage >> is still possible. >> >> > Because currently, we are just >> > providing a size estimate function without a copy function, even if >> > the in future some Am give an estimate about the size of the stats, we >> > can not directly memcpy the stat from the local memory to the shared >> > memory, we might then need a copy function also from the AM so that it >> > can flatten the stats and store in proper format? >> >> I might be missing something but why can't we copy the stats from the >> local memory to the DSM without the callback for copying stats? The >> lazy vacuum code will get the pointer of the stats that are allocated >> by index AM and the code can know the size of it. So I think we can >> just memcpy to DSM. > > > Oh sure. But, what I meant is that if AM may keep pointers in its stats as GistBulkDeleteResult do so we might not beable to copy directly outside the AM. So I thought that if we have a call back for the copy then the AM can flatten thestats such that IndexBulkDeleteResult, followed by AM specific stats. Yeah but someone may argue that we might forcethe AM to return the stats in a form that it can be memcpy directly. So I think I am fine with the way it is. > I think we have discussed this point earlier as well and the conclusion was to provide an API if there is a need for the same. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > I've attached the latest version patch set. The patch set includes all > discussed points regarding index AM options as well as shared cost > balance. Also I added some test cases used all types of index AM. > I have reviewed the first patch and made a number of modifications that include adding/modifying comments, made some corrections and modifications in the documentation. You can find my changes in v33-0001-delta-amit.patch. See, if those look okay to you, if so, please include those in the next version of the patch. I am attaching both your version of patch and delta changes by me. One comment on v33-0002-Add-parallel-option-to-VACUUM-command: + /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */ + est_shared = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN (nindexes))); .. + shared->offset = add_size(SizeOfLVShared, BITMAPLEN(nindexes)); Here, don't you need to do MAXALIGN to set offset as we are computing it that way while estimating shared memory? If not, then probably, some comments are required to explain it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Fri, Nov 22, 2019 at 2:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > I've attached the latest version patch set. The patch set includes all > > discussed points regarding index AM options as well as shared cost > > balance. Also I added some test cases used all types of index AM. > > > > I have reviewed the first patch and made a number of modifications > that include adding/modifying comments, made some corrections and > modifications in the documentation. You can find my changes in > v33-0001-delta-amit.patch. > I have continued my review for this patch series and reviewed/hacked the second patch. I have added/modified comments, changed function ordering in file to make them look consistent and a few other changes. You can find my changes in v33-0002-delta-amit.patch. Are you working on review comments given recently, if you have not started yet, then it might be better to prepare a patch atop of v33 version as I am also going to work on this patch series, that way it will be easy to merge changes. OTOH, if you are already working on those, then it is fine. I can merge any remaining changes with your new patch. Whatever be the case, please let me know. Few more comments on v33-0002-Add-parallel-option-to-VACUUM-command.patch: --------------------------------------------------------------------------------------------------------------------------- 1. + * leader process re-initializes the parallel context while keeping recorded + * dead tuples so that the leader can launch parallel workers again in the next + * time. In this sentence, it is not clear to me why we need to keep the recorded dead tuples while re-initialize parallel workers? The next time when workers are launched, they should process a new set of dead tuples, no? 2. lazy_parallel_vacuum_or_cleanup_indexes() { .. + /* + * Increment the active worker count. We cannot decrement until the + * all parallel workers finish. + */ + pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1); + + /* + * Join as parallel workers. The leader process alone does that in + * case where no workers launched. + */ + if (lps->leaderparticipates || lps->pcxt->nworkers_launched == 0) + vacuum_or_cleanup_indexes_worker (Irel, nindexes, stats, lps->lvshared, + vacrelstats->dead_tuples); + + /* + * Here, the indexes that had been skipped during parallel index vacuuming + * are remaining. If there are such indexes the leader process does vacuum + * or cleanup them one by one. + */ + nindexes_remains = nindexes - pg_atomic_read_u32(&(lps->lvshared->nprocessed)); + if (nindexes_remains > 0) + { + int i; +#ifdef USE_ASSERT_CHECKING + int nprocessed = 0; +#endif + + for (i = 0; i < nindexes; i++) + { + bool processed = !skip_parallel_index_vacuum(Irel[i], + lps->lvshared->for_cleanup, + lps->lvshared->first_time); + + /* Skip the already processed indexes */ + if (processed) + continue; + + if (lps->lvshared->for_cleanup) + lazy_cleanup_index(Irel[i], &stats[i], + vacrelstats->new_rel_tuples, + vacrelstats->tupcount_pages < vacrelstats->rel_pages); + else + lazy_vacuum_index(Irel[i], &stats[i], vacrelstats->dead_tuples, + vacrelstats- >old_live_tuples); +#ifdef USE_ASSERT_CHECKING + nprocessed++; +#endif + } +#ifdef USE_ASSERT_CHECKING + Assert (nprocessed == nindexes_remains); +#endif + } + + /* + * We have completed the index vacuum so decrement the active worker + * count. + */ + pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1); .. } Here, it seems that we can increment/decrement the VacuumActiveNWorkers even when there is no work performed by the leader backend. How about moving increment/decrement inside function vacuum_or_cleanup_indexes_worker? In that case, we need to do it in this function when we are actually doing an index vacuum or cleanup. After doing that the other usage of increment/decrement of VacuumActiveNWorkers in other function heap_parallel_vacuum_main can be removed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Fri, 22 Nov 2019 at 10:19, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > I've attached the latest version patch set. The patch set includes all > > discussed points regarding index AM options as well as shared cost > > balance. Also I added some test cases used all types of index AM. > > > > I have reviewed the first patch and made a number of modifications > that include adding/modifying comments, made some corrections and > modifications in the documentation. You can find my changes in > v33-0001-delta-amit.patch. See, if those look okay to you, if so, > please include those in the next version of the patch. I am attaching > both your version of patch and delta changes by me. Thank you. All changes look good to me. But after changed the 0002 patch the two macros for parallel vacuum options (VACUUM_OPTIONS_SUPPORT_XXX) is no longer necessary. So we can remove them and can add if we need them again. > > One comment on v33-0002-Add-parallel-option-to-VACUUM-command: > > + /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */ > + est_shared = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN > (nindexes))); > .. > + shared->offset = add_size(SizeOfLVShared, BITMAPLEN(nindexes)); > > Here, don't you need to do MAXALIGN to set offset as we are computing > it that way while estimating shared memory? If not, then probably, > some comments are required to explain it. You're right. Will fix it. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Nov 25, 2019 at 9:42 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Fri, 22 Nov 2019 at 10:19, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Nov 20, 2019 at 11:01 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > I've attached the latest version patch set. The patch set includes all > > > discussed points regarding index AM options as well as shared cost > > > balance. Also I added some test cases used all types of index AM. > > > > > > > I have reviewed the first patch and made a number of modifications > > that include adding/modifying comments, made some corrections and > > modifications in the documentation. You can find my changes in > > v33-0001-delta-amit.patch. See, if those look okay to you, if so, > > please include those in the next version of the patch. I am attaching > > both your version of patch and delta changes by me. > > Thank you. > > All changes look good to me. But after changed the 0002 patch the two > macros for parallel vacuum options (VACUUM_OPTIONS_SUPPORT_XXX) is no > longer necessary. So we can remove them and can add if we need them > again. > Sounds reasonable. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Nov 25, 2019 at 5:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > 2. > lazy_parallel_vacuum_or_cleanup_indexes() > { > .. > .. > } > > Here, it seems that we can increment/decrement the > VacuumActiveNWorkers even when there is no work performed by the > leader backend. How about moving increment/decrement inside function > vacuum_or_cleanup_indexes_worker? In that case, we need to do it in > this function when we are actually doing an index vacuum or cleanup. > After doing that the other usage of increment/decrement of > VacuumActiveNWorkers in other function heap_parallel_vacuum_main can > be removed. > One of my colleague Mahendra who was testing this patch found that stats for index reported by view pg_statio_all_tables are wrong for parallel vacuum. I debugged the issue and found that there were two problems in the stats related code. 1. The function get_indstats seem to be computing the wrong value of stats for the last index. 2. The function lazy_parallel_vacuum_or_cleanup_indexes() was not pointing to the computed stats when the parallel index scan is skipped. Find the above two fixes in the attached patch. This is on top of the patches I sent yesterday [1]. Some more comments on v33-0002-Add-parallel-option-to-VACUUM-command ------------------------------------------------------------------------------------------------------------- 1. The code in function lazy_parallel_vacuum_or_cleanup_indexes() that processes the indexes that have skipped parallel processing can be moved to a separate function. Further, the newly added code by the attached patch can also be moved to a separate function as the same code is used in function vacuum_or_cleanup_indexes_worker(). 2. +void +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc) { .. + stats = (IndexBulkDeleteResult **) + palloc0(nindexes * sizeof(IndexBulkDeleteResult *)); .. } It would be neat if we free this memory once it is used. 3. + /* + * Compute the number of indexes that can participate to parallel index + * vacuuming. + */ /to/in 4. The function lazy_parallel_vacuum_or_cleanup_indexes() launches workers without checking whether it needs to do the same or not. For ex. in cleanup phase, it is possible that we don't need to launch any worker, so it will be waste. It might be that you are already planning to handle it based on the previous comments/discussion in which case you can ignore this. [1] - https://www.postgresql.org/message-id/CAA4eK1LQ%2BYGjmSS-XqhuAa6eb%3DXykpx1LiT7UXJHmEKP%3D0QtsA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Tue, 26 Nov 2019 at 13:34, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Nov 25, 2019 at 5:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > 2. > > lazy_parallel_vacuum_or_cleanup_indexes() > > { > > .. > > .. > > } > > > > Here, it seems that we can increment/decrement the > > VacuumActiveNWorkers even when there is no work performed by the > > leader backend. How about moving increment/decrement inside function > > vacuum_or_cleanup_indexes_worker? In that case, we need to do it in > > this function when we are actually doing an index vacuum or cleanup. > > After doing that the other usage of increment/decrement of > > VacuumActiveNWorkers in other function heap_parallel_vacuum_main can > > be removed. Yeah we can move it inside vacuum_or_cleanup_indexes_worker but we still need to increment the count before processing the indexes that have skipped parallel operations because some workers might still be running yet. > > > > One of my colleague Mahendra who was testing this patch found that > stats for index reported by view pg_statio_all_tables are wrong for > parallel vacuum. I debugged the issue and found that there were two > problems in the stats related code. > 1. The function get_indstats seem to be computing the wrong value of > stats for the last index. > 2. The function lazy_parallel_vacuum_or_cleanup_indexes() was not > pointing to the computed stats when the parallel index scan is > skipped. > > Find the above two fixes in the attached patch. This is on top of the > patches I sent yesterday [1]. Thank you! During testing the current patch by myself I also found this bug. > > Some more comments on v33-0002-Add-parallel-option-to-VACUUM-command > ------------------------------------------------------------------------------------------------------------- > 1. The code in function lazy_parallel_vacuum_or_cleanup_indexes() > that processes the indexes that have skipped parallel processing can > be moved to a separate function. Further, the newly added code by the > attached patch can also be moved to a separate function as the same > code is used in function vacuum_or_cleanup_indexes_worker(). > > 2. > +void > +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc) > { > .. > + stats = (IndexBulkDeleteResult **) > + palloc0(nindexes * sizeof(IndexBulkDeleteResult *)); > .. > } > > It would be neat if we free this memory once it is used. > > 3. > + /* > + * Compute the number of indexes that can participate to parallel index > + * vacuuming. > + */ > > /to/in > > 4. The function lazy_parallel_vacuum_or_cleanup_indexes() launches > workers without checking whether it needs to do the same or not. For > ex. in cleanup phase, it is possible that we don't need to launch any > worker, so it will be waste. It might be that you are already > planning to handle it based on the previous comments/discussion in > which case you can ignore this. I've incorporated the comments I got so far including the above and the memory alignment issue. Therefore the attached v34 patch includes that changes and changes in v33-0002-delta-amit.patch and v33-0002-delta2-fix-stats-issue.patch. In this version I add an extra argument to LaunchParallelWorkers function and make the leader process launch the parallel workers as much as the particular phase needs. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, Nov 27, 2019 at 12:52 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > > I've incorporated the comments I got so far including the above and > the memory alignment issue. > Thanks, I will look into the new version. BTW, why haven't you posted 0001 patch (IndexAM API's patch)? I think without that we need to use the previous version for that. Also, I think we should post Dilip's patch related to Gist index [1] modifications for parallel vacuum or at least have a mention for that while posting a new version as without that even make check fails. [1] - https://www.postgresql.org/message-id/CAFiTN-uQY%2BB%2BCLb8W3YYdb7XmB9hyYFXkAy3C7RY%3D-YSWRV1DA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Nov 27, 2019 at 8:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Nov 27, 2019 at 12:52 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > I've incorporated the comments I got so far including the above and > > the memory alignment issue. > > > > Thanks, I will look into the new version. > Few comments: ----------------------- 1. +static void +vacuum_or_cleanup_indexes_worker(Relation *Irel, int nindexes, + IndexBulkDeleteResult **stats, + LVShared *lvshared, + LVDeadTuples *dead_tuples) +{ + /* Increment the active worker count */ + pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1); The above code is wrong because it is possible that this function is called even when there are no workers in which case VacuumActiveNWorkers will be NULL. 2. + /* Take over the shared balance value to heap scan */ + VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance); We can carry over shared balance only if the same is active. 3. + if (Irel[i]->rd_indam->amparallelvacuumoptions == + VACUUM_OPTION_NO_PARALLEL) + { + /* Set NULL as this index does not support parallel vacuum */ + lvshared->bitmap[i >> 3] |= 0 << (i & 0x07); Can we avoid setting this for each index by initializing bitmap as all NULL's as is done in the attached patch? 4. + /* + * Variables to control parallel index vacuuming. Index statistics + * returned from ambulkdelete and amvacuumcleanup is nullable variable + * length. 'offset' is NULL bitmap. Note that a 0 indicates a null, + * while 1 indicates non-null. The index statistics follows at end of + * struct. + */ This comment is not clear, so I have re-worded it. See, if the changed comment makes sense. I have fixed all the above issues, made a couple of other cosmetic changes and modified a few comments. See the changes in v34-0002-delta-amit. I am attaching just the delta patch on top of v34-0002-Add-parallel-option-to-VACUUM-command. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Wed, 27 Nov 2019 at 08:14, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Nov 27, 2019 at 12:52 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
>
> I've incorporated the comments I got so far including the above and
> the memory alignment issue.
>
Thanks, I will look into the new version. BTW, why haven't you posted
0001 patch (IndexAM API's patch)? I think without that we need to use
the previous version for that. Also, I think we should post Dilip's
patch related to Gist index [1] modifications for parallel vacuum or
at least have a mention for that while posting a new version as
without that even make check fails.
[1] - https://www.postgresql.org/message-id/CAFiTN-uQY%2BB%2BCLb8W3YYdb7XmB9hyYFXkAy3C7RY%3D-YSWRV1DA%40mail.gmail.com
I did some testing on the top of v33 patch set. By debugging, I was able to hit one assert in lazy_parallel_vacuum_or_cleanup_indexes.
TRAP: FailedAssertion("nprocessed == nindexes_remains", File: "vacuumlazy.c", Line: 2099)
I further debugged and found that this assert is not valid in all the cases. Here, nprocessed can be less than nindexes_remains in some cases because it is possible that parallel worker is launched for vacuum and idx count is incremented in vacuum_or_cleanup_indexes_worker for particular index but work is still not finished(lvshared->nprocessed is not incremented yet) so in that case, nprocessed will be less than nindexes_remains. I think, we should remove this assert.
I have one comment for assert used variable:
+#ifdef USE_ASSERT_CHECKING
+ int nprocessed = 0;
+#endif
+ int nprocessed = 0;
+#endif
I think, we can make above declaration as " int nprocessed PG_USED_FOR_ASSERTS_ONLY = 0" so that code looks good because this USE_ASSERT_CHECKING is used in 3 places in 20-30 code lines.
Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com
On Wed, 27 Nov 2019 at 13:26, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Nov 27, 2019 at 8:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Nov 27, 2019 at 12:52 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > I've incorporated the comments I got so far including the above and > > > the memory alignment issue. > > > > > > > Thanks, I will look into the new version. > > > > Few comments: > ----------------------- > 1. > +static void > +vacuum_or_cleanup_indexes_worker(Relation *Irel, int nindexes, > + IndexBulkDeleteResult **stats, > + LVShared *lvshared, > + LVDeadTuples *dead_tuples) > +{ > + /* Increment the active worker count */ > + pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1); > > The above code is wrong because it is possible that this function is > called even when there are no workers in which case > VacuumActiveNWorkers will be NULL. > > 2. > + /* Take over the shared balance value to heap scan */ > + VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance); > > We can carry over shared balance only if the same is active. > > 3. > + if (Irel[i]->rd_indam->amparallelvacuumoptions == > + VACUUM_OPTION_NO_PARALLEL) > + { > + > /* Set NULL as this index does not support parallel vacuum */ > + lvshared->bitmap[i >> 3] |= 0 << (i & 0x07); > > Can we avoid setting this for each index by initializing bitmap as all > NULL's as is done in the attached patch? > > 4. > + /* > + * Variables to control parallel index vacuuming. Index statistics > + * returned from ambulkdelete and amvacuumcleanup is nullable > variable > + * length. 'offset' is NULL bitmap. Note that a 0 indicates a null, > + * while 1 indicates non-null. The index statistics follows > at end of > + * struct. > + */ > > This comment is not clear, so I have re-worded it. See, if the > changed comment makes sense. > > I have fixed all the above issues, made a couple of other cosmetic > changes and modified a few comments. See the changes in > v34-0002-delta-amit. I am attaching just the delta patch on top of > v34-0002-Add-parallel-option-to-VACUUM-command. > Thank you for reviewing this patch. All changes you made looks good to me. I thought I already have posted all v34 patches but didn't, sorry. So I've attached v35 patch set that incorporated your changes and it includes Dilip's patch for gist index (0001). These patches can be applied on top of the current HEAD and make check should pass. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, 27 Nov 2019 at 13:28, Mahendra Singh <mahi6run@gmail.com> wrote: > > On Wed, 27 Nov 2019 at 08:14, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Wed, Nov 27, 2019 at 12:52 AM Masahiko Sawada >> <masahiko.sawada@2ndquadrant.com> wrote: >> > >> > >> > I've incorporated the comments I got so far including the above and >> > the memory alignment issue. >> > >> >> Thanks, I will look into the new version. BTW, why haven't you posted >> 0001 patch (IndexAM API's patch)? I think without that we need to use >> the previous version for that. Also, I think we should post Dilip's >> patch related to Gist index [1] modifications for parallel vacuum or >> at least have a mention for that while posting a new version as >> without that even make check fails. >> >> [1] - https://www.postgresql.org/message-id/CAFiTN-uQY%2BB%2BCLb8W3YYdb7XmB9hyYFXkAy3C7RY%3D-YSWRV1DA%40mail.gmail.com >> > > I did some testing on the top of v33 patch set. By debugging, I was able to hit one assert in lazy_parallel_vacuum_or_cleanup_indexes. > TRAP: FailedAssertion("nprocessed == nindexes_remains", File: "vacuumlazy.c", Line: 2099) > > I further debugged and found that this assert is not valid in all the cases. Here, nprocessed can be less than nindexes_remainsin some cases because it is possible that parallel worker is launched for vacuum and idx count is incrementedin vacuum_or_cleanup_indexes_worker for particular index but work is still not finished(lvshared->nprocessedis not incremented yet) so in that case, nprocessed will be less than nindexes_remains. I think,we should remove this assert. > > I have one comment for assert used variable: > > +#ifdef USE_ASSERT_CHECKING > + int nprocessed = 0; > +#endif > > I think, we can make above declaration as " int nprocessed PG_USED_FOR_ASSERTS_ONLY = 0" so that code looks good becausethis USE_ASSERT_CHECKING is used in 3 places in 20-30 code lines. Thank you for testing! Yes, I think your analysis is right. I've removed the assertion in v35 patch that I've just posted[1]. [1] https://www.postgresql.org/message-id/CA%2Bfd4k5oAuGuwZ9XaOTv%2BcTU8-dmA3RjpJ%2Bi4x5kt9VbAFse1w%40mail.gmail.com Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, 27 Nov 2019 at 23:14, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
On Wed, 27 Nov 2019 at 13:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 27, 2019 at 8:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Nov 27, 2019 at 12:52 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > >
> > > I've incorporated the comments I got so far including the above and
> > > the memory alignment issue.
> > >
> >
> > Thanks, I will look into the new version.
> >
>
> Few comments:
> -----------------------
> 1.
> +static void
> +vacuum_or_cleanup_indexes_worker(Relation *Irel, int nindexes,
> + IndexBulkDeleteResult **stats,
> + LVShared *lvshared,
> + LVDeadTuples *dead_tuples)
> +{
> + /* Increment the active worker count */
> + pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
>
> The above code is wrong because it is possible that this function is
> called even when there are no workers in which case
> VacuumActiveNWorkers will be NULL.
>
> 2.
> + /* Take over the shared balance value to heap scan */
> + VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
>
> We can carry over shared balance only if the same is active.
>
> 3.
> + if (Irel[i]->rd_indam->amparallelvacuumoptions ==
> + VACUUM_OPTION_NO_PARALLEL)
> + {
> +
> /* Set NULL as this index does not support parallel vacuum */
> + lvshared->bitmap[i >> 3] |= 0 << (i & 0x07);
>
> Can we avoid setting this for each index by initializing bitmap as all
> NULL's as is done in the attached patch?
>
> 4.
> + /*
> + * Variables to control parallel index vacuuming. Index statistics
> + * returned from ambulkdelete and amvacuumcleanup is nullable
> variable
> + * length. 'offset' is NULL bitmap. Note that a 0 indicates a null,
> + * while 1 indicates non-null. The index statistics follows
> at end of
> + * struct.
> + */
>
> This comment is not clear, so I have re-worded it. See, if the
> changed comment makes sense.
>
> I have fixed all the above issues, made a couple of other cosmetic
> changes and modified a few comments. See the changes in
> v34-0002-delta-amit. I am attaching just the delta patch on top of
> v34-0002-Add-parallel-option-to-VACUUM-command.
>
Thank you for reviewing this patch. All changes you made looks good to me.
I thought I already have posted all v34 patches but didn't, sorry. So
I've attached v35 patch set that incorporated your changes and it
includes Dilip's patch for gist index (0001). These patches can be
applied on top of the current HEAD and make check should pass.
Thanks for the re-based patches.
On the top of v35 patch, I can see one compilation warning.
parallel.c: In function ‘LaunchParallelWorkers’:
parallel.c:502:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
int i;
^
Above warning is due to one extra semicolon added at the end of declaration line in v35-0003 patch. Please fix this in next version.
+ int nworkers_to_launch = Min(nworkers, pcxt->nworkers);;
I will continue my testing on the top of v35 patch set and will post results.
Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com
On Wed, 27 Nov 2019 at 19:21, Mahendra Singh <mahi6run@gmail.com> wrote: > > > Thanks for the re-based patches. > > On the top of v35 patch, I can see one compilation warning. >> >> parallel.c: In function ‘LaunchParallelWorkers’: >> parallel.c:502:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] >> int i; >> ^ > > > Above warning is due to one extra semicolon added at the end of declaration line in v35-0003 patch. Please fix this innext version. > + int nworkers_to_launch = Min(nworkers, pcxt->nworkers);; Thanks. I will fix it in the next version patch. > > I will continue my testing on the top of v35 patch set and will post results. Thank you! Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, 28 Nov 2019 at 13:32, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
On Wed, 27 Nov 2019 at 19:21, Mahendra Singh <mahi6run@gmail.com> wrote:
>
>
> Thanks for the re-based patches.
>
> On the top of v35 patch, I can see one compilation warning.
>>
>> parallel.c: In function ‘LaunchParallelWorkers’:
>> parallel.c:502:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
>> int i;
>> ^
>
>
> Above warning is due to one extra semicolon added at the end of declaration line in v35-0003 patch. Please fix this in next version.
> + int nworkers_to_launch = Min(nworkers, pcxt->nworkers);;
Thanks. I will fix it in the next version patch.
>
> I will continue my testing on the top of v35 patch set and will post results.
While reviewing v35 patch set and doing testing, I found that if we disable leader participation, then we are launching 1 less parallel worker than total number of indexes. (I am using max_parallel_workers = 20, max_parallel_maintenance_workers = 20)
For example: If table have 3 indexes and we gave 6 parallel vacuum degree(leader participation is disabled), then I think, we should launch 3 parallel workers but we are launching 2 workers due to below check.
+ nworkers = lps->nindexes_parallel_bulkdel - 1;
+
+ /* Cap by the worker we computed at the beginning of parallel lazy vacuum */
+ nworkers = Min(nworkers, lps->pcxt->nworkers);
Please let me know your thoughts for this.
Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com
On Thu, Nov 28, 2019 at 4:10 PM Mahendra Singh <mahi6run@gmail.com> wrote: > > On Thu, 28 Nov 2019 at 13:32, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Wed, 27 Nov 2019 at 19:21, Mahendra Singh <mahi6run@gmail.com> wrote: >> > >> > >> > Thanks for the re-based patches. >> > >> > On the top of v35 patch, I can see one compilation warning. >> >> >> >> parallel.c: In function ‘LaunchParallelWorkers’: >> >> parallel.c:502:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] >> >> int i; >> >> ^ >> > >> > >> > Above warning is due to one extra semicolon added at the end of declaration line in v35-0003 patch. Please fix thisin next version. >> > + int nworkers_to_launch = Min(nworkers, pcxt->nworkers);; >> >> Thanks. I will fix it in the next version patch. >> >> > >> > I will continue my testing on the top of v35 patch set and will post results. > > > While reviewing v35 patch set and doing testing, I found that if we disable leader participation, then we are launching1 less parallel worker than total number of indexes. (I am using max_parallel_workers = 20, max_parallel_maintenance_workers= 20) > > For example: If table have 3 indexes and we gave 6 parallel vacuum degree(leader participation is disabled), then I think,we should launch 3 parallel workers but we are launching 2 workers due to below check. > + nworkers = lps->nindexes_parallel_bulkdel - 1; > + > + /* Cap by the worker we computed at the beginning of parallel lazy vacuum */ > + nworkers = Min(nworkers, lps->pcxt->nworkers); > > Please let me know your thoughts for this. > I think it is probably because this part of the code doesn't consider PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION. I think if we want we can change it but I am slightly nervous about the code complexity this will bring but maybe that is fine. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, 28 Nov 2019 at 11:57, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Nov 28, 2019 at 4:10 PM Mahendra Singh <mahi6run@gmail.com> wrote: > > > > On Thu, 28 Nov 2019 at 13:32, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > >> > >> On Wed, 27 Nov 2019 at 19:21, Mahendra Singh <mahi6run@gmail.com> wrote: > >> > > >> > > >> > Thanks for the re-based patches. > >> > > >> > On the top of v35 patch, I can see one compilation warning. > >> >> > >> >> parallel.c: In function ‘LaunchParallelWorkers’: > >> >> parallel.c:502:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] > >> >> int i; > >> >> ^ > >> > > >> > > >> > Above warning is due to one extra semicolon added at the end of declaration line in v35-0003 patch. Please fix thisin next version. > >> > + int nworkers_to_launch = Min(nworkers, pcxt->nworkers);; > >> > >> Thanks. I will fix it in the next version patch. > >> > >> > > >> > I will continue my testing on the top of v35 patch set and will post results. > > > > > > While reviewing v35 patch set and doing testing, I found that if we disable leader participation, then we are launching1 less parallel worker than total number of indexes. (I am using max_parallel_workers = 20, max_parallel_maintenance_workers= 20) > > > > For example: If table have 3 indexes and we gave 6 parallel vacuum degree(leader participation is disabled), then I think,we should launch 3 parallel workers but we are launching 2 workers due to below check. > > + nworkers = lps->nindexes_parallel_bulkdel - 1; > > + > > + /* Cap by the worker we computed at the beginning of parallel lazy vacuum */ > > + nworkers = Min(nworkers, lps->pcxt->nworkers); > > > > Please let me know your thoughts for this. Thanks! > I think it is probably because this part of the code doesn't consider > PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION. I think if we want we > can change it but I am slightly nervous about the code complexity this > will bring but maybe that is fine. Right. I'll try to change so that. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Nov 29, 2019 at 7:11 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 28 Nov 2019 at 11:57, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > I think it is probably because this part of the code doesn't consider > > PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION. I think if we want we > > can change it but I am slightly nervous about the code complexity this > > will bring but maybe that is fine. > > Right. I'll try to change so that. > I am thinking that as PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION is a debugging/testing facility, we should ideally separate this out from the main patch. BTW, I am hacking/reviewing the patch further, so request you to wait for a few day's time before we do anything in this regard. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Hello Its possible to change order of index processing by parallel leader? In v35 patchset I see following order: - start parallel processes - leader and parallel workers processed index lixt and possible skip some entries - after that parallel leader recheck index list and process the skipped indexes - WaitForParallelWorkersToFinish I think it would be better to: - start parallel processes - parallel leader goes through index list and process only indexes which are skip_parallel_index_vacuum = true - parallel workers processes indexes with skip_parallel_index_vacuum = false - parallel leader start participate with remainings parallel-safe index processing - WaitForParallelWorkersToFinish This would be less running time and better load balance across leader and workers in case of few non-parallel and few parallelindexes. (if this is expected and required by some reason, we need a comment in code) Also few notes to vacuumdb: Seems we need version check at least in vacuum_one_database and prepare_vacuum_command. Similar to SKIP_LOCKED or DISABLE_PAGE_SKIPPINGfeatures. discussion question: difference between --parallel and --jobs parameters will be confusing? We need more description forthis options? regards, Sergei
On Sat, 30 Nov 2019 at 19:18, Sergei Kornilov <sk@zsrv.org> wrote:
Hello
Its possible to change order of index processing by parallel leader? In v35 patchset I see following order:
- start parallel processes
- leader and parallel workers processed index lixt and possible skip some entries
- after that parallel leader recheck index list and process the skipped indexes
- WaitForParallelWorkersToFinish
I think it would be better to:
- start parallel processes
- parallel leader goes through index list and process only indexes which are skip_parallel_index_vacuum = true
- parallel workers processes indexes with skip_parallel_index_vacuum = false
- parallel leader start participate with remainings parallel-safe index processing
- WaitForParallelWorkersToFinish
This would be less running time and better load balance across leader and workers in case of few non-parallel and few parallel indexes.
(if this is expected and required by some reason, we need a comment in code)
Also few notes to vacuumdb:
Seems we need version check at least in vacuum_one_database and prepare_vacuum_command. Similar to SKIP_LOCKED or DISABLE_PAGE_SKIPPING features.
discussion question: difference between --parallel and --jobs parameters will be confusing? We need more description for this options
While doing testing with different server configuration settings, I am getting error (ERROR: no unpinned buffers available) in parallel vacuum but normal vacuum is working fine.
Test Setup:
max_worker_processes = 40
autovacuum = off
shared_buffers = 128kB
max_parallel_workers = 40
max_parallel_maintenance_workers = 40
vacuum_cost_limit = 2000
vacuum_cost_delay = 10
autovacuum = off
shared_buffers = 128kB
max_parallel_workers = 40
max_parallel_maintenance_workers = 40
vacuum_cost_limit = 2000
vacuum_cost_delay = 10
Table description: table have 16 indexes(14 btree, 1 hash, 1 BRIN ) and total 10,00,000 tuples and I am deleting all the tuples, then firing vacuum command.
Run attached .sql file (test_16_indexes.sql)
$ ./psql postgres
postgres=# \i test_16_indexes.sql
Re-start the server and do vacuum.
Case 1) normal vacuum:
postgres=# vacuum test ;
VACUUM
Time: 115174.470 ms (01:55.174)
VACUUM
Time: 115174.470 ms (01:55.174)
Case 2) parallel vacuum using 10 parallel workers:
postgres=# vacuum (parallel 10)test ;
ERROR: no unpinned buffers available
CONTEXT: parallel worker
postgres=#
ERROR: no unpinned buffers available
CONTEXT: parallel worker
postgres=#
This error is coming due to 128kB shared buffer. I think, I launched 10 parallel workers and all are working paralleling so due to less shared buffer, I am getting this error.
Is this expected behavior with small shared buffer size or we should try to come with a solution for this. Please let me know your thoughts.
Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com
Attachment
On Sat, Nov 30, 2019 at 7:18 PM Sergei Kornilov <sk@zsrv.org> wrote:
Hello
Its possible to change order of index processing by parallel leader? In v35 patchset I see following order:
- start parallel processes
- leader and parallel workers processed index lixt and possible skip some entries
- after that parallel leader recheck index list and process the skipped indexes
- WaitForParallelWorkersToFinish
I think it would be better to:
- start parallel processes
- parallel leader goes through index list and process only indexes which are skip_parallel_index_vacuum = true
- parallel workers processes indexes with skip_parallel_index_vacuum = false
- parallel leader start participate with remainings parallel-safe index processing
- WaitForParallelWorkersToFinish
This would be less running time and better load balance across leader and workers in case of few non-parallel and few parallel indexes.
Why do you think so? I think the advantage of the current approach is that once the parallel workers are launched, the leader can process indexes that don't support parallelism. So, both type of indexes can be processed at the same time.
Hi > I think the advantage of the current approach is that once the parallel workers are launched, the leader can process indexesthat don't support parallelism. So, both type of indexes can be processed at the same time. In lazy_parallel_vacuum_or_cleanup_indexes I see: /* * Join as a parallel worker. The leader process alone does that in * case where no workers launched. */ if (lps->leaderparticipates || lps->pcxt->nworkers_launched == 0) vacuum_or_cleanup_indexes_worker(Irel, nindexes, stats, lps->lvshared, vacrelstats->dead_tuples); /* * Here, the indexes that had been skipped during parallel index vacuuming * are remaining. If there are such indexes the leader process does vacuum * or cleanup them one by one. */ vacuum_or_cleanup_skipped_indexes(vacrelstats, Irel, nindexes, stats, lps); So parallel leader will process parallel indexes first along with parallel workers and skip non-parallel ones. Only afterend of the index list parallel leader will process non-parallel indexes one by one. In case of equal index processingtime parallel leader will process (count of parallel indexes)/(nworkers+1) + all non-parallel, while parallel workerswill process (count of parallel indexes)/(nworkers+1). I am wrong here? regards, Sergei
On Sun, 1 Dec 2019 at 11:06, Sergei Kornilov <sk@zsrv.org> wrote: > > Hi > > > I think the advantage of the current approach is that once the parallel workers are launched, the leader can processindexes that don't support parallelism. So, both type of indexes can be processed at the same time. > > In lazy_parallel_vacuum_or_cleanup_indexes I see: > > /* > * Join as a parallel worker. The leader process alone does that in > * case where no workers launched. > */ > if (lps->leaderparticipates || lps->pcxt->nworkers_launched == 0) > vacuum_or_cleanup_indexes_worker(Irel, nindexes, stats, lps->lvshared, > vacrelstats->dead_tuples); > > /* > * Here, the indexes that had been skipped during parallel index vacuuming > * are remaining. If there are such indexes the leader process does vacuum > * or cleanup them one by one. > */ > vacuum_or_cleanup_skipped_indexes(vacrelstats, Irel, nindexes, stats, > lps); > > So parallel leader will process parallel indexes first along with parallel workers and skip non-parallel ones. Only afterend of the index list parallel leader will process non-parallel indexes one by one. In case of equal index processingtime parallel leader will process (count of parallel indexes)/(nworkers+1) + all non-parallel, while parallel workerswill process (count of parallel indexes)/(nworkers+1). I am wrong here? > I think I got your point. Your proposal is that it's more efficient if we make the leader process vacuum the index that can be processed only the leader process (i.e. indexes not supporting parallel index vacuum) while workers are processing indexes supporting parallel index vacuum, right? That way, we can process indexes in parallel as much as possible. So maybe we can call vacuum_or_cleanup_skipped_indexes first and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that there are parallel-safe remaining indexes after the leader finished vacuum_or_cleanup_indexes_worker, as described on your proposal. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, 30 Nov 2019 at 04:06, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Nov 29, 2019 at 7:11 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 28 Nov 2019 at 11:57, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > I think it is probably because this part of the code doesn't consider > > > PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION. I think if we want we > > > can change it but I am slightly nervous about the code complexity this > > > will bring but maybe that is fine. > > > > Right. I'll try to change so that. > > > > I am thinking that as PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION is > a debugging/testing facility, we should ideally separate this out from > the main patch. BTW, I am hacking/reviewing the patch further, so > request you to wait for a few day's time before we do anything in this > regard. Sure, thank you so much. I'll wait for your comments and reviewing. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, 30 Nov 2019 at 22:11, Mahendra Singh <mahi6run@gmail.com> wrote: > > On Sat, 30 Nov 2019 at 19:18, Sergei Kornilov <sk@zsrv.org> wrote: >> >> Hello >> >> Its possible to change order of index processing by parallel leader? In v35 patchset I see following order: >> - start parallel processes >> - leader and parallel workers processed index lixt and possible skip some entries >> - after that parallel leader recheck index list and process the skipped indexes >> - WaitForParallelWorkersToFinish >> >> I think it would be better to: >> - start parallel processes >> - parallel leader goes through index list and process only indexes which are skip_parallel_index_vacuum = true >> - parallel workers processes indexes with skip_parallel_index_vacuum = false >> - parallel leader start participate with remainings parallel-safe index processing >> - WaitForParallelWorkersToFinish >> >> This would be less running time and better load balance across leader and workers in case of few non-parallel and fewparallel indexes. >> (if this is expected and required by some reason, we need a comment in code) >> >> Also few notes to vacuumdb: >> Seems we need version check at least in vacuum_one_database and prepare_vacuum_command. Similar to SKIP_LOCKED or DISABLE_PAGE_SKIPPINGfeatures. >> discussion question: difference between --parallel and --jobs parameters will be confusing? We need more description forthis options > > > While doing testing with different server configuration settings, I am getting error (ERROR: no unpinned buffers available)in parallel vacuum but normal vacuum is working fine. > > Test Setup: > max_worker_processes = 40 > autovacuum = off > shared_buffers = 128kB > max_parallel_workers = 40 > max_parallel_maintenance_workers = 40 > vacuum_cost_limit = 2000 > vacuum_cost_delay = 10 > > Table description: table have 16 indexes(14 btree, 1 hash, 1 BRIN ) and total 10,00,000 tuples and I am deleting all thetuples, then firing vacuum command. > Run attached .sql file (test_16_indexes.sql) > $ ./psql postgres > postgres=# \i test_16_indexes.sql > > Re-start the server and do vacuum. > Case 1) normal vacuum: > postgres=# vacuum test ; > VACUUM > Time: 115174.470 ms (01:55.174) > > Case 2) parallel vacuum using 10 parallel workers: > postgres=# vacuum (parallel 10)test ; > ERROR: no unpinned buffers available > CONTEXT: parallel worker > postgres=# > > This error is coming due to 128kB shared buffer. I think, I launched 10 parallel workers and all are working parallelingso due to less shared buffer, I am getting this error. > Thank you for testing! > Is this expected behavior with small shared buffer size or we should try to come with a solution for this. Please letme know your thoughts. I think it's normal behavior when the shared buffer is not enough. Since the total 10 processes were processing different pages at the same time and you set a small value to shared_buffers the shared buffer gets full easily. And you got the proper error. So I think in this case we should consider either to increase the shared buffer size or to decrease the parallel degree. I guess you can get this error even when you vacuum 10 different tables concurrently instead. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi > I think I got your point. Your proposal is that it's more efficient if > we make the leader process vacuum the index that can be processed only > the leader process (i.e. indexes not supporting parallel index vacuum) > while workers are processing indexes supporting parallel index vacuum, > right? That way, we can process indexes in parallel as much as > possible. Right > So maybe we can call vacuum_or_cleanup_skipped_indexes first > and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that > there are parallel-safe remaining indexes after the leader finished > vacuum_or_cleanup_indexes_worker, as described on your proposal. I meant that after processing missing indexes (not supporting parallel index vacuum), the leader can start processing indexesthat support the parallel index vacuum, along with parallel workers. Exactly call vacuum_or_cleanup_skipped_indexes after start parallel workers but before vacuum_or_cleanup_indexes_worker orsomething with similar effect. If we have 0 missed indexes - parallel vacuum will run as in current implementation, with leader participation. Sorry for my unclear english... regards, Sergei
On Sun, Dec 1, 2019 at 11:01 PM Sergei Kornilov <sk@zsrv.org> wrote: > > Hi > > > I think I got your point. Your proposal is that it's more efficient if > > we make the leader process vacuum the index that can be processed only > > the leader process (i.e. indexes not supporting parallel index vacuum) > > while workers are processing indexes supporting parallel index vacuum, > > right? That way, we can process indexes in parallel as much as > > possible. > > Right > > > So maybe we can call vacuum_or_cleanup_skipped_indexes first > > and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that > > there are parallel-safe remaining indexes after the leader finished > > vacuum_or_cleanup_indexes_worker, as described on your proposal. > > I meant that after processing missing indexes (not supporting parallel index vacuum), the leader can start processing indexesthat support the parallel index vacuum, along with parallel workers. > Exactly call vacuum_or_cleanup_skipped_indexes after start parallel workers but before vacuum_or_cleanup_indexes_workeror something with similar effect. > If we have 0 missed indexes - parallel vacuum will run as in current implementation, with leader participation. +1 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Sun, Dec 1, 2019 at 11:01 PM Sergei Kornilov <sk@zsrv.org> wrote:
Hi
> I think I got your point. Your proposal is that it's more efficient if
> we make the leader process vacuum the index that can be processed only
> the leader process (i.e. indexes not supporting parallel index vacuum)
> while workers are processing indexes supporting parallel index vacuum,
> right? That way, we can process indexes in parallel as much as
> possible.
Right
> So maybe we can call vacuum_or_cleanup_skipped_indexes first
> and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that
> there are parallel-safe remaining indexes after the leader finished
> vacuum_or_cleanup_indexes_worker, as described on your proposal.
I meant that after processing missing indexes (not supporting parallel index vacuum), the leader can start processing indexes that support the parallel index vacuum, along with parallel workers.
Your idea is good, but remember we have always considered a leader as one worker if the leader can participate. If we do what you are suggesting that won't be completely true as a leader will not completely participate in a parallel vacuum. It might be that we don't consider leader equivalent to one worker in the presence of indexes that don't support a parallel vacuum, but I am not sure if that really matters much. I think overall it should not matter much because we won't have that many indexes that don't support a parallel vacuum.
On Sun, 1 Dec 2019 at 18:31, Sergei Kornilov <sk@zsrv.org> wrote: > > Hi > > > I think I got your point. Your proposal is that it's more efficient if > > we make the leader process vacuum the index that can be processed only > > the leader process (i.e. indexes not supporting parallel index vacuum) > > while workers are processing indexes supporting parallel index vacuum, > > right? That way, we can process indexes in parallel as much as > > possible. > > Right > > > So maybe we can call vacuum_or_cleanup_skipped_indexes first > > and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that > > there are parallel-safe remaining indexes after the leader finished > > vacuum_or_cleanup_indexes_worker, as described on your proposal. > > I meant that after processing missing indexes (not supporting parallel index vacuum), the leader can start processing indexesthat support the parallel index vacuum, along with parallel workers. > Exactly call vacuum_or_cleanup_skipped_indexes after start parallel workers but before vacuum_or_cleanup_indexes_workeror something with similar effect. > If we have 0 missed indexes - parallel vacuum will run as in current implementation, with leader participation. I think your idea might not work well in some cases. That is, I think there are some cases where it's better if leader participates to parallel vacuum as a worker as soon as possible especially if a table has many indexes that designedly don't support parallel vacuum (e.g. bulkdelete of brin and using VACUUM_OPTION_PARALLEL_COND_CLEANUP). Suppose the table has both 3 indexes that support parallel vacuum and takes time 5 sec, 10 sec and 10 sec to vacuum respectively and 3 indexes that don't support and takes 2 sec for each. In current patch we launch 2 workers. Then they take two indexes to vacuum and will take 5 sec and 10 sec. At the same time the leader processes 3 indexes that don't support parallel index and takes 6 sec. Therefore after the worker finishes its index it takes the next index and takes 10 sec more. The total execution time will be 15 sec. On the other hand, if the leader participated to parallel vacuum first the total execution time can be 11 sec (taking 5 sec and 2 sec * 3). It's just an example, I'm not saying your idea is bad. ISTM the idea is good on an assumption that all indexes take the same time or take a long time so I'd also like to consider if this is true even in production and which approaches is better if we don't have such assumption. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/27/19 11:13 PM, Masahiko Sawada wrote: > Thank you for reviewing this patch. All changes you made looks good to me. > > I thought I already have posted all v34 patches but didn't, sorry. So > I've attached v35 patch set that incorporated your changes and it > includes Dilip's patch for gist index (0001). These patches can be > applied on top of the current HEAD and make check should pass. > Regards, While doing testing of this feature against v35- patches ( minus 004) on Master , getting crash when user connect to server using single mode and try to perform vacuum (parallel 1 ) o/p tushar@localhost bin]$ ./postgres --single -D data/ postgres 2019-12-03 12:49:26.967 +0530 [70300] LOG: database system was interrupted; last known up at 2019-12-03 12:48:51 +0530 2019-12-03 12:49:26.987 +0530 [70300] LOG: database system was not properly shut down; automatic recovery in progress 2019-12-03 12:49:26.990 +0530 [70300] LOG: invalid record length at 0/29F1638: wanted 24, got 0 2019-12-03 12:49:26.990 +0530 [70300] LOG: redo is not required PostgreSQL stand-alone backend 13devel backend> backend> vacuum full; backend> vacuum (parallel 1); TRAP: FailedAssertion("IsUnderPostmaster", File: "dsm.c", Line: 444) ./postgres(ExceptionalCondition+0x53)[0x8c6fa3] ./postgres[0x785ced] ./postgres(GetSessionDsmHandle+0xca)[0x49304a] ./postgres(InitializeParallelDSM+0x74)[0x519d64] ./postgres(heap_vacuum_rel+0x18d3)[0x4e47e3] ./postgres[0x631d9a] ./postgres(vacuum+0x444)[0x632f14] ./postgres(ExecVacuum+0x2bb)[0x63369b] ./postgres(standard_ProcessUtility+0x4cf)[0x7b312f] ./postgres[0x7b02c6] ./postgres[0x7b0dd3] ./postgres(PortalRun+0x162)[0x7b1b02] ./postgres[0x7ad874] ./postgres(PostgresMain+0x1002)[0x7aebf2] ./postgres(main+0x1ce)[0x48188e] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f4fe6908505] ./postgres[0x481b6a] Aborted (core dumped) -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company
On Tue, Dec 3, 2019 at 12:55 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
On 11/27/19 11:13 PM, Masahiko Sawada wrote:
> Thank you for reviewing this patch. All changes you made looks good to me.
>
> I thought I already have posted all v34 patches but didn't, sorry. So
> I've attached v35 patch set that incorporated your changes and it
> includes Dilip's patch for gist index (0001). These patches can be
> applied on top of the current HEAD and make check should pass.
> Regards,
While doing testing of this feature against v35- patches ( minus 004) on
Master ,
Thanks for doing the testing of these patches.
getting crash when user connect to server using single mode and try to
perform vacuum (parallel 1 ) o/p
tushar@localhost bin]$ ./postgres --single -D data/ postgres
2019-12-03 12:49:26.967 +0530 [70300] LOG: database system was
interrupted; last known up at 2019-12-03 12:48:51 +0530
2019-12-03 12:49:26.987 +0530 [70300] LOG: database system was not
properly shut down; automatic recovery in progress
2019-12-03 12:49:26.990 +0530 [70300] LOG: invalid record length at
0/29F1638: wanted 24, got 0
2019-12-03 12:49:26.990 +0530 [70300] LOG: redo is not required
PostgreSQL stand-alone backend 13devel
backend>
backend> vacuum full;
backend> vacuum (parallel 1);
The parallel vacuum shouldn't be allowed via standalone backends as we can't create DSM segments in that mode and similar is true for the parallel query. It should internally proceed with a serial vacuum. I'll fix it in the next version I am planning to post. BTW, it seems that the same problem will be there for parallel create index.
On Tue, Dec 3, 2019 at 12:56 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
On Sun, 1 Dec 2019 at 18:31, Sergei Kornilov <sk@zsrv.org> wrote:
>
> Hi
>
> > I think I got your point. Your proposal is that it's more efficient if
> > we make the leader process vacuum the index that can be processed only
> > the leader process (i.e. indexes not supporting parallel index vacuum)
> > while workers are processing indexes supporting parallel index vacuum,
> > right? That way, we can process indexes in parallel as much as
> > possible.
>
> Right
>
> > So maybe we can call vacuum_or_cleanup_skipped_indexes first
> > and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that
> > there are parallel-safe remaining indexes after the leader finished
> > vacuum_or_cleanup_indexes_worker, as described on your proposal.
>
> I meant that after processing missing indexes (not supporting parallel index vacuum), the leader can start processing indexes that support the parallel index vacuum, along with parallel workers.
> Exactly call vacuum_or_cleanup_skipped_indexes after start parallel workers but before vacuum_or_cleanup_indexes_worker or something with similar effect.
> If we have 0 missed indexes - parallel vacuum will run as in current implementation, with leader participation.
I think your idea might not work well in some cases.
Good point. I am also not sure whether it is a good idea to make the suggested change, but I think adding a comment on those lines is not a bad idea which I have done in the attached patch.
I have made some other changes as well.
1.
+ if (VacuumSharedCostBalance != NULL)
{
- double msec;
+ int nworkers = pg_atomic_read_u32
(VacuumActiveNWorkers);
+
+ /* At least count itself */
+ Assert(nworkers >= 1);
+
+ /* Update the shared cost
balance value atomically */
+ while (true)
+ {
+ uint32 shared_balance;
+ uint32 new_balance;
+
uint32 local_balance;
+
+ msec = 0;
+
+ /* compute new balance by adding the local value */
+
shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ new_balance = shared_balance + VacuumCostBalance;
+
/* also compute the total local balance */
+ local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
+
+
if ((new_balance >= VacuumCostLimit) &&
+ (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
+ {
+
/* compute sleep time based on the local cost balance */
+ msec = VacuumCostDelay *
VacuumCostBalanceLocal / VacuumCostLimit;
+ new_balance = shared_balance - VacuumCostBalanceLocal;
+
VacuumCostBalanceLocal = 0;
+ }
+
+ if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
+
&shared_balance,
+
new_balance))
+ {
+ /* Updated successfully, break */
+
break;
+ }
+ }
+
+ VacuumCostBalanceLocal += VacuumCostBalance;
I see multiple problems with this code. (a) if the VacuumSharedCostBalance is changed by the time of compare and exchange, then the next iteration might not compute the correct values as you might have reset VacuumCostBalanceLocal by that time. (b) In code line, new_balance = shared_balance - VacuumCostBalanceLocal, you need to use new_balance instead of shared_balance, otherwise, it won't account for the balance of the latest cycle. (c) In code line, msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;, I think you need to use local_balance for reasons similar to (b). (d) I think we can write this code with a lesser number of variables.
I have fixed all these problems and used a slightly different way to compute the parallel delay. See compute_parallel_delay() in the attached delta patch.
{
- double msec;
+ int nworkers = pg_atomic_read_u32
(VacuumActiveNWorkers);
+
+ /* At least count itself */
+ Assert(nworkers >= 1);
+
+ /* Update the shared cost
balance value atomically */
+ while (true)
+ {
+ uint32 shared_balance;
+ uint32 new_balance;
+
uint32 local_balance;
+
+ msec = 0;
+
+ /* compute new balance by adding the local value */
+
shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ new_balance = shared_balance + VacuumCostBalance;
+
/* also compute the total local balance */
+ local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
+
+
if ((new_balance >= VacuumCostLimit) &&
+ (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
+ {
+
/* compute sleep time based on the local cost balance */
+ msec = VacuumCostDelay *
VacuumCostBalanceLocal / VacuumCostLimit;
+ new_balance = shared_balance - VacuumCostBalanceLocal;
+
VacuumCostBalanceLocal = 0;
+ }
+
+ if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
+
&shared_balance,
+
new_balance))
+ {
+ /* Updated successfully, break */
+
break;
+ }
+ }
+
+ VacuumCostBalanceLocal += VacuumCostBalance;
I see multiple problems with this code. (a) if the VacuumSharedCostBalance is changed by the time of compare and exchange, then the next iteration might not compute the correct values as you might have reset VacuumCostBalanceLocal by that time. (b) In code line, new_balance = shared_balance - VacuumCostBalanceLocal, you need to use new_balance instead of shared_balance, otherwise, it won't account for the balance of the latest cycle. (c) In code line, msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;, I think you need to use local_balance for reasons similar to (b). (d) I think we can write this code with a lesser number of variables.
I have fixed all these problems and used a slightly different way to compute the parallel delay. See compute_parallel_delay() in the attached delta patch.
2.
+ /* Setup the shared cost-based vacuum delay and launch workers*/
+ if (nworkers > 0)
+ {
+ /*
+ * Reset the local value so that we compute cost balance during
+ * parallel index vacuuming.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ LaunchParallelWorkers(lps->pcxt, nworkers);
+
+ /* Enable shared costing iff we process indexes in parallel. */
+ if (lps->pcxt->nworkers_launched > 0)
+ {
+ /* Enable shared cost balance */
+ VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
+ VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
+ if (nworkers > 0)
+ {
+ /*
+ * Reset the local value so that we compute cost balance during
+ * parallel index vacuuming.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ LaunchParallelWorkers(lps->pcxt, nworkers);
+
+ /* Enable shared costing iff we process indexes in parallel. */
+ if (lps->pcxt->nworkers_launched > 0)
+ {
+ /* Enable shared cost balance */
+ VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
+ VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay.
+ */
+ pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance);
+ pg_atomic_write_u32(VacuumActiveNWorkers, 0);
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay.
+ */
+ pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance);
+ pg_atomic_write_u32(VacuumActiveNWorkers, 0);
This code has issues. We can't initialize VacuumSharedCostBalance/VacuumActiveNWorkers after launching workers as by that time some other worker would have changed its value. This has been reported offlist by Mahendra and I have fixed it.
3. Changed the name of functions which were too long and I think new names are more meaningful. If you don't agree with these changes, then we can discuss it.
4. Changed the order of parameters in many functions to match with existing code.
5. Refactored the code at a few places so that it can be easy to follow.
6. Added/Edited many comments and other cosmetic changes.
You can find all these changes in v35-0003-Code-review-amit.patch.
Few other things, I would like you to consider.
1. I think disable_parallel_leader_participation related code can be extracted into a separate patch as it is mainly a debug/test aid. You can also fix the problem reported by Mahendra in that context.
2. I think if we cam somehow disallow very small indexes to use parallel workers, then it will be better. Can we use min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum?
Attachment
On Tue, Dec 3, 2019 at 4:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few other things, I would like you to consider.1. I think disable_parallel_leader_participation related code can be extracted into a separate patch as it is mainly a debug/test aid. You can also fix the problem reported by Mahendra in that context.
2. I think if we cam somehow disallow very small indexes to use parallel workers, then it will be better. Can we use min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum?
Forgot one minor point. Please run pgindent on all the patches.
--
On Tue, 3 Dec 2019 at 16:27, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Dec 3, 2019 at 4:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:Few other things, I would like you to consider.1. I think disable_parallel_leader_participation related code can be extracted into a separate patch as it is mainly a debug/test aid. You can also fix the problem reported by Mahendra in that context.
2. I think if we cam somehow disallow very small indexes to use parallel workers, then it will be better. Can we use min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum?Forgot one minor point. Please run pgindent on all the patches.
While reviewing and testing v35 patch set, I noticed some problems. Below are some comments:
1.
/*
+ * Since parallel workers cannot access data in temporary tables, parallel
+ * vacuum is not allowed for temporary relation. However rather than
+ * skipping vacuum on the table, just disabling parallel option is better
+ * option in most cases.
+ */
+ if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
+ {
+ ereport(WARNING,
+ (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
+ RelationGetRelationName(onerel))));
+ params->nworkers = 0;
+ }
+ * Since parallel workers cannot access data in temporary tables, parallel
+ * vacuum is not allowed for temporary relation. However rather than
+ * skipping vacuum on the table, just disabling parallel option is better
+ * option in most cases.
+ */
+ if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
+ {
+ ereport(WARNING,
+ (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
+ RelationGetRelationName(onerel))));
+ params->nworkers = 0;
+ }
Here, I think, we should set params->nworkers = -1 to disable parallel vacuum for temporary tables. I noticed that even after warning, we were doing vacuum in parallel mode and were launching parallel workers that was wrong.
2.
Amit suggested me to check time taken by vacuum.sql regression test.
vacuum ... ok 20684 ms -------on the top of v35 patch set
vacuum ... ok 1020 ms -------without v35 patch set
Here, we can see that time taken by vacuum test case is increased too much due to parallel vacuum test cases so I will try to come with a small test case.
Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com
On Tue, 3 Dec 2019 at 11:55, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Dec 3, 2019 at 12:56 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: >> >> On Sun, 1 Dec 2019 at 18:31, Sergei Kornilov <sk@zsrv.org> wrote: >> > >> > Hi >> > >> > > I think I got your point. Your proposal is that it's more efficient if >> > > we make the leader process vacuum the index that can be processed only >> > > the leader process (i.e. indexes not supporting parallel index vacuum) >> > > while workers are processing indexes supporting parallel index vacuum, >> > > right? That way, we can process indexes in parallel as much as >> > > possible. >> > >> > Right >> > >> > > So maybe we can call vacuum_or_cleanup_skipped_indexes first >> > > and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that >> > > there are parallel-safe remaining indexes after the leader finished >> > > vacuum_or_cleanup_indexes_worker, as described on your proposal. >> > >> > I meant that after processing missing indexes (not supporting parallel index vacuum), the leader can start processingindexes that support the parallel index vacuum, along with parallel workers. >> > Exactly call vacuum_or_cleanup_skipped_indexes after start parallel workers but before vacuum_or_cleanup_indexes_workeror something with similar effect. >> > If we have 0 missed indexes - parallel vacuum will run as in current implementation, with leader participation. >> >> I think your idea might not work well in some cases. > > > Good point. I am also not sure whether it is a good idea to make the suggested change, but I think adding a comment onthose lines is not a bad idea which I have done in the attached patch. Thank you for updating the patch! > > I have made some other changes as well. > 1. > + if (VacuumSharedCostBalance != NULL) > { > - double msec; > + int nworkers = pg_atomic_read_u32 > (VacuumActiveNWorkers); > + > + /* At least count itself */ > + Assert(nworkers >= 1); > + > + /* Update the shared cost > balance value atomically */ > + while (true) > + { > + uint32 shared_balance; > + uint32 new_balance; > + > uint32 local_balance; > + > + msec = 0; > + > + /* compute new balance by adding the local value */ > + > shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance); > + new_balance = shared_balance + VacuumCostBalance; > + > /* also compute the total local balance */ > + local_balance = VacuumCostBalanceLocal + VacuumCostBalance; > + > + > if ((new_balance >= VacuumCostLimit) && > + (local_balance > 0.5 * (VacuumCostLimit / nworkers))) > + { > + > /* compute sleep time based on the local cost balance */ > + msec = VacuumCostDelay * > VacuumCostBalanceLocal / VacuumCostLimit; > + new_balance = shared_balance - VacuumCostBalanceLocal; > + > VacuumCostBalanceLocal = 0; > + } > + > + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance, > + > &shared_balance, > + > new_balance)) > + { > + /* Updated successfully, break */ > + > break; > + } > + } > + > + VacuumCostBalanceLocal += VacuumCostBalance; > > I see multiple problems with this code. (a) if the VacuumSharedCostBalance is changed by the time of compare and exchange,then the next iteration might not compute the correct values as you might have reset VacuumCostBalanceLocal by thattime. (b) In code line, new_balance = shared_balance - VacuumCostBalanceLocal, you need to use new_balance instead ofshared_balance, otherwise, it won't account for the balance of the latest cycle. (c) In code line, msec = VacuumCostDelay* VacuumCostBalanceLocal / VacuumCostLimit;, I think you need to use local_balance for reasons similar to(b). (d) I think we can write this code with a lesser number of variables. In your code, I think if two workers enter to compute_parallel_delay function at the same time, they add their local balance to VacuumSharedCostBalance and both workers sleep because both values reach the VacuumCostLimit. But either one worker should not sleep in this case. > > I have fixed all these problems and used a slightly different way to compute the parallel delay. See compute_parallel_delay()in the attached delta patch. > > 2. > + /* Setup the shared cost-based vacuum delay and launch workers*/ > + if (nworkers > 0) > + { > + /* > + * Reset the local value so that we compute cost balance during > + * parallel index vacuuming. > + */ > + VacuumCostBalance = 0; > + VacuumCostBalanceLocal = 0; > + > + LaunchParallelWorkers(lps->pcxt, nworkers); > + > + /* Enable shared costing iff we process indexes in parallel. */ > + if (lps->pcxt->nworkers_launched > 0) > + { > + /* Enable shared cost balance */ > + VacuumSharedCostBalance = &(lps->lvshared->cost_balance); > + VacuumActiveNWorkers = &(lps->lvshared->active_nworkers); > + > + /* > + * Set up shared cost balance and the number of active workers for > + * vacuum delay. > + */ > + pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance); > + pg_atomic_write_u32(VacuumActiveNWorkers, 0); > > This code has issues. We can't initialize VacuumSharedCostBalance/VacuumActiveNWorkers after launching workers as by thattime some other worker would have changed its value. This has been reported offlist by Mahendra and I have fixed it. > > 3. Changed the name of functions which were too long and I think new names are more meaningful. If you don't agree withthese changes, then we can discuss it. > > 4. Changed the order of parameters in many functions to match with existing code. > > 5. Refactored the code at a few places so that it can be easy to follow. > > 6. Added/Edited many comments and other cosmetic changes. > > You can find all these changes in v35-0003-Code-review-amit.patch. I've confirmed these changes and these look good to me. > Few other things, I would like you to consider. > 1. I think disable_parallel_leader_participation related code can be extracted into a separate patch as it is mainly adebug/test aid. You can also fix the problem reported by Mahendra in that context. Agreed. I'll create a patch for disable_parallel_leader_participation. > 2. I think if we cam somehow disallow very small indexes to use parallel workers, then it will be better. Can we use min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum? I think it's a good idea but I'm concerned that the default value of min_parallel_index_scan_size, 512kB, is too small for parallel vacuum purpose. Given that people who want to use parallel vacuum are likely to have a big table the indexes that can be skipped by the default value would be only brin indexes, I think. Also I guess that the reason why the default value is small is that min_parallel_index_scan_size compares to the number of blocks being scanned during index scan, not whole index. On the other hand in parallel vacuum we will compare it to the whole index blocks because the index vacuuming is always full scan. So I'm also concerned that user will get confused about reasonable setting. As another idea how about using min_parallel_table_scan_size instead? That is, we cannot do parallel vacuum on the table smaller than that value. I think this idea had already been proposed once in this thread but now I think it's also a good idea. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 3 Dec 2019 at 11:57, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Dec 3, 2019 at 4:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> Few other things, I would like you to consider. >> 1. I think disable_parallel_leader_participation related code can be extracted into a separate patch as it is mainlya debug/test aid. You can also fix the problem reported by Mahendra in that context. >> >> 2. I think if we cam somehow disallow very small indexes to use parallel workers, then it will be better. Can we use min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum? > > > Forgot one minor point. Please run pgindent on all the patches. Got it. I will run pgindent before sending patch from next time. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Dec 4, 2019 at 1:58 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 3 Dec 2019 at 11:55, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > In your code, I think if two workers enter to compute_parallel_delay > function at the same time, they add their local balance to > VacuumSharedCostBalance and both workers sleep because both values > reach the VacuumCostLimit. > True, but isn't it more appropriate because the local cost of any worker should be ideally added to shared cost as soon as it occurred? I mean to say that we are not adding any cost in shared balance without actually incurring it. Then we also consider the individual worker's local balance as well and sleep according to local balance. > > > 2. I think if we cam somehow disallow very small indexes to use parallel workers, then it will be better. Can we use min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum? > > I think it's a good idea but I'm concerned that the default value of > min_parallel_index_scan_size, 512kB, is too small for parallel vacuum > purpose. Given that people who want to use parallel vacuum are likely > to have a big table the indexes that can be skipped by the default > value would be only brin indexes, I think. > Yeah or probably hash indexes in some cases. > Also I guess that the > reason why the default value is small is that > min_parallel_index_scan_size compares to the number of blocks being > scanned during index scan, not whole index. On the other hand in > parallel vacuum we will compare it to the whole index blocks because > the index vacuuming is always full scan. So I'm also concerned that > user will get confused about reasonable setting. > This setting is about how much of index we are going to scan, so I am not sure if it matters whether it is part or full index scan. Also, in an index scan, we will launch multiple workers to scan that index and here we will consider launching just one worker. > As another idea how about using min_parallel_table_scan_size instead? > Hmm, yeah, that can be another option, but it might not be a good idea for partial indexes. > That is, we cannot do parallel vacuum on the table smaller than that > value. > Yeah, that makes sense, but I feel if we can directly target index scan size that may be a better option. If we can't use min_parallel_index_scan_size, then we can consider this. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 4, 2019 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Dec 4, 2019 at 1:58 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 3 Dec 2019 at 11:55, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > In your code, I think if two workers enter to compute_parallel_delay > > function at the same time, they add their local balance to > > VacuumSharedCostBalance and both workers sleep because both values > > reach the VacuumCostLimit. > > > > True, but isn't it more appropriate because the local cost of any > worker should be ideally added to shared cost as soon as it occurred? > I mean to say that we are not adding any cost in shared balance > without actually incurring it. Then we also consider the individual > worker's local balance as well and sleep according to local balance. Even I think it is better to add the balance to the shared balance at the earliest opportunity. Just consider the case that there are 5 workers and all have I/O balance of 20, and VacuumCostLimit is 50. So Actually, there combined balance is 100 (which is double of the VacuumCostLimit) but if we don't add immediately then none of the workers will sleep and it may go to the next cycle which is not very good. OTOH, if we add 20 immediately then check the shared balance then all the workers might go for sleep if their local balances have reached the limit but they will only sleep in proportion to their local balance. So IMHO, adding the current balance to shared balance early is more close to the model we are trying to implement i.e. shared cost accounting. > > > > > > 2. I think if we cam somehow disallow very small indexes to use parallel workers, then it will be better. Can weuse min_parallel_index_scan_size to decide whether a particular index can participate in a parallel vacuum? > > > > I think it's a good idea but I'm concerned that the default value of > > min_parallel_index_scan_size, 512kB, is too small for parallel vacuum > > purpose. Given that people who want to use parallel vacuum are likely > > to have a big table the indexes that can be skipped by the default > > value would be only brin indexes, I think. > > > > Yeah or probably hash indexes in some cases. > > > Also I guess that the > > reason why the default value is small is that > > min_parallel_index_scan_size compares to the number of blocks being > > scanned during index scan, not whole index. On the other hand in > > parallel vacuum we will compare it to the whole index blocks because > > the index vacuuming is always full scan. So I'm also concerned that > > user will get confused about reasonable setting. > > > > This setting is about how much of index we are going to scan, so I am > not sure if it matters whether it is part or full index scan. Also, > in an index scan, we will launch multiple workers to scan that index > and here we will consider launching just one worker. > > > As another idea how about using min_parallel_table_scan_size instead? > > > > Hmm, yeah, that can be another option, but it might not be a good idea > for partial indexes. > > > That is, we cannot do parallel vacuum on the table smaller than that > > value. > > > > Yeah, that makes sense, but I feel if we can directly target index > scan size that may be a better option. If we can't use > min_parallel_index_scan_size, then we can consider this. > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 4, 2019 at 2:01 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 3 Dec 2019 at 11:57, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Forgot one minor point. Please run pgindent on all the patches. > > Got it. I will run pgindent before sending patch from next time. > Today, I again read the patch and found a few more minor comments: 1. void -LaunchParallelWorkers(ParallelContext *pcxt) +LaunchParallelWorkers(ParallelContext *pcxt, int nworkers) I think we should add a comment for this API change which should indicate why we need to pass nworkers as an additional parameter when the context itself contains information about the number of workers. 2. At the beginning of a lazy vacuum (at lazy_scan_heap) we + * prepare the parallel context and initialize the DSM segment that contains + * shared information as well as the memory space for storing dead tuples. + * When starting either index vacuuming or index cleanup, we launch parallel + * worker processes. Once all indexes are processed the parallel worker + * processes exit. And then the leader process re-initializes the parallel + * context so that it can use the same DSM for multiple passses of index + * vacuum and for performing index cleanup. a. /And then the leader/After that, the leader .. This will avoid using 'and' two times in this sentence. b. typo, /passses/passes 3. + * Macro to check if we are in a parallel lazy vacuum. If true, we are + * in the parallel mode and prepared the DSM segment. How about changing it slightly as /and prepared the DSM segment./ and the DSM segment is initialized.? 4. - /* non-export function prototypes */ static void lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, Relation *Irel, int nindexes, bool aggressive); Spurious change, please remove. I think this is done by me in one of the versions. 5. + * function we exit from parallel mode. Index bulk-deletion results are + * stored in the DSM segment and update index statistics as a whole after + * exited from parallel mode since all writes are not allowed during parallel + * mode. Can we slightly change the above sentence as "Index bulk-deletion results are stored in the DSM segment and we update index statistics as a whole after exited from parallel mode since writes are not allowed during the parallel mode."? 6. /* + * Reset the local value so that we compute cost balance during + * parallel index vacuuming. + */ This comment is a bit unclear. How about "Reset the local cost values for leader backend as we have already accumulated the remaining balance of heap."? 7. + /* Do vacuum or cleanup one index */ How about changing it as: Do vacuum or cleanup of the index? 8. The copying the result normally + * happens only after the first time of index vacuuming. /The copying the ../The copying of the 9. + /* + * no longer need the locally allocated result and now + * stats[idx] points to the DSM segment. + */ How about changing it as below: "Now that the stats[idx] points to the DSM segment, we don't need the locally allocated results." -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, 4 Dec 2019 at 04:57, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Dec 4, 2019 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Dec 4, 2019 at 1:58 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Tue, 3 Dec 2019 at 11:55, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > In your code, I think if two workers enter to compute_parallel_delay > > > function at the same time, they add their local balance to > > > VacuumSharedCostBalance and both workers sleep because both values > > > reach the VacuumCostLimit. > > > > > > > True, but isn't it more appropriate because the local cost of any > > worker should be ideally added to shared cost as soon as it occurred? > > I mean to say that we are not adding any cost in shared balance > > without actually incurring it. Then we also consider the individual > > worker's local balance as well and sleep according to local balance. > > Even I think it is better to add the balance to the shared balance at > the earliest opportunity. Just consider the case that there are 5 > workers and all have I/O balance of 20, and VacuumCostLimit is 50. So > Actually, there combined balance is 100 (which is double of the > VacuumCostLimit) but if we don't add immediately then none of the > workers will sleep and it may go to the next cycle which is not very > good. OTOH, if we add 20 immediately then check the shared balance > then all the workers might go for sleep if their local balances have > reached the limit but they will only sleep in proportion to their > local balance. So IMHO, adding the current balance to shared balance > early is more close to the model we are trying to implement i.e. > shared cost accounting. I agree to add the balance as soon as it occurred. But the problem I'm concerned is, let's suppose we have 4 workers, the cost limit is 100 and the shared balance is now 95. Two workers, whom local balance(VacuumCostBalanceLocal) are 40, consumed I/O, added 10 to theirs local balance and entered compute_parallel_delay function at the same time. One worker adds 10 to the shared balance(VacuumSharedCostBalance) and another worker also adds 10 to the shared balance. The one worker then subtracts the local balance from the shared balance and sleeps because the shared cost is now 115 (> the cost limit) and its local balance is 50 (> 0.5*(100/4)). Even another worker also does the same for the same reason. On the other hand if two workers do that serially, only one worker sleeps and another worker doesn't because the total shared cost will be 75 when the later worker enters the condition. At first glance it looks like a concurrency problem but is that expected behaviour? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Nov 21, 2019 at 12:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > In v33-0001-Add-index-AM-field-and-callback-for-parallel-ind patch, I > am a bit doubtful about this kind of arrangement, where the code in > the "if" is always unreachable with the current AMs. I am not sure > what is the best way to handle this, should we just drop the > amestimateparallelvacuum altogether? Because currently, we are just > providing a size estimate function without a copy function, even if > the in future some Am give an estimate about the size of the stats, we > can not directly memcpy the stat from the local memory to the shared > memory, we might then need a copy function also from the AM so that it > can flatten the stats and store in proper format? I agree that it's a crock to add an AM method that is never used for anything. That's just asking for the design to prove buggy and inadequate. One way to avoid this would be to require that every AM that wants to support parallel vacuuming supply this method, and if it wants to just return sizeof(IndexBulkDeleteResult), then it can. But I also think someone should modify one of the AMs to use a differently-sized object, and then see whether they can really make parallel vacuum work with this patch. If, as you speculated here, it needs another API, then we should add both of them or neither. A half-baked solution is worse than nothing at all. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Dec 2, 2019 at 2:26 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > It's just an example, I'm not saying your idea is bad. ISTM the idea > is good on an assumption that all indexes take the same time or take a > long time so I'd also like to consider if this is true even in > production and which approaches is better if we don't have such > assumption. I think his idea is good. You're not wrong when you say that there are cases where it could work out badly, but I think on the whole it's a clear improvement. Generally, the indexes should be of relatively similar size because index size is driven by table size; it's true that different AMs could result in different-size indexes, but it seems like a stretch to suppose that the indexes that don't support parallelism are also going to be the little tiny ones that go fast anyway, unless we have some evidence that this is really true. I also wonder whether we really need the option to disable parallel vacuum in the first place. Maybe I'm looking in the right place, but I'm not finding anything in the way of comments or documentation explaining why some AMs don't support it. It's an important design point, and should be documented. I also think PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION seems like a waste of space. For parallel queries, there is a trade-off between having the leader do work (which can speed up the query) and having it remain idle so that it can immediately absorb tuples from workers and keep them from having their tuple queues fill up (which can speed up the query). But here, at least as I understand it, there's no such trade-off. Having the leader fail to participate is just a loser. Maybe it's useful to test while debugging the patch, but why should the committed code support it? To respond to another point from a different part of the email chain, the reason why LaunchParallelWorkers() does not take an argument for the number of workers is because I believed that the caller should always know how many workers they're going to want at the time they CreateParallelContext(). Increasing it later is not possible, because the DSM has already sized based on the count provided. I grant that it would be possible to allow the number to be reduced later, but why would you want to do that? Why not get the number right when first creating the DSM? Is there any legitimate use case for parallel vacuum in combination with vacuum cost delay? As I understand it, any serious vacuuming is going to be I/O-bound, so can you really need multiple workers at the same time that you are limiting the I/O rate? Perhaps it's possible if the I/O limit is so high that a single worker can't hit the limit by itself, but multiple workers can, but it seems like a bad idea to spawn more workers and then throttle them rather than just starting fewer workers. In any case, the algorithm suggested in vacuumlazy.c around the definition of VacuumSharedCostBalance seems almost the opposite of what you probably want. The idea there seems to be that you shouldn't make a worker sleep if it hasn't actually got to do anything. Apparently the idea is that if you have 3 workers and you only have enough I/O rate for 1 worker, you want all 3 workers to run at once, so that the I/O is random, rather than having them run 1 at a time, so that the I/O is sequential. That seems counterintuitive. It could be right if the indexes are in different tablespaces, but if they are in the same tablespace it's probably wrong. I guess it could still be right if there's just so much I/O that you aren't going to run out ever, and the more important consideration is that you don't know which index will take longer to vacuum and so want to start them all at the same time so that you don't accidentally start the slow one last, but that sounds like a stretch. I think this whole area needs more thought. I feel like we're trying to jam a go-slower feature and a go-faster feature into the same box. + * vacuum and for performing index cleanup. Note that all parallel workers + * live during either index vacuuming or index cleanup but the leader process + * neither exits from the parallel mode nor destroys the parallel context. + * For updating the index statistics, since any updates are not allowed during + * parallel mode we update the index statistics after exited from the parallel The first of these sentences ("Note that all...") is not very clear to me, and seems like it may amount to a statement that the leader doesn't try to destroy the parallel context too early, but since I don't understand it, maybe that's not what it is saying. The second sentence needs exited -> exiting, and maybe some more work on the grammar, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Dec 5, 2019 at 12:21 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 4 Dec 2019 at 04:57, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Dec 4, 2019 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Dec 4, 2019 at 1:58 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Tue, 3 Dec 2019 at 11:55, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > In your code, I think if two workers enter to compute_parallel_delay > > > > function at the same time, they add their local balance to > > > > VacuumSharedCostBalance and both workers sleep because both values > > > > reach the VacuumCostLimit. > > > > > > > > > > True, but isn't it more appropriate because the local cost of any > > > worker should be ideally added to shared cost as soon as it occurred? > > > I mean to say that we are not adding any cost in shared balance > > > without actually incurring it. Then we also consider the individual > > > worker's local balance as well and sleep according to local balance. > > > > Even I think it is better to add the balance to the shared balance at > > the earliest opportunity. Just consider the case that there are 5 > > workers and all have I/O balance of 20, and VacuumCostLimit is 50. So > > Actually, there combined balance is 100 (which is double of the > > VacuumCostLimit) but if we don't add immediately then none of the > > workers will sleep and it may go to the next cycle which is not very > > good. OTOH, if we add 20 immediately then check the shared balance > > then all the workers might go for sleep if their local balances have > > reached the limit but they will only sleep in proportion to their > > local balance. So IMHO, adding the current balance to shared balance > > early is more close to the model we are trying to implement i.e. > > shared cost accounting. > > I agree to add the balance as soon as it occurred. But the problem I'm > concerned is, let's suppose we have 4 workers, the cost limit is 100 > and the shared balance is now 95. Two workers, whom local > balance(VacuumCostBalanceLocal) are 40, consumed I/O, added 10 to > theirs local balance and entered compute_parallel_delay function at > the same time. One worker adds 10 to the shared > balance(VacuumSharedCostBalance) and another worker also adds 10 to > the shared balance. The one worker then subtracts the local balance > from the shared balance and sleeps because the shared cost is now 115 > (> the cost limit) and its local balance is 50 (> 0.5*(100/4)). Even > another worker also does the same for the same reason. On the other > hand if two workers do that serially, only one worker sleeps and > another worker doesn't because the total shared cost will be 75 when > the later worker enters the condition. At first glance it looks like a > concurrency problem but is that expected behaviour? If both workers sleep then the remaining shared balance will be 15 and their local balances will be 0. OTOH if one worker sleep then the remaining shared balance will be 75, so the second worker has missed this sleep cycle but on the next opportunity when the shared value again reaches 100 and if the second worker performs more I/O it will sleep for a longer duration. Even if we add it to the shared balance later (like you were doing earlier) then also we can reproduce the similar behavior, suppose shared balance is 85 and both workers have local balance 40 each. Now, each worker has done the I/O of 10. Now, suppose we don't add to shared balance then both workers will see the balance as 85+10= 95 so none of them will sleep. OTOH, if they do serially the first worker will add 10 and make it 95 and then the second worker will locally check 95+10 which is more than 100 and it will sleep. Right? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 5, 2019 at 1:41 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Dec 2, 2019 at 2:26 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > It's just an example, I'm not saying your idea is bad. ISTM the idea > > is good on an assumption that all indexes take the same time or take a > > long time so I'd also like to consider if this is true even in > > production and which approaches is better if we don't have such > > assumption. > > I think his idea is good. You're not wrong when you say that there are > cases where it could work out badly, but I think on the whole it's a > clear improvement. Generally, the indexes should be of relatively > similar size because index size is driven by table size; it's true > that different AMs could result in different-size indexes, but it > seems like a stretch to suppose that the indexes that don't support > parallelism are also going to be the little tiny ones that go fast > anyway, unless we have some evidence that this is really true. I also > wonder whether we really need the option to disable parallel vacuum in > the first place. > I think it could be required for the cases where the AM doesn't have a way (or it is difficult to come up with a way) to communicate the stats allocated by the first ambulkdelete call to the subsequent ones until amvacuumcleanup. Currently, we have such a case for the Gist index, see email thread [1]. Though we have come up with a way to avoid that for Gist indexes, I am not sure if we can assume that it is the case for any possible index AM especially when there is a provision that indexAM can have additional stats information. In the worst case, if we have to modify some existing index AM like we did for the Gist index, we need such a provision so that it is possible. In the ideal case, the index AM should provide a way to copy such stats, but we can't assume that, so we come up with this option. We have used this for dummy_index_am which also provides a way to test it. > Maybe I'm looking in the right place, but I'm not > finding anything in the way of comments or documentation explaining > why some AMs don't support it. It's an important design point, and > should be documented. > Agreed. We should do this. > I also think PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION seems like a > waste of space. For parallel queries, there is a trade-off between > having the leader do work (which can speed up the query) and having it > remain idle so that it can immediately absorb tuples from workers and > keep them from having their tuple queues fill up (which can speed up > the query). But here, at least as I understand it, there's no such > trade-off. Having the leader fail to participate is just a loser. > Maybe it's useful to test while debugging the patch, > Yeah, it is primarily a debugging/testing aid patch and it helped us in discovering some issues. During development, it is being used for tesing purpose as well. This is the reason the code is under #ifdef > but why should > the committed code support it? > I am also not sure whether we should commit this part of code and that is why I told in one of the above emails to keep it as a separate patch. We can later see whether to commit this code. Now, the point in its favor is that we already have a similar define (DISABLE_LEADER_PARTICIPATION) for parallel create index, so having it here is not a bad idea. I think it might help us in debugging some bugs where we want forcefully the index to be vacuumed by some worker. We might want to have something like force_parallel_mode for testing/debugging purpose, but not sure which is better. I think having something as a debugging aid for such features is good. > To respond to another point from a different part of the email chain, > the reason why LaunchParallelWorkers() does not take an argument for > the number of workers is because I believed that the caller should > always know how many workers they're going to want at the time they > CreateParallelContext(). Increasing it later is not possible, because > the DSM has already sized based on the count provided. I grant that it > would be possible to allow the number to be reduced later, but why > would you want to do that? Why not get the number right when first > creating the DSM? > Here, we have a need to reduce the number of workers. Index Vacuum has two different phases (index vacuum and index cleanup) which uses the same parallel-context/DSM but both could have different requirements for workers. The second phase (cleanup) would normally need fewer workers as if the work is done in the first phase, second wouldn't need it, but we have exceptions like gin indexes where we need it for the second phase as well because it takes the pass over-index again even if we have cleaned the index in the first phase. Now, consider the case where we have 3 btree indexes and 2 gin indexes, we would need 5 workers for index vacuum phase and 2 workers for index cleanup phase. There are other cases too. We also considered to have a separate DSM for each phase, but that appeared to have overhead without much benefit. > Is there any legitimate use case for parallel vacuum in combination > with vacuum cost delay? > Yeah, we also initially thought that it is not legitimate to use a parallel vacuum with a cost delay. But to get a wider view, we started a separate thread [2] and there we reach to the conclusion that we need a solution for throttling [3]. > > + * vacuum and for performing index cleanup. Note that all parallel workers > + * live during either index vacuuming or index cleanup but the leader process > + * neither exits from the parallel mode nor destroys the parallel context. > + * For updating the index statistics, since any updates are not allowed during > + * parallel mode we update the index statistics after exited from the parallel > > The first of these sentences ("Note that all...") is not very clear to > me, and seems like it may amount to a statement that the leader > doesn't try to destroy the parallel context too early, but since I > don't understand it, maybe that's not what it is saying. > Your understanding is correct. How about if we modify it to something like: "Note that parallel workers are alive only during index vacuum or index cleanup but the leader process neither exits from the parallel mode nor destroys the parallel context until the entire parallel operation is finished." OR something like "The leader backend holds the parallel context till the index vacuum and cleanup is finished. Both index vacuum and cleanup separately perform the work with parallel workers." -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 5, 2019 at 1:41 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Dec 2, 2019 at 2:26 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > It's just an example, I'm not saying your idea is bad. ISTM the idea > > is good on an assumption that all indexes take the same time or take a > > long time so I'd also like to consider if this is true even in > > production and which approaches is better if we don't have such > > assumption. > > I think his idea is good. You're not wrong when you say that there are > cases where it could work out badly, but I think on the whole it's a > clear improvement. Generally, the indexes should be of relatively > similar size because index size is driven by table size; it's true > that different AMs could result in different-size indexes, but it > seems like a stretch to suppose that the indexes that don't support > parallelism are also going to be the little tiny ones that go fast > anyway, unless we have some evidence that this is really true. I also > wonder whether we really need the option to disable parallel vacuum in > the first place. Maybe I'm looking in the right place, but I'm not > finding anything in the way of comments or documentation explaining > why some AMs don't support it. It's an important design point, and > should be documented. > > I also think PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION seems like a > waste of space. For parallel queries, there is a trade-off between > having the leader do work (which can speed up the query) and having it > remain idle so that it can immediately absorb tuples from workers and > keep them from having their tuple queues fill up (which can speed up > the query). But here, at least as I understand it, there's no such > trade-off. Having the leader fail to participate is just a loser. > Maybe it's useful to test while debugging the patch, but why should > the committed code support it? > > To respond to another point from a different part of the email chain, > the reason why LaunchParallelWorkers() does not take an argument for > the number of workers is because I believed that the caller should > always know how many workers they're going to want at the time they > CreateParallelContext(). Increasing it later is not possible, because > the DSM has already sized based on the count provided. I grant that it > would be possible to allow the number to be reduced later, but why > would you want to do that? Why not get the number right when first > creating the DSM? > > Is there any legitimate use case for parallel vacuum in combination > with vacuum cost delay? As I understand it, any serious vacuuming is > going to be I/O-bound, so can you really need multiple workers at the > same time that you are limiting the I/O rate? Perhaps it's possible if > the I/O limit is so high that a single worker can't hit the limit by > itself, but multiple workers can, but it seems like a bad idea to > spawn more workers and then throttle them rather than just starting > fewer workers. I agree that there is no point is first to spawn more workers to get the work done faster and later throttle them. Basically, that will lose the whole purpose of running it in parallel. OTOH, we should also consider the cases where there could be some vacuum that may not hit the I/O limit right? because it may find all the pages in the shared buffers and they might not need to dirty a lot of pages. So I think for such cases it is advantageous to run in parallel. The problem is that there is no way to know in advance whether the total I/O for the vacuum will hit the I/O limit or not so we can not decide in advance whether to run it in parallel or not. In any case, the algorithm suggested in vacuumlazy.c > around the definition of VacuumSharedCostBalance seems almost the > opposite of what you probably want. The idea there seems to be that > you shouldn't make a worker sleep if it hasn't actually got to do > anything. Apparently the idea is that if you have 3 workers and you > only have enough I/O rate for 1 worker, you want all 3 workers to run > at once, so that the I/O is random, rather than having them run 1 at a > time, so that the I/O is sequential. That seems counterintuitive. It > could be right if the indexes are in different tablespaces, but if > they are in the same tablespace it's probably wrong. I guess it could > still be right if there's just so much I/O that you aren't going to > run out ever, and the more important consideration is that you don't > know which index will take longer to vacuum and so want to start them > all at the same time so that you don't accidentally start the slow one > last, but that sounds like a stretch. I think this whole area needs > more thought. I feel like we're trying to jam a go-slower feature and > a go-faster feature into the same box. > > + * vacuum and for performing index cleanup. Note that all parallel workers > + * live during either index vacuuming or index cleanup but the leader process > + * neither exits from the parallel mode nor destroys the parallel context. > + * For updating the index statistics, since any updates are not allowed during > + * parallel mode we update the index statistics after exited from the parallel > > The first of these sentences ("Note that all...") is not very clear to > me, and seems like it may amount to a statement that the leader > doesn't try to destroy the parallel context too early, but since I > don't understand it, maybe that's not what it is saying. The second > sentence needs exited -> exiting, and maybe some more work on the > grammar, too. > -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 5, 2019 at 10:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Dec 5, 2019 at 1:41 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Mon, Dec 2, 2019 at 2:26 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > It's just an example, I'm not saying your idea is bad. ISTM the idea > > > is good on an assumption that all indexes take the same time or take a > > > long time so I'd also like to consider if this is true even in > > > production and which approaches is better if we don't have such > > > assumption. > > > > I think his idea is good. You're not wrong when you say that there are > > cases where it could work out badly, but I think on the whole it's a > > clear improvement. Generally, the indexes should be of relatively > > similar size because index size is driven by table size; it's true > > that different AMs could result in different-size indexes, but it > > seems like a stretch to suppose that the indexes that don't support > > parallelism are also going to be the little tiny ones that go fast > > anyway, unless we have some evidence that this is really true. I also > > wonder whether we really need the option to disable parallel vacuum in > > the first place. > > > > I think it could be required for the cases where the AM doesn't have a > way (or it is difficult to come up with a way) to communicate the > stats allocated by the first ambulkdelete call to the subsequent ones > until amvacuumcleanup. Currently, we have such a case for the Gist > index, see email thread [1]. > oops, I had referred to a couple of other discussions in my reply but forgot to mention the links, doing it now. [1] - https://www.postgresql.org/message-id/CAA4eK1LGr%2BMN0xHZpJ2dfS8QNQ1a_aROKowZB%2BMPNep8FVtwAA%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0%2Bw%40mail.gmail.com [3] - https://www.postgresql.org/message-id/20191106022550.zq7nai2ct2ashegq%40alap3.anarazel.de -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 5, 2019 at 12:54 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Nov 21, 2019 at 12:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > In v33-0001-Add-index-AM-field-and-callback-for-parallel-ind patch, I > > am a bit doubtful about this kind of arrangement, where the code in > > the "if" is always unreachable with the current AMs. I am not sure > > what is the best way to handle this, should we just drop the > > amestimateparallelvacuum altogether? Because currently, we are just > > providing a size estimate function without a copy function, even if > > the in future some Am give an estimate about the size of the stats, we > > can not directly memcpy the stat from the local memory to the shared > > memory, we might then need a copy function also from the AM so that it > > can flatten the stats and store in proper format? > > I agree that it's a crock to add an AM method that is never used for > anything. That's just asking for the design to prove buggy and > inadequate. One way to avoid this would be to require that every AM > that wants to support parallel vacuuming supply this method, and if it > wants to just return sizeof(IndexBulkDeleteResult), then it can. But I > also think someone should modify one of the AMs to use a > differently-sized object, and then see whether they can really make > parallel vacuum work with this patch. If, as you speculated here, it > needs another API, then we should add both of them or neither. A > half-baked solution is worse than nothing at all. > It is possible that we need another API to make it work as is currently the case for Gist Index where we need to someway first serialize it (which as mentioned earlier that we have now a way to avoid serializing it). However, if it is for some simple case where there are some additional constants apart from IndexBulkDeleteResult, then we don't need it. I think here, we were cautious to not expose more API's unless there is a real need, but I guess it is better to completely avoid such cases and don't expose any API unless we have some examples. In any case, the user will have the facility to disable a parallel vacuum for such cases. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 5, 2019 at 12:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > I think it could be required for the cases where the AM doesn't have a > way (or it is difficult to come up with a way) to communicate the > stats allocated by the first ambulkdelete call to the subsequent ones > until amvacuumcleanup. Currently, we have such a case for the Gist > index, see email thread [1]. Though we have come up with a way to > avoid that for Gist indexes, I am not sure if we can assume that it is > the case for any possible index AM especially when there is a > provision that indexAM can have additional stats information. In the > worst case, if we have to modify some existing index AM like we did > for the Gist index, we need such a provision so that it is possible. > In the ideal case, the index AM should provide a way to copy such > stats, but we can't assume that, so we come up with this option. > > We have used this for dummy_index_am which also provides a way to test it. I think it might be a good idea to change what we expect index AMs to do rather than trying to make anything that they happen to be doing right now work, no matter how crazy. In particular, suppose we say that you CAN'T add data on to the end of IndexBulkDeleteResult any more, and that instead the extra data is passed through a separate parameter. And then you add an estimate method that gives the size of the space provided by that parameter (and if the estimate method isn't defined then the extra parameter is passed as NULL) and document that the data stored there might get flat-copied. Now, you've taken the onus off of parallel vacuum to cope with any crazy thing a hypothetical AM might be doing, and instead you've defined the behavior of that hypothetical AM as wrong. If somebody really needs that, it's now their job to modify the index AM machinery further instead of your job to somehow cope. > Here, we have a need to reduce the number of workers. Index Vacuum > has two different phases (index vacuum and index cleanup) which uses > the same parallel-context/DSM but both could have different > requirements for workers. The second phase (cleanup) would normally > need fewer workers as if the work is done in the first phase, second > wouldn't need it, but we have exceptions like gin indexes where we > need it for the second phase as well because it takes the pass > over-index again even if we have cleaned the index in the first phase. > Now, consider the case where we have 3 btree indexes and 2 gin > indexes, we would need 5 workers for index vacuum phase and 2 workers > for index cleanup phase. There are other cases too. > > We also considered to have a separate DSM for each phase, but that > appeared to have overhead without much benefit. How about adding an additional argument to ReinitializeParallelDSM() that allows the number of workers to be reduced? That seems like it would be less confusing than what you have now, and would involve modify code in a lot fewer places. > > Is there any legitimate use case for parallel vacuum in combination > > with vacuum cost delay? > > > > Yeah, we also initially thought that it is not legitimate to use a > parallel vacuum with a cost delay. But to get a wider view, we > started a separate thread [2] and there we reach to the conclusion > that we need a solution for throttling [3]. OK, thanks for the pointer. This doesn't address the other part of my complaint, though, which is that the whole discussion between you and Dilip and Sawada-san presumes that you want the delays ought to be scattered across the workers roughly in proportion to their share of the I/O, and it seems NOT AT ALL clear that this is actually a desirable property. You're all assuming that, but none of you has justified it, and I think the opposite might be true in some cases. You're adding extra complexity for something that isn't a clear improvement. > Your understanding is correct. How about if we modify it to something > like: "Note that parallel workers are alive only during index vacuum > or index cleanup but the leader process neither exits from the > parallel mode nor destroys the parallel context until the entire > parallel operation is finished." OR something like "The leader backend > holds the parallel context till the index vacuum and cleanup is > finished. Both index vacuum and cleanup separately perform the work > with parallel workers." How about if you just delete it? You don't need a comment explaining that this caller of CreateParallelContext() does something which *every* caller of CreateParallelContext() must do. If you didn't do that, you'd fail assertions and everything would break, so *of course* you are doing it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
[ Please trim excess quoted material from your replies. ] On Thu, Dec 5, 2019 at 12:27 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > I agree that there is no point is first to spawn more workers to get > the work done faster and later throttle them. Basically, that will > lose the whole purpose of running it in parallel. Right. I mean if you throttle something that would have otherwise kept 3 workers running full blast back to the point where it uses the equivalent of 2.5 workers, that might make sense. It's a little marginal, maybe, but sure. But once you throttle it back to <= 2 workers, you're just wasting resources. I think my concern here is ultimately more about usability than whether or not we allow throttling. I agree that there are some possible cases where throttling a parallel vacuum is useful, so I guess we should support it. But I also think there's a real risk of people not realizing that throttling is happening and then being sad because they used parallel VACUUM and it was still slow. I think we should document explicitly that parallel VACUUM is still potentially throttled and that you should consider setting the cost delay to a higher value or 0 before using it. We might even want to add a FAST option (or similar) to VACUUM that makes it behave as if vacuum_cost_delay = 0, and add something to the examples section for VACUUM that suggests e.g. VACUUM (PARALLEL 3, FAST) my_big_table Vacuum my_big_table with 3 workers and with resource throttling disabled for maximum performance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, 5 Dec 2019 at 19:54, Robert Haas <robertmhaas@gmail.com> wrote: > > [ Please trim excess quoted material from your replies. ] > > On Thu, Dec 5, 2019 at 12:27 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I agree that there is no point is first to spawn more workers to get > > the work done faster and later throttle them. Basically, that will > > lose the whole purpose of running it in parallel. > > Right. I mean if you throttle something that would have otherwise > kept 3 workers running full blast back to the point where it uses the > equivalent of 2.5 workers, that might make sense. It's a little > marginal, maybe, but sure. But once you throttle it back to <= 2 > workers, you're just wasting resources. > > I think my concern here is ultimately more about usability than > whether or not we allow throttling. I agree that there are some > possible cases where throttling a parallel vacuum is useful, so I > guess we should support it. But I also think there's a real risk of > people not realizing that throttling is happening and then being sad > because they used parallel VACUUM and it was still slow. I think we > should document explicitly that parallel VACUUM is still potentially > throttled and that you should consider setting the cost delay to a > higher value or 0 before using it. > > We might even want to add a FAST option (or similar) to VACUUM that > makes it behave as if vacuum_cost_delay = 0, and add something to the > examples section for VACUUM that suggests e.g. > > VACUUM (PARALLEL 3, FAST) my_big_table > Vacuum my_big_table with 3 workers and with resource throttling > disabled for maximum performance. > Please find some review comments for v35 patch set 1. + /* Return immediately when parallelism disabled */ + if (max_parallel_maintenance_workers == 0) + return 0; + Here, we should add check of max_worker_processes because if max_worker_processes is set as 0, then we can't launch any worker so we should return from here. 2. + /* cap by max_parallel_maintenace_workers */ + parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers); + Here also, we should consider max_worker_processes to calculate parallel_workers. (by default, max_worker_processes = 8) Thanks and Regards Mahendra Thalor EnterpriseDB: http://www.enterprisedb.com
On Fri, Dec 6, 2019 at 12:55 AM Mahendra Singh <mahi6run@gmail.com> wrote: > > On Thu, 5 Dec 2019 at 19:54, Robert Haas <robertmhaas@gmail.com> wrote: > > > > [ Please trim excess quoted material from your replies. ] > > > > On Thu, Dec 5, 2019 at 12:27 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > I agree that there is no point is first to spawn more workers to get > > > the work done faster and later throttle them. Basically, that will > > > lose the whole purpose of running it in parallel. > > > > Right. I mean if you throttle something that would have otherwise > > kept 3 workers running full blast back to the point where it uses the > > equivalent of 2.5 workers, that might make sense. It's a little > > marginal, maybe, but sure. But once you throttle it back to <= 2 > > workers, you're just wasting resources. > > > > I think my concern here is ultimately more about usability than > > whether or not we allow throttling. I agree that there are some > > possible cases where throttling a parallel vacuum is useful, so I > > guess we should support it. But I also think there's a real risk of > > people not realizing that throttling is happening and then being sad > > because they used parallel VACUUM and it was still slow. I think we > > should document explicitly that parallel VACUUM is still potentially > > throttled and that you should consider setting the cost delay to a > > higher value or 0 before using it. > > > > We might even want to add a FAST option (or similar) to VACUUM that > > makes it behave as if vacuum_cost_delay = 0, and add something to the > > examples section for VACUUM that suggests e.g. > > > > VACUUM (PARALLEL 3, FAST) my_big_table > > Vacuum my_big_table with 3 workers and with resource throttling > > disabled for maximum performance. > > > > Please find some review comments for v35 patch set > > 1. > + /* Return immediately when parallelism disabled */ > + if (max_parallel_maintenance_workers == 0) > + return 0; > + > Here, we should add check of max_worker_processes because if > max_worker_processes is set as 0, then we can't launch any worker so > we should return from here. > > 2. > + /* cap by max_parallel_maintenace_workers */ > + parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers); > + > Here also, we should consider max_worker_processes to calculate > parallel_workers. (by default, max_worker_processes = 8) IMHO, it's enough to cap with max_parallel_maintenace_workers. So I think it's the user's responsibility to keep max_parallel_maintenace_workers under parallel_workers limit. And, if the user fails to set max_parallel_maintenace_workers under the parallel_workers or enough workers are not available then LaunchParallel worker will take care. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 5, 2019 at 7:44 PM Robert Haas <robertmhaas@gmail.com> wrote: > > I think it might be a good idea to change what we expect index AMs to > do rather than trying to make anything that they happen to be doing > right now work, no matter how crazy. In particular, suppose we say > that you CAN'T add data on to the end of IndexBulkDeleteResult any > more, and that instead the extra data is passed through a separate > parameter. And then you add an estimate method that gives the size of > the space provided by that parameter (and if the estimate method isn't > defined then the extra parameter is passed as NULL) and document that > the data stored there might get flat-copied. > I think this is a good idea and serves the purpose we are trying to achieve currently. However, if there are any IndexAM that is using the current way to pass stats with additional information, they would need to change even if they don't want to use parallel vacuum functionality (say because their indexes are too small or whatever other reasons). I think this is a reasonable trade-off and the changes on their end won't be that big. So, we should do this. > Now, you've taken the > onus off of parallel vacuum to cope with any crazy thing a > hypothetical AM might be doing, and instead you've defined the > behavior of that hypothetical AM as wrong. If somebody really needs > that, it's now their job to modify the index AM machinery further > instead of your job to somehow cope. > makes sense. > > Here, we have a need to reduce the number of workers. Index Vacuum > > has two different phases (index vacuum and index cleanup) which uses > > the same parallel-context/DSM but both could have different > > requirements for workers. The second phase (cleanup) would normally > > need fewer workers as if the work is done in the first phase, second > > wouldn't need it, but we have exceptions like gin indexes where we > > need it for the second phase as well because it takes the pass > > over-index again even if we have cleaned the index in the first phase. > > Now, consider the case where we have 3 btree indexes and 2 gin > > indexes, we would need 5 workers for index vacuum phase and 2 workers > > for index cleanup phase. There are other cases too. > > > > We also considered to have a separate DSM for each phase, but that > > appeared to have overhead without much benefit. > > How about adding an additional argument to ReinitializeParallelDSM() > that allows the number of workers to be reduced? That seems like it > would be less confusing than what you have now, and would involve > modify code in a lot fewer places. > Yeah, we can do that. We can maintain some information in LVParallelState which indicates whether we need to reinitialize the DSM before launching workers. Sawada-San, do you see any problem with this idea? > > > Is there any legitimate use case for parallel vacuum in combination > > > with vacuum cost delay? > > > > > > > Yeah, we also initially thought that it is not legitimate to use a > > parallel vacuum with a cost delay. But to get a wider view, we > > started a separate thread [2] and there we reach to the conclusion > > that we need a solution for throttling [3]. > > OK, thanks for the pointer. This doesn't address the other part of my > complaint, though, which is that the whole discussion between you and > Dilip and Sawada-san presumes that you want the delays ought to be > scattered across the workers roughly in proportion to their share of > the I/O, and it seems NOT AT ALL clear that this is actually a > desirable property. You're all assuming that, but none of you has > justified it, and I think the opposite might be true in some cases. > IIUC, your complaint is that in some cases, even if the I/O rate is enough for one worker, we will still launch more workers and throttle them. The point is we can't know in advance how much I/O is required for each index. We can try to do that based on index size, but I don't think that will be right because it is possible that for the bigger index, we don't need to dirty the pages and most of the pages are in shared buffers, etc. The current algorithm won't use more I/O than required and it will be good for cases where one or some of the indexes are doing more I/O as compared to others and it will also work equally good even when the indexes have a similar amount of work. I think we could do better if we can predict how much I/O each index requires before actually scanning the index. I agree with the other points (add a FAST option for parallel vacuum and document that parallel vacuum is still potentially throttled ...) you made in a separate email. > You're adding extra complexity for something that isn't a clear > improvement. > > > Your understanding is correct. How about if we modify it to something > > like: "Note that parallel workers are alive only during index vacuum > > or index cleanup but the leader process neither exits from the > > parallel mode nor destroys the parallel context until the entire > > parallel operation is finished." OR something like "The leader backend > > holds the parallel context till the index vacuum and cleanup is > > finished. Both index vacuum and cleanup separately perform the work > > with parallel workers." > > How about if you just delete it? You don't need a comment explaining > that this caller of CreateParallelContext() does something which > *every* caller of CreateParallelContext() must do. If you didn't do > that, you'd fail assertions and everything would break, so *of course* > you are doing it. > Fair enough, we can just remove this part of the comment. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, 6 Dec 2019 at 10:50, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Dec 5, 2019 at 7:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > I think it might be a good idea to change what we expect index AMs to
> > do rather than trying to make anything that they happen to be doing
> > right now work, no matter how crazy. In particular, suppose we say
> > that you CAN'T add data on to the end of IndexBulkDeleteResult any
> > more, and that instead the extra data is passed through a separate
> > parameter. And then you add an estimate method that gives the size of
> > the space provided by that parameter (and if the estimate method isn't
> > defined then the extra parameter is passed as NULL) and document that
> > the data stored there might get flat-copied.
> >
>
> I think this is a good idea and serves the purpose we are trying to
> achieve currently. However, if there are any IndexAM that is using
> the current way to pass stats with additional information, they would
> need to change even if they don't want to use parallel vacuum
> functionality (say because their indexes are too small or whatever
> other reasons). I think this is a reasonable trade-off and the
> changes on their end won't be that big. So, we should do this.
>
> > Now, you've taken the
> > onus off of parallel vacuum to cope with any crazy thing a
> > hypothetical AM might be doing, and instead you've defined the
> > behavior of that hypothetical AM as wrong. If somebody really needs
> > that, it's now their job to modify the index AM machinery further
> > instead of your job to somehow cope.
> >
>
> makes sense.
>
> > > Here, we have a need to reduce the number of workers. Index Vacuum
> > > has two different phases (index vacuum and index cleanup) which uses
> > > the same parallel-context/DSM but both could have different
> > > requirements for workers. The second phase (cleanup) would normally
> > > need fewer workers as if the work is done in the first phase, second
> > > wouldn't need it, but we have exceptions like gin indexes where we
> > > need it for the second phase as well because it takes the pass
> > > over-index again even if we have cleaned the index in the first phase.
> > > Now, consider the case where we have 3 btree indexes and 2 gin
> > > indexes, we would need 5 workers for index vacuum phase and 2 workers
> > > for index cleanup phase. There are other cases too.
> > >
> > > We also considered to have a separate DSM for each phase, but that
> > > appeared to have overhead without much benefit.
> >
> > How about adding an additional argument to ReinitializeParallelDSM()
> > that allows the number of workers to be reduced? That seems like it
> > would be less confusing than what you have now, and would involve
> > modify code in a lot fewer places.
> >
>
> Yeah, we can do that. We can maintain some information in
> LVParallelState which indicates whether we need to reinitialize the
> DSM before launching workers. Sawada-San, do you see any problem with
> this idea?
>
>
> > > > Is there any legitimate use case for parallel vacuum in combination
> > > > with vacuum cost delay?
> > > >
> > >
> > > Yeah, we also initially thought that it is not legitimate to use a
> > > parallel vacuum with a cost delay. But to get a wider view, we
> > > started a separate thread [2] and there we reach to the conclusion
> > > that we need a solution for throttling [3].
> >
> > OK, thanks for the pointer. This doesn't address the other part of my
> > complaint, though, which is that the whole discussion between you and
> > Dilip and Sawada-san presumes that you want the delays ought to be
> > scattered across the workers roughly in proportion to their share of
> > the I/O, and it seems NOT AT ALL clear that this is actually a
> > desirable property. You're all assuming that, but none of you has
> > justified it, and I think the opposite might be true in some cases.
> >
>
> IIUC, your complaint is that in some cases, even if the I/O rate is
> enough for one worker, we will still launch more workers and throttle
> them. The point is we can't know in advance how much I/O is required
> for each index. We can try to do that based on index size, but I
> don't think that will be right because it is possible that for the
> bigger index, we don't need to dirty the pages and most of the pages
> are in shared buffers, etc. The current algorithm won't use more I/O
> than required and it will be good for cases where one or some of the
> indexes are doing more I/O as compared to others and it will also work
> equally good even when the indexes have a similar amount of work. I
> think we could do better if we can predict how much I/O each index
> requires before actually scanning the index.
>
> I agree with the other points (add a FAST option for parallel vacuum
> and document that parallel vacuum is still potentially throttled ...)
> you made in a separate email.
>
>
> > You're adding extra complexity for something that isn't a clear
> > improvement.
> >
> > > Your understanding is correct. How about if we modify it to something
> > > like: "Note that parallel workers are alive only during index vacuum
> > > or index cleanup but the leader process neither exits from the
> > > parallel mode nor destroys the parallel context until the entire
> > > parallel operation is finished." OR something like "The leader backend
> > > holds the parallel context till the index vacuum and cleanup is
> > > finished. Both index vacuum and cleanup separately perform the work
> > > with parallel workers."
> >
> > How about if you just delete it? You don't need a comment explaining
> > that this caller of CreateParallelContext() does something which
> > *every* caller of CreateParallelContext() must do. If you didn't do
> > that, you'd fail assertions and everything would break, so *of course*
> > you are doing it.
> >
>
> Fair enough, we can just remove this part of the comment.
>
Hi All,
Below is the brief about testing of v35 patch set.
1.
All the test cases are passing on the top of v35 patch set (make check world and all contrib test cases)
2.
By enabling PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION, "make check world" is passing.
3.
After v35 patch, vacuum.sql regression test is taking too much time due to large number of inserts so by reducing number of tuples, we can reduce that time.
+INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,100000) i;
here, instead of 100000, we can make 1000 to reduce time of this test case because we only want to test code and functionality.
4.
I tested functionality of parallel vacuum with different server configuration setting and behavior is as per expected.
>
> On Thu, Dec 5, 2019 at 7:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > I think it might be a good idea to change what we expect index AMs to
> > do rather than trying to make anything that they happen to be doing
> > right now work, no matter how crazy. In particular, suppose we say
> > that you CAN'T add data on to the end of IndexBulkDeleteResult any
> > more, and that instead the extra data is passed through a separate
> > parameter. And then you add an estimate method that gives the size of
> > the space provided by that parameter (and if the estimate method isn't
> > defined then the extra parameter is passed as NULL) and document that
> > the data stored there might get flat-copied.
> >
>
> I think this is a good idea and serves the purpose we are trying to
> achieve currently. However, if there are any IndexAM that is using
> the current way to pass stats with additional information, they would
> need to change even if they don't want to use parallel vacuum
> functionality (say because their indexes are too small or whatever
> other reasons). I think this is a reasonable trade-off and the
> changes on their end won't be that big. So, we should do this.
>
> > Now, you've taken the
> > onus off of parallel vacuum to cope with any crazy thing a
> > hypothetical AM might be doing, and instead you've defined the
> > behavior of that hypothetical AM as wrong. If somebody really needs
> > that, it's now their job to modify the index AM machinery further
> > instead of your job to somehow cope.
> >
>
> makes sense.
>
> > > Here, we have a need to reduce the number of workers. Index Vacuum
> > > has two different phases (index vacuum and index cleanup) which uses
> > > the same parallel-context/DSM but both could have different
> > > requirements for workers. The second phase (cleanup) would normally
> > > need fewer workers as if the work is done in the first phase, second
> > > wouldn't need it, but we have exceptions like gin indexes where we
> > > need it for the second phase as well because it takes the pass
> > > over-index again even if we have cleaned the index in the first phase.
> > > Now, consider the case where we have 3 btree indexes and 2 gin
> > > indexes, we would need 5 workers for index vacuum phase and 2 workers
> > > for index cleanup phase. There are other cases too.
> > >
> > > We also considered to have a separate DSM for each phase, but that
> > > appeared to have overhead without much benefit.
> >
> > How about adding an additional argument to ReinitializeParallelDSM()
> > that allows the number of workers to be reduced? That seems like it
> > would be less confusing than what you have now, and would involve
> > modify code in a lot fewer places.
> >
>
> Yeah, we can do that. We can maintain some information in
> LVParallelState which indicates whether we need to reinitialize the
> DSM before launching workers. Sawada-San, do you see any problem with
> this idea?
>
>
> > > > Is there any legitimate use case for parallel vacuum in combination
> > > > with vacuum cost delay?
> > > >
> > >
> > > Yeah, we also initially thought that it is not legitimate to use a
> > > parallel vacuum with a cost delay. But to get a wider view, we
> > > started a separate thread [2] and there we reach to the conclusion
> > > that we need a solution for throttling [3].
> >
> > OK, thanks for the pointer. This doesn't address the other part of my
> > complaint, though, which is that the whole discussion between you and
> > Dilip and Sawada-san presumes that you want the delays ought to be
> > scattered across the workers roughly in proportion to their share of
> > the I/O, and it seems NOT AT ALL clear that this is actually a
> > desirable property. You're all assuming that, but none of you has
> > justified it, and I think the opposite might be true in some cases.
> >
>
> IIUC, your complaint is that in some cases, even if the I/O rate is
> enough for one worker, we will still launch more workers and throttle
> them. The point is we can't know in advance how much I/O is required
> for each index. We can try to do that based on index size, but I
> don't think that will be right because it is possible that for the
> bigger index, we don't need to dirty the pages and most of the pages
> are in shared buffers, etc. The current algorithm won't use more I/O
> than required and it will be good for cases where one or some of the
> indexes are doing more I/O as compared to others and it will also work
> equally good even when the indexes have a similar amount of work. I
> think we could do better if we can predict how much I/O each index
> requires before actually scanning the index.
>
> I agree with the other points (add a FAST option for parallel vacuum
> and document that parallel vacuum is still potentially throttled ...)
> you made in a separate email.
>
>
> > You're adding extra complexity for something that isn't a clear
> > improvement.
> >
> > > Your understanding is correct. How about if we modify it to something
> > > like: "Note that parallel workers are alive only during index vacuum
> > > or index cleanup but the leader process neither exits from the
> > > parallel mode nor destroys the parallel context until the entire
> > > parallel operation is finished." OR something like "The leader backend
> > > holds the parallel context till the index vacuum and cleanup is
> > > finished. Both index vacuum and cleanup separately perform the work
> > > with parallel workers."
> >
> > How about if you just delete it? You don't need a comment explaining
> > that this caller of CreateParallelContext() does something which
> > *every* caller of CreateParallelContext() must do. If you didn't do
> > that, you'd fail assertions and everything would break, so *of course*
> > you are doing it.
> >
>
> Fair enough, we can just remove this part of the comment.
>
Hi All,
Below is the brief about testing of v35 patch set.
1.
All the test cases are passing on the top of v35 patch set (make check world and all contrib test cases)
2.
By enabling PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION, "make check world" is passing.
3.
After v35 patch, vacuum.sql regression test is taking too much time due to large number of inserts so by reducing number of tuples, we can reduce that time.
+INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,100000) i;
here, instead of 100000, we can make 1000 to reduce time of this test case because we only want to test code and functionality.
4.
I tested functionality of parallel vacuum with different server configuration setting and behavior is as per expected.
shared_buffers, max_parallel_workers, max_parallel_maintenance_workers, vacuum_cost_limit, vacuum_cost_delay, maintenance_work_mem, max_worker_processes
5.
index and table stats of parallel vacuum are matching with normal vacuum.
postgres=# select * from pg_statio_all_tables where relname = 'test';
relid | schemaname | relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | tidx_blks_read | tidx_blks_hit
-------+------------+---------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
16384 | public | test | 399 | 5000 | 3 | 0 | 0 | 0 | 0 | 0
(1 row)
relid | schemaname | relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | tidx_blks_read | tidx_blks_hit
-------+------------+---------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+---------------
16384 | public | test | 399 | 5000 | 3 | 0 | 0 | 0 | 0 | 0
(1 row)
6.
vacuum Progress Reporting is as per expectation.
postgres=# select * from pg_stat_progress_vacuum;
pid | datid | datname | relid | phase | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples
-------+-------+----------+-------+---------------------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
44161 | 13577 | postgres | 16384 | cleaning up indexes | 41650 | 41650 | 41650 | 1 | 11184810 | 1000000
(1 row)
pid | datid | datname | relid | phase | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples
-------+-------+----------+-------+---------------------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
44161 | 13577 | postgres | 16384 | cleaning up indexes | 41650 | 41650 | 41650 | 1 | 11184810 | 1000000
(1 row)
7.
If any worker(or main worker) got error, then immediately all the workers are exiting and action is marked as abort.
8.
I tested parallel vacuum for all the types of indexes and by varying size of indexes, all are working and didn't got any unexpected behavior.
9.
While doing testing, I found that if we delete all the tuples from table, then also size of btree indexes was not reducing.
delete all tuples from table.
before vacuum, total pages in btree index: 8000
after vacuum(normal/parallel), total pages in btree index: 8000
but size of table is reducing after deleting all the tuples.
Can we add a check in vacuum to truncate all the pages of btree indexes if there is no tuple in table.
Please let me know if you have any inputs for more testing.
Sorry for the late reply. On Fri, 6 Dec 2019 at 14:20, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Dec 5, 2019 at 7:44 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > I think it might be a good idea to change what we expect index AMs to > > do rather than trying to make anything that they happen to be doing > > right now work, no matter how crazy. In particular, suppose we say > > that you CAN'T add data on to the end of IndexBulkDeleteResult any > > more, and that instead the extra data is passed through a separate > > parameter. And then you add an estimate method that gives the size of > > the space provided by that parameter (and if the estimate method isn't > > defined then the extra parameter is passed as NULL) and document that > > the data stored there might get flat-copied. > > > > I think this is a good idea and serves the purpose we are trying to > achieve currently. However, if there are any IndexAM that is using > the current way to pass stats with additional information, they would > need to change even if they don't want to use parallel vacuum > functionality (say because their indexes are too small or whatever > other reasons). I think this is a reasonable trade-off and the > changes on their end won't be that big. So, we should do this. > > > Now, you've taken the > > onus off of parallel vacuum to cope with any crazy thing a > > hypothetical AM might be doing, and instead you've defined the > > behavior of that hypothetical AM as wrong. If somebody really needs > > that, it's now their job to modify the index AM machinery further > > instead of your job to somehow cope. > > > > makes sense. > > > > Here, we have a need to reduce the number of workers. Index Vacuum > > > has two different phases (index vacuum and index cleanup) which uses > > > the same parallel-context/DSM but both could have different > > > requirements for workers. The second phase (cleanup) would normally > > > need fewer workers as if the work is done in the first phase, second > > > wouldn't need it, but we have exceptions like gin indexes where we > > > need it for the second phase as well because it takes the pass > > > over-index again even if we have cleaned the index in the first phase. > > > Now, consider the case where we have 3 btree indexes and 2 gin > > > indexes, we would need 5 workers for index vacuum phase and 2 workers > > > for index cleanup phase. There are other cases too. > > > > > > We also considered to have a separate DSM for each phase, but that > > > appeared to have overhead without much benefit. > > > > How about adding an additional argument to ReinitializeParallelDSM() > > that allows the number of workers to be reduced? That seems like it > > would be less confusing than what you have now, and would involve > > modify code in a lot fewer places. > > > > Yeah, we can do that. We can maintain some information in > LVParallelState which indicates whether we need to reinitialize the > DSM before launching workers. Sawada-San, do you see any problem with > this idea? I think the number of workers could be increased in cleanup phase. For example, if we have 1 brin index and 2 gin indexes then in bulkdelete phase we need only 1 worker but in cleanup we need 2 workers. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Dec 13, 2019 at 10:03 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > Sorry for the late reply. > > On Fri, 6 Dec 2019 at 14:20, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > Here, we have a need to reduce the number of workers. Index Vacuum > > > > has two different phases (index vacuum and index cleanup) which uses > > > > the same parallel-context/DSM but both could have different > > > > requirements for workers. The second phase (cleanup) would normally > > > > need fewer workers as if the work is done in the first phase, second > > > > wouldn't need it, but we have exceptions like gin indexes where we > > > > need it for the second phase as well because it takes the pass > > > > over-index again even if we have cleaned the index in the first phase. > > > > Now, consider the case where we have 3 btree indexes and 2 gin > > > > indexes, we would need 5 workers for index vacuum phase and 2 workers > > > > for index cleanup phase. There are other cases too. > > > > > > > > We also considered to have a separate DSM for each phase, but that > > > > appeared to have overhead without much benefit. > > > > > > How about adding an additional argument to ReinitializeParallelDSM() > > > that allows the number of workers to be reduced? That seems like it > > > would be less confusing than what you have now, and would involve > > > modify code in a lot fewer places. > > > > > > > Yeah, we can do that. We can maintain some information in > > LVParallelState which indicates whether we need to reinitialize the > > DSM before launching workers. Sawada-San, do you see any problem with > > this idea? > > I think the number of workers could be increased in cleanup phase. For > example, if we have 1 brin index and 2 gin indexes then in bulkdelete > phase we need only 1 worker but in cleanup we need 2 workers. > I think it shouldn't be more than the number with which we have created a parallel context, no? If that is the case, then I think it should be fine. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, 13 Dec 2019 at 14:19, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Dec 13, 2019 at 10:03 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > Sorry for the late reply. > > > > On Fri, 6 Dec 2019 at 14:20, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > Here, we have a need to reduce the number of workers. Index Vacuum > > > > > has two different phases (index vacuum and index cleanup) which uses > > > > > the same parallel-context/DSM but both could have different > > > > > requirements for workers. The second phase (cleanup) would normally > > > > > need fewer workers as if the work is done in the first phase, second > > > > > wouldn't need it, but we have exceptions like gin indexes where we > > > > > need it for the second phase as well because it takes the pass > > > > > over-index again even if we have cleaned the index in the first phase. > > > > > Now, consider the case where we have 3 btree indexes and 2 gin > > > > > indexes, we would need 5 workers for index vacuum phase and 2 workers > > > > > for index cleanup phase. There are other cases too. > > > > > > > > > > We also considered to have a separate DSM for each phase, but that > > > > > appeared to have overhead without much benefit. > > > > > > > > How about adding an additional argument to ReinitializeParallelDSM() > > > > that allows the number of workers to be reduced? That seems like it > > > > would be less confusing than what you have now, and would involve > > > > modify code in a lot fewer places. > > > > > > > > > > Yeah, we can do that. We can maintain some information in > > > LVParallelState which indicates whether we need to reinitialize the > > > DSM before launching workers. Sawada-San, do you see any problem with > > > this idea? > > > > I think the number of workers could be increased in cleanup phase. For > > example, if we have 1 brin index and 2 gin indexes then in bulkdelete > > phase we need only 1 worker but in cleanup we need 2 workers. > > > > I think it shouldn't be more than the number with which we have > created a parallel context, no? If that is the case, then I think it > should be fine. Right. I thought that ReinitializeParallelDSM() with an additional argument would reduce DSM but I understand that it doesn't actually reduce DSM but just have a variable for the number of workers to launch, is that right? And we also would need to call ReinitializeParallelDSM() at the beginning of vacuum index or vacuum cleanup since we don't know that we will do either index vacuum or index cleanup, at the end of index vacum. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Dec 13, 2019 at 11:08 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Fri, 13 Dec 2019 at 14:19, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > How about adding an additional argument to ReinitializeParallelDSM() > > > > > that allows the number of workers to be reduced? That seems like it > > > > > would be less confusing than what you have now, and would involve > > > > > modify code in a lot fewer places. > > > > > > > > > > > > > Yeah, we can do that. We can maintain some information in > > > > LVParallelState which indicates whether we need to reinitialize the > > > > DSM before launching workers. Sawada-San, do you see any problem with > > > > this idea? > > > > > > I think the number of workers could be increased in cleanup phase. For > > > example, if we have 1 brin index and 2 gin indexes then in bulkdelete > > > phase we need only 1 worker but in cleanup we need 2 workers. > > > > > > > I think it shouldn't be more than the number with which we have > > created a parallel context, no? If that is the case, then I think it > > should be fine. > > Right. I thought that ReinitializeParallelDSM() with an additional > argument would reduce DSM but I understand that it doesn't actually > reduce DSM but just have a variable for the number of workers to > launch, is that right? > Yeah, probably, we need to change the nworkers stored in the context and it should be lesser than the value already stored in that number. > And we also would need to call > ReinitializeParallelDSM() at the beginning of vacuum index or vacuum > cleanup since we don't know that we will do either index vacuum or > index cleanup, at the end of index vacum. > Right. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, 13 Dec 2019 at 15:50, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Dec 13, 2019 at 11:08 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Fri, 13 Dec 2019 at 14:19, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > How about adding an additional argument to ReinitializeParallelDSM() > > > > > > that allows the number of workers to be reduced? That seems like it > > > > > > would be less confusing than what you have now, and would involve > > > > > > modify code in a lot fewer places. > > > > > > > > > > > > > > > > Yeah, we can do that. We can maintain some information in > > > > > LVParallelState which indicates whether we need to reinitialize the > > > > > DSM before launching workers. Sawada-San, do you see any problem with > > > > > this idea? > > > > > > > > I think the number of workers could be increased in cleanup phase. For > > > > example, if we have 1 brin index and 2 gin indexes then in bulkdelete > > > > phase we need only 1 worker but in cleanup we need 2 workers. > > > > > > > > > > I think it shouldn't be more than the number with which we have > > > created a parallel context, no? If that is the case, then I think it > > > should be fine. > > > > Right. I thought that ReinitializeParallelDSM() with an additional > > argument would reduce DSM but I understand that it doesn't actually > > reduce DSM but just have a variable for the number of workers to > > launch, is that right? > > > > Yeah, probably, we need to change the nworkers stored in the context > and it should be lesser than the value already stored in that number. > > > And we also would need to call > > ReinitializeParallelDSM() at the beginning of vacuum index or vacuum > > cleanup since we don't know that we will do either index vacuum or > > index cleanup, at the end of index vacum. > > > > Right. I've attached the latest version patch set. These patches requires the gist vacuum patch[1]. The patch incorporated the review comments. In current version patch only indexes that support parallel vacuum and whose size is larger than min_parallel_index_scan_size can participate parallel vacuum. I'm still not unclear to me that using min_parallel_index_scan_size is the best approach but I agreed to set a lower bound of relation size. I separated the patch for PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION from the main patch and I'm working on that patch. Please review it. [1] https://www.postgresql.org/message-id/CAA4eK1J1RxmXFAHC34S4_BznT76cfbrvqORSk23iBgRAOj1azw%40mail.gmail.com -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, 17 Dec 2019 at 18:07, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
>
> On Fri, 13 Dec 2019 at 15:50, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Dec 13, 2019 at 11:08 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Fri, 13 Dec 2019 at 14:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > > > >
> > > > > > > How about adding an additional argument to ReinitializeParallelDSM()
> > > > > > > that allows the number of workers to be reduced? That seems like it
> > > > > > > would be less confusing than what you have now, and would involve
> > > > > > > modify code in a lot fewer places.
> > > > > > >
> > > > > >
> > > > > > Yeah, we can do that. We can maintain some information in
> > > > > > LVParallelState which indicates whether we need to reinitialize the
> > > > > > DSM before launching workers. Sawada-San, do you see any problem with
> > > > > > this idea?
> > > > >
> > > > > I think the number of workers could be increased in cleanup phase. For
> > > > > example, if we have 1 brin index and 2 gin indexes then in bulkdelete
> > > > > phase we need only 1 worker but in cleanup we need 2 workers.
> > > > >
> > > >
> > > > I think it shouldn't be more than the number with which we have
> > > > created a parallel context, no? If that is the case, then I think it
> > > > should be fine.
> > >
> > > Right. I thought that ReinitializeParallelDSM() with an additional
> > > argument would reduce DSM but I understand that it doesn't actually
> > > reduce DSM but just have a variable for the number of workers to
> > > launch, is that right?
> > >
> >
> > Yeah, probably, we need to change the nworkers stored in the context
> > and it should be lesser than the value already stored in that number.
> >
> > > And we also would need to call
> > > ReinitializeParallelDSM() at the beginning of vacuum index or vacuum
> > > cleanup since we don't know that we will do either index vacuum or
> > > index cleanup, at the end of index vacum.
> > >
> >
> > Right.
>
> I've attached the latest version patch set. These patches requires the
> gist vacuum patch[1]. The patch incorporated the review comments. In
> current version patch only indexes that support parallel vacuum and
> whose size is larger than min_parallel_index_scan_size can participate
> parallel vacuum. I'm still not unclear to me that using
> min_parallel_index_scan_size is the best approach but I agreed to set
> a lower bound of relation size. I separated the patch for
> PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION from the main patch and
> I'm working on that patch.
>
> Please review it.
>
> [1] https://www.postgresql.org/message-id/CAA4eK1J1RxmXFAHC34S4_BznT76cfbrvqORSk23iBgRAOj1azw%40mail.gmail.com
Thanks for updated patches. I verified my all reported issues and all are fixed in v36 patch set.
Below are some review comments:
1.
+ /* cap by max_parallel_maintenace_workers */
+ parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
>
> On Fri, 13 Dec 2019 at 15:50, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Dec 13, 2019 at 11:08 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Fri, 13 Dec 2019 at 14:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > > > >
> > > > > > > How about adding an additional argument to ReinitializeParallelDSM()
> > > > > > > that allows the number of workers to be reduced? That seems like it
> > > > > > > would be less confusing than what you have now, and would involve
> > > > > > > modify code in a lot fewer places.
> > > > > > >
> > > > > >
> > > > > > Yeah, we can do that. We can maintain some information in
> > > > > > LVParallelState which indicates whether we need to reinitialize the
> > > > > > DSM before launching workers. Sawada-San, do you see any problem with
> > > > > > this idea?
> > > > >
> > > > > I think the number of workers could be increased in cleanup phase. For
> > > > > example, if we have 1 brin index and 2 gin indexes then in bulkdelete
> > > > > phase we need only 1 worker but in cleanup we need 2 workers.
> > > > >
> > > >
> > > > I think it shouldn't be more than the number with which we have
> > > > created a parallel context, no? If that is the case, then I think it
> > > > should be fine.
> > >
> > > Right. I thought that ReinitializeParallelDSM() with an additional
> > > argument would reduce DSM but I understand that it doesn't actually
> > > reduce DSM but just have a variable for the number of workers to
> > > launch, is that right?
> > >
> >
> > Yeah, probably, we need to change the nworkers stored in the context
> > and it should be lesser than the value already stored in that number.
> >
> > > And we also would need to call
> > > ReinitializeParallelDSM() at the beginning of vacuum index or vacuum
> > > cleanup since we don't know that we will do either index vacuum or
> > > index cleanup, at the end of index vacum.
> > >
> >
> > Right.
>
> I've attached the latest version patch set. These patches requires the
> gist vacuum patch[1]. The patch incorporated the review comments. In
> current version patch only indexes that support parallel vacuum and
> whose size is larger than min_parallel_index_scan_size can participate
> parallel vacuum. I'm still not unclear to me that using
> min_parallel_index_scan_size is the best approach but I agreed to set
> a lower bound of relation size. I separated the patch for
> PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION from the main patch and
> I'm working on that patch.
>
> Please review it.
>
> [1] https://www.postgresql.org/message-id/CAA4eK1J1RxmXFAHC34S4_BznT76cfbrvqORSk23iBgRAOj1azw%40mail.gmail.com
Thanks for updated patches. I verified my all reported issues and all are fixed in v36 patch set.
Below are some review comments:
1.
+ /* cap by max_parallel_maintenace_workers */
+ parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
Here, spell of max_parallel_maintenace_workers is wrong. (correct: max_parallel_maintenance_workers)
2.
+ * size of stats for each index. Also, this function Since currently we don't support parallel vacuum
+ * for autovacuum we don't need to care about autovacuum_work_mem
+ * for autovacuum we don't need to care about autovacuum_work_mem
Here, I think, 1st line should be changed because it is not looking correct as grammatically.
On Tue, Dec 17, 2019 at 6:07 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Fri, 13 Dec 2019 at 15:50, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I think it shouldn't be more than the number with which we have > > > > created a parallel context, no? If that is the case, then I think it > > > > should be fine. > > > > > > Right. I thought that ReinitializeParallelDSM() with an additional > > > argument would reduce DSM but I understand that it doesn't actually > > > reduce DSM but just have a variable for the number of workers to > > > launch, is that right? > > > > > > > Yeah, probably, we need to change the nworkers stored in the context > > and it should be lesser than the value already stored in that number. > > > > > And we also would need to call > > > ReinitializeParallelDSM() at the beginning of vacuum index or vacuum > > > cleanup since we don't know that we will do either index vacuum or > > > index cleanup, at the end of index vacum. > > > > > > > Right. > > I've attached the latest version patch set. These patches requires the > gist vacuum patch[1]. The patch incorporated the review comments. > I was analyzing your changes related to ReinitializeParallelDSM() and it seems like we might launch more number of workers for the bulkdelete phase. While creating a parallel context, we used the maximum of "workers required for bulkdelete phase" and "workers required for cleanup", but now if the number of workers required in bulkdelete phase is lesser than a cleanup phase(as mentioned by you in one example), then we would launch more workers for bulkdelete phase. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, 18 Dec 2019 at 15:03, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Dec 17, 2019 at 6:07 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Fri, 13 Dec 2019 at 15:50, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > I think it shouldn't be more than the number with which we have > > > > > created a parallel context, no? If that is the case, then I think it > > > > > should be fine. > > > > > > > > Right. I thought that ReinitializeParallelDSM() with an additional > > > > argument would reduce DSM but I understand that it doesn't actually > > > > reduce DSM but just have a variable for the number of workers to > > > > launch, is that right? > > > > > > > > > > Yeah, probably, we need to change the nworkers stored in the context > > > and it should be lesser than the value already stored in that number. > > > > > > > And we also would need to call > > > > ReinitializeParallelDSM() at the beginning of vacuum index or vacuum > > > > cleanup since we don't know that we will do either index vacuum or > > > > index cleanup, at the end of index vacum. > > > > > > > > > > Right. > > > > I've attached the latest version patch set. These patches requires the > > gist vacuum patch[1]. The patch incorporated the review comments. > > > > I was analyzing your changes related to ReinitializeParallelDSM() and > it seems like we might launch more number of workers for the > bulkdelete phase. While creating a parallel context, we used the > maximum of "workers required for bulkdelete phase" and "workers > required for cleanup", but now if the number of workers required in > bulkdelete phase is lesser than a cleanup phase(as mentioned by you in > one example), then we would launch more workers for bulkdelete phase. Good catch. Currently when creating a parallel context the number of workers passed to CreateParallelContext() is set not only to pcxt->nworkers but also pcxt->nworkers_to_launch. We would need to specify the number of workers actually to launch after created the parallel context or when creating it. Or I think we call ReinitializeParallelDSM() even the first time running index vacuum. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, 18 Dec 2019 at 03:39, Mahendra Singh <mahi6run@gmail.com> wrote: > > > Thanks for updated patches. I verified my all reported issues and all are fixed in v36 patch set. > > Below are some review comments: > 1. > + /* cap by max_parallel_maintenace_workers */ > + parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers); > > Here, spell of max_parallel_maintenace_workers is wrong. (correct: max_parallel_maintenance_workers) > > 2. > + * size of stats for each index. Also, this function Since currently we don't support parallel vacuum > + * for autovacuum we don't need to care about autovacuum_work_mem > > Here, I think, 1st line should be changed because it is not looking correct as grammatically. Thank you for reviewing and testing this patch. I'll incorporate your comments in the next version patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 10 Dec 2019 at 00:30, Mahendra Singh <mahi6run@gmail.com> wrote: > > On Fri, 6 Dec 2019 at 10:50, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Dec 5, 2019 at 7:44 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > > > I think it might be a good idea to change what we expect index AMs to > > > do rather than trying to make anything that they happen to be doing > > > right now work, no matter how crazy. In particular, suppose we say > > > that you CAN'T add data on to the end of IndexBulkDeleteResult any > > > more, and that instead the extra data is passed through a separate > > > parameter. And then you add an estimate method that gives the size of > > > the space provided by that parameter (and if the estimate method isn't > > > defined then the extra parameter is passed as NULL) and document that > > > the data stored there might get flat-copied. > > > > > > > I think this is a good idea and serves the purpose we are trying to > > achieve currently. However, if there are any IndexAM that is using > > the current way to pass stats with additional information, they would > > need to change even if they don't want to use parallel vacuum > > functionality (say because their indexes are too small or whatever > > other reasons). I think this is a reasonable trade-off and the > > changes on their end won't be that big. So, we should do this. > > > > > Now, you've taken the > > > onus off of parallel vacuum to cope with any crazy thing a > > > hypothetical AM might be doing, and instead you've defined the > > > behavior of that hypothetical AM as wrong. If somebody really needs > > > that, it's now their job to modify the index AM machinery further > > > instead of your job to somehow cope. > > > > > > > makes sense. > > > > > > Here, we have a need to reduce the number of workers. Index Vacuum > > > > has two different phases (index vacuum and index cleanup) which uses > > > > the same parallel-context/DSM but both could have different > > > > requirements for workers. The second phase (cleanup) would normally > > > > need fewer workers as if the work is done in the first phase, second > > > > wouldn't need it, but we have exceptions like gin indexes where we > > > > need it for the second phase as well because it takes the pass > > > > over-index again even if we have cleaned the index in the first phase. > > > > Now, consider the case where we have 3 btree indexes and 2 gin > > > > indexes, we would need 5 workers for index vacuum phase and 2 workers > > > > for index cleanup phase. There are other cases too. > > > > > > > > We also considered to have a separate DSM for each phase, but that > > > > appeared to have overhead without much benefit. > > > > > > How about adding an additional argument to ReinitializeParallelDSM() > > > that allows the number of workers to be reduced? That seems like it > > > would be less confusing than what you have now, and would involve > > > modify code in a lot fewer places. > > > > > > > Yeah, we can do that. We can maintain some information in > > LVParallelState which indicates whether we need to reinitialize the > > DSM before launching workers. Sawada-San, do you see any problem with > > this idea? > > > > > > > > > Is there any legitimate use case for parallel vacuum in combination > > > > > with vacuum cost delay? > > > > > > > > > > > > > Yeah, we also initially thought that it is not legitimate to use a > > > > parallel vacuum with a cost delay. But to get a wider view, we > > > > started a separate thread [2] and there we reach to the conclusion > > > > that we need a solution for throttling [3]. > > > > > > OK, thanks for the pointer. This doesn't address the other part of my > > > complaint, though, which is that the whole discussion between you and > > > Dilip and Sawada-san presumes that you want the delays ought to be > > > scattered across the workers roughly in proportion to their share of > > > the I/O, and it seems NOT AT ALL clear that this is actually a > > > desirable property. You're all assuming that, but none of you has > > > justified it, and I think the opposite might be true in some cases. > > > > > > > IIUC, your complaint is that in some cases, even if the I/O rate is > > enough for one worker, we will still launch more workers and throttle > > them. The point is we can't know in advance how much I/O is required > > for each index. We can try to do that based on index size, but I > > don't think that will be right because it is possible that for the > > bigger index, we don't need to dirty the pages and most of the pages > > are in shared buffers, etc. The current algorithm won't use more I/O > > than required and it will be good for cases where one or some of the > > indexes are doing more I/O as compared to others and it will also work > > equally good even when the indexes have a similar amount of work. I > > think we could do better if we can predict how much I/O each index > > requires before actually scanning the index. > > > > I agree with the other points (add a FAST option for parallel vacuum > > and document that parallel vacuum is still potentially throttled ...) > > you made in a separate email. > > > > > > > You're adding extra complexity for something that isn't a clear > > > improvement. > > > > > > > Your understanding is correct. How about if we modify it to something > > > > like: "Note that parallel workers are alive only during index vacuum > > > > or index cleanup but the leader process neither exits from the > > > > parallel mode nor destroys the parallel context until the entire > > > > parallel operation is finished." OR something like "The leader backend > > > > holds the parallel context till the index vacuum and cleanup is > > > > finished. Both index vacuum and cleanup separately perform the work > > > > with parallel workers." > > > > > > How about if you just delete it? You don't need a comment explaining > > > that this caller of CreateParallelContext() does something which > > > *every* caller of CreateParallelContext() must do. If you didn't do > > > that, you'd fail assertions and everything would break, so *of course* > > > you are doing it. > > > > > > > Fair enough, we can just remove this part of the comment. > > > > Hi All, > Below is the brief about testing of v35 patch set. > > 1. > All the test cases are passing on the top of v35 patch set (make check world and all contrib test cases) > > 2. > By enabling PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION, "make check world" is passing. > > 3. > After v35 patch, vacuum.sql regression test is taking too much time due to large number of inserts so by reducing numberof tuples, we can reduce that time. > +INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,100000) i; > > here, instead of 100000, we can make 1000 to reduce time of this test case because we only want to test code and functionality. As we added check of min_parallel_index_scan_size in v36 patch set to decide parallel vacuum, 1000 tuples are not enough to do parallel vacuum. I can see that we are not launching any workers in vacuum.sql test case and hence, code coverage also decreased. I am not sure that how to fix this. Thanks and Regards Mahendra Thalor EnterpriseDB: http://www.enterprisedb.com > > 4. > I tested functionality of parallel vacuum with different server configuration setting and behavior is as per expected. > shared_buffers, max_parallel_workers, max_parallel_maintenance_workers, vacuum_cost_limit, vacuum_cost_delay, maintenance_work_mem,max_worker_processes > > 5. > index and table stats of parallel vacuum are matching with normal vacuum. > > postgres=# select * from pg_statio_all_tables where relname = 'test'; > relid | schemaname | relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit| tidx_blks_read | tidx_blks_hit > -------+------------+---------+----------------+---------------+---------------+--------------+-----------------+----------------+----------------+--------------- > 16384 | public | test | 399 | 5000 | 3 | 0 | 0 | 0 | 0 | 0 > (1 row) > > 6. > vacuum Progress Reporting is as per expectation. > postgres=# select * from pg_stat_progress_vacuum; > pid | datid | datname | relid | phase | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count| max_dead_tuples | num_dead_tuples > -------+-------+----------+-------+---------------------+-----------------+-------------------+--------------------+--------------------+-----------------+----------------- > 44161 | 13577 | postgres | 16384 | cleaning up indexes | 41650 | 41650 | 41650 | 1 | 11184810 | 1000000 > (1 row) > > 7. > If any worker(or main worker) got error, then immediately all the workers are exiting and action is marked as abort. > > 8. > I tested parallel vacuum for all the types of indexes and by varying size of indexes, all are working and didn't got anyunexpected behavior. > > 9. > While doing testing, I found that if we delete all the tuples from table, then also size of btree indexes was not reducing. > > delete all tuples from table. > before vacuum, total pages in btree index: 8000 > after vacuum(normal/parallel), total pages in btree index: 8000 > but size of table is reducing after deleting all the tuples. > Can we add a check in vacuum to truncate all the pages of btree indexes if there is no tuple in table. > > Please let me know if you have any inputs for more testing. > > Thanks and Regards > Mahendra Thalor > EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 18, 2019 at 11:46 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 18 Dec 2019 at 15:03, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > I was analyzing your changes related to ReinitializeParallelDSM() and > > it seems like we might launch more number of workers for the > > bulkdelete phase. While creating a parallel context, we used the > > maximum of "workers required for bulkdelete phase" and "workers > > required for cleanup", but now if the number of workers required in > > bulkdelete phase is lesser than a cleanup phase(as mentioned by you in > > one example), then we would launch more workers for bulkdelete phase. > > Good catch. Currently when creating a parallel context the number of > workers passed to CreateParallelContext() is set not only to > pcxt->nworkers but also pcxt->nworkers_to_launch. We would need to > specify the number of workers actually to launch after created the > parallel context or when creating it. Or I think we call > ReinitializeParallelDSM() even the first time running index vacuum. > How about just having ReinitializeParallelWorkers which can be called only via vacuum even for the first time before the launch of workers as of now? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
[please trim extra text before responding] On Wed, Dec 18, 2019 at 12:01 PM Mahendra Singh <mahi6run@gmail.com> wrote: > > On Tue, 10 Dec 2019 at 00:30, Mahendra Singh <mahi6run@gmail.com> wrote: > > > > > > 3. > > After v35 patch, vacuum.sql regression test is taking too much time due to large number of inserts so by reducing numberof tuples, we can reduce that time. > > +INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,100000) i; > > > > here, instead of 100000, we can make 1000 to reduce time of this test case because we only want to test code and functionality. > > As we added check of min_parallel_index_scan_size in v36 patch set to > decide parallel vacuum, 1000 tuples are not enough to do parallel > vacuum. I can see that we are not launching any workers in vacuum.sql > test case and hence, code coverage also decreased. I am not sure that > how to fix this. > Try by setting min_parallel_index_scan_size to 0 in test case. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 18, 2019 at 12:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Dec 18, 2019 at 11:46 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 18 Dec 2019 at 15:03, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I was analyzing your changes related to ReinitializeParallelDSM() and > > > it seems like we might launch more number of workers for the > > > bulkdelete phase. While creating a parallel context, we used the > > > maximum of "workers required for bulkdelete phase" and "workers > > > required for cleanup", but now if the number of workers required in > > > bulkdelete phase is lesser than a cleanup phase(as mentioned by you in > > > one example), then we would launch more workers for bulkdelete phase. > > > > Good catch. Currently when creating a parallel context the number of > > workers passed to CreateParallelContext() is set not only to > > pcxt->nworkers but also pcxt->nworkers_to_launch. We would need to > > specify the number of workers actually to launch after created the > > parallel context or when creating it. Or I think we call > > ReinitializeParallelDSM() even the first time running index vacuum. > > > > How about just having ReinitializeParallelWorkers which can be called > only via vacuum even for the first time before the launch of workers > as of now? > See in the attached what I have in mind. Few other comments: 1. + shared->disable_delay = (params->options & VACOPT_FAST); This should be part of the third patch. 2. +lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, + LVRelStats *vacrelstats, LVParallelState *lps, + int nindexes) { .. .. + /* Cap by the worker we computed at the beginning of parallel lazy vacuum */ + nworkers = Min(nworkers, lps->pcxt->nworkers); .. } This should be Assert. In no case, the computed workers can be more than what we have in context. 3. + if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) || + ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)) + nindexes_parallel_cleanup++; I think the second condition should be VACUUM_OPTION_PARALLEL_COND_CLEANUP. I have fixed the above comments and some given by me earlier [1] in the attached patch. The attached patch is a diff on top of v36-0002-Add-parallel-option-to-VACUUM-command. Few other comments which I have not fixed: 4. + if (Irel[i]->rd_indam->amusemaintenanceworkmem) + nindexes_mwm++; + + /* Skip indexes that don't participate parallel index vacuum */ + if (vacoptions == VACUUM_OPTION_NO_PARALLEL || + RelationGetNumberOfBlocks(Irel[i]) < min_parallel_index_scan_size) + continue; Won't we need to worry about the number of indexes that uses maintenance_work_mem only for indexes that can participate in a parallel vacuum? If so, the above checks need to be reversed. 5. /* + * Remember indexes that can participate parallel index vacuum and use + * it for index statistics initialization on DSM because the index + * size can get bigger during vacuum. + */ + can_parallel_vacuum[i] = true; I am not able to understand the second part of the comment ("because the index size can get bigger during vacuum."). What is its relevance? 6. +/* + * Vacuum or cleanup indexes that can be processed by only the leader process + * because these indexes don't support parallel operation at that phase. + * Therefore this function must be called by the leader process. + */ +static void +vacuum_indexes_leader(Relation *Irel, int nindexes, IndexBulkDeleteResult **stats, + LVRelStats *vacrelstats, LVParallelState *lps) { .. Why you have changed the order of nindexes parameter? I think in the previous patch, it was the last parameter and that seems to be better place for it. Also, I think after the latest modifications, you can remove the second sentence in the above comment ("Therefore this function must be called by the leader process.). 7. + for (i = 0; i < nindexes; i++) + { + bool leader_only = (get_indstats(lps->lvshared, i) == NULL || + skip_parallel_vacuum_index(Irel[i], lps->lvshared)); + + /* Skip the indexes that can be processed by parallel workers */ + if (!leader_only) + continue; It is better to name this parameter as skip_index or something like that. [1] - https://www.postgresql.org/message-id/CAA4eK1%2BKBAt1JS%2BasDd7K9C10OtBiyuUC75y8LR6QVnD2wrsMw%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
Hi all,
While testing on v36 patch with gist index, I came across below segmentation fault.
-- PG Head+ v36_patch
create table tab1(c1 int, c2 text PRIMARY KEY, c3 bool, c4 timestamp without time zone, c5 timestamp with time zone, p point);
create index gist_idx1 on tab1 using gist(p);
create index gist_idx2 on tab1 using gist(p);
create index gist_idx3 on tab1 using gist(p);
create index gist_idx4 on tab1 using gist(p);
create index gist_idx5 on tab1 using gist(p);
-- Cancel the insert statement in middle:
postgres=# insert into tab1 (select x, x||'_c2', 'T', current_date-x/100, current_date-x/100,point (x,x) from generate_series(1,1000000) x);
^CCancel request sent
ERROR: canceling statement due to user request
-- Segmentation fault during VACUUM(PARALLEL):
postgres=# vacuum(parallel 10) tab1 ;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
-- Below is the stack trace:
[centos@parallel-vacuum-testing bin]$ gdb -q -c data/core.14650 postgres
Reading symbols from /home/centos/BLP_Vacuum/postgresql/inst/bin/postgres...done.
[New LWP 14650]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: centos postgres [local] VACUUM '.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000075e713 in intset_num_entries (intset=0x1f62) at integerset.c:353
353 return intset->num_entries;
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 libcom_err-1.42.9-16.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x000000000075e713 in intset_num_entries (intset=0x1f62) at integerset.c:353
#1 0x00000000004cbe0f in gistvacuum_delete_empty_pages (info=0x7fff32f8eba0, stats=0x7f2923b3f4d8) at gistvacuum.c:478
#2 0x00000000004cb353 in gistvacuumcleanup (info=0x7fff32f8eba0, stats=0x7f2923b3f4d8) at gistvacuum.c:124
#3 0x000000000050dcca in index_vacuum_cleanup (info=0x7fff32f8eba0, stats=0x7f2923b3f4d8) at indexam.c:711
#4 0x00000000005079ba in lazy_cleanup_index (indrel=0x7f292e149560, stats=0x2db5e40, reltuples=0, estimated_count=false) at vacuumlazy.c:2380
#5 0x00000000005074f0 in vacuum_one_index (indrel=0x7f292e149560, stats=0x2db5e40, lvshared=0x7f2923b3f460, shared_indstats=0x7f2923b3f4d0,
dead_tuples=0x7f2922fbe2c0) at vacuumlazy.c:2196
#6 0x0000000000507428 in vacuum_indexes_leader (Irel=0x2db5de0, nindexes=6, stats=0x2db5e38, vacrelstats=0x2db5cb0, lps=0x2db5e90) at vacuumlazy.c:2155
#7 0x0000000000507126 in lazy_parallel_vacuum_indexes (Irel=0x2db5de0, stats=0x2db5e38, vacrelstats=0x2db5cb0, lps=0x2db5e90, nindexes=6)
at vacuumlazy.c:2045
#8 0x0000000000507770 in lazy_cleanup_indexes (Irel=0x2db5de0, stats=0x2db5e38, vacrelstats=0x2db5cb0, lps=0x2db5e90, nindexes=6) at vacuumlazy.c:2300
#9 0x0000000000506076 in lazy_scan_heap (onerel=0x7f292e1473b8, params=0x7fff32f8f3e0, vacrelstats=0x2db5cb0, Irel=0x2db5de0, nindexes=6, aggressive=false)
at vacuumlazy.c:1675
#10 0x0000000000504228 in heap_vacuum_rel (onerel=0x7f292e1473b8, params=0x7fff32f8f3e0, bstrategy=0x2deb3a0) at vacuumlazy.c:475
#11 0x00000000006ea059 in table_relation_vacuum (rel=0x7f292e1473b8, params=0x7fff32f8f3e0, bstrategy=0x2deb3a0)
at ../../../src/include/access/tableam.h:1432
#12 0x00000000006ecb74 in vacuum_rel (relid=16384, relation=0x2cf5cf8, params=0x7fff32f8f3e0) at vacuum.c:1885
#13 0x00000000006eac8d in vacuum (relations=0x2deb548, params=0x7fff32f8f3e0, bstrategy=0x2deb3a0, isTopLevel=true) at vacuum.c:440
#14 0x00000000006ea776 in ExecVacuum (pstate=0x2deaf90, vacstmt=0x2cf5de0, isTopLevel=true) at vacuum.c:241
#15 0x000000000091da3d in standard_ProcessUtility (pstmt=0x2cf5ea8, queryString=0x2cf51a0 "vacuum(parallel 10) tab1 ;", context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0, dest=0x2cf6188, completionTag=0x7fff32f8f840 "") at utility.c:665
#16 0x000000000091d270 in ProcessUtility (pstmt=0x2cf5ea8, queryString=0x2cf51a0 "vacuum(parallel 10) tab1 ;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
queryEnv=0x0, dest=0x2cf6188, completionTag=0x7fff32f8f840 "") at utility.c:359
#17 0x000000000091c187 in PortalRunUtility (portal=0x2d5c530, pstmt=0x2cf5ea8, isTopLevel=true, setHoldSnapshot=false, dest=0x2cf6188,
completionTag=0x7fff32f8f840 "") at pquery.c:1175
#18 0x000000000091c39e in PortalRunMulti (portal=0x2d5c530, isTopLevel=true, setHoldSnapshot=false, dest=0x2cf6188, altdest=0x2cf6188,
completionTag=0x7fff32f8f840 "") at pquery.c:1321
#19 0x000000000091b8c8 in PortalRun (portal=0x2d5c530, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2cf6188, altdest=0x2cf6188,
completionTag=0x7fff32f8f840 "") at pquery.c:796
#20 0x00000000009156d4 in exec_simple_query (query_string=0x2cf51a0 "vacuum(parallel 10) tab1 ;") at postgres.c:1227
#21 0x0000000000919a1c in PostgresMain (argc=1, argv=0x2d1f608, dbname=0x2d1f520 "postgres", username=0x2d1f500 "centos") at postgres.c:4288
#22 0x000000000086de39 in BackendRun (port=0x2d174e0) at postmaster.c:4498
#23 0x000000000086d617 in BackendStartup (port=0x2d174e0) at postmaster.c:4189
#24 0x0000000000869992 in ServerLoop () at postmaster.c:1727
#25 0x0000000000869248 in PostmasterMain (argc=3, argv=0x2cefd70) at postmaster.c:1400
#26 0x0000000000778593 in main (argc=3, argv=0x2cefd70) at main.c:210
While testing on v36 patch with gist index, I came across below segmentation fault.
-- PG Head+ v36_patch
create table tab1(c1 int, c2 text PRIMARY KEY, c3 bool, c4 timestamp without time zone, c5 timestamp with time zone, p point);
create index gist_idx1 on tab1 using gist(p);
create index gist_idx2 on tab1 using gist(p);
create index gist_idx3 on tab1 using gist(p);
create index gist_idx4 on tab1 using gist(p);
create index gist_idx5 on tab1 using gist(p);
-- Cancel the insert statement in middle:
postgres=# insert into tab1 (select x, x||'_c2', 'T', current_date-x/100, current_date-x/100,point (x,x) from generate_series(1,1000000) x);
^CCancel request sent
ERROR: canceling statement due to user request
-- Segmentation fault during VACUUM(PARALLEL):
postgres=# vacuum(parallel 10) tab1 ;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
-- Below is the stack trace:
[centos@parallel-vacuum-testing bin]$ gdb -q -c data/core.14650 postgres
Reading symbols from /home/centos/BLP_Vacuum/postgresql/inst/bin/postgres...done.
[New LWP 14650]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: centos postgres [local] VACUUM '.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000075e713 in intset_num_entries (intset=0x1f62) at integerset.c:353
353 return intset->num_entries;
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 libcom_err-1.42.9-16.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x000000000075e713 in intset_num_entries (intset=0x1f62) at integerset.c:353
#1 0x00000000004cbe0f in gistvacuum_delete_empty_pages (info=0x7fff32f8eba0, stats=0x7f2923b3f4d8) at gistvacuum.c:478
#2 0x00000000004cb353 in gistvacuumcleanup (info=0x7fff32f8eba0, stats=0x7f2923b3f4d8) at gistvacuum.c:124
#3 0x000000000050dcca in index_vacuum_cleanup (info=0x7fff32f8eba0, stats=0x7f2923b3f4d8) at indexam.c:711
#4 0x00000000005079ba in lazy_cleanup_index (indrel=0x7f292e149560, stats=0x2db5e40, reltuples=0, estimated_count=false) at vacuumlazy.c:2380
#5 0x00000000005074f0 in vacuum_one_index (indrel=0x7f292e149560, stats=0x2db5e40, lvshared=0x7f2923b3f460, shared_indstats=0x7f2923b3f4d0,
dead_tuples=0x7f2922fbe2c0) at vacuumlazy.c:2196
#6 0x0000000000507428 in vacuum_indexes_leader (Irel=0x2db5de0, nindexes=6, stats=0x2db5e38, vacrelstats=0x2db5cb0, lps=0x2db5e90) at vacuumlazy.c:2155
#7 0x0000000000507126 in lazy_parallel_vacuum_indexes (Irel=0x2db5de0, stats=0x2db5e38, vacrelstats=0x2db5cb0, lps=0x2db5e90, nindexes=6)
at vacuumlazy.c:2045
#8 0x0000000000507770 in lazy_cleanup_indexes (Irel=0x2db5de0, stats=0x2db5e38, vacrelstats=0x2db5cb0, lps=0x2db5e90, nindexes=6) at vacuumlazy.c:2300
#9 0x0000000000506076 in lazy_scan_heap (onerel=0x7f292e1473b8, params=0x7fff32f8f3e0, vacrelstats=0x2db5cb0, Irel=0x2db5de0, nindexes=6, aggressive=false)
at vacuumlazy.c:1675
#10 0x0000000000504228 in heap_vacuum_rel (onerel=0x7f292e1473b8, params=0x7fff32f8f3e0, bstrategy=0x2deb3a0) at vacuumlazy.c:475
#11 0x00000000006ea059 in table_relation_vacuum (rel=0x7f292e1473b8, params=0x7fff32f8f3e0, bstrategy=0x2deb3a0)
at ../../../src/include/access/tableam.h:1432
#12 0x00000000006ecb74 in vacuum_rel (relid=16384, relation=0x2cf5cf8, params=0x7fff32f8f3e0) at vacuum.c:1885
#13 0x00000000006eac8d in vacuum (relations=0x2deb548, params=0x7fff32f8f3e0, bstrategy=0x2deb3a0, isTopLevel=true) at vacuum.c:440
#14 0x00000000006ea776 in ExecVacuum (pstate=0x2deaf90, vacstmt=0x2cf5de0, isTopLevel=true) at vacuum.c:241
#15 0x000000000091da3d in standard_ProcessUtility (pstmt=0x2cf5ea8, queryString=0x2cf51a0 "vacuum(parallel 10) tab1 ;", context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0, dest=0x2cf6188, completionTag=0x7fff32f8f840 "") at utility.c:665
#16 0x000000000091d270 in ProcessUtility (pstmt=0x2cf5ea8, queryString=0x2cf51a0 "vacuum(parallel 10) tab1 ;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
queryEnv=0x0, dest=0x2cf6188, completionTag=0x7fff32f8f840 "") at utility.c:359
#17 0x000000000091c187 in PortalRunUtility (portal=0x2d5c530, pstmt=0x2cf5ea8, isTopLevel=true, setHoldSnapshot=false, dest=0x2cf6188,
completionTag=0x7fff32f8f840 "") at pquery.c:1175
#18 0x000000000091c39e in PortalRunMulti (portal=0x2d5c530, isTopLevel=true, setHoldSnapshot=false, dest=0x2cf6188, altdest=0x2cf6188,
completionTag=0x7fff32f8f840 "") at pquery.c:1321
#19 0x000000000091b8c8 in PortalRun (portal=0x2d5c530, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2cf6188, altdest=0x2cf6188,
completionTag=0x7fff32f8f840 "") at pquery.c:796
#20 0x00000000009156d4 in exec_simple_query (query_string=0x2cf51a0 "vacuum(parallel 10) tab1 ;") at postgres.c:1227
#21 0x0000000000919a1c in PostgresMain (argc=1, argv=0x2d1f608, dbname=0x2d1f520 "postgres", username=0x2d1f500 "centos") at postgres.c:4288
#22 0x000000000086de39 in BackendRun (port=0x2d174e0) at postmaster.c:4498
#23 0x000000000086d617 in BackendStartup (port=0x2d174e0) at postmaster.c:4189
#24 0x0000000000869992 in ServerLoop () at postmaster.c:1727
#25 0x0000000000869248 in PostmasterMain (argc=3, argv=0x2cefd70) at postmaster.c:1400
#26 0x0000000000778593 in main (argc=3, argv=0x2cefd70) at main.c:210
On Wed, Dec 18, 2019 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 18, 2019 at 12:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 18, 2019 at 11:46 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 18 Dec 2019 at 15:03, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > I was analyzing your changes related to ReinitializeParallelDSM() and
> > > it seems like we might launch more number of workers for the
> > > bulkdelete phase. While creating a parallel context, we used the
> > > maximum of "workers required for bulkdelete phase" and "workers
> > > required for cleanup", but now if the number of workers required in
> > > bulkdelete phase is lesser than a cleanup phase(as mentioned by you in
> > > one example), then we would launch more workers for bulkdelete phase.
> >
> > Good catch. Currently when creating a parallel context the number of
> > workers passed to CreateParallelContext() is set not only to
> > pcxt->nworkers but also pcxt->nworkers_to_launch. We would need to
> > specify the number of workers actually to launch after created the
> > parallel context or when creating it. Or I think we call
> > ReinitializeParallelDSM() even the first time running index vacuum.
> >
>
> How about just having ReinitializeParallelWorkers which can be called
> only via vacuum even for the first time before the launch of workers
> as of now?
>
See in the attached what I have in mind. Few other comments:
1.
+ shared->disable_delay = (params->options & VACOPT_FAST);
This should be part of the third patch.
2.
+lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
+ LVRelStats *vacrelstats, LVParallelState *lps,
+ int nindexes)
{
..
..
+ /* Cap by the worker we computed at the beginning of parallel lazy vacuum */
+ nworkers = Min(nworkers, lps->pcxt->nworkers);
..
}
This should be Assert. In no case, the computed workers can be more
than what we have in context.
3.
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0))
+ nindexes_parallel_cleanup++;
I think the second condition should be VACUUM_OPTION_PARALLEL_COND_CLEANUP.
I have fixed the above comments and some given by me earlier [1] in
the attached patch. The attached patch is a diff on top of
v36-0002-Add-parallel-option-to-VACUUM-command.
Few other comments which I have not fixed:
4.
+ if (Irel[i]->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /* Skip indexes that don't participate parallel index vacuum */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(Irel[i]) < min_parallel_index_scan_size)
+ continue;
Won't we need to worry about the number of indexes that uses
maintenance_work_mem only for indexes that can participate in a
parallel vacuum? If so, the above checks need to be reversed.
5.
/*
+ * Remember indexes that can participate parallel index vacuum and use
+ * it for index statistics initialization on DSM because the index
+ * size can get bigger during vacuum.
+ */
+ can_parallel_vacuum[i] = true;
I am not able to understand the second part of the comment ("because
the index size can get bigger during vacuum."). What is its
relevance?
6.
+/*
+ * Vacuum or cleanup indexes that can be processed by only the leader process
+ * because these indexes don't support parallel operation at that phase.
+ * Therefore this function must be called by the leader process.
+ */
+static void
+vacuum_indexes_leader(Relation *Irel, int nindexes,
IndexBulkDeleteResult **stats,
+ LVRelStats *vacrelstats, LVParallelState *lps)
{
..
Why you have changed the order of nindexes parameter? I think in the
previous patch, it was the last parameter and that seems to be better
place for it. Also, I think after the latest modifications, you can
remove the second sentence in the above comment ("Therefore this
function must be called by the leader process.).
7.
+ for (i = 0; i < nindexes; i++)
+ {
+ bool leader_only = (get_indstats(lps->lvshared, i) == NULL ||
+ skip_parallel_vacuum_index(Irel[i], lps->lvshared));
+
+ /* Skip the indexes that can be processed by parallel workers */
+ if (!leader_only)
+ continue;
It is better to name this parameter as skip_index or something like that.
[1] - https://www.postgresql.org/message-id/CAA4eK1%2BKBAt1JS%2BasDd7K9C10OtBiyuUC75y8LR6QVnD2wrsMw%40mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
With Regards,
Prabhat Kumar Sahu
Skype ID: prabhat.sahu1984
EnterpriseDB Software India Pvt. Ltd.
The Postgres Database Company
On Wed, Dec 18, 2019 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Few other comments which I have not fixed: > + /* interface function to support parallel vacuum */ + amestimateparallelvacuum_function amestimateparallelvacuum; /* can be NULL */ } IndexAmRoutine; One more thing, why have you removed the estimate function for API patch? It seems to me Robert has given a different suggestion [1] to deal with it. I think he suggests to add a new member like void *private_data to IndexBulkDeleteResult and then provide an estimate function. See his email [1] for detailed explanation. Did I misunderstood it or you have handled it differently? Can you please share your thoughts on this? [1] - https://www.postgresql.org/message-id/CA%2BTgmobjtHdLfQhmzqBNt7VEsz%2B5w3P0yy0-EsoT05yAJViBSQ%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 18, 2019 at 6:01 PM Prabhat Sahu <prabhat.sahu@enterprisedb.com> wrote:
Hi all,
While testing on v36 patch with gist index, I came across below segmentation fault.
It seems you forgot to apply the Gist index patch as mentioned by Masahiko-San. You need to first apply the patch at https://www.postgresql.org/message-id/CAA4eK1J1RxmXFAHC34S4_BznT76cfbrvqORSk23iBgRAOj1azw%40mail.gmail.com and then apply other v-36 patches. If you have already done that, then we need to investigate. Kindly confirm.
On Wed, Dec 18, 2019 at 6:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 18, 2019 at 6:01 PM Prabhat Sahu <prabhat.sahu@enterprisedb.com> wrote:Hi all,
While testing on v36 patch with gist index, I came across below segmentation fault.It seems you forgot to apply the Gist index patch as mentioned by Masahiko-San. You need to first apply the patch at https://www.postgresql.org/message-id/CAA4eK1J1RxmXFAHC34S4_BznT76cfbrvqORSk23iBgRAOj1azw%40mail.gmail.com and then apply other v-36 patches. If you have already done that, then we need to investigate. Kindly confirm.
Yes Amit, Thanks for the suggestion. I have forgotten to add the v4 patch.
I have retested the same scenario, now the issue is not reproducible and it is working fine.
With Regards,
Prabhat Kumar Sahu
Skype ID: prabhat.sahu1984
EnterpriseDB Software India Pvt. Ltd.
The Postgres Database Company
On Wed, Dec 18, 2019 at 6:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Dec 18, 2019 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Few other comments which I have not fixed: > > > > + /* interface function to support parallel vacuum */ > + amestimateparallelvacuum_function amestimateparallelvacuum; /* > can be NULL */ > } IndexAmRoutine; > > One more thing, why have you removed the estimate function for API > patch? > Again thinking about this, it seems to me what you have done here is probably the right direction because whatever else we will do we need to have some untested code or we need to write/enhance some IndexAM to test this. The point is that we don't have any IndexAM in the core (after working around Gist index) which has this requirement and we have not even heard from anyone of such usage, so there is a good chance that whatever we do might not be sufficient for the IndexAM that have such usage. Now, we are already providing an option that one can set VACUUM_OPTION_NO_PARALLEL to indicate that the IndexAM can't participate in a parallel vacuum. So, I feel if there is any IndexAM which would like to pass more data along with IndexBulkDeleteResult, they can use that option. It won't be very difficult to enhance or provide the new APIs to support a parallel vacuum if we come across such a usage. I think we should just modify the comments atop VACUUM_OPTION_NO_PARALLEL to mention this. I think this should be good enough for the first version of parallel vacuum considering we are able to support a parallel vacuum for all in-core indexes. Thoughts? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 19, 2019 at 8:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Dec 18, 2019 at 6:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Dec 18, 2019 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Few other comments which I have not fixed: > > > > > > > + /* interface function to support parallel vacuum */ > > + amestimateparallelvacuum_function amestimateparallelvacuum; /* > > can be NULL */ > > } IndexAmRoutine; > > > > One more thing, why have you removed the estimate function for API > > patch? > > > > Again thinking about this, it seems to me what you have done here is > probably the right direction because whatever else we will do we need > to have some untested code or we need to write/enhance some IndexAM to > test this. The point is that we don't have any IndexAM in the core > (after working around Gist index) which has this requirement and we > have not even heard from anyone of such usage, so there is a good > chance that whatever we do might not be sufficient for the IndexAM > that have such usage. > > Now, we are already providing an option that one can set > VACUUM_OPTION_NO_PARALLEL to indicate that the IndexAM can't > participate in a parallel vacuum. So, I feel if there is any IndexAM > which would like to pass more data along with IndexBulkDeleteResult, > they can use that option. It won't be very difficult to enhance or > provide the new APIs to support a parallel vacuum if we come across > such a usage. I think we should just modify the comments atop > VACUUM_OPTION_NO_PARALLEL to mention this. I think this should be > good enough for the first version of parallel vacuum considering we > are able to support a parallel vacuum for all in-core indexes. > > Thoughts? +1 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, 19 Dec 2019 at 11:47, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Dec 18, 2019 at 6:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Dec 18, 2019 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Few other comments which I have not fixed: > > > > > > > + /* interface function to support parallel vacuum */ > > + amestimateparallelvacuum_function amestimateparallelvacuum; /* > > can be NULL */ > > } IndexAmRoutine; > > > > One more thing, why have you removed the estimate function for API > > patch? > > > > Again thinking about this, it seems to me what you have done here is > probably the right direction because whatever else we will do we need > to have some untested code or we need to write/enhance some IndexAM to > test this. The point is that we don't have any IndexAM in the core > (after working around Gist index) which has this requirement and we > have not even heard from anyone of such usage, so there is a good > chance that whatever we do might not be sufficient for the IndexAM > that have such usage. > > Now, we are already providing an option that one can set > VACUUM_OPTION_NO_PARALLEL to indicate that the IndexAM can't > participate in a parallel vacuum. So, I feel if there is any IndexAM > which would like to pass more data along with IndexBulkDeleteResult, > they can use that option. It won't be very difficult to enhance or > provide the new APIs to support a parallel vacuum if we come across > such a usage. Yeah that's exactly what I was thinking. I was about to send such email. The idea is good but I thought we can exclude this feature from the first version patch because we still don't have index AMs that uses that callback in core after gist index patch gets committed. That is, an index AM that does vacuum like the current gist indexes should set VACUUM_OPTION_NO_PARALLEL and we can discuss that again when we got real voice from index AM developers. > I think we should just modify the comments atop > VACUUM_OPTION_NO_PARALLEL to mention this. I think this should be > good enough for the first version of parallel vacuum considering we > are able to support a parallel vacuum for all in-core indexes. I added some comments about that in v36 patch but I slightly modified it. I'll submit an updated version patch soon. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, 18 Dec 2019 at 19:06, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Dec 18, 2019 at 12:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Dec 18, 2019 at 11:46 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Wed, 18 Dec 2019 at 15:03, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > I was analyzing your changes related to ReinitializeParallelDSM() and > > > > it seems like we might launch more number of workers for the > > > > bulkdelete phase. While creating a parallel context, we used the > > > > maximum of "workers required for bulkdelete phase" and "workers > > > > required for cleanup", but now if the number of workers required in > > > > bulkdelete phase is lesser than a cleanup phase(as mentioned by you in > > > > one example), then we would launch more workers for bulkdelete phase. > > > > > > Good catch. Currently when creating a parallel context the number of > > > workers passed to CreateParallelContext() is set not only to > > > pcxt->nworkers but also pcxt->nworkers_to_launch. We would need to > > > specify the number of workers actually to launch after created the > > > parallel context or when creating it. Or I think we call > > > ReinitializeParallelDSM() even the first time running index vacuum. > > > > > > > How about just having ReinitializeParallelWorkers which can be called > > only via vacuum even for the first time before the launch of workers > > as of now? > > > > See in the attached what I have in mind. Few other comments: > > 1. > + shared->disable_delay = (params->options & VACOPT_FAST); > > This should be part of the third patch. > > 2. > +lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, > + LVRelStats *vacrelstats, LVParallelState *lps, > + int nindexes) > { > .. > .. > + /* Cap by the worker we computed at the beginning of parallel lazy vacuum */ > + nworkers = Min(nworkers, lps->pcxt->nworkers); > .. > } > > This should be Assert. In no case, the computed workers can be more > than what we have in context. > > 3. > + if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) || > + ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)) > + nindexes_parallel_cleanup++; > > I think the second condition should be VACUUM_OPTION_PARALLEL_COND_CLEANUP. > > I have fixed the above comments and some given by me earlier [1] in > the attached patch. The attached patch is a diff on top of > v36-0002-Add-parallel-option-to-VACUUM-command. Thank you! - /* Cap by the worker we computed at the beginning of parallel lazy vacuum */ - nworkers = Min(nworkers, lps->pcxt->nworkers); + /* + * The number of workers required for parallel vacuum phase must be less + * than the number of workers with which parallel context is initialized. + */ + Assert(lps->pcxt->nworkers >= nworkers); Regarding the above change in your patch I think we need to cap the number of workers by lps->pcxt->nworkers because the computation of the number of indexes based on lps->nindexes_paralle_XXX can be larger than the number determined when creating the parallel context, for example, when max_parallel_maintenance_workers is smaller than the number of indexes that can be vacuumed in parallel at bulkdelete phase. > > Few other comments which I have not fixed: > > 4. > + if (Irel[i]->rd_indam->amusemaintenanceworkmem) > + nindexes_mwm++; > + > + /* Skip indexes that don't participate parallel index vacuum */ > + if (vacoptions == VACUUM_OPTION_NO_PARALLEL || > + RelationGetNumberOfBlocks(Irel[i]) < min_parallel_index_scan_size) > + continue; > > Won't we need to worry about the number of indexes that uses > maintenance_work_mem only for indexes that can participate in a > parallel vacuum? If so, the above checks need to be reversed. You're right. Fixed. > > 5. > /* > + * Remember indexes that can participate parallel index vacuum and use > + * it for index statistics initialization on DSM because the index > + * size can get bigger during vacuum. > + */ > + can_parallel_vacuum[i] = true; > > I am not able to understand the second part of the comment ("because > the index size can get bigger during vacuum."). What is its > relevance? I meant that the indexes can be begger even during vacuum. So we need to check the size of indexes and determine participations of parallel index vacuum at one place. > > 6. > +/* > + * Vacuum or cleanup indexes that can be processed by only the leader process > + * because these indexes don't support parallel operation at that phase. > + * Therefore this function must be called by the leader process. > + */ > +static void > +vacuum_indexes_leader(Relation *Irel, int nindexes, > IndexBulkDeleteResult **stats, > + LVRelStats *vacrelstats, LVParallelState *lps) > { > .. > > Why you have changed the order of nindexes parameter? I think in the > previous patch, it was the last parameter and that seems to be better > place for it. Since some existing codes place nindexes right after *Irel I thought it's more understandable but I'm also fine with the previous order. > Also, I think after the latest modifications, you can > remove the second sentence in the above comment ("Therefore this > function must be called by the leader process.). Fixed. > > 7. > + for (i = 0; i < nindexes; i++) > + { > + bool leader_only = (get_indstats(lps->lvshared, i) == NULL || > + skip_parallel_vacuum_index(Irel[i], lps->lvshared)); > + > + /* Skip the indexes that can be processed by parallel workers */ > + if (!leader_only) > + continue; > > It is better to name this parameter as skip_index or something like that. Fixed. Attached the updated version patch. This version patch incorporates the above comments and the comments from Mahendra. I also fixed one bug around determining the indexes that are vacuumed in parallel based on their option and size. Please review it. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, 18 Dec 2019 at 12:07, Amit Kapila <amit.kapila16@gmail.com> wrote: > > [please trim extra text before responding] > > On Wed, Dec 18, 2019 at 12:01 PM Mahendra Singh <mahi6run@gmail.com> wrote: > > > > On Tue, 10 Dec 2019 at 00:30, Mahendra Singh <mahi6run@gmail.com> wrote: > > > > > > > > > 3. > > > After v35 patch, vacuum.sql regression test is taking too much time due to large number of inserts so by reducing numberof tuples, we can reduce that time. > > > +INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,100000) i; > > > > > > here, instead of 100000, we can make 1000 to reduce time of this test case because we only want to test code and functionality. > > > > As we added check of min_parallel_index_scan_size in v36 patch set to > > decide parallel vacuum, 1000 tuples are not enough to do parallel > > vacuum. I can see that we are not launching any workers in vacuum.sql > > test case and hence, code coverage also decreased. I am not sure that > > how to fix this. > > > > Try by setting min_parallel_index_scan_size to 0 in test case. Thanks Amit for the fix. Yes, we can add "set min_parallel_index_scan_size = 0;" in vacuum.sql test case. I tested by setting min_parallel_index_scan_size=0 and it is working fine. @Masahiko san, please add above line in vacuum.sql test case. Thanks and Regards Mahendra Thalor EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 19, 2019 at 11:11 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 18 Dec 2019 at 19:06, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > - /* Cap by the worker we computed at the beginning of parallel lazy vacuum */ > - nworkers = Min(nworkers, lps->pcxt->nworkers); > + /* > + * The number of workers required for parallel vacuum phase must be less > + * than the number of workers with which parallel context is initialized. > + */ > + Assert(lps->pcxt->nworkers >= nworkers); > > Regarding the above change in your patch I think we need to cap the > number of workers by lps->pcxt->nworkers because the computation of > the number of indexes based on lps->nindexes_paralle_XXX can be larger > than the number determined when creating the parallel context, for > example, when max_parallel_maintenance_workers is smaller than the > number of indexes that can be vacuumed in parallel at bulkdelete > phase. > oh, right, but then probably, you can write a comment as this is not so obvious. > > > > Few other comments which I have not fixed: > > > > 4. > > + if (Irel[i]->rd_indam->amusemaintenanceworkmem) > > + nindexes_mwm++; > > + > > + /* Skip indexes that don't participate parallel index vacuum */ > > + if (vacoptions == VACUUM_OPTION_NO_PARALLEL || > > + RelationGetNumberOfBlocks(Irel[i]) < min_parallel_index_scan_size) > > + continue; > > > > Won't we need to worry about the number of indexes that uses > > maintenance_work_mem only for indexes that can participate in a > > parallel vacuum? If so, the above checks need to be reversed. > > You're right. Fixed. > > > > > 5. > > /* > > + * Remember indexes that can participate parallel index vacuum and use > > + * it for index statistics initialization on DSM because the index > > + * size can get bigger during vacuum. > > + */ > > + can_parallel_vacuum[i] = true; > > > > I am not able to understand the second part of the comment ("because > > the index size can get bigger during vacuum."). What is its > > relevance? > > I meant that the indexes can be begger even during vacuum. So we need > to check the size of indexes and determine participations of parallel > index vacuum at one place. > Okay, but that doesn't go with the earlier part of the comment. We can either remove it or explain it a bit more. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 19, 2019 at 12:41 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > Attached the updated version patch. This version patch incorporates > the above comments and the comments from Mahendra. I also fixed one > bug around determining the indexes that are vacuumed in parallel based > on their option and size. Please review it. I'm not enthusiastic about the fact that 0003 calls the fast option 'disable_delay' in some places. I think it would be more clear to call it 'fast' everywhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, 19 Dec 2019 at 22:48, Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Dec 19, 2019 at 12:41 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > Attached the updated version patch. This version patch incorporates > > the above comments and the comments from Mahendra. I also fixed one > > bug around determining the indexes that are vacuumed in parallel based > > on their option and size. Please review it. > > I'm not enthusiastic about the fact that 0003 calls the fast option > 'disable_delay' in some places. I think it would be more clear to call > it 'fast' everywhere. > Agreed. I've attached the updated version patch that incorporated the all review comments I go so far. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi,
While testing this feature with parallel vacuum on "TEMPORARY TABLE", I got a server crash on PG Head+V36_patch.
Changed configuration parameters and Stack trace are as below:
autovacuum = on
max_worker_processes = 4
shared_buffers = 10MB
max_parallel_workers = 8
max_parallel_maintenance_workers = 8
vacuum_cost_limit = 2000
vacuum_cost_delay = 10
min_parallel_table_scan_size = 8MB
min_parallel_index_scan_size = 0
-- Stack trace:
[centos@parallel-vacuum-testing bin]$ gdb -q -c data/core.1399 postgres
Reading symbols from /home/centos/BLP_Vacuum/postgresql/inst/bin/postgres...done.
[New LWP 1399]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: autovacuum worker postgres '.
Program terminated with signal 6, Aborted.
#0 0x00007f4517d80337 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x00007f4517d80337 in raise () from /lib64/libc.so.6
#1 0x00007f4517d81a28 in abort () from /lib64/libc.so.6
#2 0x0000000000a96341 in ExceptionalCondition (conditionName=0xd18efb "strvalue != NULL", errorType=0xd18eeb "FailedAssertion",
fileName=0xd18ee0 "snprintf.c", lineNumber=442) at assert.c:67
#3 0x0000000000b02522 in dopr (target=0x7ffdb0e38450, format=0xc5fa95 ".%s\"", args=0x7ffdb0e38538) at snprintf.c:442
#4 0x0000000000b01ea6 in pg_vsnprintf (str=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177' <repeats 151 times>..., count=1024,
fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at snprintf.c:195
#5 0x0000000000afbadf in pvsnprintf (buf=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177' <repeats 151 times>..., len=1024,
fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at psprintf.c:110
#6 0x0000000000afd34b in appendStringInfoVA (str=0x7ffdb0e38550, fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538)
at stringinfo.c:149
#7 0x0000000000a970fd in errmsg (fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"") at elog.c:832
#8 0x00000000008588d2 in do_autovacuum () at autovacuum.c:2249
#9 0x0000000000857b29 in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1689
#10 0x000000000085772f in StartAutoVacWorker () at autovacuum.c:1483
#11 0x000000000086e64f in StartAutovacuumWorker () at postmaster.c:5562
#12 0x000000000086e106 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5279
#13 <signal handler called>
#14 0x00007f4517e3f933 in __select_nocancel () from /lib64/libc.so.6
#15 0x0000000000869838 in ServerLoop () at postmaster.c:1691
#16 0x0000000000869212 in PostmasterMain (argc=3, argv=0x256bd70) at postmaster.c:1400
#17 0x000000000077855d in main (argc=3, argv=0x256bd70) at main.c:210
(gdb)
I have tried to reproduce the same with all previously executed queries but now I am not able to reproduce the same.
While testing this feature with parallel vacuum on "TEMPORARY TABLE", I got a server crash on PG Head+V36_patch.
Changed configuration parameters and Stack trace are as below:
autovacuum = on
max_worker_processes = 4
shared_buffers = 10MB
max_parallel_workers = 8
max_parallel_maintenance_workers = 8
vacuum_cost_limit = 2000
vacuum_cost_delay = 10
min_parallel_table_scan_size = 8MB
min_parallel_index_scan_size = 0
-- Stack trace:
[centos@parallel-vacuum-testing bin]$ gdb -q -c data/core.1399 postgres
Reading symbols from /home/centos/BLP_Vacuum/postgresql/inst/bin/postgres...done.
[New LWP 1399]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: autovacuum worker postgres '.
Program terminated with signal 6, Aborted.
#0 0x00007f4517d80337 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x00007f4517d80337 in raise () from /lib64/libc.so.6
#1 0x00007f4517d81a28 in abort () from /lib64/libc.so.6
#2 0x0000000000a96341 in ExceptionalCondition (conditionName=0xd18efb "strvalue != NULL", errorType=0xd18eeb "FailedAssertion",
fileName=0xd18ee0 "snprintf.c", lineNumber=442) at assert.c:67
#3 0x0000000000b02522 in dopr (target=0x7ffdb0e38450, format=0xc5fa95 ".%s\"", args=0x7ffdb0e38538) at snprintf.c:442
#4 0x0000000000b01ea6 in pg_vsnprintf (str=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177' <repeats 151 times>..., count=1024,
fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at snprintf.c:195
#5 0x0000000000afbadf in pvsnprintf (buf=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177' <repeats 151 times>..., len=1024,
fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at psprintf.c:110
#6 0x0000000000afd34b in appendStringInfoVA (str=0x7ffdb0e38550, fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538)
at stringinfo.c:149
#7 0x0000000000a970fd in errmsg (fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"") at elog.c:832
#8 0x00000000008588d2 in do_autovacuum () at autovacuum.c:2249
#9 0x0000000000857b29 in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1689
#10 0x000000000085772f in StartAutoVacWorker () at autovacuum.c:1483
#11 0x000000000086e64f in StartAutovacuumWorker () at postmaster.c:5562
#12 0x000000000086e106 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5279
#13 <signal handler called>
#14 0x00007f4517e3f933 in __select_nocancel () from /lib64/libc.so.6
#15 0x0000000000869838 in ServerLoop () at postmaster.c:1691
#16 0x0000000000869212 in PostmasterMain (argc=3, argv=0x256bd70) at postmaster.c:1400
#17 0x000000000077855d in main (argc=3, argv=0x256bd70) at main.c:210
(gdb)
I have tried to reproduce the same with all previously executed queries but now I am not able to reproduce the same.
On Thu, Dec 19, 2019 at 11:26 AM Mahendra Singh <mahi6run@gmail.com> wrote:
On Wed, 18 Dec 2019 at 12:07, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> [please trim extra text before responding]
>
> On Wed, Dec 18, 2019 at 12:01 PM Mahendra Singh <mahi6run@gmail.com> wrote:
> >
> > On Tue, 10 Dec 2019 at 00:30, Mahendra Singh <mahi6run@gmail.com> wrote:
> > >
> > >
> > > 3.
> > > After v35 patch, vacuum.sql regression test is taking too much time due to large number of inserts so by reducing number of tuples, we can reduce that time.
> > > +INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM generate_series(1,100000) i;
> > >
> > > here, instead of 100000, we can make 1000 to reduce time of this test case because we only want to test code and functionality.
> >
> > As we added check of min_parallel_index_scan_size in v36 patch set to
> > decide parallel vacuum, 1000 tuples are not enough to do parallel
> > vacuum. I can see that we are not launching any workers in vacuum.sql
> > test case and hence, code coverage also decreased. I am not sure that
> > how to fix this.
> >
>
> Try by setting min_parallel_index_scan_size to 0 in test case.
Thanks Amit for the fix.
Yes, we can add "set min_parallel_index_scan_size = 0;" in vacuum.sql
test case. I tested by setting min_parallel_index_scan_size=0 and it
is working fine.
@Masahiko san, please add above line in vacuum.sql test case.
Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com
With Regards,
Prabhat Kumar Sahu
Skype ID: prabhat.sahu1984
EnterpriseDB Software India Pvt. Ltd.
The Postgres Database Company
On Fri, Dec 20, 2019 at 5:17 PM Prabhat Sahu <prabhat.sahu@enterprisedb.com> wrote:
Hi,
While testing this feature with parallel vacuum on "TEMPORARY TABLE", I got a server crash on PG Head+V36_patch.
From the call stack, it is not clear whether it is related to a patch at all. Have you checked your test with and without the patch? The reason is that the patch doesn't perform a parallel vacuum on temporary tables.
Changed configuration parameters and Stack trace are as below:
-- Stack trace:
[centos@parallel-vacuum-testing bin]$ gdb -q -c data/core.1399 postgres
Reading symbols from /home/centos/BLP_Vacuum/postgresql/inst/bin/postgres...done.
[New LWP 1399]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: autovacuum worker postgres '.
Program terminated with signal 6, Aborted.
#0 0x00007f4517d80337 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x00007f4517d80337 in raise () from /lib64/libc.so.6
#1 0x00007f4517d81a28 in abort () from /lib64/libc.so.6
#2 0x0000000000a96341 in ExceptionalCondition (conditionName=0xd18efb "strvalue != NULL", errorType=0xd18eeb "FailedAssertion",
fileName=0xd18ee0 "snprintf.c", lineNumber=442) at assert.c:67
#3 0x0000000000b02522 in dopr (target=0x7ffdb0e38450, format=0xc5fa95 ".%s\"", args=0x7ffdb0e38538) at snprintf.c:442
#4 0x0000000000b01ea6 in pg_vsnprintf (str=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177' <repeats 151 times>..., count=1024,
fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at snprintf.c:195
#5 0x0000000000afbadf in pvsnprintf (buf=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177' <repeats 151 times>..., len=1024,
fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at psprintf.c:110
#6 0x0000000000afd34b in appendStringInfoVA (str=0x7ffdb0e38550, fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538)
at stringinfo.c:149
#7 0x0000000000a970fd in errmsg (fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"") at elog.c:832
#8 0x00000000008588d2 in do_autovacuum () at autovacuum.c:2249
The call stack seems to indicate that the backend from where you were doing the operations on temporary tables seems to have crashed somehow and then autovacuum tries to clean up that orphaned temporary table. And it crashes while printing the message for dropping orphan tables. Below is that message:
(errmsg("autovacuum: dropping orphan temp table \"%s.%s.%s\"",
get_database_name(MyDatabaseId),
get_namespace_name(classForm->relnamespace),
NameStr(classForm->relname))));
Now it can fail the assertion only if one of three parameters (database name, namespace, relname) is NULL which I can't see any way to happen unless you have manually removed one of namespace or database.
(gdb)
I have tried to reproduce the same with all previously executed queries but now I am not able to reproduce the same.
I am not sure how from this we can conclude if there is any problem with this patch or otherwise unless you have some steps to show us what you have done. It could happen if you somehow corrupt the database by manually removing stuff or maybe there is some genuine bug, but it is not at all clear.
On Fri, 20 Dec 2019 at 17:17, Prabhat Sahu <prabhat.sahu@enterprisedb.com> wrote: > > Hi, > > While testing this feature with parallel vacuum on "TEMPORARY TABLE", I got a server crash on PG Head+V36_patch. > Changed configuration parameters and Stack trace are as below: > > autovacuum = on > max_worker_processes = 4 > shared_buffers = 10MB > max_parallel_workers = 8 > max_parallel_maintenance_workers = 8 > vacuum_cost_limit = 2000 > vacuum_cost_delay = 10 > min_parallel_table_scan_size = 8MB > min_parallel_index_scan_size = 0 > > -- Stack trace: > [centos@parallel-vacuum-testing bin]$ gdb -q -c data/core.1399 postgres > Reading symbols from /home/centos/BLP_Vacuum/postgresql/inst/bin/postgres...done. > [New LWP 1399] > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > Core was generated by `postgres: autovacuum worker postgres '. > Program terminated with signal 6, Aborted. > #0 0x00007f4517d80337 in raise () from /lib64/libc.so.6 > Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-14.1.el7.x86_64openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64 > (gdb) bt > #0 0x00007f4517d80337 in raise () from /lib64/libc.so.6 > #1 0x00007f4517d81a28 in abort () from /lib64/libc.so.6 > #2 0x0000000000a96341 in ExceptionalCondition (conditionName=0xd18efb "strvalue != NULL", errorType=0xd18eeb "FailedAssertion", > fileName=0xd18ee0 "snprintf.c", lineNumber=442) at assert.c:67 > #3 0x0000000000b02522 in dopr (target=0x7ffdb0e38450, format=0xc5fa95 ".%s\"", args=0x7ffdb0e38538) at snprintf.c:442 > #4 0x0000000000b01ea6 in pg_vsnprintf (str=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177' <repeats151 times>..., count=1024, > fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at snprintf.c:195 > #5 0x0000000000afbadf in pvsnprintf (buf=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177' <repeats151 times>..., len=1024, > fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at psprintf.c:110 > #6 0x0000000000afd34b in appendStringInfoVA (str=0x7ffdb0e38550, fmt=0xc5fa68 "autovacuum: dropping orphan temp table\"%s.%s.%s\"", args=0x7ffdb0e38538) > at stringinfo.c:149 > #7 0x0000000000a970fd in errmsg (fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"") at elog.c:832 > #8 0x00000000008588d2 in do_autovacuum () at autovacuum.c:2249 > #9 0x0000000000857b29 in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1689 > #10 0x000000000085772f in StartAutoVacWorker () at autovacuum.c:1483 > #11 0x000000000086e64f in StartAutovacuumWorker () at postmaster.c:5562 > #12 0x000000000086e106 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5279 > #13 <signal handler called> > #14 0x00007f4517e3f933 in __select_nocancel () from /lib64/libc.so.6 > #15 0x0000000000869838 in ServerLoop () at postmaster.c:1691 > #16 0x0000000000869212 in PostmasterMain (argc=3, argv=0x256bd70) at postmaster.c:1400 > #17 0x000000000077855d in main (argc=3, argv=0x256bd70) at main.c:210 > (gdb) > > I have tried to reproduce the same with all previously executed queries but now I am not able to reproduce the same. Thanks Prabhat for reporting this issue. I am able to reproduce this issue at my end. I tested and verified that this issue is not related to parallel vacuum patch. I am able to reproduce this issue on HEAD without parallel vacuum patch(v37). I will report this issue in new thread with reproducible test case. Thanks and Regards Mahendra Thalor EnterpriseDB: http://www.enterprisedb.com
On Mon, 23 Dec 2019 at 16:24, Mahendra Singh <mahi6run@gmail.com> wrote: > > On Fri, 20 Dec 2019 at 17:17, Prabhat Sahu > <prabhat.sahu@enterprisedb.com> wrote: > > > > Hi, > > > > While testing this feature with parallel vacuum on "TEMPORARY TABLE", I got a server crash on PG Head+V36_patch. > > Changed configuration parameters and Stack trace are as below: > > > > autovacuum = on > > max_worker_processes = 4 > > shared_buffers = 10MB > > max_parallel_workers = 8 > > max_parallel_maintenance_workers = 8 > > vacuum_cost_limit = 2000 > > vacuum_cost_delay = 10 > > min_parallel_table_scan_size = 8MB > > min_parallel_index_scan_size = 0 > > > > -- Stack trace: > > [centos@parallel-vacuum-testing bin]$ gdb -q -c data/core.1399 postgres > > Reading symbols from /home/centos/BLP_Vacuum/postgresql/inst/bin/postgres...done. > > [New LWP 1399] > > [Thread debugging using libthread_db enabled] > > Using host libthread_db library "/lib64/libthread_db.so.1". > > Core was generated by `postgres: autovacuum worker postgres '. > > Program terminated with signal 6, Aborted. > > #0 0x00007f4517d80337 in raise () from /lib64/libc.so.6 > > Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-14.1.el7.x86_64openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64 > > (gdb) bt > > #0 0x00007f4517d80337 in raise () from /lib64/libc.so.6 > > #1 0x00007f4517d81a28 in abort () from /lib64/libc.so.6 > > #2 0x0000000000a96341 in ExceptionalCondition (conditionName=0xd18efb "strvalue != NULL", errorType=0xd18eeb "FailedAssertion", > > fileName=0xd18ee0 "snprintf.c", lineNumber=442) at assert.c:67 > > #3 0x0000000000b02522 in dopr (target=0x7ffdb0e38450, format=0xc5fa95 ".%s\"", args=0x7ffdb0e38538) at snprintf.c:442 > > #4 0x0000000000b01ea6 in pg_vsnprintf (str=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177' <repeats151 times>..., count=1024, > > fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at snprintf.c:195 > > #5 0x0000000000afbadf in pvsnprintf (buf=0x256df50 "autovacuum: dropping orphan temp table \"postgres.", '\177' <repeats151 times>..., len=1024, > > fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"", args=0x7ffdb0e38538) at psprintf.c:110 > > #6 0x0000000000afd34b in appendStringInfoVA (str=0x7ffdb0e38550, fmt=0xc5fa68 "autovacuum: dropping orphan temp table\"%s.%s.%s\"", args=0x7ffdb0e38538) > > at stringinfo.c:149 > > #7 0x0000000000a970fd in errmsg (fmt=0xc5fa68 "autovacuum: dropping orphan temp table \"%s.%s.%s\"") at elog.c:832 > > #8 0x00000000008588d2 in do_autovacuum () at autovacuum.c:2249 > > #9 0x0000000000857b29 in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1689 > > #10 0x000000000085772f in StartAutoVacWorker () at autovacuum.c:1483 > > #11 0x000000000086e64f in StartAutovacuumWorker () at postmaster.c:5562 > > #12 0x000000000086e106 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5279 > > #13 <signal handler called> > > #14 0x00007f4517e3f933 in __select_nocancel () from /lib64/libc.so.6 > > #15 0x0000000000869838 in ServerLoop () at postmaster.c:1691 > > #16 0x0000000000869212 in PostmasterMain (argc=3, argv=0x256bd70) at postmaster.c:1400 > > #17 0x000000000077855d in main (argc=3, argv=0x256bd70) at main.c:210 > > (gdb) > > > > I have tried to reproduce the same with all previously executed queries but now I am not able to reproduce the same. > > Thanks Prabhat for reporting this issue. > > I am able to reproduce this issue at my end. I tested and verified > that this issue is not related to parallel vacuum patch. I am able to > reproduce this issue on HEAD without parallel vacuum patch(v37). > > I will report this issue in new thread with reproducible test case. Thank you so much! Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Dec 20, 2019 at 12:13 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > I've attached the updated version patch that incorporated the all > review comments I go so far. > I have further edited the first two patches posted by you. The changes include (a) changed tests to reset the guc, (b) removing some stuff which is not required in this version, (c) moving some variables around to make them in better order, (d) changed comments and few other cosmetic things and (e) commit messages for first two patches. I think the first two patches attached in this email are in good shape and we can commit those unless you or someone has more comments on them, the main parallel vacuum patch can still be improved by some more test/polish/review. I am planning to push the first two patches next week after another pass. The first two patches are explained in brief as below: 1. v4-0001-Delete-empty-pages-in-each-pass-during-GIST-VACUUM: It allows us to delete empty pages in each pass during GIST VACUUM. Earlier, we use to postpone deleting empty pages till the second stage of vacuum to amortize the cost of scanning internal pages. However, that can sometimes (say vacuum is canceled or errored between first and second stage) delay the pages to be recycled. Another thing is that to facilitate deleting empty pages in the second stage, we need to share the information of internal and empty pages between different stages of vacuum. It will be quite tricky to share this information via DSM which is required for the main parallel vacuum patch. Also, it will bring the logic to reclaim deleted pages closer to nbtree where we delete empty pages in each pass. Overall, the advantages of deleting empty pages in each pass outweigh the advantages of postponing the same. This patch is discussed in detail in a separate thread [1]. 2. v39-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch: Introduce new fields amusemaintenanceworkmem and amparallelvacuumoptions in IndexAmRoutine for parallel vacuum. The amusemaintenanceworkmem tells whether a particular IndexAM uses maintenance_work_mem or not. This will help in controlling the memory used by individual workers as otherwise, each worker can consume memory equal to maintenance_work_mem. This has been discussed in detail in a separate thread as well [2]. The amparallelvacuumoptions tell whether a particular IndexAM participates in a parallel vacuum and if so in which phase (bulkdelete, vacuumcleanup) of vacuum. [1] - https://www.postgresql.org/message-id/CAA4eK1LGr%2BMN0xHZpJ2dfS8QNQ1a_aROKowZB%2BMPNep8FVtwAA%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAA4eK1LmcD5aPogzwim5Nn58Ki+74a6Edghx4Wd8hAskvHaq5A@mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
g_indg_On Mon, 23 Dec 2019 at 16:11, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 20, 2019 at 12:13 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > I've attached the updated version patch that incorporated the all
> > review comments I go so far.
> >
>
> I have further edited the first two patches posted by you. The
> changes include (a) changed tests to reset the guc, (b) removing some
> stuff which is not required in this version, (c) moving some variables
> around to make them in better order, (d) changed comments and few
> other cosmetic things and (e) commit messages for first two patches.
>
> I think the first two patches attached in this email are in good shape
> and we can commit those unless you or someone has more comments on
> them, the main parallel vacuum patch can still be improved by some
> more test/polish/review. I am planning to push the first two patches
> next week after another pass. The first two patches are explained in
> brief as below:
>
> 1. v4-0001-Delete-empty-pages-in-each-pass-during-GIST-VACUUM: It
> allows us to delete empty pages in each pass during GIST VACUUM.
> Earlier, we use to postpone deleting empty pages till the second stage
> of vacuum to amortize the cost of scanning internal pages. However,
> that can sometimes (say vacuum is canceled or errored between first
> and second stage) delay the pages to be recycled. Another thing is
> that to facilitate deleting empty pages in the second stage, we need
> to share the information of internal and empty pages between different
> stages of vacuum. It will be quite tricky to share this information
> via DSM which is required for the main parallel vacuum patch. Also,
> it will bring the logic to reclaim deleted pages closer to nbtree
> where we delete empty pages in each pass. Overall, the advantages of
> deleting empty pages in each pass outweigh the advantages of
> postponing the same. This patch is discussed in detail in a separate
> thread [1].
>
> 2. v39-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch:
> Introduce new fields amusemaintenanceworkmem and
> amparallelvacuumoptions in IndexAmRoutine for parallel vacuum. The
> amusemaintenanceworkmem tells whether a particular IndexAM uses
> maintenance_work_mem or not. This will help in controlling the memory
> used by individual workers as otherwise, each worker can consume
> memory equal to maintenance_work_mem. This has been discussed in
> detail in a separate thread as well [2]. The amparallelvacuumoptions
> tell whether a particular IndexAM participates in a parallel vacuum
> and if so in which phase (bulkdelete, vacuumcleanup) of vacuum.
>
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1LGr%2BMN0xHZpJ2dfS8QNQ1a_aROKowZB%2BMPNep8FVtwAA%40mail.gmail.com
> [2] - https://www.postgresql.org/message-id/CAA4eK1LmcD5aPogzwim5Nn58Ki+74a6Edghx4Wd8hAskvHaq5A@mail.gmail.com
1.
+ * memory equal to maitenance_work_mem, the new maitenance_work_mem for
>
> On Fri, Dec 20, 2019 at 12:13 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > I've attached the updated version patch that incorporated the all
> > review comments I go so far.
> >
>
> I have further edited the first two patches posted by you. The
> changes include (a) changed tests to reset the guc, (b) removing some
> stuff which is not required in this version, (c) moving some variables
> around to make them in better order, (d) changed comments and few
> other cosmetic things and (e) commit messages for first two patches.
>
> I think the first two patches attached in this email are in good shape
> and we can commit those unless you or someone has more comments on
> them, the main parallel vacuum patch can still be improved by some
> more test/polish/review. I am planning to push the first two patches
> next week after another pass. The first two patches are explained in
> brief as below:
>
> 1. v4-0001-Delete-empty-pages-in-each-pass-during-GIST-VACUUM: It
> allows us to delete empty pages in each pass during GIST VACUUM.
> Earlier, we use to postpone deleting empty pages till the second stage
> of vacuum to amortize the cost of scanning internal pages. However,
> that can sometimes (say vacuum is canceled or errored between first
> and second stage) delay the pages to be recycled. Another thing is
> that to facilitate deleting empty pages in the second stage, we need
> to share the information of internal and empty pages between different
> stages of vacuum. It will be quite tricky to share this information
> via DSM which is required for the main parallel vacuum patch. Also,
> it will bring the logic to reclaim deleted pages closer to nbtree
> where we delete empty pages in each pass. Overall, the advantages of
> deleting empty pages in each pass outweigh the advantages of
> postponing the same. This patch is discussed in detail in a separate
> thread [1].
>
> 2. v39-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch:
> Introduce new fields amusemaintenanceworkmem and
> amparallelvacuumoptions in IndexAmRoutine for parallel vacuum. The
> amusemaintenanceworkmem tells whether a particular IndexAM uses
> maintenance_work_mem or not. This will help in controlling the memory
> used by individual workers as otherwise, each worker can consume
> memory equal to maintenance_work_mem. This has been discussed in
> detail in a separate thread as well [2]. The amparallelvacuumoptions
> tell whether a particular IndexAM participates in a parallel vacuum
> and if so in which phase (bulkdelete, vacuumcleanup) of vacuum.
>
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1LGr%2BMN0xHZpJ2dfS8QNQ1a_aROKowZB%2BMPNep8FVtwAA%40mail.gmail.com
> [2] - https://www.postgresql.org/message-id/CAA4eK1LmcD5aPogzwim5Nn58Ki+74a6Edghx4Wd8hAskvHaq5A@mail.gmail.com
>
Hi,
I reviewed v39 patch set. Below are the some minor review comments:1.
+ * memory equal to maitenance_work_mem, the new maitenance_work_mem for
maitenance_work_mem should be replaced by maintenance_work_mem.
2.
+ * The number of workers can vary between and bulkdelete and cleanup
I think, grammatically above sentence is not correct. "and" is extra in above sentence.
3.
+ /*
+ * Open table. The lock mode is the same as the leader process. It's
+ * okay because The lockmode does not conflict among the parallel workers.
+ */
+ * Open table. The lock mode is the same as the leader process. It's
+ * okay because The lockmode does not conflict among the parallel workers.
+ */
I think, "lock mode" and "lockmode", both should be same.(means extra space should be removed from "lock mode"). In "The", "T" should be small case letter.
4.
+ /* We don't support parallel vacuum for autovacuum for now */
I think, above sentence should be like "As of now, we don't support parallel vacuum for autovacuum"
5. I am not sure that I am right but I can see that we are not consistent while ending the single line comments.
I think, if single line comment is started with "upper case letter", then we should not put period(dot) at the end of comment, but if comment started with "lower case letter", then we should put period(dot) at the end of comment.
a)
+ /* parallel vacuum must be active */
I think. we should end above comment with dot or we should make "p" of parallel as upper case letter.
b)
+ /* At least count itself */
I think, above is correct.
If my understanding is correct, then please let me know so that I can make these changes on the top of v39 patch set.
6.
+ bool amusemaintenanceworkmem;
I think, we haven't ran pgindent.
Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com
On Mon, Dec 23, 2019 at 11:02 PM Mahendra Singh <mahi6run@gmail.com> wrote: > > 5. I am not sure that I am right but I can see that we are not consistent while ending the single line comments. > > I think, if single line comment is started with "upper case letter", then we should not put period(dot) at the end of comment,but if comment started with "lower case letter", then we should put period(dot) at the end of comment. > > a) > + /* parallel vacuum must be active */ > I think. we should end above comment with dot or we should make "p" of parallel as upper case letter. > > b) > + /* At least count itself */ > I think, above is correct. > I have checked a few files in this context and I don't see any consistency, so I would suggest keeping the things matching with the nearby code. Do you have any reason for the above conclusion? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, 23 Dec 2019 at 19:41, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Dec 20, 2019 at 12:13 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > I've attached the updated version patch that incorporated the all > > review comments I go so far. > > > > I have further edited the first two patches posted by you. The > changes include (a) changed tests to reset the guc, (b) removing some > stuff which is not required in this version, (c) moving some variables > around to make them in better order, (d) changed comments and few > other cosmetic things and (e) commit messages for first two patches. > > I think the first two patches attached in this email are in good shape > and we can commit those unless you or someone has more comments on > them, the main parallel vacuum patch can still be improved by some > more test/polish/review. I am planning to push the first two patches > next week after another pass. The first two patches are explained in > brief as below: > > 1. v4-0001-Delete-empty-pages-in-each-pass-during-GIST-VACUUM: It > allows us to delete empty pages in each pass during GIST VACUUM. > Earlier, we use to postpone deleting empty pages till the second stage > of vacuum to amortize the cost of scanning internal pages. However, > that can sometimes (say vacuum is canceled or errored between first > and second stage) delay the pages to be recycled. Another thing is > that to facilitate deleting empty pages in the second stage, we need > to share the information of internal and empty pages between different > stages of vacuum. It will be quite tricky to share this information > via DSM which is required for the main parallel vacuum patch. Also, > it will bring the logic to reclaim deleted pages closer to nbtree > where we delete empty pages in each pass. Overall, the advantages of > deleting empty pages in each pass outweigh the advantages of > postponing the same. This patch is discussed in detail in a separate > thread [1]. > > 2. v39-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch: > Introduce new fields amusemaintenanceworkmem and > amparallelvacuumoptions in IndexAmRoutine for parallel vacuum. The > amusemaintenanceworkmem tells whether a particular IndexAM uses > maintenance_work_mem or not. This will help in controlling the memory > used by individual workers as otherwise, each worker can consume > memory equal to maintenance_work_mem. This has been discussed in > detail in a separate thread as well [2]. The amparallelvacuumoptions > tell whether a particular IndexAM participates in a parallel vacuum > and if so in which phase (bulkdelete, vacuumcleanup) of vacuum. > > Thank you for updating the patches! The first patches look good to me. I'm reviewing other patches and will post comments if there is. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Dec 24, 2019 at 12:08 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > > The first patches look good to me. I'm reviewing other patches and > will post comments if there is. > Okay, feel free to address few comments raised by Mahendra along with whatever you find. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, 24 Dec 2019 at 15:44, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Dec 24, 2019 at 12:08 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > The first patches look good to me. I'm reviewing other patches and > > will post comments if there is. > > Oops I meant first "two" patches look good to me. > > Okay, feel free to address few comments raised by Mahendra along with > whatever you find. Thanks! Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 24 Dec 2019 at 15:46, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 24 Dec 2019 at 15:44, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Dec 24, 2019 at 12:08 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > The first patches look good to me. I'm reviewing other patches and > > > will post comments if there is. > > > > > Oops I meant first "two" patches look good to me. > > > > > Okay, feel free to address few comments raised by Mahendra along with > > whatever you find. > > Thanks! > I've attached updated patch set as the previous version patch set conflicts to the current HEAD. This patch set incorporated the review comments, a few fix and the patch for PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION. 0001 patch is the same as previous version. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, 25 Dec 2019 at 17:47, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 24 Dec 2019 at 15:46, Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 24 Dec 2019 at 15:44, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Dec 24, 2019 at 12:08 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > > The first patches look good to me. I'm reviewing other patches and > > > > will post comments if there is. > > > > > > > > Oops I meant first "two" patches look good to me. > > > > > > > > Okay, feel free to address few comments raised by Mahendra along with > > > whatever you find. > > > > Thanks! > > > > I've attached updated patch set as the previous version patch set > conflicts to the current HEAD. This patch set incorporated the review > comments, a few fix and the patch for > PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION. 0001 patch is the same > as previous version. I verified my all review comments in v40 patch set. All are fixed. v40-0002-Add-a-parallel-option-to-the-VACUUM-command.patch doesn't apply on HEAD. Please send re-based patch. Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Hi, On Wed, Dec 25, 2019 at 09:17:16PM +0900, Masahiko Sawada wrote: >On Tue, 24 Dec 2019 at 15:46, Masahiko Sawada ><masahiko.sawada@2ndquadrant.com> wrote: >> >> On Tue, 24 Dec 2019 at 15:44, Amit Kapila <amit.kapila16@gmail.com> wrote: >> > >> > On Tue, Dec 24, 2019 at 12:08 PM Masahiko Sawada >> > <masahiko.sawada@2ndquadrant.com> wrote: >> > > >> > > >> > > The first patches look good to me. I'm reviewing other patches and >> > > will post comments if there is. >> > > >> >> Oops I meant first "two" patches look good to me. >> >> > >> > Okay, feel free to address few comments raised by Mahendra along with >> > whatever you find. >> >> Thanks! >> > >I've attached updated patch set as the previous version patch set >conflicts to the current HEAD. This patch set incorporated the review >comments, a few fix and the patch for >PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION. 0001 patch is the same >as previous version. > I've been reviewing the updated patches over the past couple of days, so let me share some initial review comments. I initially started to read the thread, but then I realized it's futile - the thread is massive, and the patch changed so much re-reading the whole thread is a waste of time. It might be useful write a summary of the current design, but AFAICS the original plan to parallelize the heap scan is abandoned and we now do just the steps that vacuum indexes in parallel. Which is fine, but it means the subject "block level parallel vacuum" is a bit misleading. Anyway, most of the logic is implemented in part 0002, which actually does all the parallel worker stuff. The remaining parts 0001, 0003 and 0004 are either preparing infrastructure or not directlyrelated to the primary feature. v40-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch ----------------------------------------------------------- I wonder if 'amparallelvacuumoptions' is unnecessarily specific. Maybe it should be called just 'amvacuumoptions' or something like that? The 'parallel' part is actually encoded in names of the options. Also, why do we need a separate amusemaintenanceworkmem option? Why don't we simply track it using a separate flag in 'amvacuumoptions' (or whatever we end up calling it)? Would it make sense to track m_w_m usage separately for the two index cleanup phases? Or is that unnecessary / pointless? v40-0002-Add-a-parallel-option-to-the-VACUUM-command.patch ---------------------------------------------------------- I haven't found any issues yet, but I've only started with the code review. I'll continue with the review. It seems in a fairly good shape though, I think, I only have two minor comments at the moment: - The SizeOfLVDeadTuples macro seems rather weird. It does include space for one ItemPointerData, but we really need an array of them. But then all the places where the macro is used explicitly add space for the pointers, so the sizeof(ItemPointerData) seems unnecessary. So it should be either #define SizeOfLVDeadTuples (offsetof(LVDeadTuples, itemptrs)) or #define SizeOfLVDeadTuples(cnt) \ (offsetof(LVDeadTuples, itemptrs) + (cnt) * sizeof(ItemPointerData)) in which case the callers can be simplified. - It's not quite clear to me why we need the new nworkers_to_launch field in ParallelContext. v40-0003-Add-FAST-option-to-vacuum-command.patch ------------------------------------------------ I do have a bit of an issue with this part - I'm not quite convinved we actually need a FAST option, and I actually suspect we'll come to regret it sooner than later. AFAIK it pretty much does exactly the same thing as setting vacuum_cost_delay to 0, and IMO it's confusing to provide multiple ways to do the same thing - I do expect reports from confused users on pgsql-bugs etc. Why is setting vacuum_cost_delay directly not a sufficient solution? The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do we need a separate VACUUM option, instead of just using the existing max_parallel_maintenance_workers GUC? It's good enough for CREATE INDEX so why not here? Maybe it's explained somewhere deep in the thread, of course ... v40-0004-Add-ability-to-disable-leader-participation-in-p.patch --------------------------------------------------------------- IMHO this should be simply merged into 0002. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 27 Dec 2019 at 11:24, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > Hi, > > On Wed, Dec 25, 2019 at 09:17:16PM +0900, Masahiko Sawada wrote: > >On Tue, 24 Dec 2019 at 15:46, Masahiko Sawada > ><masahiko.sawada@2ndquadrant.com> wrote: > >> > >> On Tue, 24 Dec 2019 at 15:44, Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > > >> > On Tue, Dec 24, 2019 at 12:08 PM Masahiko Sawada > >> > <masahiko.sawada@2ndquadrant.com> wrote: > >> > > > >> > > > >> > > The first patches look good to me. I'm reviewing other patches and > >> > > will post comments if there is. > >> > > > >> > >> Oops I meant first "two" patches look good to me. > >> > >> > > >> > Okay, feel free to address few comments raised by Mahendra along with > >> > whatever you find. > >> > >> Thanks! > >> > > > >I've attached updated patch set as the previous version patch set > >conflicts to the current HEAD. This patch set incorporated the review > >comments, a few fix and the patch for > >PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION. 0001 patch is the same > >as previous version. > > > > I've been reviewing the updated patches over the past couple of days, so > let me share some initial review comments. I initially started to read > the thread, but then I realized it's futile - the thread is massive, and > the patch changed so much re-reading the whole thread is a waste of time. Thank you for reviewing this patch! > > It might be useful write a summary of the current design, but AFAICS the > original plan to parallelize the heap scan is abandoned and we now do > just the steps that vacuum indexes in parallel. Which is fine, but it > means the subject "block level parallel vacuum" is a bit misleading. > Yeah I should have renamed it. I'll summarize the current design. > Anyway, most of the logic is implemented in part 0002, which actually > does all the parallel worker stuff. The remaining parts 0001, 0003 and > 0004 are either preparing infrastructure or not directlyrelated to the > primary feature. > > > v40-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch > ----------------------------------------------------------- > > I wonder if 'amparallelvacuumoptions' is unnecessarily specific. Maybe > it should be called just 'amvacuumoptions' or something like that? The > 'parallel' part is actually encoded in names of the options. > amvacuumoptions seems good to me. > Also, why do we need a separate amusemaintenanceworkmem option? Why > don't we simply track it using a separate flag in 'amvacuumoptions' > (or whatever we end up calling it)? > It also seems like a good idea. > Would it make sense to track m_w_m usage separately for the two index > cleanup phases? Or is that unnecessary / pointless? We could do that but currently index AM uses this option is only gin indexes. And gin indexes could use maintenance_work_mem both during bulkdelete and cleanup. So it might be unnecessary at least as of now. > > > v40-0002-Add-a-parallel-option-to-the-VACUUM-command.patch > ---------------------------------------------------------- > > I haven't found any issues yet, but I've only started with the code > review. I'll continue with the review. It seems in a fairly good shape > though, I think, I only have two minor comments at the moment: > > - The SizeOfLVDeadTuples macro seems rather weird. It does include space > for one ItemPointerData, but we really need an array of them. But then > all the places where the macro is used explicitly add space for the > pointers, so the sizeof(ItemPointerData) seems unnecessary. So it > should be either > > #define SizeOfLVDeadTuples (offsetof(LVDeadTuples, itemptrs)) > > or > > #define SizeOfLVDeadTuples(cnt) \ > (offsetof(LVDeadTuples, itemptrs) + (cnt) * sizeof(ItemPointerData)) > > in which case the callers can be simplified. Fixed it to the former. > > - It's not quite clear to me why we need the new nworkers_to_launch > field in ParallelContext. The motivation of nworkers_to_launch is to specify the number of workers to actually launch when we use the same parallel context several times while changing the number of workers to launch. Since index AM can choose the participation of bulkdelete and/or cleanup, the number of workers required for each vacuum phrases can be different. I originally changed LaunchParallelWorkers to have the number of workers to launch so that it launches different number of workers for each vacuum phases but Robert suggested to change the routine of reinitializing parallel context[1]. It would be less confusing and would involve modify code in a lot fewer places. So with this patch we specify the number of workers during initializing the parallel context as a maximum number of workers. And using ReinitializeParallelWorkers before doing either bulkdelete or cleanup we specify the number of workers to launch. > > > v40-0003-Add-FAST-option-to-vacuum-command.patch > ------------------------------------------------ > > I do have a bit of an issue with this part - I'm not quite convinved we > actually need a FAST option, and I actually suspect we'll come to regret > it sooner than later. AFAIK it pretty much does exactly the same thing > as setting vacuum_cost_delay to 0, and IMO it's confusing to provide > multiple ways to do the same thing - I do expect reports from confused > users on pgsql-bugs etc. Why is setting vacuum_cost_delay directly not a > sufficient solution? I think the motivation of this option is similar to FREEZE. I think it's sometimes a good idea to have a shortcut of popular usage and make it have an name corresponding to its job. From that perspective I think having FAST option would make sense but maybe we need more discussion the combination parallel vacuum and vacuum delay. > > The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do > we need a separate VACUUM option, instead of just using the existing > max_parallel_maintenance_workers GUC? It's good enough for CREATE INDEX > so why not here? AFAIR There was no such discussion so far but I think one reason could be that parallel vacuum should be disabled by default. If the parallel vacuum uses max_parallel_maintenance_workers (2 by default) rather than having the option the parallel vacuum would work with default setting but I think that it would become a big impact for user because the disk access could become random reads and writes when some indexes are on the same tablespace. > > Maybe it's explained somewhere deep in the thread, of course ... > > > v40-0004-Add-ability-to-disable-leader-participation-in-p.patch > --------------------------------------------------------------- > > IMHO this should be simply merged into 0002. We discussed it's still unclear whether we really want to commit this code and therefore it's separated from the main part. Please see more details here[2]. I've fixed code based on the review comments and rebased to the current HEAD. Some comments around vacuum option name and FAST option are still left as we would need more discussion. Regards, [1] https://www.postgresql.org/message-id/CA%2BTgmobjtHdLfQhmzqBNt7VEsz%2B5w3P0yy0-EsoT05yAJViBSQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAA4eK1%2BC8OBhm4g3Mnfx%2BVjGfZ4ckLOLSU9i7Smo1sp4k0V5HA%40mail.gmail.com -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Sun, Dec 29, 2019 at 10:06:23PM +0900, Masahiko Sawada wrote: >On Fri, 27 Dec 2019 at 11:24, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> >> Hi, >> >> On Wed, Dec 25, 2019 at 09:17:16PM +0900, Masahiko Sawada wrote: >> >On Tue, 24 Dec 2019 at 15:46, Masahiko Sawada >> ><masahiko.sawada@2ndquadrant.com> wrote: >> >> >> >> On Tue, 24 Dec 2019 at 15:44, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> > >> >> > On Tue, Dec 24, 2019 at 12:08 PM Masahiko Sawada >> >> > <masahiko.sawada@2ndquadrant.com> wrote: >> >> > > >> >> > > >> >> > > The first patches look good to me. I'm reviewing other patches and >> >> > > will post comments if there is. >> >> > > >> >> >> >> Oops I meant first "two" patches look good to me. >> >> >> >> > >> >> > Okay, feel free to address few comments raised by Mahendra along with >> >> > whatever you find. >> >> >> >> Thanks! >> >> >> > >> >I've attached updated patch set as the previous version patch set >> >conflicts to the current HEAD. This patch set incorporated the review >> >comments, a few fix and the patch for >> >PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION. 0001 patch is the same >> >as previous version. >> > >> >> I've been reviewing the updated patches over the past couple of days, so >> let me share some initial review comments. I initially started to read >> the thread, but then I realized it's futile - the thread is massive, and >> the patch changed so much re-reading the whole thread is a waste of time. > >Thank you for reviewing this patch! > >> >> It might be useful write a summary of the current design, but AFAICS the >> original plan to parallelize the heap scan is abandoned and we now do >> just the steps that vacuum indexes in parallel. Which is fine, but it >> means the subject "block level parallel vacuum" is a bit misleading. >> > >Yeah I should have renamed it. I'll summarize the current design. > OK >> Anyway, most of the logic is implemented in part 0002, which actually >> does all the parallel worker stuff. The remaining parts 0001, 0003 and >> 0004 are either preparing infrastructure or not directlyrelated to the >> primary feature. >> >> >> v40-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch >> ----------------------------------------------------------- >> >> I wonder if 'amparallelvacuumoptions' is unnecessarily specific. Maybe >> it should be called just 'amvacuumoptions' or something like that? The >> 'parallel' part is actually encoded in names of the options. >> > >amvacuumoptions seems good to me. > >> Also, why do we need a separate amusemaintenanceworkmem option? Why >> don't we simply track it using a separate flag in 'amvacuumoptions' >> (or whatever we end up calling it)? >> > >It also seems like a good idea. > I think there's another question we need to ask - why to we introduce a bitmask, instead of using regular boolean struct members? Until now, the IndexAmRoutine struct had simple boolean members describing capabilities of the AM implementation. Why shouldn't this patch do the same thing, i.e. add one boolean flag for each AM feature? >> Would it make sense to track m_w_m usage separately for the two index >> cleanup phases? Or is that unnecessary / pointless? > >We could do that but currently index AM uses this option is only gin >indexes. And gin indexes could use maintenance_work_mem both during >bulkdelete and cleanup. So it might be unnecessary at least as of now. > OK >> >> >> v40-0002-Add-a-parallel-option-to-the-VACUUM-command.patch >> ---------------------------------------------------------- >> >> I haven't found any issues yet, but I've only started with the code >> review. I'll continue with the review. It seems in a fairly good shape >> though, I think, I only have two minor comments at the moment: >> >> - The SizeOfLVDeadTuples macro seems rather weird. It does include space >> for one ItemPointerData, but we really need an array of them. But then >> all the places where the macro is used explicitly add space for the >> pointers, so the sizeof(ItemPointerData) seems unnecessary. So it >> should be either >> >> #define SizeOfLVDeadTuples (offsetof(LVDeadTuples, itemptrs)) >> >> or >> >> #define SizeOfLVDeadTuples(cnt) \ >> (offsetof(LVDeadTuples, itemptrs) + (cnt) * sizeof(ItemPointerData)) >> >> in which case the callers can be simplified. > >Fixed it to the former. > Hmmm, I'd actually suggest to use the latter variant, because it allows simplifying the callers. Just translating it to offsetof() is not saving much code, I think. >> >> - It's not quite clear to me why we need the new nworkers_to_launch >> field in ParallelContext. > >The motivation of nworkers_to_launch is to specify the number of >workers to actually launch when we use the same parallel context >several times while changing the number of workers to launch. Since >index AM can choose the participation of bulkdelete and/or cleanup, >the number of workers required for each vacuum phrases can be >different. I originally changed LaunchParallelWorkers to have the >number of workers to launch so that it launches different number of >workers for each vacuum phases but Robert suggested to change the >routine of reinitializing parallel context[1]. It would be less >confusing and would involve modify code in a lot fewer places. So with >this patch we specify the number of workers during initializing the >parallel context as a maximum number of workers. And using >ReinitializeParallelWorkers before doing either bulkdelete or cleanup >we specify the number of workers to launch. > Hmmm. I find it a bit confusing, but I don't know a better solution. >> >> >> v40-0003-Add-FAST-option-to-vacuum-command.patch >> ------------------------------------------------ >> >> I do have a bit of an issue with this part - I'm not quite convinved we >> actually need a FAST option, and I actually suspect we'll come to regret >> it sooner than later. AFAIK it pretty much does exactly the same thing >> as setting vacuum_cost_delay to 0, and IMO it's confusing to provide >> multiple ways to do the same thing - I do expect reports from confused >> users on pgsql-bugs etc. Why is setting vacuum_cost_delay directly not a >> sufficient solution? > >I think the motivation of this option is similar to FREEZE. I think >it's sometimes a good idea to have a shortcut of popular usage and >make it have an name corresponding to its job. From that perspective I >think having FAST option would make sense but maybe we need more >discussion the combination parallel vacuum and vacuum delay. > OK. I think it's mostly independent piece, so maybe we should move it to a separate patch. It's more likely to get attention/feedback when not buried in this thread. >> >> The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do >> we need a separate VACUUM option, instead of just using the existing >> max_parallel_maintenance_workers GUC? It's good enough for CREATE INDEX >> so why not here? > >AFAIR There was no such discussion so far but I think one reason could >be that parallel vacuum should be disabled by default. If the parallel >vacuum uses max_parallel_maintenance_workers (2 by default) rather >than having the option the parallel vacuum would work with default >setting but I think that it would become a big impact for user because >the disk access could become random reads and writes when some indexes >are on the same tablespace. > I'm not quite convinced VACUUM should have parallelism disabled by default. I know some people argued we should do that because making vacuum faster may put pressure on other parts of the system. Which is true, but I don't think the solution is to make vacuum slower by default. IMHO we should do the opposite - have it parallel by default (as driven by max_parallel_maintenance_workers), and have an option to disable parallelism. It's pretty much the same thing we did with vacuum throttling - it's disabled for explicit vacuum by default, but you can enable it. If you're worried about VACUUM causing issues, you should cost delay. The way it's done now we pretty much don't handle either case without having to tweak something: - If you really want to go as fast as possible (e.g. during maintenance window) you have to say "PARALLEL". - If you need to restrict VACUUM activity, you have to et cost_delay because just not using parallelism seems unreliable. Of course, the question is what to do about autovacuum - I agree it may make sense to have parallelism disabled in this case (just like we already have throttling enabled by default for autovacuum). >> >> Maybe it's explained somewhere deep in the thread, of course ... >> >> >> v40-0004-Add-ability-to-disable-leader-participation-in-p.patch >> --------------------------------------------------------------- >> >> IMHO this should be simply merged into 0002. > >We discussed it's still unclear whether we really want to commit this >code and therefore it's separated from the main part. Please see more >details here[2]. > IMO there's not much reason for the leader not to participate. For regular queries the leader may be doing useful stuff (essentially running the non-parallel part of the query) but AFAIK for VAUCUM that's not the case and the worker is not doing anything. >I've fixed code based on the review comments and rebased to the >current HEAD. Some comments around vacuum option name and FAST option >are still left as we would need more discussion. > Thanks, I'll take a look. regards >-- >Masahiko Sawada http://www.2ndQuadrant.com/ >PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Sun, Dec 29, 2019 at 10:06:23PM +0900, Masahiko Sawada wrote: > >> v40-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch > >> ----------------------------------------------------------- > >> > >> I wonder if 'amparallelvacuumoptions' is unnecessarily specific. Maybe > >> it should be called just 'amvacuumoptions' or something like that? The > >> 'parallel' part is actually encoded in names of the options. > >> > > > >amvacuumoptions seems good to me. > > > >> Also, why do we need a separate amusemaintenanceworkmem option? Why > >> don't we simply track it using a separate flag in 'amvacuumoptions' > >> (or whatever we end up calling it)? > >> > > > >It also seems like a good idea. > > > > I think there's another question we need to ask - why to we introduce a > bitmask, instead of using regular boolean struct members? Until now, the > IndexAmRoutine struct had simple boolean members describing capabilities > of the AM implementation. Why shouldn't this patch do the same thing, > i.e. add one boolean flag for each AM feature? > This structure member describes mostly one property of index which is about a parallel vacuum which I am not sure is true for other members. Now, we can use separate bool variables for it which we were initially using in the patch but that seems to be taking more space in a structure without any advantage. Also, using one variable makes a code bit better because otherwise, in many places we need to check and set four variables instead of one. This is also the reason we used parallel in its name (we also use *parallel* for parallel index scan related things). Having said that, we can remove parallel from its name if we want to extend/use it for something other than a parallel vacuum. I think we might need to add a flag or two for parallelizing heap scan of vacuum when we enhance this feature, so keeping it for just a parallel vacuum is not completely insane. I think keeping amusemaintenanceworkmem separate from this variable seems to me like a better idea as it doesn't describe whether IndexAM can participate in a parallel vacuum or not. You can see more discussion about that variable in the thread [1]. > >> > >> > >> v40-0004-Add-ability-to-disable-leader-participation-in-p.patch > >> --------------------------------------------------------------- > >> > >> IMHO this should be simply merged into 0002. > > > >We discussed it's still unclear whether we really want to commit this > >code and therefore it's separated from the main part. Please see more > >details here[2]. > > > > IMO there's not much reason for the leader not to participate. > The only reason for this is just a debugging/testing aid because during the development of other parallel features we required such a knob. The other way could be to have something similar to force_parallel_mode and there is some discussion about that as well on this thread but we haven't concluded which is better. So, we decided to keep it as a separate patch which we can use to test this feature during development and decide later whether we really need to commit it. BTW, we have found few bugs by using this knob in the patch. [1] - https://www.postgresql.org/message-id/CAA4eK1LmcD5aPogzwim5Nn58Ki+74a6Edghx4Wd8hAskvHaq5A@mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Sun, Dec 29, 2019 at 10:06:23PM +0900, Masahiko Sawada wrote: > >> > >> v40-0003-Add-FAST-option-to-vacuum-command.patch > >> ------------------------------------------------ > >> > >> I do have a bit of an issue with this part - I'm not quite convinved we > >> actually need a FAST option, and I actually suspect we'll come to regret > >> it sooner than later. AFAIK it pretty much does exactly the same thing > >> as setting vacuum_cost_delay to 0, and IMO it's confusing to provide > >> multiple ways to do the same thing - I do expect reports from confused > >> users on pgsql-bugs etc. Why is setting vacuum_cost_delay directly not a > >> sufficient solution? > > > >I think the motivation of this option is similar to FREEZE. I think > >it's sometimes a good idea to have a shortcut of popular usage and > >make it have an name corresponding to its job. From that perspective I > >think having FAST option would make sense but maybe we need more > >discussion the combination parallel vacuum and vacuum delay. > > > > OK. I think it's mostly independent piece, so maybe we should move it to > a separate patch. It's more likely to get attention/feedback when not > buried in this thread. > +1. It is already a separate patch and I think we can even discuss more on it in a new thread once the main patch is committed or do you think we should have a conclusion about it now itself? To me, this option appears to be an extension to the main feature which can be useful for some users and people might like to have a separate option, so we can discuss it and get broader feedback after the main patch is committed. > >> > >> The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do > >> we need a separate VACUUM option, instead of just using the existing > >> max_parallel_maintenance_workers GUC? > >> How will user specify parallel degree? The parallel degree is helpful because in some cases users can decide how many workers should be launched based on size and type of indexes. > >> It's good enough for CREATE INDEX > >> so why not here? > > That is a different feature and I think here users can make a better judgment based on the size of indexes. Moreover, users have an option to control a parallel degree for 'Create Index' via Alter Table <tbl_name> Set (parallel_workers = <n>) which I am not sure is a good idea for parallel vacuum as the parallelism is more derived from size and type of indexes. Now, we can think of a similar parameter at the table/index level for parallel vacuum, but I don't see it equally useful in this case. > >AFAIR There was no such discussion so far but I think one reason could > >be that parallel vacuum should be disabled by default. If the parallel > >vacuum uses max_parallel_maintenance_workers (2 by default) rather > >than having the option the parallel vacuum would work with default > >setting but I think that it would become a big impact for user because > >the disk access could become random reads and writes when some indexes > >are on the same tablespace. > > > > I'm not quite convinced VACUUM should have parallelism disabled by > default. I know some people argued we should do that because making > vacuum faster may put pressure on other parts of the system. Which is > true, but I don't think the solution is to make vacuum slower by > default. IMHO we should do the opposite - have it parallel by default > (as driven by max_parallel_maintenance_workers), and have an option > to disable parallelism. > I think driving parallelism for vacuum by max_parallel_maintenance_workers might not be sufficient. We need to give finer control as it depends a lot on the size of indexes. Also, unlike Create Index, Vacuum can be performed on an entire database and it is quite possible that some tables/indexes are relatively smaller and forcing parallelism on them by default might slow down the operation. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Dec 30, 2019 at 10:40:39AM +0530, Amit Kapila wrote: >On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra ><tomas.vondra@2ndquadrant.com> wrote: >> >> On Sun, Dec 29, 2019 at 10:06:23PM +0900, Masahiko Sawada wrote: >> >> >> >> v40-0003-Add-FAST-option-to-vacuum-command.patch >> >> ------------------------------------------------ >> >> >> >> I do have a bit of an issue with this part - I'm not quite convinved we >> >> actually need a FAST option, and I actually suspect we'll come to regret >> >> it sooner than later. AFAIK it pretty much does exactly the same thing >> >> as setting vacuum_cost_delay to 0, and IMO it's confusing to provide >> >> multiple ways to do the same thing - I do expect reports from confused >> >> users on pgsql-bugs etc. Why is setting vacuum_cost_delay directly not a >> >> sufficient solution? >> > >> >I think the motivation of this option is similar to FREEZE. I think >> >it's sometimes a good idea to have a shortcut of popular usage and >> >make it have an name corresponding to its job. From that perspective I >> >think having FAST option would make sense but maybe we need more >> >discussion the combination parallel vacuum and vacuum delay. >> > >> >> OK. I think it's mostly independent piece, so maybe we should move it to >> a separate patch. It's more likely to get attention/feedback when not >> buried in this thread. >> > >+1. It is already a separate patch and I think we can even discuss >more on it in a new thread once the main patch is committed or do you >think we should have a conclusion about it now itself? To me, this >option appears to be an extension to the main feature which can be >useful for some users and people might like to have a separate option, >so we can discuss it and get broader feedback after the main patch is >committed. > I don't think it's an extension of the main feature - it does not depend on it, it could be committed before or after the parallel vacuum (with some conflicts, but the feature itself is not affected). My point was that by moving it into a separate thread we're more likely to get feedback on it, e.g. from people who don't feel like reviewing the parallel vacuum feature and/or feel intimidated by t100+ messages in this thread. >> >> >> >> The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do >> >> we need a separate VACUUM option, instead of just using the existing >> >> max_parallel_maintenance_workers GUC? >> >> > >How will user specify parallel degree? The parallel degree is helpful >because in some cases users can decide how many workers should be >launched based on size and type of indexes. > By setting max_maintenance_parallel_workers. >> >> It's good enough for CREATE INDEX >> >> so why not here? >> > > >That is a different feature and I think here users can make a better >judgment based on the size of indexes. Moreover, users have an option >to control a parallel degree for 'Create Index' via Alter Table ><tbl_name> Set (parallel_workers = <n>) which I am not sure is a good >idea for parallel vacuum as the parallelism is more derived from size >and type of indexes. Now, we can think of a similar parameter at the >table/index level for parallel vacuum, but I don't see it equally >useful in this case. > I'm a bit skeptical about users being able to pick good parallel degree. If we (i.e. experienced developers/hackers with quite a bit of knowledge) can't come up with a reasonable heuristics, how likely is it that a regular user will come up with something better? Not sure I understand why "parallel_workers" would not be suitable for parallel vacuum? I mean, even for CREATE INDEX it certainly matters the size/type of indexes, no? I may be wrong in both cases, of course. >> >AFAIR There was no such discussion so far but I think one reason could >> >be that parallel vacuum should be disabled by default. If the parallel >> >vacuum uses max_parallel_maintenance_workers (2 by default) rather >> >than having the option the parallel vacuum would work with default >> >setting but I think that it would become a big impact for user because >> >the disk access could become random reads and writes when some indexes >> >are on the same tablespace. >> > >> >> I'm not quite convinced VACUUM should have parallelism disabled by >> default. I know some people argued we should do that because making >> vacuum faster may put pressure on other parts of the system. Which is >> true, but I don't think the solution is to make vacuum slower by >> default. IMHO we should do the opposite - have it parallel by default >> (as driven by max_parallel_maintenance_workers), and have an option >> to disable parallelism. >> > >I think driving parallelism for vacuum by >max_parallel_maintenance_workers might not be sufficient. We need to >give finer control as it depends a lot on the size of indexes. Also, >unlike Create Index, Vacuum can be performed on an entire database and >it is quite possible that some tables/indexes are relatively smaller >and forcing parallelism on them by default might slow down the >operation. > Why wouldn't it be sufficient? Why couldn't this use similar logic to what we have in plan_create_index_workers for CREATE INDEX? Sure, it may be useful to give power users a way to override the default logic, but I very much doubt users can make reliable judgments about parallelism. Also, it's not like the risks are comparable in those two cases. If you have very large table with a lot of indexes, the gains with parallel vacuum are pretty much bound to be significant, possibly 10x or more. OTOH if the table is small, parallelism may not give you much and it may even be less efficient, but I doubt it's going to be 10x slower. And considering min_parallel_index_scan_size already protects us against this, at least partially. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Dec 30, 2019 at 08:25:28AM +0530, Amit Kapila wrote: >On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra ><tomas.vondra@2ndquadrant.com> wrote: >> >> On Sun, Dec 29, 2019 at 10:06:23PM +0900, Masahiko Sawada wrote: >> >> v40-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch >> >> ----------------------------------------------------------- >> >> >> >> I wonder if 'amparallelvacuumoptions' is unnecessarily specific. Maybe >> >> it should be called just 'amvacuumoptions' or something like that? The >> >> 'parallel' part is actually encoded in names of the options. >> >> >> > >> >amvacuumoptions seems good to me. >> > >> >> Also, why do we need a separate amusemaintenanceworkmem option? Why >> >> don't we simply track it using a separate flag in 'amvacuumoptions' >> >> (or whatever we end up calling it)? >> >> >> > >> >It also seems like a good idea. >> > >> >> I think there's another question we need to ask - why to we introduce a >> bitmask, instead of using regular boolean struct members? Until now, the >> IndexAmRoutine struct had simple boolean members describing capabilities >> of the AM implementation. Why shouldn't this patch do the same thing, >> i.e. add one boolean flag for each AM feature? >> > >This structure member describes mostly one property of index which is >about a parallel vacuum which I am not sure is true for other members. >Now, we can use separate bool variables for it which we were initially >using in the patch but that seems to be taking more space in a >structure without any advantage. Also, using one variable makes a >code bit better because otherwise, in many places we need to check and >set four variables instead of one. This is also the reason we used >parallel in its name (we also use *parallel* for parallel index scan >related things). Having said that, we can remove parallel from its >name if we want to extend/use it for something other than a parallel >vacuum. I think we might need to add a flag or two for parallelizing >heap scan of vacuum when we enhance this feature, so keeping it for >just a parallel vacuum is not completely insane. > >I think keeping amusemaintenanceworkmem separate from this variable >seems to me like a better idea as it doesn't describe whether IndexAM >can participate in a parallel vacuum or not. You can see more >discussion about that variable in the thread [1]. > I don't know, but IMHO it's somewhat easier to work with separate flags. Bitmasks make sense when space usage matters a lot, e.g. for on-disk representation, but that doesn't seem to be the case here I think (if it was, we'd probably use bitmasks already). It seems like we're mixing two ways to design the struct unnecessarily, but I'm not going to nag about this any further. >> >> >> >> >> >> v40-0004-Add-ability-to-disable-leader-participation-in-p.patch >> >> --------------------------------------------------------------- >> >> >> >> IMHO this should be simply merged into 0002. >> > >> >We discussed it's still unclear whether we really want to commit this >> >code and therefore it's separated from the main part. Please see more >> >details here[2]. >> > >> >> IMO there's not much reason for the leader not to participate. >> > >The only reason for this is just a debugging/testing aid because >during the development of other parallel features we required such a >knob. The other way could be to have something similar to >force_parallel_mode and there is some discussion about that as well on >this thread but we haven't concluded which is better. So, we decided >to keep it as a separate patch which we can use to test this feature >during development and decide later whether we really need to commit >it. BTW, we have found few bugs by using this knob in the patch. > OK, understood. Then why not just use force_parallel_mode? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Dec 30, 2019 at 6:46 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Mon, Dec 30, 2019 at 08:25:28AM +0530, Amit Kapila wrote: > >On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra > ><tomas.vondra@2ndquadrant.com> wrote: > >> I think there's another question we need to ask - why to we introduce a > >> bitmask, instead of using regular boolean struct members? Until now, the > >> IndexAmRoutine struct had simple boolean members describing capabilities > >> of the AM implementation. Why shouldn't this patch do the same thing, > >> i.e. add one boolean flag for each AM feature? > >> > > > >This structure member describes mostly one property of index which is > >about a parallel vacuum which I am not sure is true for other members. > >Now, we can use separate bool variables for it which we were initially > >using in the patch but that seems to be taking more space in a > >structure without any advantage. Also, using one variable makes a > >code bit better because otherwise, in many places we need to check and > >set four variables instead of one. This is also the reason we used > >parallel in its name (we also use *parallel* for parallel index scan > >related things). Having said that, we can remove parallel from its > >name if we want to extend/use it for something other than a parallel > >vacuum. I think we might need to add a flag or two for parallelizing > >heap scan of vacuum when we enhance this feature, so keeping it for > >just a parallel vacuum is not completely insane. > > > >I think keeping amusemaintenanceworkmem separate from this variable > >seems to me like a better idea as it doesn't describe whether IndexAM > >can participate in a parallel vacuum or not. You can see more > >discussion about that variable in the thread [1]. > > > > I don't know, but IMHO it's somewhat easier to work with separate flags. > Bitmasks make sense when space usage matters a lot, e.g. for on-disk > representation, but that doesn't seem to be the case here I think (if it > was, we'd probably use bitmasks already). > > It seems like we're mixing two ways to design the struct unnecessarily, > but I'm not going to nag about this any further. > Fair enough. I see your point and as mentioned earlier that we started with the approach of separate booleans, but later found that this is a better way as it was easier to set and check the different parallel options for a parallel vacuum. I think we can go back to the individual booleans if we want but I am not sure if that is a better approach for this usage. Sawada-San, others, do you have any opinion here? > >> >> > >> >> > >> >> v40-0004-Add-ability-to-disable-leader-participation-in-p.patch > >> >> --------------------------------------------------------------- > >> >> > >> >> IMHO this should be simply merged into 0002. > >> > > >> >We discussed it's still unclear whether we really want to commit this > >> >code and therefore it's separated from the main part. Please see more > >> >details here[2]. > >> > > >> > >> IMO there's not much reason for the leader not to participate. > >> > > > >The only reason for this is just a debugging/testing aid because > >during the development of other parallel features we required such a > >knob. The other way could be to have something similar to > >force_parallel_mode and there is some discussion about that as well on > >this thread but we haven't concluded which is better. So, we decided > >to keep it as a separate patch which we can use to test this feature > >during development and decide later whether we really need to commit > >it. BTW, we have found few bugs by using this knob in the patch. > > > > OK, understood. Then why not just use force_parallel_mode? > Because we are not sure what should be its behavior under different modes especially what should we do when user set force_parallel_mode = on. We can even consider introducing new guc specific for this, but as of now, I am not convinced that is required. See some more discussion around this parameter in emails [1][2]. I think we can decide on this later (probably once the main patch is committed) as we already have one way to test the patch. [1] - https://www.postgresql.org/message-id/CAFiTN-sUuLASVXm2qOjufVH3tBZHPLdujMJ0RHr47Tnctjk9YA%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CA%2Bfd4k6VgA_DG%3D8%3Dui7UvHhqx9VbQ-%2B72X%3D_GdTzh%3DJ_xN%2BVEg%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Dec 30, 2019 at 6:37 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Mon, Dec 30, 2019 at 10:40:39AM +0530, Amit Kapila wrote: > >On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra > ><tomas.vondra@2ndquadrant.com> wrote: > >> > > > >+1. It is already a separate patch and I think we can even discuss > >more on it in a new thread once the main patch is committed or do you > >think we should have a conclusion about it now itself? To me, this > >option appears to be an extension to the main feature which can be > >useful for some users and people might like to have a separate option, > >so we can discuss it and get broader feedback after the main patch is > >committed. > > > > I don't think it's an extension of the main feature - it does not depend > on it, it could be committed before or after the parallel vacuum (with > some conflicts, but the feature itself is not affected). > > My point was that by moving it into a separate thread we're more likely > to get feedback on it, e.g. from people who don't feel like reviewing > the parallel vacuum feature and/or feel intimidated by t100+ messages in > this thread. > I agree with this point. > >> >> > >> >> The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do > >> >> we need a separate VACUUM option, instead of just using the existing > >> >> max_parallel_maintenance_workers GUC? > >> >> > > > >How will user specify parallel degree? The parallel degree is helpful > >because in some cases users can decide how many workers should be > >launched based on size and type of indexes. > > > > By setting max_maintenance_parallel_workers. > > >> >> It's good enough for CREATE INDEX > >> >> so why not here? > >> > > > > >That is a different feature and I think here users can make a better > >judgment based on the size of indexes. Moreover, users have an option > >to control a parallel degree for 'Create Index' via Alter Table > ><tbl_name> Set (parallel_workers = <n>) which I am not sure is a good > >idea for parallel vacuum as the parallelism is more derived from size > >and type of indexes. Now, we can think of a similar parameter at the > >table/index level for parallel vacuum, but I don't see it equally > >useful in this case. > > > > I'm a bit skeptical about users being able to pick good parallel degree. > If we (i.e. experienced developers/hackers with quite a bit of > knowledge) can't come up with a reasonable heuristics, how likely is it > that a regular user will come up with something better? > In this case, it is highly dependent on the number of indexes (as for each index, we can spawn one worker). So, it is a bit easier for the users to specify it. Now, we can internally also identify the same and we do that in case the user doesn't specify it, however, that can easily lead to more resource (CPU, I/O) usage than the user would like to do for a particular vacuum. So, giving an option to the user sounds quite reasonable to me. Anyway, in case user doesn't specify the parallel_degree, we are going to select one internally. > Not sure I understand why "parallel_workers" would not be suitable for > parallel vacuum? I mean, even for CREATE INDEX it certainly matters the > size/type of indexes, no? > The difference here is that in parallel vacuum each worker can scan a separate index whereas parallel_workers is more of an option for scanning heap in parallel. So, if the size of the heap is bigger, then increasing that value helps whereas here if there are more number of indexes on the table, increasing corresponding value for parallel vacuum can help. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, 31 Dec 2019 at 12:39, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Dec 30, 2019 at 6:46 PM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: > > > > On Mon, Dec 30, 2019 at 08:25:28AM +0530, Amit Kapila wrote: > > >On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra > > ><tomas.vondra@2ndquadrant.com> wrote: > > >> I think there's another question we need to ask - why to we introduce a > > >> bitmask, instead of using regular boolean struct members? Until now, the > > >> IndexAmRoutine struct had simple boolean members describing capabilities > > >> of the AM implementation. Why shouldn't this patch do the same thing, > > >> i.e. add one boolean flag for each AM feature? > > >> > > > > > >This structure member describes mostly one property of index which is > > >about a parallel vacuum which I am not sure is true for other members. > > >Now, we can use separate bool variables for it which we were initially > > >using in the patch but that seems to be taking more space in a > > >structure without any advantage. Also, using one variable makes a > > >code bit better because otherwise, in many places we need to check and > > >set four variables instead of one. This is also the reason we used > > >parallel in its name (we also use *parallel* for parallel index scan > > >related things). Having said that, we can remove parallel from its > > >name if we want to extend/use it for something other than a parallel > > >vacuum. I think we might need to add a flag or two for parallelizing > > >heap scan of vacuum when we enhance this feature, so keeping it for > > >just a parallel vacuum is not completely insane. > > > > > >I think keeping amusemaintenanceworkmem separate from this variable > > >seems to me like a better idea as it doesn't describe whether IndexAM > > >can participate in a parallel vacuum or not. You can see more > > >discussion about that variable in the thread [1]. > > > > > > > I don't know, but IMHO it's somewhat easier to work with separate flags. > > Bitmasks make sense when space usage matters a lot, e.g. for on-disk > > representation, but that doesn't seem to be the case here I think (if it > > was, we'd probably use bitmasks already). > > > > It seems like we're mixing two ways to design the struct unnecessarily, > > but I'm not going to nag about this any further. > > > > Fair enough. I see your point and as mentioned earlier that we > started with the approach of separate booleans, but later found that > this is a better way as it was easier to set and check the different > parallel options for a parallel vacuum. I think we can go back to > the individual booleans if we want but I am not sure if that is a > better approach for this usage. Sawada-San, others, do you have any > opinion here? If we go back to the individual booleans we would end up with having three booleans: bulkdelete, cleanup and conditional cleanup. I think making the bulkdelete option to a boolean makes sense but having two booleans for cleanup and conditional cleanup might be slightly odd because these options are exclusive. If we don't stick to have only booleans the having a ternary value for cleanup might be understandable but I'm not sure it's better to have it for only vacuum purpose. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jan 2, 2020 at 8:29 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 31 Dec 2019 at 12:39, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Dec 30, 2019 at 6:46 PM Tomas Vondra > > <tomas.vondra@2ndquadrant.com> wrote: > > > > > > On Mon, Dec 30, 2019 at 08:25:28AM +0530, Amit Kapila wrote: > > > >On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra > > > ><tomas.vondra@2ndquadrant.com> wrote: > > > >> I think there's another question we need to ask - why to we introduce a > > > >> bitmask, instead of using regular boolean struct members? Until now, the > > > >> IndexAmRoutine struct had simple boolean members describing capabilities > > > >> of the AM implementation. Why shouldn't this patch do the same thing, > > > >> i.e. add one boolean flag for each AM feature? > > > >> > > > > > > > >This structure member describes mostly one property of index which is > > > >about a parallel vacuum which I am not sure is true for other members. > > > >Now, we can use separate bool variables for it which we were initially > > > >using in the patch but that seems to be taking more space in a > > > >structure without any advantage. Also, using one variable makes a > > > >code bit better because otherwise, in many places we need to check and > > > >set four variables instead of one. This is also the reason we used > > > >parallel in its name (we also use *parallel* for parallel index scan > > > >related things). Having said that, we can remove parallel from its > > > >name if we want to extend/use it for something other than a parallel > > > >vacuum. I think we might need to add a flag or two for parallelizing > > > >heap scan of vacuum when we enhance this feature, so keeping it for > > > >just a parallel vacuum is not completely insane. > > > > > > > >I think keeping amusemaintenanceworkmem separate from this variable > > > >seems to me like a better idea as it doesn't describe whether IndexAM > > > >can participate in a parallel vacuum or not. You can see more > > > >discussion about that variable in the thread [1]. > > > > > > > > > > I don't know, but IMHO it's somewhat easier to work with separate flags. > > > Bitmasks make sense when space usage matters a lot, e.g. for on-disk > > > representation, but that doesn't seem to be the case here I think (if it > > > was, we'd probably use bitmasks already). > > > > > > It seems like we're mixing two ways to design the struct unnecessarily, > > > but I'm not going to nag about this any further. > > > > > > > Fair enough. I see your point and as mentioned earlier that we > > started with the approach of separate booleans, but later found that > > this is a better way as it was easier to set and check the different > > parallel options for a parallel vacuum. I think we can go back to > > the individual booleans if we want but I am not sure if that is a > > better approach for this usage. Sawada-San, others, do you have any > > opinion here? > > If we go back to the individual booleans we would end up with having > three booleans: bulkdelete, cleanup and conditional cleanup. I think > making the bulkdelete option to a boolean makes sense but having two > booleans for cleanup and conditional cleanup might be slightly odd > because these options are exclusive. > If we have only three booleans, then we need to check for all three to conclude that a parallel vacuum is not enabled for any index. Alternatively, we can have a fourth boolean to indicate that a parallel vacuum is not enabled. And in the future, when we allow supporting multiple workers for an index, we might need another variable unless we can allow it for all types of indexes. This was my point that having multiple variables for the purpose of a parallel vacuum (for indexes) doesn't sound like a better approach than having a single uint8 variable. > If we don't stick to have only > booleans the having a ternary value for cleanup might be > understandable but I'm not sure it's better to have it for only vacuum > purpose. > If we want to keep the possibility of extending it for other purposes, then we can probably rename it to amoptions or something like that. What do you think? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Dec 31, 2019 at 9:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Dec 30, 2019 at 6:46 PM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: > > > > On Mon, Dec 30, 2019 at 08:25:28AM +0530, Amit Kapila wrote: > > >On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra > > ><tomas.vondra@2ndquadrant.com> wrote: > > >> I think there's another question we need to ask - why to we introduce a > > >> bitmask, instead of using regular boolean struct members? Until now, the > > >> IndexAmRoutine struct had simple boolean members describing capabilities > > >> of the AM implementation. Why shouldn't this patch do the same thing, > > >> i.e. add one boolean flag for each AM feature? > > >> > > > > > >This structure member describes mostly one property of index which is > > >about a parallel vacuum which I am not sure is true for other members. > > >Now, we can use separate bool variables for it which we were initially > > >using in the patch but that seems to be taking more space in a > > >structure without any advantage. Also, using one variable makes a > > >code bit better because otherwise, in many places we need to check and > > >set four variables instead of one. This is also the reason we used > > >parallel in its name (we also use *parallel* for parallel index scan > > >related things). Having said that, we can remove parallel from its > > >name if we want to extend/use it for something other than a parallel > > >vacuum. I think we might need to add a flag or two for parallelizing > > >heap scan of vacuum when we enhance this feature, so keeping it for > > >just a parallel vacuum is not completely insane. > > > > > >I think keeping amusemaintenanceworkmem separate from this variable > > >seems to me like a better idea as it doesn't describe whether IndexAM > > >can participate in a parallel vacuum or not. You can see more > > >discussion about that variable in the thread [1]. > > > > > > > I don't know, but IMHO it's somewhat easier to work with separate flags. > > Bitmasks make sense when space usage matters a lot, e.g. for on-disk > > representation, but that doesn't seem to be the case here I think (if it > > was, we'd probably use bitmasks already). > > > > It seems like we're mixing two ways to design the struct unnecessarily, > > but I'm not going to nag about this any further. > > > > Fair enough. I see your point and as mentioned earlier that we > started with the approach of separate booleans, but later found that > this is a better way as it was easier to set and check the different > parallel options for a parallel vacuum. I think we can go back to > the individual booleans if we want but I am not sure if that is a > better approach for this usage. Sawada-San, others, do you have any > opinion here? IMHO, having multiple bools will be confusing compared to what we have now because these are all related to enabling parallelism for different phases of the vacuum. So it makes more sense to keep it as a single variable with multiple options. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Jan 2, 2020 at 9:03 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jan 2, 2020 at 8:29 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 31 Dec 2019 at 12:39, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Dec 30, 2019 at 6:46 PM Tomas Vondra > > > <tomas.vondra@2ndquadrant.com> wrote: > > > > > > > > On Mon, Dec 30, 2019 at 08:25:28AM +0530, Amit Kapila wrote: > > > > >On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra > > > > ><tomas.vondra@2ndquadrant.com> wrote: > > > > >> I think there's another question we need to ask - why to we introduce a > > > > >> bitmask, instead of using regular boolean struct members? Until now, the > > > > >> IndexAmRoutine struct had simple boolean members describing capabilities > > > > >> of the AM implementation. Why shouldn't this patch do the same thing, > > > > >> i.e. add one boolean flag for each AM feature? > > > > >> > > > > > > > > > >This structure member describes mostly one property of index which is > > > > >about a parallel vacuum which I am not sure is true for other members. > > > > >Now, we can use separate bool variables for it which we were initially > > > > >using in the patch but that seems to be taking more space in a > > > > >structure without any advantage. Also, using one variable makes a > > > > >code bit better because otherwise, in many places we need to check and > > > > >set four variables instead of one. This is also the reason we used > > > > >parallel in its name (we also use *parallel* for parallel index scan > > > > >related things). Having said that, we can remove parallel from its > > > > >name if we want to extend/use it for something other than a parallel > > > > >vacuum. I think we might need to add a flag or two for parallelizing > > > > >heap scan of vacuum when we enhance this feature, so keeping it for > > > > >just a parallel vacuum is not completely insane. > > > > > > > > > >I think keeping amusemaintenanceworkmem separate from this variable > > > > >seems to me like a better idea as it doesn't describe whether IndexAM > > > > >can participate in a parallel vacuum or not. You can see more > > > > >discussion about that variable in the thread [1]. > > > > > > > > > > > > > I don't know, but IMHO it's somewhat easier to work with separate flags. > > > > Bitmasks make sense when space usage matters a lot, e.g. for on-disk > > > > representation, but that doesn't seem to be the case here I think (if it > > > > was, we'd probably use bitmasks already). > > > > > > > > It seems like we're mixing two ways to design the struct unnecessarily, > > > > but I'm not going to nag about this any further. > > > > > > > > > > Fair enough. I see your point and as mentioned earlier that we > > > started with the approach of separate booleans, but later found that > > > this is a better way as it was easier to set and check the different > > > parallel options for a parallel vacuum. I think we can go back to > > > the individual booleans if we want but I am not sure if that is a > > > better approach for this usage. Sawada-San, others, do you have any > > > opinion here? > > > > If we go back to the individual booleans we would end up with having > > three booleans: bulkdelete, cleanup and conditional cleanup. I think > > making the bulkdelete option to a boolean makes sense but having two > > booleans for cleanup and conditional cleanup might be slightly odd > > because these options are exclusive. > > > > If we have only three booleans, then we need to check for all three to > conclude that a parallel vacuum is not enabled for any index. > Alternatively, we can have a fourth boolean to indicate that a > parallel vacuum is not enabled. And in the future, when we allow > supporting multiple workers for an index, we might need another > variable unless we can allow it for all types of indexes. This was my > point that having multiple variables for the purpose of a parallel > vacuum (for indexes) doesn't sound like a better approach than having > a single uint8 variable. > > > If we don't stick to have only > > booleans the having a ternary value for cleanup might be > > understandable but I'm not sure it's better to have it for only vacuum > > purpose. > > > > If we want to keep the possibility of extending it for other purposes, > then we can probably rename it to amoptions or something like that. > What do you think? I think it makes more sense to just keep it for the purpose of enabling/disabling parallelism in different phases. I am not sure that adding more options (which are not related to enabling parallelism in vacuum phases) to the same variable will make sense. So I think the current name is good for its purpose. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Sun, Dec 29, 2019 at 4:23 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > IMO there's not much reason for the leader not to participate. For > regular queries the leader may be doing useful stuff (essentially > running the non-parallel part of the query) but AFAIK for VAUCUM that's > not the case and the worker is not doing anything. I agree, and said the same thing in http://postgr.es/m/CA+Tgmob7JLrngeHz6i60_TqdvE1YBcvGYVoEQ6_xvP=vN7DwGg@mail.gmail.com I really don't know why we have that code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jan 3, 2020 at 10:15 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Sun, Dec 29, 2019 at 4:23 PM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: > > IMO there's not much reason for the leader not to participate. For > > regular queries the leader may be doing useful stuff (essentially > > running the non-parallel part of the query) but AFAIK for VAUCUM that's > > not the case and the worker is not doing anything. > > I agree, and said the same thing in > http://postgr.es/m/CA+Tgmob7JLrngeHz6i60_TqdvE1YBcvGYVoEQ6_xvP=vN7DwGg@mail.gmail.com > > I really don't know why we have that code. > We have removed that code from the main patch. It is in a separate patch and used mainly for development testing where we want to debug/test the worker code. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, 4 Jan 2020 at 07:12, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 3, 2020 at 10:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Sun, Dec 29, 2019 at 4:23 PM Tomas Vondra
> > <tomas.vondra@2ndquadrant.com> wrote:
> > > IMO there's not much reason for the leader not to participate. For
> > > regular queries the leader may be doing useful stuff (essentially
> > > running the non-parallel part of the query) but AFAIK for VAUCUM that's
> > > not the case and the worker is not doing anything.
> >
> > I agree, and said the same thing in
> > http://postgr.es/m/CA+Tgmob7JLrngeHz6i60_TqdvE1YBcvGYVoEQ6_xvP=vN7DwGg@mail.gmail.com
> >
> > I really don't know why we have that code.
> >
>
> We have removed that code from the main patch. It is in a separate
> patch and used mainly for development testing where we want to
> debug/test the worker code.
>
Hi All,
In other thread "parallel vacuum options/syntax" [1], Amit Kapila asked opinion about syntax for making normal vacuum to parallel. From that thread, I can see that people are in favor of option(b) to implement. So I tried to implement option(b) on the top of v41 patch set and implemented a delta patch.
How vacuum will work?
If user gave "vacuum" or "vacuum table_name", then based on the number of parallel supported indexes, we will launch workers.
Ex: vacuum table_name;
or vacuum (parallel) table_name; //both are same.
If user has requested parallel degree (1-1024), then we will launch workers based on requested degree and parallel supported indexes.
Ex: vacuum (parallel 8) table_name;
If user don't want parallel vacuum, then he should set parallel degree as zero.
Ex: vacuum (parallel 0) table_name;
I did some testing also and didn't find any issue after forcing normal vacuum to parallel vacuum. All the test cases are passing and make check world also passing.
Here, I am attaching delta patch that can be applied on the top of v41 patch set. Apart from delta patch, attaching gist index patch (v4) and all the v41 patch set.
Please let me know your thoughts for this.
[1] : https://www.postgresql.org/message-id/CAA4eK1LBUfVQu7jCfL20MAF%2BRzUssP06mcBEcSZb8XktD7X1BA%40mail.gmail.com
--
Thanks and Regards
Mahendra Singh Thalor
>
> On Fri, Jan 3, 2020 at 10:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Sun, Dec 29, 2019 at 4:23 PM Tomas Vondra
> > <tomas.vondra@2ndquadrant.com> wrote:
> > > IMO there's not much reason for the leader not to participate. For
> > > regular queries the leader may be doing useful stuff (essentially
> > > running the non-parallel part of the query) but AFAIK for VAUCUM that's
> > > not the case and the worker is not doing anything.
> >
> > I agree, and said the same thing in
> > http://postgr.es/m/CA+Tgmob7JLrngeHz6i60_TqdvE1YBcvGYVoEQ6_xvP=vN7DwGg@mail.gmail.com
> >
> > I really don't know why we have that code.
> >
>
> We have removed that code from the main patch. It is in a separate
> patch and used mainly for development testing where we want to
> debug/test the worker code.
>
Hi All,
In other thread "parallel vacuum options/syntax" [1], Amit Kapila asked opinion about syntax for making normal vacuum to parallel. From that thread, I can see that people are in favor of option(b) to implement. So I tried to implement option(b) on the top of v41 patch set and implemented a delta patch.
How vacuum will work?
If user gave "vacuum" or "vacuum table_name", then based on the number of parallel supported indexes, we will launch workers.
Ex: vacuum table_name;
or vacuum (parallel) table_name; //both are same.
If user has requested parallel degree (1-1024), then we will launch workers based on requested degree and parallel supported indexes.
Ex: vacuum (parallel 8) table_name;
If user don't want parallel vacuum, then he should set parallel degree as zero.
Ex: vacuum (parallel 0) table_name;
I did some testing also and didn't find any issue after forcing normal vacuum to parallel vacuum. All the test cases are passing and make check world also passing.
Here, I am attaching delta patch that can be applied on the top of v41 patch set. Apart from delta patch, attaching gist index patch (v4) and all the v41 patch set.
Please let me know your thoughts for this.
[1] : https://www.postgresql.org/message-id/CAA4eK1LBUfVQu7jCfL20MAF%2BRzUssP06mcBEcSZb8XktD7X1BA%40mail.gmail.com
--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
Attachment
- v4-0001-Delete-empty-pages-in-each-pass-during-GIST-VACUUM.patch
- v41-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch
- v41-0003-Add-FAST-option-to-vacuum-command.patch
- v41-0004-Add-ability-to-disable-leader-participation-in-p.patch
- v41-0002-Add-a-parallel-option-to-the-VACUUM-command.patch
- delta_patch_to_make_vacuum_as_parallel.patch
On Sat, Jan 4, 2020 at 6:48 PM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > Hi All, > > In other thread "parallel vacuum options/syntax" [1], Amit Kapila asked opinion about syntax for making normal vacuum toparallel. From that thread, I can see that people are in favor of option(b) to implement. So I tried to implement option(b)on the top of v41 patch set and implemented a delta patch. > I looked at your code and changed it slightly to allow the vacuum to be performed in parallel by default. Apart from that, I have made a few other modifications (a) changed the macro SizeOfLVDeadTuples as preferred by Tomas [1], (b) updated the documentation, (c) changed a few comments. The first two patches are the same. I have not posted the patch related to the FAST option as I am not sure we have a consensus for that and I have also intentionally left DISABLE_LEADER_PARTICIPATION related patch to avoid confusion. What do you think of the attached? Sawada-san, kindly verify the changes and let me know your opinion. [1] - https://www.postgresql.org/message-id/20191229212354.tqivttn23lxjg2jz%40development -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Wed, 8 Jan 2020 at 22:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Jan 4, 2020 at 6:48 PM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > Hi All, > > > > In other thread "parallel vacuum options/syntax" [1], Amit Kapila asked opinion about syntax for making normal vacuumto parallel. From that thread, I can see that people are in favor of option(b) to implement. So I tried to implementoption(b) on the top of v41 patch set and implemented a delta patch. > > > > I looked at your code and changed it slightly to allow the vacuum to > be performed in parallel by default. Apart from that, I have made a > few other modifications (a) changed the macro SizeOfLVDeadTuples as > preferred by Tomas [1], (b) updated the documentation, (c) changed a > few comments. Thanks. > > The first two patches are the same. I have not posted the patch > related to the FAST option as I am not sure we have a consensus for > that and I have also intentionally left DISABLE_LEADER_PARTICIPATION > related patch to avoid confusion. > > What do you think of the attached? Sawada-san, kindly verify the > changes and let me know your opinion. I agreed to not include both the FAST option patch and DISABLE_LEADER_PARTICIPATION patch at this stage. It's better to focus on the main part and we can discuss and add them later if want. I've looked at the latest version patch you shared. Overall it looks good and works fine. I have a few small comments: 1. + refer to <xref linkend="vacuum-phases"/>). If the + <literal>PARALLEL</literal>option or parallel degree A space is needed between </literal> and 'option'. 2. + /* + * Variables to control parallel index vacuuming. We have a bitmap to + * indicate which index has stats in shared memory. The set bit in the + * map indicates that the particular index supports a parallel vacuum. + */ + pg_atomic_uint32 idx; /* counter for vacuuming and clean up */ + pg_atomic_uint32 nprocessed; /* # of indexes done during parallel + * execution */ + uint32 offset; /* sizeof header incl. bitmap */ + bits8 bitmap[FLEXIBLE_ARRAY_MEMBER]; /* bit map of NULLs */ + + /* Shared index statistics data follows at end of struct */ +} LVShared; It seems to me that we no longer use nprocessed at all. So we can remove it. 3. + * Compute the number of parallel worker processes to request. Both index + * vacuuming and index cleanup can be executed with parallel workers. The + * relation sizes of table don't affect to the parallel degree for now. I think the last sentence should be "The relation size of table doesn't affect to the parallel degree for now". 4. + /* cap by max_parallel_maintenance_workers */ + parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers); + /* + * a parallel vacuum must be requested and there must be indexes on the + * relation + */ + /* copy the updated statistics */ + /* parallel vacuum must be active */ + Assert(VacuumSharedCostBalance); All comments that the patches newly added except for the above four places start with a capital letter. Maybe we can change them for consistency. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jan 9, 2020 at 10:41 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 8 Jan 2020 at 22:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > What do you think of the attached? Sawada-san, kindly verify the > > changes and let me know your opinion. > > I agreed to not include both the FAST option patch and > DISABLE_LEADER_PARTICIPATION patch at this stage. It's better to focus > on the main part and we can discuss and add them later if want. > > I've looked at the latest version patch you shared. Overall it looks > good and works fine. I have a few small comments: > I have addressed all your comments and slightly change nearby comments and ran pgindent. I think we can commit the first two preparatory patches now unless you or someone else has any more comments on those. Tomas, most of your comments were in the main patch (v43-0002-Allow-vacuum-command-to-process-indexes-in-parallel) which are now addressed and we have provided the reasons for the proposed API changes in patch v43-0001-Introduce-IndexAM-fields-for-parallel-vacuum. Do you have any concerns if we commit the API patch and then in a few days time (after another pass or two) commit the main patch? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
Hello I noticed that parallel vacuum uses min_parallel_index_scan_size GUC to skip small indexes but this is not mentioned in documentationfor both vacuum command and GUC itself. + /* Determine the number of parallel workers to launch */ + if (lps->lvshared->for_cleanup) + { + if (lps->lvshared->first_time) + nworkers = lps->nindexes_parallel_cleanup + + lps->nindexes_parallel_condcleanup - 1; + else + nworkers = lps->nindexes_parallel_cleanup - 1; + + } + else + nworkers = lps->nindexes_parallel_bulkdel - 1; (lazy_parallel_vacuum_indexes) Perhaps we need to add a comment for future readers, why we reduce the number of workers by 1. Maybe this would be cleaner? + /* Determine the number of parallel workers to launch */ + if (lps->lvshared->for_cleanup) + { + if (lps->lvshared->first_time) + nworkers = lps->nindexes_parallel_cleanup + + lps->nindexes_parallel_condcleanup; + else + nworkers = lps->nindexes_parallel_cleanup; + + } + else + nworkers = lps->nindexes_parallel_bulkdel; + + /* The leader process will participate */ + nworkers--; I have no more comments after reading the patches. regards, Sergei
On Thu, 9 Jan 2020 at 17:31, Sergei Kornilov <sk@zsrv.org> wrote: > > Hello > > I noticed that parallel vacuum uses min_parallel_index_scan_size GUC to skip small indexes but this is not mentioned indocumentation for both vacuum command and GUC itself. > > + /* Determine the number of parallel workers to launch */ > + if (lps->lvshared->for_cleanup) > + { > + if (lps->lvshared->first_time) > + nworkers = lps->nindexes_parallel_cleanup + > + lps->nindexes_parallel_condcleanup - 1; > + else > + nworkers = lps->nindexes_parallel_cleanup - 1; > + > + } > + else > + nworkers = lps->nindexes_parallel_bulkdel - 1; v43-0001-Introduce-IndexAM-fields-for-parallel-vacuum and v43-0001-Introduce-IndexAM-fields-for-parallel-vacuum patches look fine to me. Below are some review comments for v43-0002 patch. 1. + <term><replaceable class="parameter">integer</replaceable></term> + <listitem> + <para> + Specifies a positive integer value passed to the selected option. + The <replaceable class="parameter">integer</replaceable> value can + also be omitted, in which case the value is decided by the command + based on the option used. + </para> + </listitem I think, now we are supporting zero also as a degree, so it should be changed from "positive integer" to "positive integer(including zero)" 2. + * with parallel worker processes. Individual indexes are processed by one + * vacuum process. At the beginning of a lazy vacuum (at lazy_scan_heap) we I think, above sentence should be like "Each individual index is processed by one vacuum process." or one worker 3. + * Lazy vacuum supports parallel execution with parallel worker processes. In + * a parallel lazy vacuum, we perform both index vacuuming and index cleanup Here, "index vacuuming" should be changed to "index vacuum" or "index cleanup" to "index cleaning" Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
On Thu, 9 Jan 2020 at 19:33, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jan 9, 2020 at 10:41 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 8 Jan 2020 at 22:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > What do you think of the attached? Sawada-san, kindly verify the > > > changes and let me know your opinion. > > > > I agreed to not include both the FAST option patch and > > DISABLE_LEADER_PARTICIPATION patch at this stage. It's better to focus > > on the main part and we can discuss and add them later if want. > > > > I've looked at the latest version patch you shared. Overall it looks > > good and works fine. I have a few small comments: > > > > I have addressed all your comments and slightly change nearby comments > and ran pgindent. I think we can commit the first two preparatory > patches now unless you or someone else has any more comments on those. Yes. I'd like to briefly summarize the v43-0002-Allow-vacuum-command-to-process-indexes-in-parallel for other reviewers who wants to newly starts to review this patch: Introduce PARALLEL option to VACUUM command. Parallel vacuum is enabled by default. The number of parallel workers is determined based on the number of indexes that support parallel index when user didn't specify the parallel degree or PARALLEL option is omitted. Specifying PARALLEL 0 disables parallel vacuum. In parallel vacuum of this patch, only the leader process does heap scan and collect dead tuple TIDs on the DSM segment. Before starting index vacuum or index cleanup the leader launches the parallel workers and perform it together with parallel workers. Individual index are processed by one vacuum worker process. Therefore parallel vacuum can be used when the table has at least 2 indexes (the leader always takes one index). After launched parallel workers, the leader process vacuums indexes first that don't support parallel index after launched parallel workers. The parallel workers process indexes that support parallel index vacuum and the leader process join as a worker after completing such indexes. Once all indexes are processed the parallel worker processes exit. After that, the leader process re-initializes the parallel context so that it can use the same DSM for multiple passes of index vacuum and for performing index cleanup. For updating the index statistics, we need to update the system table and since updates are not allowed during parallel mode we update the index statistics after exiting from the parallel mode. When the vacuum cost-based delay is enabled, even parallel vacuum is throttled. The basic idea of a cost-based vacuum delay for parallel index vacuuming is to allow all parallel vacuum workers including the leader process to have a shared view of cost related parameters (mainly VacuumCostBalance). We allow each worker to update it as and when it has incurred any cost and then based on that decide whether it needs to sleep. We allow the worker to sleep proportional to the work done and reduce the VacuumSharedCostBalance by the amount which is consumed by the current worker (VacuumCostBalanceLocal). This can avoid letting the workers sleep who have done less or no I/O as compared to other workers and therefore can ensure that workers who are doing more I/O got throttled more. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jan 9, 2020 at 5:31 PM Sergei Kornilov <sk@zsrv.org> wrote: > > Hello > > I noticed that parallel vacuum uses min_parallel_index_scan_size GUC to skip small indexes but this is not mentioned indocumentation for both vacuum command and GUC itself. > Changed documentation at both places. > + /* Determine the number of parallel workers to launch */ > + if (lps->lvshared->for_cleanup) > + { > + if (lps->lvshared->first_time) > + nworkers = lps->nindexes_parallel_cleanup + > + lps->nindexes_parallel_condcleanup - 1; > + else > + nworkers = lps->nindexes_parallel_cleanup - 1; > + > + } > + else > + nworkers = lps->nindexes_parallel_bulkdel - 1; > > (lazy_parallel_vacuum_indexes) > Perhaps we need to add a comment for future readers, why we reduce the number of workers by 1. Maybe this would be cleaner? > Adapted your suggestion. > > I have no more comments after reading the patches. > Thank you for reviewing the patch. > 1. > + <term><replaceable class="parameter">integer</replaceable></term> > + <listitem> > + <para> > + Specifies a positive integer value passed to the selected option. > + The <replaceable class="parameter">integer</replaceable> value can > + also be omitted, in which case the value is decided by the command > + based on the option used. > + </para> > + </listitem > > I think, now we are supporting zero also as a degree, so it should be > changed from "positive integer" to "positive integer(including zero)" > I have replaced it with "non-negative integer .." > 2. > + * with parallel worker processes. Individual indexes are processed by one > + * vacuum process. At the beginning of a lazy vacuum (at lazy_scan_heap) we > > I think, above sentence should be like "Each individual index is > processed by one vacuum process." or one worker > Hmm, in the above sentence vacuum process refers to either a leader or worker process, so not sure if what you are suggesting is an improvement over current. > 3. > + * Lazy vacuum supports parallel execution with parallel worker processes. In > + * a parallel lazy vacuum, we perform both index vacuuming and index cleanup > > Here, "index vacuuming" should be changed to "index vacuum" or "index > cleanup" to "index cleaning" > Okay, changed at the place you mentioned and other places where similar change is required. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
Hi Thank you for update! I looked again (vacuum_indexes_leader) + /* Skip the indexes that can be processed by parallel workers */ + if (!skip_index) + continue; Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel? Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum entire database(and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we hit: + if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0) + { + ereport(WARNING, + (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel", + RelationGetRelationName(onerel)))); + params->nworkers = -1; + } And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case? regards, Sergei
On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote: > > Hi > Thank you for update! I looked again > > (vacuum_indexes_leader) > + /* Skip the indexes that can be processed by parallel workers */ > + if (!skip_index) > + continue; > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel? I also agree with your point. > > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum entire database(and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we hit: > > + if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0) > + { > + ereport(WARNING, > + (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tablesin parallel", > + RelationGetRelationName(onerel)))); > + params->nworkers = -1; > + } > > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case? Good point. Yes, we should improve this. I tried to fix this. Attaching a delta patch that is fixing both the comments. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Attachment
Hello > Yes, we should improve this. I tried to fix this. Attaching a delta > patch that is fixing both the comments. Thank you, I have no objections. I think that status of CF entry is outdated and the most appropriate status for this patch is "Ready to Commiter". Changed.I also added an annotation with a link to recently summarized results. regards, Sergei
On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> >
> > Hi
> > Thank you for update! I looked again
> >
> > (vacuum_indexes_leader)
> > + /* Skip the indexes that can be processed by parallel workers */
> > + if (!skip_index)
> > + continue;
> >
> > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
>
> I also agree with your point.
I don't think the change is a good idea.
- bool skip_index = (get_indstats(lps->lvshared, i) == NULL ||
- skip_parallel_vacuum_index(Irel[i], lps->lvshared));
+ bool can_parallel = (get_indstats(lps->lvshared, i) == NULL ||
+ skip_parallel_vacuum_index(Irel[i],
+ lps->lvshared));
- bool skip_index = (get_indstats(lps->lvshared, i) == NULL ||
- skip_parallel_vacuum_index(Irel[i], lps->lvshared));
+ bool can_parallel = (get_indstats(lps->lvshared, i) == NULL ||
+ skip_parallel_vacuum_index(Irel[i],
+ lps->lvshared));
The above condition is true when the index can *not* do parallel index vacuum. How about changing it to skipped_index and change the comment to something like “We are interested in only index skipped parallel vacuum”?
>
> >
> > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum entire database (and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we hit:
> >
> > + if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
> > + {
> > + ereport(WARNING,
> > + (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
> > + RelationGetRelationName(onerel))));
> > + params->nworkers = -1;
> > + }
> >
> > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case?
>
> Good point.
> Yes, we should improve this. I tried to fix this.
+1
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote: > > > > > > Hi > > > Thank you for update! I looked again > > > > > > (vacuum_indexes_leader) > > > + /* Skip the indexes that can be processed by parallel workers */ > > > + if (!skip_index) > > > + continue; > > > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel? > > > > I also agree with your point. > > I don't think the change is a good idea. > > - bool skip_index = (get_indstats(lps->lvshared, i) == NULL || > - skip_parallel_vacuum_index(Irel[i], lps->lvshared)); > + bool can_parallel = (get_indstats(lps->lvshared, i) == NULL || > + skip_parallel_vacuum_index(Irel[i], > + lps->lvshared)); > > The above condition is true when the index can *not* do parallel index vacuum. How about changing it to skipped_index andchange the comment to something like “We are interested in only index skipped parallel vacuum”? > Hmm, I find the current code and comment better than what you or Sergei are proposing. I am not sure what is the point of confusion in the current code? > > > > > > > > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum entire database(and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we hit: > > > > > > + if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0) > > > + { > > > + ereport(WARNING, > > > + (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporarytables in parallel", > > > + RelationGetRelationName(onerel)))); > > > + params->nworkers = -1; > > > + } > > > > > > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case? > > > > Good point. > > Yes, we should improve this. I tried to fix this. > > +1 > Yeah, we can improve the situation here. I think we don't need to change the value of params->nworkers at first place if allow lazy_scan_heap to take care of this. Also, I think we shouldn't display warning unless the user has explicitly asked for parallel option. See the fix in the attached patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Sat, 11 Jan 2020 at 13:18, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote: > > > > > > > > Hi > > > > Thank you for update! I looked again > > > > > > > > (vacuum_indexes_leader) > > > > + /* Skip the indexes that can be processed by parallel workers */ > > > > + if (!skip_index) > > > > + continue; > > > > > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel? > > > > > > I also agree with your point. > > > > I don't think the change is a good idea. > > > > - bool skip_index = (get_indstats(lps->lvshared, i) == NULL || > > - skip_parallel_vacuum_index(Irel[i], lps->lvshared)); > > + bool can_parallel = (get_indstats(lps->lvshared, i) == NULL || > > + skip_parallel_vacuum_index(Irel[i], > > + lps->lvshared)); > > > > The above condition is true when the index can *not* do parallel index vacuum. How about changing it to skipped_indexand change the comment to something like “We are interested in only index skipped parallel vacuum”? > > > > Hmm, I find the current code and comment better than what you or > Sergei are proposing. I am not sure what is the point of confusion in > the current code? Yeah the current code is also good. I just thought they were concerned that the variable name skip_index might be confusing because we skip if skip_index is NOT true. > > > > > > > > > > > > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum entiredatabase (and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we hit: > > > > > > > > + if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0) > > > > + { > > > > + ereport(WARNING, > > > > + (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporarytables in parallel", > > > > + RelationGetRelationName(onerel)))); > > > > + params->nworkers = -1; > > > > + } > > > > > > > > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case? > > > > > > Good point. > > > Yes, we should improve this. I tried to fix this. > > > > +1 > > > > Yeah, we can improve the situation here. I think we don't need to > change the value of params->nworkers at first place if allow > lazy_scan_heap to take care of this. Also, I think we shouldn't > display warning unless the user has explicitly asked for parallel > option. See the fix in the attached patch. Agreed. But with the updated patch the PARALLEL option without the parallel degree doesn't display warning because params->nworkers = 0 in that case. So how about restoring params->nworkers at the end of vacuum_rel()? + /* + * Give warning only if the user explicitly tries to perform a + * parallel vacuum on the temporary table. + */ + if (params->nworkers > 0) + ereport(WARNING, + (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel", + RelationGetRelationName(onerel)))); Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, 11 Jan 2020 at 19:48, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sat, 11 Jan 2020 at 13:18, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > > >
> > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> > > > >
> > > > > Hi
> > > > > Thank you for update! I looked again
> > > > >
> > > > > (vacuum_indexes_leader)
> > > > > + /* Skip the indexes that can be processed by parallel workers */
> > > > > + if (!skip_index)
> > > > > + continue;
> > > > >
> > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
> > > >
> > > > I also agree with your point.
> > >
> > > I don't think the change is a good idea.
> > >
> > > - bool skip_index = (get_indstats(lps->lvshared, i) == NULL ||
> > > - skip_parallel_vacuum_index(Irel[i], lps->lvshared));
> > > + bool can_parallel = (get_indstats(lps->lvshared, i) == NULL ||
> > > + skip_parallel_vacuum_index(Irel[i],
> > > + lps->lvshared));
> > >
> > > The above condition is true when the index can *not* do parallel index vacuum. How about changing it to skipped_index and change the comment to something like “We are interested in only index skipped parallel vacuum”?
> > >
> >
> > Hmm, I find the current code and comment better than what you or
> > Sergei are proposing. I am not sure what is the point of confusion in
> > the current code?
>
> Yeah the current code is also good. I just thought they were concerned
> that the variable name skip_index might be confusing because we skip
> if skip_index is NOT true.
>
> >
> > > >
> > > > >
> > > > > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum entire database (and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we hit:
> > > > >
> > > > > + if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
> > > > > + {
> > > > > + ereport(WARNING,
> > > > > + (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
> > > > > + RelationGetRelationName(onerel))));
> > > > > + params->nworkers = -1;
> > > > > + }
> > > > >
> > > > > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case?
> > > >
> > > > Good point.
> > > > Yes, we should improve this. I tried to fix this.
> > >
> > > +1
> > >
> >
> > Yeah, we can improve the situation here. I think we don't need to
> > change the value of params->nworkers at first place if allow
> > lazy_scan_heap to take care of this. Also, I think we shouldn't
> > display warning unless the user has explicitly asked for parallel
> > option. See the fix in the attached patch.
>
> Agreed. But with the updated patch the PARALLEL option without the
> parallel degree doesn't display warning because params->nworkers = 0
> in that case. So how about restoring params->nworkers at the end of
> vacuum_rel()?
>
> + /*
> + * Give warning only if the user explicitly
> tries to perform a
> + * parallel vacuum on the temporary table.
> + */
> + if (params->nworkers > 0)
> + ereport(WARNING,
> + (errmsg("disabling
> parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables
> in parallel",
> +
> RelationGetRelationName(onerel))));
--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
>
> On Sat, 11 Jan 2020 at 13:18, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > > >
> > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> > > > >
> > > > > Hi
> > > > > Thank you for update! I looked again
> > > > >
> > > > > (vacuum_indexes_leader)
> > > > > + /* Skip the indexes that can be processed by parallel workers */
> > > > > + if (!skip_index)
> > > > > + continue;
> > > > >
> > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
> > > >
> > > > I also agree with your point.
> > >
> > > I don't think the change is a good idea.
> > >
> > > - bool skip_index = (get_indstats(lps->lvshared, i) == NULL ||
> > > - skip_parallel_vacuum_index(Irel[i], lps->lvshared));
> > > + bool can_parallel = (get_indstats(lps->lvshared, i) == NULL ||
> > > + skip_parallel_vacuum_index(Irel[i],
> > > + lps->lvshared));
> > >
> > > The above condition is true when the index can *not* do parallel index vacuum. How about changing it to skipped_index and change the comment to something like “We are interested in only index skipped parallel vacuum”?
> > >
> >
> > Hmm, I find the current code and comment better than what you or
> > Sergei are proposing. I am not sure what is the point of confusion in
> > the current code?
>
> Yeah the current code is also good. I just thought they were concerned
> that the variable name skip_index might be confusing because we skip
> if skip_index is NOT true.
>
> >
> > > >
> > > > >
> > > > > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum entire database (and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we hit:
> > > > >
> > > > > + if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
> > > > > + {
> > > > > + ereport(WARNING,
> > > > > + (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
> > > > > + RelationGetRelationName(onerel))));
> > > > > + params->nworkers = -1;
> > > > > + }
> > > > >
> > > > > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case?
> > > >
> > > > Good point.
> > > > Yes, we should improve this. I tried to fix this.
> > >
> > > +1
> > >
> >
> > Yeah, we can improve the situation here. I think we don't need to
> > change the value of params->nworkers at first place if allow
> > lazy_scan_heap to take care of this. Also, I think we shouldn't
> > display warning unless the user has explicitly asked for parallel
> > option. See the fix in the attached patch.
>
> Agreed. But with the updated patch the PARALLEL option without the
> parallel degree doesn't display warning because params->nworkers = 0
> in that case. So how about restoring params->nworkers at the end of
> vacuum_rel()?
>
> + /*
> + * Give warning only if the user explicitly
> tries to perform a
> + * parallel vacuum on the temporary table.
> + */
> + if (params->nworkers > 0)
> + ereport(WARNING,
> + (errmsg("disabling
> parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables
> in parallel",
> +
> RelationGetRelationName(onerel))));
Hi,
I have some doubts for warning of temporary tables . Below are the some examples.
Let we have 1 temporary table with name "temp_table".
Case 1:
vacuum;
I think, in this case, we should not give any warning for temp table. We should do parallel vacuum(considering zero as parallel degree) for all the tables except temporary tables.
Case 2:
vacuum (parallel);
Case 3:
vacuum(parallel 5);
Case 4:
vacuum(parallel) temp_table;
Case 5:
vacuum(parallel 4) temp_table;
I think, for case 2 and 4, as per new design, we should give error (ERROR: Parallel degree should be specified between 0 to 1024) because by default, parallel vacuum is ON, so if user give parallel option without degree, then we can give error.
If we can give error for case 2 and 4, then we can give warning for case 3, 5.
Thoughts?
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sat, 11 Jan 2020 at 13:18, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > > > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote: > > > > > > > > > > Hi > > > > > Thank you for update! I looked again > > > > > > > > > > (vacuum_indexes_leader) > > > > > + /* Skip the indexes that can be processed by parallel workers */ > > > > > + if (!skip_index) > > > > > + continue; > > > > > > > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel? > > > > > > > > I also agree with your point. > > > > > > I don't think the change is a good idea. > > > > > > - bool skip_index = (get_indstats(lps->lvshared, i) == NULL || > > > - skip_parallel_vacuum_index(Irel[i], lps->lvshared)); > > > + bool can_parallel = (get_indstats(lps->lvshared, i) == NULL || > > > + skip_parallel_vacuum_index(Irel[i], > > > + lps->lvshared)); > > > > > > The above condition is true when the index can *not* do parallel index vacuum. How about changing it to skipped_indexand change the comment to something like “We are interested in only index skipped parallel vacuum”? > > > > > > > Hmm, I find the current code and comment better than what you or > > Sergei are proposing. I am not sure what is the point of confusion in > > the current code? > > Yeah the current code is also good. I just thought they were concerned > that the variable name skip_index might be confusing because we skip > if skip_index is NOT true. > Okay, would it better if we get rid of this variable and have code like below? /* Skip the indexes that can be processed by parallel workers */ if ( !(get_indstats(lps->lvshared, i) == NULL || skip_parallel_vacuum_index(Irel[i], lps->lvshared))) continue; ... > > > > > > > > > > > > > > > > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum entiredatabase (and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we hit: > > > > > > > > > > + if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0) > > > > > + { > > > > > + ereport(WARNING, > > > > > + (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporarytables in parallel", > > > > > + RelationGetRelationName(onerel)))); > > > > > + params->nworkers = -1; > > > > > + } > > > > > > > > > > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case? > > > > > > > > Good point. > > > > Yes, we should improve this. I tried to fix this. > > > > > > +1 > > > > > > > Yeah, we can improve the situation here. I think we don't need to > > change the value of params->nworkers at first place if allow > > lazy_scan_heap to take care of this. Also, I think we shouldn't > > display warning unless the user has explicitly asked for parallel > > option. See the fix in the attached patch. > > Agreed. But with the updated patch the PARALLEL option without the > parallel degree doesn't display warning because params->nworkers = 0 > in that case. So how about restoring params->nworkers at the end of > vacuum_rel()? > I had also thought on those lines, but I was not entirely sure about this resetting of workers. Today, again thinking about it, it seems the idea Mahendra is suggesting that is giving an error if the parallel degree is not specified seems reasonable to me. This means Vacuum (parallel), Vacuum (parallel) <tbl_name>, etc. will give an error "parallel degree must be specified". This idea has merit as now we are supporting a parallel vacuum by default, so a 'parallel' option without a parallel degree doesn't have any meaning. If we do that, then we don't need to do anything additional about the handling of temp tables (other than what patch is already doing) as well. What do you think? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Jan 13, 2020 at 9:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Sat, 11 Jan 2020 at 13:18, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > > > > > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote: > > > > > > > > > > > > Hi > > > > > > Thank you for update! I looked again > > > > > > > > > > > > (vacuum_indexes_leader) > > > > > > + /* Skip the indexes that can be processed by parallel workers */ > > > > > > + if (!skip_index) > > > > > > + continue; > > > > > > > > > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel? > > > > > > > > > > I also agree with your point. > > > > > > > > I don't think the change is a good idea. > > > > > > > > - bool skip_index = (get_indstats(lps->lvshared, i) == NULL || > > > > - skip_parallel_vacuum_index(Irel[i], lps->lvshared)); > > > > + bool can_parallel = (get_indstats(lps->lvshared, i) == NULL || > > > > + skip_parallel_vacuum_index(Irel[i], > > > > + lps->lvshared)); > > > > > > > > The above condition is true when the index can *not* do parallel index vacuum. How about changing it to skipped_indexand change the comment to something like “We are interested in only index skipped parallel vacuum”? > > > > > > > > > > Hmm, I find the current code and comment better than what you or > > > Sergei are proposing. I am not sure what is the point of confusion in > > > the current code? > > > > Yeah the current code is also good. I just thought they were concerned > > that the variable name skip_index might be confusing because we skip > > if skip_index is NOT true. > > > > Okay, would it better if we get rid of this variable and have code like below? > > /* Skip the indexes that can be processed by parallel workers */ > if ( !(get_indstats(lps->lvshared, i) == NULL || > skip_parallel_vacuum_index(Irel[i], lps->lvshared))) > continue; > ... > > > > > > > > > > > > > > > > > > > > > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum entiredatabase (and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we hit: > > > > > > > > > > > > + if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0) > > > > > > + { > > > > > > + ereport(WARNING, > > > > > > + (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporarytables in parallel", > > > > > > + RelationGetRelationName(onerel)))); > > > > > > + params->nworkers = -1; > > > > > > + } > > > > > > > > > > > > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case? > > > > > > > > > > Good point. > > > > > Yes, we should improve this. I tried to fix this. > > > > > > > > +1 > > > > > > > > > > Yeah, we can improve the situation here. I think we don't need to > > > change the value of params->nworkers at first place if allow > > > lazy_scan_heap to take care of this. Also, I think we shouldn't > > > display warning unless the user has explicitly asked for parallel > > > option. See the fix in the attached patch. > > > > Agreed. But with the updated patch the PARALLEL option without the > > parallel degree doesn't display warning because params->nworkers = 0 > > in that case. So how about restoring params->nworkers at the end of > > vacuum_rel()? > > > > I had also thought on those lines, but I was not entirely sure about > this resetting of workers. Today, again thinking about it, it seems > the idea Mahendra is suggesting that is giving an error if the > parallel degree is not specified seems reasonable to me. This means > Vacuum (parallel), Vacuum (parallel) <tbl_name>, etc. will give an > error "parallel degree must be specified". This idea has merit as now > we are supporting a parallel vacuum by default, so a 'parallel' option > without a parallel degree doesn't have any meaning. If we do that, > then we don't need to do anything additional about the handling of > temp tables (other than what patch is already doing) as well. What do > you think? > This idea make sense to me. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Hello > I just thought they were concerned > that the variable name skip_index might be confusing because we skip > if skip_index is NOT true. Right. >> > - bool skip_index = (get_indstats(lps->lvshared, i) == NULL || >> > - skip_parallel_vacuum_index(Irel[i], lps->lvshared)); >> > + bool can_parallel = (get_indstats(lps->lvshared, i) == NULL || >> > + skip_parallel_vacuum_index(Irel[i], >> > + lps->lvshared)); >> > >> > The above condition is true when the index can *not* do parallel index vacuum. Ouch, right. I was wrong. (or the variable name and the comment really confused me) > Okay, would it better if we get rid of this variable and have code like below? > > /* Skip the indexes that can be processed by parallel workers */ > if ( !(get_indstats(lps->lvshared, i) == NULL || > skip_parallel_vacuum_index(Irel[i], lps->lvshared))) > continue; Complex condition... Not sure. > How about changing it to skipped_index and change the comment to something like “We are interested in only index skippedparallel vacuum”? I prefer this idea. > Today, again thinking about it, it seems > the idea Mahendra is suggesting that is giving an error if the > parallel degree is not specified seems reasonable to me. +1 regards, Sergei
On Thu, Jan 9, 2020 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jan 9, 2020 at 10:41 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 8 Jan 2020 at 22:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > What do you think of the attached? Sawada-san, kindly verify the > > > changes and let me know your opinion. > > > > I agreed to not include both the FAST option patch and > > DISABLE_LEADER_PARTICIPATION patch at this stage. It's better to focus > > on the main part and we can discuss and add them later if want. > > > > I've looked at the latest version patch you shared. Overall it looks > > good and works fine. I have a few small comments: > > > > I have addressed all your comments and slightly change nearby comments > and ran pgindent. I think we can commit the first two preparatory > patches now unless you or someone else has any more comments on those. > I have pushed the first one (4e514c6) and I am planning to commit the next one (API: v46-0001-Introduce-IndexAM-fields-for-parallel-vacuum) patch on Wednesday. We are still discussing a few things for the main parallel vacuum patch (v46-0002-Allow-vacuum-command-to-process-indexes-in-parallel) which we should reach conclusion soon. In the attached, I have made a few changes in the comments of patch v46-0002-Allow-vacuum-command-to-process-indexes-in-parallel. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote: > > Hi > Thank you for update! I looked again > > (vacuum_indexes_leader) > + /* Skip the indexes that can be processed by parallel workers */ > + if (!skip_index) > + continue; > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel? > Again I looked into code and thought that somehow if we can add a boolean flag(can_parallel) in IndexBulkDeleteResult structure to identify that this index is supporting parallel vacuum or not, then it will be easy to skip those indexes and multiple time we will not call skip_parallel_vacuum_index (from vacuum_indexes_leader and parallel_vacuum_index) We can have a linked list of non-parallel supported indexes, then directly we can pass to vacuum_indexes_leader. Ex: let suppose we have 5 indexes into a table. If before launching parallel workers, if we can add boolean flag(can_parallel) IndexBulkDeleteResult structure to identify that this index is supporting parallel vacuum or not. Let index 1, 4 are not supporting parallel vacuum so we already have info in a linked list that 1->4 are not supporting parallel vacuum, so parallel_vacuum_index will process these indexes and rest will be processed by parallel workers. If parallel worker found that can_parallel is false, then it will skip that index. As per my understanding, if we implement this, then we can avoid multiple function calling of skip_parallel_vacuum_index and if there is no index which can't performe parallel vacuum, then we will not call vacuum_indexes_leader as head of list pointing to null. (we can save unnecessary calling of vacuum_indexes_leader) Thoughts? -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
On Mon, 13 Jan 2020 at 12:50, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Sat, 11 Jan 2020 at 13:18, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > > > > > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote: > > > > > > > > > > > > Hi > > > > > > Thank you for update! I looked again > > > > > > > > > > > > (vacuum_indexes_leader) > > > > > > + /* Skip the indexes that can be processed by parallel workers */ > > > > > > + if (!skip_index) > > > > > > + continue; > > > > > > > > > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel? > > > > > > > > > > I also agree with your point. > > > > > > > > I don't think the change is a good idea. > > > > > > > > - bool skip_index = (get_indstats(lps->lvshared, i) == NULL || > > > > - skip_parallel_vacuum_index(Irel[i], lps->lvshared)); > > > > + bool can_parallel = (get_indstats(lps->lvshared, i) == NULL || > > > > + skip_parallel_vacuum_index(Irel[i], > > > > + lps->lvshared)); > > > > > > > > The above condition is true when the index can *not* do parallel index vacuum. How about changing it to skipped_indexand change the comment to something like “We are interested in only index skipped parallel vacuum”? > > > > > > > > > > Hmm, I find the current code and comment better than what you or > > > Sergei are proposing. I am not sure what is the point of confusion in > > > the current code? > > > > Yeah the current code is also good. I just thought they were concerned > > that the variable name skip_index might be confusing because we skip > > if skip_index is NOT true. > > > > Okay, would it better if we get rid of this variable and have code like below? > > /* Skip the indexes that can be processed by parallel workers */ > if ( !(get_indstats(lps->lvshared, i) == NULL || > skip_parallel_vacuum_index(Irel[i], lps->lvshared))) > continue; Make sense to me. > ... > > > > > > > > > > > > > > > > > > > > > Another question about behavior on temporary tables. Use case: the user commands just "vacuum;" to vacuum entiredatabase (and has enough maintenance workers). Vacuum starts fine in parallel, but on first temporary table we hit: > > > > > > > > > > > > + if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0) > > > > > > + { > > > > > > + ereport(WARNING, > > > > > > + (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporarytables in parallel", > > > > > > + RelationGetRelationName(onerel)))); > > > > > > + params->nworkers = -1; > > > > > > + } > > > > > > > > > > > > And therefore we turn off the parallel vacuum for the remaining tables... Can we improve this case? > > > > > > > > > > Good point. > > > > > Yes, we should improve this. I tried to fix this. > > > > > > > > +1 > > > > > > > > > > Yeah, we can improve the situation here. I think we don't need to > > > change the value of params->nworkers at first place if allow > > > lazy_scan_heap to take care of this. Also, I think we shouldn't > > > display warning unless the user has explicitly asked for parallel > > > option. See the fix in the attached patch. > > > > Agreed. But with the updated patch the PARALLEL option without the > > parallel degree doesn't display warning because params->nworkers = 0 > > in that case. So how about restoring params->nworkers at the end of > > vacuum_rel()? > > > > I had also thought on those lines, but I was not entirely sure about > this resetting of workers. Today, again thinking about it, it seems > the idea Mahendra is suggesting that is giving an error if the > parallel degree is not specified seems reasonable to me. This means > Vacuum (parallel), Vacuum (parallel) <tbl_name>, etc. will give an > error "parallel degree must be specified". This idea has merit as now > we are supporting a parallel vacuum by default, so a 'parallel' option > without a parallel degree doesn't have any meaning. If we do that, > then we don't need to do anything additional about the handling of > temp tables (other than what patch is already doing) as well. What do > you think? > Good point! Agreed. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote: > > > > Hi > > Thank you for update! I looked again > > > > (vacuum_indexes_leader) > > + /* Skip the indexes that can be processed by parallel workers */ > > + if (!skip_index) > > + continue; > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel? > > > > Again I looked into code and thought that somehow if we can add a > boolean flag(can_parallel) in IndexBulkDeleteResult structure to > identify that this index is supporting parallel vacuum or not, then it > will be easy to skip those indexes and multiple time we will not call > skip_parallel_vacuum_index (from vacuum_indexes_leader and > parallel_vacuum_index) > We can have a linked list of non-parallel supported indexes, then > directly we can pass to vacuum_indexes_leader. > > Ex: let suppose we have 5 indexes into a table. If before launching > parallel workers, if we can add boolean flag(can_parallel) > IndexBulkDeleteResult structure to identify that this index is > supporting parallel vacuum or not. > Let index 1, 4 are not supporting parallel vacuum so we already have > info in a linked list that 1->4 are not supporting parallel vacuum, so > parallel_vacuum_index will process these indexes and rest will be > processed by parallel workers. If parallel worker found that > can_parallel is false, then it will skip that index. > > As per my understanding, if we implement this, then we can avoid > multiple function calling of skip_parallel_vacuum_index and if there > is no index which can't performe parallel vacuum, then we will not > call vacuum_indexes_leader as head of list pointing to null. (we can > save unnecessary calling of vacuum_indexes_leader) > > Thoughts? > We skip not only indexes that don't support parallel index vacuum but also indexes supporting it depending on vacuum phase. That is, we could skip different indexes at different vacuum phase. Therefore with your idea, we would need to have at least three linked lists for each possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is that right? I think we can check if there are indexes that should be processed by the leader process before entering the loop in vacuum_indexes_leader by comparing nindexes_parallel_XXX of LVParallelState to the number of indexes but I'm not sure it's effective since the number of indexes on a table should be small. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 14 Jan 2020 at 10:06, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote: > > > > > > Hi > > > Thank you for update! I looked again > > > > > > (vacuum_indexes_leader) > > > + /* Skip the indexes that can be processed by parallel workers */ > > > + if (!skip_index) > > > + continue; > > > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel? > > > > > > > Again I looked into code and thought that somehow if we can add a > > boolean flag(can_parallel) in IndexBulkDeleteResult structure to > > identify that this index is supporting parallel vacuum or not, then it > > will be easy to skip those indexes and multiple time we will not call > > skip_parallel_vacuum_index (from vacuum_indexes_leader and > > parallel_vacuum_index) > > We can have a linked list of non-parallel supported indexes, then > > directly we can pass to vacuum_indexes_leader. > > > > Ex: let suppose we have 5 indexes into a table. If before launching > > parallel workers, if we can add boolean flag(can_parallel) > > IndexBulkDeleteResult structure to identify that this index is > > supporting parallel vacuum or not. > > Let index 1, 4 are not supporting parallel vacuum so we already have > > info in a linked list that 1->4 are not supporting parallel vacuum, so > > parallel_vacuum_index will process these indexes and rest will be > > processed by parallel workers. If parallel worker found that > > can_parallel is false, then it will skip that index. > > > > As per my understanding, if we implement this, then we can avoid > > multiple function calling of skip_parallel_vacuum_index and if there > > is no index which can't performe parallel vacuum, then we will not > > call vacuum_indexes_leader as head of list pointing to null. (we can > > save unnecessary calling of vacuum_indexes_leader) > > > > Thoughts? > > > > We skip not only indexes that don't support parallel index vacuum but > also indexes supporting it depending on vacuum phase. That is, we > could skip different indexes at different vacuum phase. Therefore with > your idea, we would need to have at least three linked lists for each > possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is > that right? > > I think we can check if there are indexes that should be processed by > the leader process before entering the loop in vacuum_indexes_leader > by comparing nindexes_parallel_XXX of LVParallelState to the number of > indexes but I'm not sure it's effective since the number of indexes on > a table should be small. > Hi, + /* + * Try to initialize the parallel vacuum if requested + */ + if (params->nworkers >= 0 && vacrelstats->useindex) + { + /* + * Since parallel workers cannot access data in temporary tables, we + * can't perform parallel vacuum on them. + */ + if (RelationUsesLocalBuffers(onerel)) + { + /* + * Give warning only if the user explicitly tries to perform a + * parallel vacuum on the temporary table. + */ + if (params->nworkers > 0) + ereport(WARNING, + (errmsg("disabling parallel option of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel", From v45 patch, we moved warning of temporary table into "params->nworkers >= 0 && vacrelstats->useindex)" check so if table don't have any index, then we are not giving any warning. I think, we should give warning for all the temporary tables if parallel degree is given. (Till v44 patch, we were giving warning for all the temporary tables(having index and without index)) Thoughts? -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
On Tue, 14 Jan 2020 at 16:17, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> On Tue, 14 Jan 2020 at 10:06, Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > >
> > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> > > >
> > > > Hi
> > > > Thank you for update! I looked again
> > > >
> > > > (vacuum_indexes_leader)
> > > > + /* Skip the indexes that can be processed by parallel workers */
> > > > + if (!skip_index)
> > > > + continue;
> > > >
> > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
> > > >
> > >
> > > Again I looked into code and thought that somehow if we can add a
> > > boolean flag(can_parallel) in IndexBulkDeleteResult structure to
> > > identify that this index is supporting parallel vacuum or not, then it
> > > will be easy to skip those indexes and multiple time we will not call
> > > skip_parallel_vacuum_index (from vacuum_indexes_leader and
> > > parallel_vacuum_index)
> > > We can have a linked list of non-parallel supported indexes, then
> > > directly we can pass to vacuum_indexes_leader.
> > >
> > > Ex: let suppose we have 5 indexes into a table. If before launching
> > > parallel workers, if we can add boolean flag(can_parallel)
> > > IndexBulkDeleteResult structure to identify that this index is
> > > supporting parallel vacuum or not.
> > > Let index 1, 4 are not supporting parallel vacuum so we already have
> > > info in a linked list that 1->4 are not supporting parallel vacuum, so
> > > parallel_vacuum_index will process these indexes and rest will be
> > > processed by parallel workers. If parallel worker found that
> > > can_parallel is false, then it will skip that index.
> > >
> > > As per my understanding, if we implement this, then we can avoid
> > > multiple function calling of skip_parallel_vacuum_index and if there
> > > is no index which can't performe parallel vacuum, then we will not
> > > call vacuum_indexes_leader as head of list pointing to null. (we can
> > > save unnecessary calling of vacuum_indexes_leader)
> > >
> > > Thoughts?
> > >
> >
> > We skip not only indexes that don't support parallel index vacuum but
> > also indexes supporting it depending on vacuum phase. That is, we
> > could skip different indexes at different vacuum phase. Therefore with
> > your idea, we would need to have at least three linked lists for each
> > possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is
> > that right?
> >
> > I think we can check if there are indexes that should be processed by
> > the leader process before entering the loop in vacuum_indexes_leader
> > by comparing nindexes_parallel_XXX of LVParallelState to the number of
> > indexes but I'm not sure it's effective since the number of indexes on
> > a table should be small.
> >
>
> Hi,
>
> + /*
> + * Try to initialize the parallel vacuum if requested
> + */
> + if (params->nworkers >= 0 && vacrelstats->useindex)
> + {
> + /*
> + * Since parallel workers cannot access data in temporary tables, we
> + * can't perform parallel vacuum on them.
> + */
> + if (RelationUsesLocalBuffers(onerel))
> + {
> + /*
> + * Give warning only if the user explicitly tries to perform a
> + * parallel vacuum on the temporary table.
> + */
> + if (params->nworkers > 0)
> + ereport(WARNING,
> + (errmsg("disabling parallel option of vacuum
> on \"%s\" --- cannot vacuum temporary tables in parallel",
>
> From v45 patch, we moved warning of temporary table into
> "params->nworkers >= 0 && vacrelstats->useindex)" check so if table
> don't have any index, then we are not giving any warning. I think, we
> should give warning for all the temporary tables if parallel degree is
> given. (Till v44 patch, we were giving warning for all the temporary
> tables(having index and without index))
>
> Thoughts?
Hi,
I did some more review. Below is the 1 review comment for v46-0002.
+ /*
+ * Initialize the state for parallel vacuum
+ */
+ if (params->nworkers >= 0 && vacrelstats->useindex)
+ {
+ /*
+ * Since parallel workers cannot access data in temporary tables, we
+ * can't perform parallel vacuum on them.
+ */
+ if (RelationUsesLocalBuffers(onerel)
>
> On Tue, 14 Jan 2020 at 10:06, Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > >
> > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote:
> > > >
> > > > Hi
> > > > Thank you for update! I looked again
> > > >
> > > > (vacuum_indexes_leader)
> > > > + /* Skip the indexes that can be processed by parallel workers */
> > > > + if (!skip_index)
> > > > + continue;
> > > >
> > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel?
> > > >
> > >
> > > Again I looked into code and thought that somehow if we can add a
> > > boolean flag(can_parallel) in IndexBulkDeleteResult structure to
> > > identify that this index is supporting parallel vacuum or not, then it
> > > will be easy to skip those indexes and multiple time we will not call
> > > skip_parallel_vacuum_index (from vacuum_indexes_leader and
> > > parallel_vacuum_index)
> > > We can have a linked list of non-parallel supported indexes, then
> > > directly we can pass to vacuum_indexes_leader.
> > >
> > > Ex: let suppose we have 5 indexes into a table. If before launching
> > > parallel workers, if we can add boolean flag(can_parallel)
> > > IndexBulkDeleteResult structure to identify that this index is
> > > supporting parallel vacuum or not.
> > > Let index 1, 4 are not supporting parallel vacuum so we already have
> > > info in a linked list that 1->4 are not supporting parallel vacuum, so
> > > parallel_vacuum_index will process these indexes and rest will be
> > > processed by parallel workers. If parallel worker found that
> > > can_parallel is false, then it will skip that index.
> > >
> > > As per my understanding, if we implement this, then we can avoid
> > > multiple function calling of skip_parallel_vacuum_index and if there
> > > is no index which can't performe parallel vacuum, then we will not
> > > call vacuum_indexes_leader as head of list pointing to null. (we can
> > > save unnecessary calling of vacuum_indexes_leader)
> > >
> > > Thoughts?
> > >
> >
> > We skip not only indexes that don't support parallel index vacuum but
> > also indexes supporting it depending on vacuum phase. That is, we
> > could skip different indexes at different vacuum phase. Therefore with
> > your idea, we would need to have at least three linked lists for each
> > possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is
> > that right?
> >
> > I think we can check if there are indexes that should be processed by
> > the leader process before entering the loop in vacuum_indexes_leader
> > by comparing nindexes_parallel_XXX of LVParallelState to the number of
> > indexes but I'm not sure it's effective since the number of indexes on
> > a table should be small.
> >
>
> Hi,
>
> + /*
> + * Try to initialize the parallel vacuum if requested
> + */
> + if (params->nworkers >= 0 && vacrelstats->useindex)
> + {
> + /*
> + * Since parallel workers cannot access data in temporary tables, we
> + * can't perform parallel vacuum on them.
> + */
> + if (RelationUsesLocalBuffers(onerel))
> + {
> + /*
> + * Give warning only if the user explicitly tries to perform a
> + * parallel vacuum on the temporary table.
> + */
> + if (params->nworkers > 0)
> + ereport(WARNING,
> + (errmsg("disabling parallel option of vacuum
> on \"%s\" --- cannot vacuum temporary tables in parallel",
>
> From v45 patch, we moved warning of temporary table into
> "params->nworkers >= 0 && vacrelstats->useindex)" check so if table
> don't have any index, then we are not giving any warning. I think, we
> should give warning for all the temporary tables if parallel degree is
> given. (Till v44 patch, we were giving warning for all the temporary
> tables(having index and without index))
>
> Thoughts?
Hi,
I did some more review. Below is the 1 review comment for v46-0002.
+ /*
+ * Initialize the state for parallel vacuum
+ */
+ if (params->nworkers >= 0 && vacrelstats->useindex)
+ {
+ /*
+ * Since parallel workers cannot access data in temporary tables, we
+ * can't perform parallel vacuum on them.
+ */
+ if (RelationUsesLocalBuffers(onerel)
In above check, we should add "nindexes > 1" check so that if there is only 1 index, then we will not call begin_parallel_vacuum.
"Initialize the state for parallel vacuum",we can improve this comment by mentioning that what are doing here. (If table has more than index and parallel vacuum is requested, then try to start parallel vacuum).
On Tue, Jan 14, 2020 at 10:04 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 13 Jan 2020 at 12:50, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > Okay, would it better if we get rid of this variable and have code like below? > > > > /* Skip the indexes that can be processed by parallel workers */ > > if ( !(get_indstats(lps->lvshared, i) == NULL || > > skip_parallel_vacuum_index(Irel[i], lps->lvshared))) > > continue; > > Make sense to me. > I have changed the comment and condition to make it a positive test so that it is more clear. > > ... > > > Agreed. But with the updated patch the PARALLEL option without the > > > parallel degree doesn't display warning because params->nworkers = 0 > > > in that case. So how about restoring params->nworkers at the end of > > > vacuum_rel()? > > > > > > > I had also thought on those lines, but I was not entirely sure about > > this resetting of workers. Today, again thinking about it, it seems > > the idea Mahendra is suggesting that is giving an error if the > > parallel degree is not specified seems reasonable to me. This means > > Vacuum (parallel), Vacuum (parallel) <tbl_name>, etc. will give an > > error "parallel degree must be specified". This idea has merit as now > > we are supporting a parallel vacuum by default, so a 'parallel' option > > without a parallel degree doesn't have any meaning. If we do that, > > then we don't need to do anything additional about the handling of > > temp tables (other than what patch is already doing) as well. What do > > you think? > > > > Good point! Agreed. > Thanks, changed accordingly. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Tue, Jan 14, 2020 at 4:17 PM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > Hi, > > + /* > + * Try to initialize the parallel vacuum if requested > + */ > + if (params->nworkers >= 0 && vacrelstats->useindex) > + { > + /* > + * Since parallel workers cannot access data in temporary tables, we > + * can't perform parallel vacuum on them. > + */ > + if (RelationUsesLocalBuffers(onerel)) > + { > + /* > + * Give warning only if the user explicitly tries to perform a > + * parallel vacuum on the temporary table. > + */ > + if (params->nworkers > 0) > + ereport(WARNING, > + (errmsg("disabling parallel option of vacuum > on \"%s\" --- cannot vacuum temporary tables in parallel", > > From v45 patch, we moved warning of temporary table into > "params->nworkers >= 0 && vacrelstats->useindex)" check so if table > don't have any index, then we are not giving any warning. I think, we > should give warning for all the temporary tables if parallel degree is > given. (Till v44 patch, we were giving warning for all the temporary > tables(having index and without index)) > I am not sure how useful it is to give WARNING in this case as we are anyway not going to perform a parallel vacuum because it doesn't have an index? One can also say that WARNING is expected in the cases where we skip a parallel vacuum due to any reason (ex., if the size of the index is small), but I don't think that will be a good idea. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, 14 Jan 2020 at 17:16, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > On Tue, 14 Jan 2020 at 16:17, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > On Tue, 14 Jan 2020 at 10:06, Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > > > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote: > > > > > > > > > > Hi > > > > > Thank you for update! I looked again > > > > > > > > > > (vacuum_indexes_leader) > > > > > + /* Skip the indexes that can be processed by parallel workers */ > > > > > + if (!skip_index) > > > > > + continue; > > > > > > > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel? > > > > > > > > > > > > > Again I looked into code and thought that somehow if we can add a > > > > boolean flag(can_parallel) in IndexBulkDeleteResult structure to > > > > identify that this index is supporting parallel vacuum or not, then it > > > > will be easy to skip those indexes and multiple time we will not call > > > > skip_parallel_vacuum_index (from vacuum_indexes_leader and > > > > parallel_vacuum_index) > > > > We can have a linked list of non-parallel supported indexes, then > > > > directly we can pass to vacuum_indexes_leader. > > > > > > > > Ex: let suppose we have 5 indexes into a table. If before launching > > > > parallel workers, if we can add boolean flag(can_parallel) > > > > IndexBulkDeleteResult structure to identify that this index is > > > > supporting parallel vacuum or not. > > > > Let index 1, 4 are not supporting parallel vacuum so we already have > > > > info in a linked list that 1->4 are not supporting parallel vacuum, so > > > > parallel_vacuum_index will process these indexes and rest will be > > > > processed by parallel workers. If parallel worker found that > > > > can_parallel is false, then it will skip that index. > > > > > > > > As per my understanding, if we implement this, then we can avoid > > > > multiple function calling of skip_parallel_vacuum_index and if there > > > > is no index which can't performe parallel vacuum, then we will not > > > > call vacuum_indexes_leader as head of list pointing to null. (we can > > > > save unnecessary calling of vacuum_indexes_leader) > > > > > > > > Thoughts? > > > > > > > > > > We skip not only indexes that don't support parallel index vacuum but > > > also indexes supporting it depending on vacuum phase. That is, we > > > could skip different indexes at different vacuum phase. Therefore with > > > your idea, we would need to have at least three linked lists for each > > > possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is > > > that right? > > > > > > I think we can check if there are indexes that should be processed by > > > the leader process before entering the loop in vacuum_indexes_leader > > > by comparing nindexes_parallel_XXX of LVParallelState to the number of > > > indexes but I'm not sure it's effective since the number of indexes on > > > a table should be small. > > > > > > > Hi, > > > > + /* > > + * Try to initialize the parallel vacuum if requested > > + */ > > + if (params->nworkers >= 0 && vacrelstats->useindex) > > + { > > + /* > > + * Since parallel workers cannot access data in temporary tables, we > > + * can't perform parallel vacuum on them. > > + */ > > + if (RelationUsesLocalBuffers(onerel)) > > + { > > + /* > > + * Give warning only if the user explicitly tries to perform a > > + * parallel vacuum on the temporary table. > > + */ > > + if (params->nworkers > 0) > > + ereport(WARNING, > > + (errmsg("disabling parallel option of vacuum > > on \"%s\" --- cannot vacuum temporary tables in parallel", > > > > From v45 patch, we moved warning of temporary table into > > "params->nworkers >= 0 && vacrelstats->useindex)" check so if table > > don't have any index, then we are not giving any warning. I think, we > > should give warning for all the temporary tables if parallel degree is > > given. (Till v44 patch, we were giving warning for all the temporary > > tables(having index and without index)) > > > > Thoughts? > > Hi, > I did some more review. Below is the 1 review comment for v46-0002. > > + /* > + * Initialize the state for parallel vacuum > + */ > + if (params->nworkers >= 0 && vacrelstats->useindex) > + { > + /* > + * Since parallel workers cannot access data in temporary tables, we > + * can't perform parallel vacuum on them. > + */ > + if (RelationUsesLocalBuffers(onerel) > > In above check, we should add "nindexes > 1" check so that if there is only 1 index, then we will not call begin_parallel_vacuum. I think, " if (params->nworkers >= 0 && nindexes > 1)" check will be enough here . Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
On Tue, 14 Jan 2020 at 21:43, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jan 14, 2020 at 10:04 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Mon, 13 Jan 2020 at 12:50, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > Okay, would it better if we get rid of this variable and have code like below? > > > > > > /* Skip the indexes that can be processed by parallel workers */ > > > if ( !(get_indstats(lps->lvshared, i) == NULL || > > > skip_parallel_vacuum_index(Irel[i], lps->lvshared))) > > > continue; > > > > Make sense to me. > > > > I have changed the comment and condition to make it a positive test so > that it is more clear. > > > > ... > > > > Agreed. But with the updated patch the PARALLEL option without the > > > > parallel degree doesn't display warning because params->nworkers = 0 > > > > in that case. So how about restoring params->nworkers at the end of > > > > vacuum_rel()? > > > > > > > > > > I had also thought on those lines, but I was not entirely sure about > > > this resetting of workers. Today, again thinking about it, it seems > > > the idea Mahendra is suggesting that is giving an error if the > > > parallel degree is not specified seems reasonable to me. This means > > > Vacuum (parallel), Vacuum (parallel) <tbl_name>, etc. will give an > > > error "parallel degree must be specified". This idea has merit as now > > > we are supporting a parallel vacuum by default, so a 'parallel' option > > > without a parallel degree doesn't have any meaning. If we do that, > > > then we don't need to do anything additional about the handling of > > > temp tables (other than what patch is already doing) as well. What do > > > you think? > > > > > > > Good point! Agreed. > > > > Thanks, changed accordingly. > Thank you for updating the patch! I have a few small comments. The rest looks good to me. 1. + * Compute the number of parallel worker processes to request. Both index + * vacuum and index cleanup can be executed with parallel workers. The + * relation size of the table don't affect the parallel degree for now. s/don't/doesn't/ 2. @@ -383,6 +435,7 @@ vacuum(List *relations, VacuumParams *params, VacuumPageHit = 0; VacuumPageMiss = 0; VacuumPageDirty = 0; + VacuumSharedCostBalance = NULL; I think we can initialize VacuumCostBalanceLocal and VacuumActiveNWorkers here. We use these parameters during parallel index vacuum and reset at the end but we might want to initialize them for safety. 3. + /* Set cost-based vacuum delay */ + VacuumCostActive = (VacuumCostDelay > 0); + VacuumCostBalance = 0; + VacuumPageHit = 0; + VacuumPageMiss = 0; + VacuumPageDirty = 0; + VacuumSharedCostBalance = &(lvshared->cost_balance); + VacuumActiveNWorkers = &(lvshared->active_nworkers); VacuumCostBalanceLocal also needs to be initialized. 4. The regression tests don't have the test case of PARALLEL 0. Since I guess you already modifies the code locally I've attached the diff containing the above review comments. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, 15 Jan 2020 at 12:34, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > On Tue, 14 Jan 2020 at 17:16, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > On Tue, 14 Jan 2020 at 16:17, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > > > On Tue, 14 Jan 2020 at 10:06, Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > > > > > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov <sk@zsrv.org> wrote: > > > > > > > > > > > > Hi > > > > > > Thank you for update! I looked again > > > > > > > > > > > > (vacuum_indexes_leader) > > > > > > + /* Skip the indexes that can be processed by parallel workers */ > > > > > > + if (!skip_index) > > > > > > + continue; > > > > > > > > > > > > Does the variable name skip_index not confuse here? Maybe rename to something like can_parallel? > > > > > > > > > > > > > > > > Again I looked into code and thought that somehow if we can add a > > > > > boolean flag(can_parallel) in IndexBulkDeleteResult structure to > > > > > identify that this index is supporting parallel vacuum or not, then it > > > > > will be easy to skip those indexes and multiple time we will not call > > > > > skip_parallel_vacuum_index (from vacuum_indexes_leader and > > > > > parallel_vacuum_index) > > > > > We can have a linked list of non-parallel supported indexes, then > > > > > directly we can pass to vacuum_indexes_leader. > > > > > > > > > > Ex: let suppose we have 5 indexes into a table. If before launching > > > > > parallel workers, if we can add boolean flag(can_parallel) > > > > > IndexBulkDeleteResult structure to identify that this index is > > > > > supporting parallel vacuum or not. > > > > > Let index 1, 4 are not supporting parallel vacuum so we already have > > > > > info in a linked list that 1->4 are not supporting parallel vacuum, so > > > > > parallel_vacuum_index will process these indexes and rest will be > > > > > processed by parallel workers. If parallel worker found that > > > > > can_parallel is false, then it will skip that index. > > > > > > > > > > As per my understanding, if we implement this, then we can avoid > > > > > multiple function calling of skip_parallel_vacuum_index and if there > > > > > is no index which can't performe parallel vacuum, then we will not > > > > > call vacuum_indexes_leader as head of list pointing to null. (we can > > > > > save unnecessary calling of vacuum_indexes_leader) > > > > > > > > > > Thoughts? > > > > > > > > > > > > > We skip not only indexes that don't support parallel index vacuum but > > > > also indexes supporting it depending on vacuum phase. That is, we > > > > could skip different indexes at different vacuum phase. Therefore with > > > > your idea, we would need to have at least three linked lists for each > > > > possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is > > > > that right? > > > > > > > > I think we can check if there are indexes that should be processed by > > > > the leader process before entering the loop in vacuum_indexes_leader > > > > by comparing nindexes_parallel_XXX of LVParallelState to the number of > > > > indexes but I'm not sure it's effective since the number of indexes on > > > > a table should be small. > > > > > > > > > > Hi, > > > > > > + /* > > > + * Try to initialize the parallel vacuum if requested > > > + */ > > > + if (params->nworkers >= 0 && vacrelstats->useindex) > > > + { > > > + /* > > > + * Since parallel workers cannot access data in temporary tables, we > > > + * can't perform parallel vacuum on them. > > > + */ > > > + if (RelationUsesLocalBuffers(onerel)) > > > + { > > > + /* > > > + * Give warning only if the user explicitly tries to perform a > > > + * parallel vacuum on the temporary table. > > > + */ > > > + if (params->nworkers > 0) > > > + ereport(WARNING, > > > + (errmsg("disabling parallel option of vacuum > > > on \"%s\" --- cannot vacuum temporary tables in parallel", > > > > > > From v45 patch, we moved warning of temporary table into > > > "params->nworkers >= 0 && vacrelstats->useindex)" check so if table > > > don't have any index, then we are not giving any warning. I think, we > > > should give warning for all the temporary tables if parallel degree is > > > given. (Till v44 patch, we were giving warning for all the temporary > > > tables(having index and without index)) > > > > > > Thoughts? > > > > Hi, > > I did some more review. Below is the 1 review comment for v46-0002. > > > > + /* > > + * Initialize the state for parallel vacuum > > + */ > > + if (params->nworkers >= 0 && vacrelstats->useindex) > > + { > > + /* > > + * Since parallel workers cannot access data in temporary tables, we > > + * can't perform parallel vacuum on them. > > + */ > > + if (RelationUsesLocalBuffers(onerel) > > > > In above check, we should add "nindexes > 1" check so that if there is only 1 index, then we will not call begin_parallel_vacuum. > > I think, " if (params->nworkers >= 0 && nindexes > 1)" check will be > enough here . > Hmm I think if we removed vacrelstats->useindex from that condition we will call begin_parallel_vacuum even when index cleanup is disabled. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > Thank you for updating the patch! I have a few small comments. > I have adapted all your changes, fixed the comment by Mahendra related to initializing parallel state only when there are at least two indexes. Additionally, I have changed a few comments (make the reference to parallel vacuum consistent, at some places we were referring it as 'parallel lazy vacuum' and at other places it was 'parallel index vacuum'). > The > rest looks good to me. > Okay, I think the patch is in good shape. I am planning to read it a few more times (at least 2 times) and then probably will commit it early next week (Monday or Tuesday) unless there are any major comments. I have already committed the API patch (4d8a8d0c73). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Wed, 15 Jan 2020 at 17:27, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > Thank you for updating the patch! I have a few small comments.
> >
>
> I have adapted all your changes, fixed the comment by Mahendra related
> to initializing parallel state only when there are at least two
> indexes. Additionally, I have changed a few comments (make the
> reference to parallel vacuum consistent, at some places we were
> referring it as 'parallel lazy vacuum' and at other places it was
> 'parallel index vacuum').
>
> > The
> > rest looks good to me.
> >
>
> Okay, I think the patch is in good shape. I am planning to read it a
> few more times (at least 2 times) and then probably will commit it
> early next week (Monday or Tuesday) unless there are any major
> comments. I have already committed the API patch (4d8a8d0c73).
>
Hi,
Thanks Amit for fixing review comments.
1.
--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
>
> On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > Thank you for updating the patch! I have a few small comments.
> >
>
> I have adapted all your changes, fixed the comment by Mahendra related
> to initializing parallel state only when there are at least two
> indexes. Additionally, I have changed a few comments (make the
> reference to parallel vacuum consistent, at some places we were
> referring it as 'parallel lazy vacuum' and at other places it was
> 'parallel index vacuum').
>
> > The
> > rest looks good to me.
> >
>
> Okay, I think the patch is in good shape. I am planning to read it a
> few more times (at least 2 times) and then probably will commit it
> early next week (Monday or Tuesday) unless there are any major
> comments. I have already committed the API patch (4d8a8d0c73).
>
Hi,
Thanks Amit for fixing review comments.
I reviewed v48 patch and below are some comments.
1.
+ * based on the number of indexes. -1 indicates a parallel vacuum is
I think, above should be like "-1 indicates that parallel vacuum is"
2.
+/* Variables for cost-based parallel vacuum */
At the end of comment, there is 2 spaces. I think, it should be only 1 space.
3.
I think, we should add a test case for parallel option(when degree is not specified).
Ex:
postgres=# VACUUM (PARALLEL) tmp;
ERROR: parallel option requires a value between 0 and 1024
LINE 1: VACUUM (PARALLEL) tmp;
^
postgres=#
ERROR: parallel option requires a value between 0 and 1024
LINE 1: VACUUM (PARALLEL) tmp;
^
postgres=#
Because above error is added in this parallel patch, so we should have test case for this to increase code coverage.
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
On Wed, 15 Jan 2020 at 19:04, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > On Wed, 15 Jan 2020 at 17:27, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > Thank you for updating the patch! I have a few small comments. > > > > > > > I have adapted all your changes, fixed the comment by Mahendra related > > to initializing parallel state only when there are at least two > > indexes. Additionally, I have changed a few comments (make the > > reference to parallel vacuum consistent, at some places we were > > referring it as 'parallel lazy vacuum' and at other places it was > > 'parallel index vacuum'). > > > > > The > > > rest looks good to me. > > > > > > > Okay, I think the patch is in good shape. I am planning to read it a > > few more times (at least 2 times) and then probably will commit it > > early next week (Monday or Tuesday) unless there are any major > > comments. I have already committed the API patch (4d8a8d0c73). > > > > Hi, > Thanks Amit for fixing review comments. > > I reviewed v48 patch and below are some comments. > > 1. > + * based on the number of indexes. -1 indicates a parallel vacuum is > > I think, above should be like "-1 indicates that parallel vacuum is" > > 2. > +/* Variables for cost-based parallel vacuum */ > > At the end of comment, there is 2 spaces. I think, it should be only 1 space. > > 3. > I think, we should add a test case for parallel option(when degree is not specified). > Ex: > postgres=# VACUUM (PARALLEL) tmp; > ERROR: parallel option requires a value between 0 and 1024 > LINE 1: VACUUM (PARALLEL) tmp; > ^ > postgres=# > > Because above error is added in this parallel patch, so we should have test case for this to increase code coverage. > Hi Below are some more review comments for v48 patch. 1. #include "storage/bufpage.h" #include "storage/lockdefs.h" +#include "storage/shm_toc.h" +#include "storage/dsm.h" Here, order of header file is not alphabetically. (storage/dsm.h should come before storage/lockdefs.h) 2. + /* No index supports parallel vacuum */ + if (nindexes_parallel == 0) + return 0; + + /* The leader process takes one index */ + nindexes_parallel--; Above code can be rearranged as: + /* The leader process takes one index */ + nindexes_parallel--; + + /* No index supports parallel vacuum */ + if (nindexes_parallel <= 0) + return 0; If we do like this, then in some cases, we can skip some calculations of parallel workers. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
On Wed, 15 Jan 2020 at 19:31, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > On Wed, 15 Jan 2020 at 19:04, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > On Wed, 15 Jan 2020 at 17:27, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Jan 15, 2020 at 10:05 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > Thank you for updating the patch! I have a few small comments. > > > > > > > > > > I have adapted all your changes, fixed the comment by Mahendra related > > > to initializing parallel state only when there are at least two > > > indexes. Additionally, I have changed a few comments (make the > > > reference to parallel vacuum consistent, at some places we were > > > referring it as 'parallel lazy vacuum' and at other places it was > > > 'parallel index vacuum'). > > > > > > > The > > > > rest looks good to me. > > > > > > > > > > Okay, I think the patch is in good shape. I am planning to read it a > > > few more times (at least 2 times) and then probably will commit it > > > early next week (Monday or Tuesday) unless there are any major > > > comments. I have already committed the API patch (4d8a8d0c73). > > > > > > > Hi, > > Thanks Amit for fixing review comments. > > > > I reviewed v48 patch and below are some comments. > > > > 1. > > + * based on the number of indexes. -1 indicates a parallel vacuum is > > > > I think, above should be like "-1 indicates that parallel vacuum is" > > > > 2. > > +/* Variables for cost-based parallel vacuum */ > > > > At the end of comment, there is 2 spaces. I think, it should be only 1 space. > > > > 3. > > I think, we should add a test case for parallel option(when degree is not specified). > > Ex: > > postgres=# VACUUM (PARALLEL) tmp; > > ERROR: parallel option requires a value between 0 and 1024 > > LINE 1: VACUUM (PARALLEL) tmp; > > ^ > > postgres=# > > > > Because above error is added in this parallel patch, so we should have test case for this to increase code coverage. > > > > Hi > Below are some more review comments for v48 patch. > > 1. > #include "storage/bufpage.h" > #include "storage/lockdefs.h" > +#include "storage/shm_toc.h" > +#include "storage/dsm.h" > > Here, order of header file is not alphabetically. (storage/dsm.h > should come before storage/lockdefs.h) > > 2. > + /* No index supports parallel vacuum */ > + if (nindexes_parallel == 0) > + return 0; > + > + /* The leader process takes one index */ > + nindexes_parallel--; > > Above code can be rearranged as: > > + /* The leader process takes one index */ > + nindexes_parallel--; > + > + /* No index supports parallel vacuum */ > + if (nindexes_parallel <= 0) > + return 0; > > If we do like this, then in some cases, we can skip some calculations > of parallel workers. > > -- > Thanks and Regards > Mahendra Singh Thalor > EnterpriseDB: http://www.enterprisedb.com Hi, I checked code coverage and time taken by vacuum.sql test with and without v48 patch. Below are some findings (I ran "make check-world -i" to get coverage.) 1. With v45 patch, compute_parallel_delay is never called so function hit is zero. I think, we can add some delay options into vacuum.sql test to hit function. 2. I checked time taken by vacuum.sql test. Execution time is almost same with and without v45 patch. Without v45 patch: Run1) vacuum ... ok 701 ms Run2) vacuum ... ok 549 ms Run3) vacuum ... ok 559 ms Run4) vacuum ... ok 480 ms With v45 patch: Run1) vacuum ... ok 842 ms Run2) vacuum ... ok 808 ms Run3) vacuum ... ok 774 ms Run4) vacuum ... ok 792 ms -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
On Thu, Jan 16, 2020 at 1:02 AM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > On Wed, 15 Jan 2020 at 19:31, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > On Wed, 15 Jan 2020 at 19:04, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > > > > > > I reviewed v48 patch and below are some comments. > > > > > > 1. > > > + * based on the number of indexes. -1 indicates a parallel vacuum is > > > > > > I think, above should be like "-1 indicates that parallel vacuum is" > > > I am not an expert in this matter, but I am not sure if your suggestion is correct. I thought an article is required here, but I could be wrong. Can you please clarify? > > > 2. > > > +/* Variables for cost-based parallel vacuum */ > > > > > > At the end of comment, there is 2 spaces. I think, it should be only 1 space. > > > > > > 3. > > > I think, we should add a test case for parallel option(when degree is not specified). > > > Ex: > > > postgres=# VACUUM (PARALLEL) tmp; > > > ERROR: parallel option requires a value between 0 and 1024 > > > LINE 1: VACUUM (PARALLEL) tmp; > > > ^ > > > postgres=# > > > > > > Because above error is added in this parallel patch, so we should have test case for this to increase code coverage. > > > I thought about it but was not sure to add a test for it. We might not want to add a test for each and every case as that will increase the number and time of tests without a significant advantage. Now that you have pointed this, I can add a test for it unless someone else thinks otherwise. > > 1. > With v45 patch, compute_parallel_delay is never called so function hit > is zero. I think, we can add some delay options into vacuum.sql test > to hit function. > But how can we meaningfully test the functionality of the delay? It would be tricky to come up with a portable test that can always produce consistent results. > 2. > I checked time taken by vacuum.sql test. Execution time is almost same > with and without v45 patch. > > Without v45 patch: > Run1) vacuum ... ok 701 ms > Run2) vacuum ... ok 549 ms > Run3) vacuum ... ok 559 ms > Run4) vacuum ... ok 480 ms > > With v45 patch: > Run1) vacuum ... ok 842 ms > Run2) vacuum ... ok 808 ms > Run3) vacuum ... ok 774 ms > Run4) vacuum ... ok 792 ms > I see some variance in results, have you run with autovacuum as off. I was expecting that this might speed up some cases where parallel vacuum is used by default. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, 16 Jan 2020 at 08:22, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jan 16, 2020 at 1:02 AM Mahendra Singh Thalor > <mahi6run@gmail.com> wrote: > > > > On Wed, 15 Jan 2020 at 19:31, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > > > On Wed, 15 Jan 2020 at 19:04, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > > > > > > > > > I reviewed v48 patch and below are some comments. > > > > > > > > 1. > > > > + * based on the number of indexes. -1 indicates a parallel vacuum is > > > > > > > > I think, above should be like "-1 indicates that parallel vacuum is" > > > > > > I am not an expert in this matter, but I am not sure if your > suggestion is correct. I thought an article is required here, but I > could be wrong. Can you please clarify? > > > > > 2. > > > > +/* Variables for cost-based parallel vacuum */ > > > > > > > > At the end of comment, there is 2 spaces. I think, it should be only 1 space. > > > > > > > > 3. > > > > I think, we should add a test case for parallel option(when degree is not specified). > > > > Ex: > > > > postgres=# VACUUM (PARALLEL) tmp; > > > > ERROR: parallel option requires a value between 0 and 1024 > > > > LINE 1: VACUUM (PARALLEL) tmp; > > > > ^ > > > > postgres=# > > > > > > > > Because above error is added in this parallel patch, so we should have test case for this to increase code coverage. > > > > > > I thought about it but was not sure to add a test for it. We might > not want to add a test for each and every case as that will increase > the number and time of tests without a significant advantage. Now > that you have pointed this, I can add a test for it unless someone > else thinks otherwise. > > > > > 1. > > With v45 patch, compute_parallel_delay is never called so function hit > > is zero. I think, we can add some delay options into vacuum.sql test > > to hit function. > > > > But how can we meaningfully test the functionality of the delay? It > would be tricky to come up with a portable test that can always > produce consistent results. > > > 2. > > I checked time taken by vacuum.sql test. Execution time is almost same > > with and without v45 patch. > > > > Without v45 patch: > > Run1) vacuum ... ok 701 ms > > Run2) vacuum ... ok 549 ms > > Run3) vacuum ... ok 559 ms > > Run4) vacuum ... ok 480 ms > > > > With v45 patch: > > Run1) vacuum ... ok 842 ms > > Run2) vacuum ... ok 808 ms > > Run3) vacuum ... ok 774 ms > > Run4) vacuum ... ok 792 ms > > > > I see some variance in results, have you run with autovacuum as off. > I was expecting that this might speed up some cases where parallel > vacuum is used by default. I think, this is expected difference in timing because we are adding some vacuum related test. I am not starting server manually(means I am starting server with only default setting). If we start server with default settings, then we will not hit vacuum related test cases to parallel because size of index relation is very small so we will not do parallel vacuum. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
On Thu, Jan 16, 2020 at 10:11 AM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > On Thu, 16 Jan 2020 at 08:22, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > 2. > > > I checked time taken by vacuum.sql test. Execution time is almost same > > > with and without v45 patch. > > > > > > Without v45 patch: > > > Run1) vacuum ... ok 701 ms > > > Run2) vacuum ... ok 549 ms > > > Run3) vacuum ... ok 559 ms > > > Run4) vacuum ... ok 480 ms > > > > > > With v45 patch: > > > Run1) vacuum ... ok 842 ms > > > Run2) vacuum ... ok 808 ms > > > Run3) vacuum ... ok 774 ms > > > Run4) vacuum ... ok 792 ms > > > > > > > I see some variance in results, have you run with autovacuum as off. > > I was expecting that this might speed up some cases where parallel > > vacuum is used by default. > > I think, this is expected difference in timing because we are adding > some vacuum related test. I am not starting server manually(means I am > starting server with only default setting). > Can you once test by setting autovacuum = off? The autovacuum leads to variability in test timing. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, 16 Jan 2020 at 14:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jan 16, 2020 at 10:11 AM Mahendra Singh Thalor > <mahi6run@gmail.com> wrote: > > > > On Thu, 16 Jan 2020 at 08:22, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > 2. > > > > I checked time taken by vacuum.sql test. Execution time is almost same > > > > with and without v45 patch. > > > > > > > > Without v45 patch: > > > > Run1) vacuum ... ok 701 ms > > > > Run2) vacuum ... ok 549 ms > > > > Run3) vacuum ... ok 559 ms > > > > Run4) vacuum ... ok 480 ms > > > > > > > > With v45 patch: > > > > Run1) vacuum ... ok 842 ms > > > > Run2) vacuum ... ok 808 ms > > > > Run3) vacuum ... ok 774 ms > > > > Run4) vacuum ... ok 792 ms > > > > > > > > > > I see some variance in results, have you run with autovacuum as off. > > > I was expecting that this might speed up some cases where parallel > > > vacuum is used by default. > > > > I think, this is expected difference in timing because we are adding > > some vacuum related test. I am not starting server manually(means I am > > starting server with only default setting). > > > > Can you once test by setting autovacuum = off? The autovacuum leads > to variability in test timing. > > I've also run the regression tests with and without the patch: * w/o patch and autovacuum = on: 255 ms * w/o patch and autovacuum = off: 258 ms * w/ patch and autovacuum = on: 370 ms * w/ patch and autovacuum = off: 375 ms > > If we start server with default settings, then we will not hit vacuum > > related test cases to parallel because size of index relation is very > > small so we will not do parallel vacuum. Right. Most indexes (all?) of tables that are used in the regression tests are smaller than min_parallel_index_scan_size. And we set min_parallel_index_scan_size to 0 in vacuum.sql but VACUUM would not be speeded-up much because of the relation size. Since we instead populate new table for parallel vacuum testing the regression test for vacuum would take a longer time. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Jan 16, 2020 at 4:46 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > Right. Most indexes (all?) of tables that are used in the regression > tests are smaller than min_parallel_index_scan_size. And we set > min_parallel_index_scan_size to 0 in vacuum.sql but VACUUM would not > be speeded-up much because of the relation size. Since we instead > populate new table for parallel vacuum testing the regression test for > vacuum would take a longer time. > Fair enough and I think it is good in a way that it won't change the coverage of existing vacuum code. I have fixed all the issues reported by Mahendra and have fixed a few other cosmetic things in the attached patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
Hi all,
I would like to share my observation on this PG feature "Block-level parallel vacuum".
I have tested the earlier patch (i.e v48) with below high-level test scenarios, and those are working as expected.
- I have played around with these GUC parameters while testing
max_worker_processesautovacuum = offshared_buffersmax_parallel_workersmax_parallel_maintenance_workersmin_parallel_index_scan_sizevacuum_cost_limitvacuum_cost_delay
- Tested the parallel vacuum with tables and Partition tables having possible datatypes and Columns having various indexes(like btree, gist, etc.) on part / full table.
- Tested the pgbench tables data with multiple indexes created manually and ran script(vacuum_test.sql) with DMLs and VACUUM for multiple Clients, Jobs, and Time as below.
./pgbench -c 8 -j 16 -T 900 postgres -f vacuum_test.sqlWe observe the usage of parallel workers during VACUUM.
- Ran few isolation schedule test cases(in regression) with huge data and indexes, perform DMLs -> VACUUM
- Tested with PARTITION TABLEs -> global/local indexes -> DMLs -> VACUUM
- Tested with PARTITION TABLE having different TABLESPACE in different location -> global/local indexes -> DMLs -> VACUUM
- With Changing STORAGE options for columns(as PLAIN / EXTERNAL / EXTENDED) -> DMLs -> VACUUM
- Create index with CONCURRENTLY option / Changing storage_parameter for index as below -> DMLs -> VACUUM
with(buffering=auto) / with(buffering=on) / with(buffering=off) / with(fillfactor=30);
- Tested with creating Simple and Partitioned tables -> DMLs -> pg_dump/pg_restore/pg_upgrade -> VACUUM
Verified the data after restore / upgrade / VACUUM.
- Indexes on UUID-OSSP data -> DMLs -> pg_upgrade -> VACUUM
- Verified with various test scenarios for better performance of parallel VACUUM as compared to Non-parallel VACUUM.
Time taken by VACUUM on PG HEAD+PATCH(with PARALLEL) < Time taken by VACUUM on PG HEAD (without PARALLEL)
Machine configuration: (16 VCPUs / RAM: 16GB / Disk size: 640GB)
PG HEAD:VACUUM tab1;
Time: 38915.384 ms (00:38.915)Time: 48389.006 ms (00:48.389)Time: 41324.223 ms (00:41.324)Time: 37640.874 ms (00:37.641) --medianTime: 36897.325 ms (00:36.897)Time: 36351.022 ms (00:36.351)Time: 36198.890 ms (00:36.199)
PG HEAD + v48 Patch:VACUUM tab1;
Time: 37051.589 ms (00:37.052)Time: 33647.459 ms (00:33.647) --medianTime: 31580.894 ms (00:31.581)Time: 34442.046 ms (00:34.442)Time: 31335.960 ms (00:31.336)Time: 34441.245 ms (00:34.441)Time: 31159.639 ms (00:31.160)
With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com
On Thu, Jan 16, 2020 at 5:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Jan 16, 2020 at 4:46 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > Right. Most indexes (all?) of tables that are used in the regression > > tests are smaller than min_parallel_index_scan_size. And we set > > min_parallel_index_scan_size to 0 in vacuum.sql but VACUUM would not > > be speeded-up much because of the relation size. Since we instead > > populate new table for parallel vacuum testing the regression test for > > vacuum would take a longer time. > > > > Fair enough and I think it is good in a way that it won't change the > coverage of existing vacuum code. I have fixed all the issues > reported by Mahendra and have fixed a few other cosmetic things in the > attached patch. > I have few small comments. 1. logical streaming for large in-progress transactions+ + /* Can't perform vacuum in parallel */ + if (parallel_workers <= 0) + { + pfree(can_parallel_vacuum); + return lps; + } why are we checking parallel_workers <= 0, Function compute_parallel_vacuum_workers only returns 0 or greater than 0 so isn't it better to just check if (parallel_workers == 0) ? 2. +/* + * Macro to check if we are in a parallel vacuum. If true, we are in the + * parallel mode and the DSM segment is initialized. + */ +#define ParallelVacuumIsActive(lps) (((LVParallelState *) (lps)) != NULL) (LVParallelState *) (lps) -> this typecast is not required, just (lps) != NULL should be enough. 3. + shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes))); + prepare_index_statistics(shared, can_parallel_vacuum, nindexes); + pg_atomic_init_u32(&(shared->idx), 0); + pg_atomic_init_u32(&(shared->cost_balance), 0); + pg_atomic_init_u32(&(shared->active_nworkers), 0); I think it will look cleaner if we can initialize in the order they are declared in structure. 4. + VacuumSharedCostBalance = &(lps->lvshared->cost_balance); + VacuumActiveNWorkers = &(lps->lvshared->active_nworkers); + + /* + * Set up shared cost balance and the number of active workers for + * vacuum delay. + */ + pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance); + pg_atomic_write_u32(VacuumActiveNWorkers, 0); + + /* + * The number of workers can vary between bulkdelete and cleanup + * phase. + */ + ReinitializeParallelWorkers(lps->pcxt, nworkers); + + LaunchParallelWorkers(lps->pcxt); + + if (lps->pcxt->nworkers_launched > 0) + { + /* + * Reset the local cost values for leader backend as we have + * already accumulated the remaining balance of heap. + */ + VacuumCostBalance = 0; + VacuumCostBalanceLocal = 0; + } + else + { + /* + * Disable shared cost balance if we are not able to launch + * workers. + */ + VacuumSharedCostBalance = NULL; + VacuumActiveNWorkers = NULL; + } + I don't like the idea of first initializing the VacuumSharedCostBalance with lps->lvshared->cost_balance and then uninitialize if nworkers_launched is 0. I am not sure why do we need to initialize VacuumSharedCostBalance here? just to perform pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance);? I think we can initialize it only if nworkers_launched > 0 then we can get rid of the else branch completely. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 17, 2020 at 9:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Jan 16, 2020 at 5:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Jan 16, 2020 at 4:46 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > Right. Most indexes (all?) of tables that are used in the regression > > > tests are smaller than min_parallel_index_scan_size. And we set > > > min_parallel_index_scan_size to 0 in vacuum.sql but VACUUM would not > > > be speeded-up much because of the relation size. Since we instead > > > populate new table for parallel vacuum testing the regression test for > > > vacuum would take a longer time. > > > > > > > Fair enough and I think it is good in a way that it won't change the > > coverage of existing vacuum code. I have fixed all the issues > > reported by Mahendra and have fixed a few other cosmetic things in the > > attached patch. > > > I have few small comments. > > 1. > logical streaming for large in-progress transactions+ > + /* Can't perform vacuum in parallel */ > + if (parallel_workers <= 0) > + { > + pfree(can_parallel_vacuum); > + return lps; > + } > > why are we checking parallel_workers <= 0, Function > compute_parallel_vacuum_workers only returns 0 or greater than 0 > so isn't it better to just check if (parallel_workers == 0) ? > > 2. > +/* > + * Macro to check if we are in a parallel vacuum. If true, we are in the > + * parallel mode and the DSM segment is initialized. > + */ > +#define ParallelVacuumIsActive(lps) (((LVParallelState *) (lps)) != NULL) > > (LVParallelState *) (lps) -> this typecast is not required, just (lps) > != NULL should be enough. > > 3. > > + shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes))); > + prepare_index_statistics(shared, can_parallel_vacuum, nindexes); > + pg_atomic_init_u32(&(shared->idx), 0); > + pg_atomic_init_u32(&(shared->cost_balance), 0); > + pg_atomic_init_u32(&(shared->active_nworkers), 0); > > I think it will look cleaner if we can initialize in the order they > are declared in structure. > > 4. > + VacuumSharedCostBalance = &(lps->lvshared->cost_balance); > + VacuumActiveNWorkers = &(lps->lvshared->active_nworkers); > + > + /* > + * Set up shared cost balance and the number of active workers for > + * vacuum delay. > + */ > + pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance); > + pg_atomic_write_u32(VacuumActiveNWorkers, 0); > + > + /* > + * The number of workers can vary between bulkdelete and cleanup > + * phase. > + */ > + ReinitializeParallelWorkers(lps->pcxt, nworkers); > + > + LaunchParallelWorkers(lps->pcxt); > + > + if (lps->pcxt->nworkers_launched > 0) > + { > + /* > + * Reset the local cost values for leader backend as we have > + * already accumulated the remaining balance of heap. > + */ > + VacuumCostBalance = 0; > + VacuumCostBalanceLocal = 0; > + } > + else > + { > + /* > + * Disable shared cost balance if we are not able to launch > + * workers. > + */ > + VacuumSharedCostBalance = NULL; > + VacuumActiveNWorkers = NULL; > + } > + > > I don't like the idea of first initializing the > VacuumSharedCostBalance with lps->lvshared->cost_balance and then > uninitialize if nworkers_launched is 0. > I am not sure why do we need to initialize VacuumSharedCostBalance > here? just to perform pg_atomic_write_u32(VacuumSharedCostBalance, > VacuumCostBalance);? > I think we can initialize it only if nworkers_launched > 0 then we can > get rid of the else branch completely. I missed one of my comment + /* Carry the shared balance value to heap scan */ + if (VacuumSharedCostBalance) + VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance); + + if (nworkers > 0) + { + /* Disable shared cost balance */ + VacuumSharedCostBalance = NULL; + VacuumActiveNWorkers = NULL; + } Doesn't make sense to keep them as two conditions, we can combine them as below /* If shared costing is enable, carry the shared balance value to heap scan and disable the shared costing */ if (VacuumSharedCostBalance) { VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance); VacuumSharedCostBalance = NULL; VacuumActiveNWorkers = NULL; } -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 17, 2020 at 9:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I have few small comments. > > 1. > logical streaming for large in-progress transactions+ > + /* Can't perform vacuum in parallel */ > + if (parallel_workers <= 0) > + { > + pfree(can_parallel_vacuum); > + return lps; > + } > > why are we checking parallel_workers <= 0, Function > compute_parallel_vacuum_workers only returns 0 or greater than 0 > so isn't it better to just check if (parallel_workers == 0) ? > Why to have such an assumption about compute_parallel_vacuum_workers()? The function compute_parallel_vacuum_workers() returns int, so such a check (<= 0) seems reasonable to me. > 2. > +/* > + * Macro to check if we are in a parallel vacuum. If true, we are in the > + * parallel mode and the DSM segment is initialized. > + */ > +#define ParallelVacuumIsActive(lps) (((LVParallelState *) (lps)) != NULL) > > (LVParallelState *) (lps) -> this typecast is not required, just (lps) > != NULL should be enough. > I think the better idea would be to just replace it PointerIsValid like below. I see similar usage in other places. #define ParallelVacuumIsActive(lps) PointerIsValid(lps) > 3. > > + shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes))); > + prepare_index_statistics(shared, can_parallel_vacuum, nindexes); > + pg_atomic_init_u32(&(shared->idx), 0); > + pg_atomic_init_u32(&(shared->cost_balance), 0); > + pg_atomic_init_u32(&(shared->active_nworkers), 0); > > I think it will look cleaner if we can initialize in the order they > are declared in structure. > Okay. > 4. > + VacuumSharedCostBalance = &(lps->lvshared->cost_balance); > + VacuumActiveNWorkers = &(lps->lvshared->active_nworkers); > + > + /* > + * Set up shared cost balance and the number of active workers for > + * vacuum delay. > + */ > + pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance); > + pg_atomic_write_u32(VacuumActiveNWorkers, 0); > + > + /* > + * The number of workers can vary between bulkdelete and cleanup > + * phase. > + */ > + ReinitializeParallelWorkers(lps->pcxt, nworkers); > + > + LaunchParallelWorkers(lps->pcxt); > + > + if (lps->pcxt->nworkers_launched > 0) > + { > + /* > + * Reset the local cost values for leader backend as we have > + * already accumulated the remaining balance of heap. > + */ > + VacuumCostBalance = 0; > + VacuumCostBalanceLocal = 0; > + } > + else > + { > + /* > + * Disable shared cost balance if we are not able to launch > + * workers. > + */ > + VacuumSharedCostBalance = NULL; > + VacuumActiveNWorkers = NULL; > + } > + > > I don't like the idea of first initializing the > VacuumSharedCostBalance with lps->lvshared->cost_balance and then > uninitialize if nworkers_launched is 0. > I am not sure why do we need to initialize VacuumSharedCostBalance > here? just to perform pg_atomic_write_u32(VacuumSharedCostBalance, > VacuumCostBalance);? > I think we can initialize it only if nworkers_launched > 0 then we can > get rid of the else branch completely. > No, we can't initialize after nworkers_launched > 0 because by that time some workers would have already tried to access the shared cost balance. So, it needs to be done before launching the workers as is done in code. We can probably add a comment. > > + /* Carry the shared balance value to heap scan */ > + if (VacuumSharedCostBalance) > + VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance); > + > + if (nworkers > 0) > + { > + /* Disable shared cost balance */ > + VacuumSharedCostBalance = NULL; > + VacuumActiveNWorkers = NULL; > + } > > Doesn't make sense to keep them as two conditions, we can combine them as below > > /* If shared costing is enable, carry the shared balance value to heap > scan and disable the shared costing */ > if (VacuumSharedCostBalance) > { > VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance); > VacuumSharedCostBalance = NULL; > VacuumActiveNWorkers = NULL; > } > makes sense to me, will change. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 17, 2020 at 10:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jan 17, 2020 at 9:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I have few small comments. > > > > 1. > > logical streaming for large in-progress transactions+ > > + /* Can't perform vacuum in parallel */ > > + if (parallel_workers <= 0) > > + { > > + pfree(can_parallel_vacuum); > > + return lps; > > + } > > > > why are we checking parallel_workers <= 0, Function > > compute_parallel_vacuum_workers only returns 0 or greater than 0 > > so isn't it better to just check if (parallel_workers == 0) ? > > > > Why to have such an assumption about > compute_parallel_vacuum_workers()? The function > compute_parallel_vacuum_workers() returns int, so such a check > (<= 0) seems reasonable to me. Okay so I should probably change my statement to why compute_parallel_vacuum_workers is returning "int" instead of uint? I mean when this function is designed to return 0 or more worker why to make it return int and then handle extra values on caller. Am I missing something, can it really return negative in some cases? I find the below code in "compute_parallel_vacuum_workers" a bit confusing +static int +compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested, + bool *can_parallel_vacuum) +{ ...... + /* The leader process takes one index */ + nindexes_parallel--; --> nindexes_parallel can become -1 + + /* No index supports parallel vacuum */ + if (nindexes_parallel == 0) . -> Now if it is 0 then return 0 but if its -1 then continue. seems strange no? I think here itself we can handle if (nindexes_parallel <= 0), that will make code cleaner. + return 0; + + /* Compute the parallel degree */ + parallel_workers = (nrequested > 0) ? + Min(nrequested, nindexes_parallel) : nindexes_parallel; > > > 2. > > +/* > > + * Macro to check if we are in a parallel vacuum. If true, we are in the > > + * parallel mode and the DSM segment is initialized. > > + */ > > +#define ParallelVacuumIsActive(lps) (((LVParallelState *) (lps)) != NULL) > > > > (LVParallelState *) (lps) -> this typecast is not required, just (lps) > > != NULL should be enough. > > > > I think the better idea would be to just replace it PointerIsValid > like below. I see similar usage in other places. > #define ParallelVacuumIsActive(lps) PointerIsValid(lps) Make sense to me. > > > 3. > > > > + shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes))); > > + prepare_index_statistics(shared, can_parallel_vacuum, nindexes); > > + pg_atomic_init_u32(&(shared->idx), 0); > > + pg_atomic_init_u32(&(shared->cost_balance), 0); > > + pg_atomic_init_u32(&(shared->active_nworkers), 0); > > > > I think it will look cleaner if we can initialize in the order they > > are declared in structure. > > > > Okay. > > > 4. > > + VacuumSharedCostBalance = &(lps->lvshared->cost_balance); > > + VacuumActiveNWorkers = &(lps->lvshared->active_nworkers); > > + > > + /* > > + * Set up shared cost balance and the number of active workers for > > + * vacuum delay. > > + */ > > + pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance); > > + pg_atomic_write_u32(VacuumActiveNWorkers, 0); > > + > > + /* > > + * The number of workers can vary between bulkdelete and cleanup > > + * phase. > > + */ > > + ReinitializeParallelWorkers(lps->pcxt, nworkers); > > + > > + LaunchParallelWorkers(lps->pcxt); > > + > > + if (lps->pcxt->nworkers_launched > 0) > > + { > > + /* > > + * Reset the local cost values for leader backend as we have > > + * already accumulated the remaining balance of heap. > > + */ > > + VacuumCostBalance = 0; > > + VacuumCostBalanceLocal = 0; > > + } > > + else > > + { > > + /* > > + * Disable shared cost balance if we are not able to launch > > + * workers. > > + */ > > + VacuumSharedCostBalance = NULL; > > + VacuumActiveNWorkers = NULL; > > + } > > + > > > > I don't like the idea of first initializing the > > VacuumSharedCostBalance with lps->lvshared->cost_balance and then > > uninitialize if nworkers_launched is 0. > > I am not sure why do we need to initialize VacuumSharedCostBalance > > here? just to perform pg_atomic_write_u32(VacuumSharedCostBalance, > > VacuumCostBalance);? > > I think we can initialize it only if nworkers_launched > 0 then we can > > get rid of the else branch completely. > > > > No, we can't initialize after nworkers_launched > 0 because by that > time some workers would have already tried to access the shared cost > balance. So, it needs to be done before launching the workers as is > done in code. We can probably add a comment. I don't think so, VacuumSharedCostBalance is a process local which is just pointing to the shared memory variable right? and each process has to point it to the shared memory and that we are already doing in parallel_vacuum_main. So we can initialize it after worker is launched. Basically code will look like below pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance); pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0); .. ReinitializeParallelWorkers(lps->pcxt, nworkers); LaunchParallelWorkers(lps->pcxt); if (lps->pcxt->nworkers_launched > 0) { .. VacuumCostBalance = 0; VacuumCostBalanceLocal = 0; VacuumSharedCostBalance = &(lps->lvshared->cost_balance); VacuumActiveNWorkers = &(lps->lvshared->active_nworkers); } -- remove the else part completely.. > > > > > + /* Carry the shared balance value to heap scan */ > > + if (VacuumSharedCostBalance) > > + VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance); > > + > > + if (nworkers > 0) > > + { > > + /* Disable shared cost balance */ > > + VacuumSharedCostBalance = NULL; > > + VacuumActiveNWorkers = NULL; > > + } > > > > Doesn't make sense to keep them as two conditions, we can combine them as below > > > > /* If shared costing is enable, carry the shared balance value to heap > > scan and disable the shared costing */ > > if (VacuumSharedCostBalance) > > { > > VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance); > > VacuumSharedCostBalance = NULL; > > VacuumActiveNWorkers = NULL; > > } > > > > makes sense to me, will change. ok > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 17, 2020 at 11:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Jan 17, 2020 at 10:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jan 17, 2020 at 9:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > I have few small comments. > > > > > > 1. > > > logical streaming for large in-progress transactions+ > > > + /* Can't perform vacuum in parallel */ > > > + if (parallel_workers <= 0) > > > + { > > > + pfree(can_parallel_vacuum); > > > + return lps; > > > + } > > > > > > why are we checking parallel_workers <= 0, Function > > > compute_parallel_vacuum_workers only returns 0 or greater than 0 > > > so isn't it better to just check if (parallel_workers == 0) ? > > > > > > > Why to have such an assumption about > > compute_parallel_vacuum_workers()? The function > > compute_parallel_vacuum_workers() returns int, so such a check > > (<= 0) seems reasonable to me. > > Okay so I should probably change my statement to why > compute_parallel_vacuum_workers is returning "int" instead of uint? > Hmm, I think the number of workers at most places is int, so it is better to return int here which will keep it consistent with how we do at other places. See, the similar usage in compute_parallel_worker. I > mean when this function is designed to return 0 or more worker why to > make it return int and then handle extra values on caller. Am I > missing something, can it really return negative in some cases? > > I find the below code in "compute_parallel_vacuum_workers" a bit confusing > > +static int > +compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested, > + bool *can_parallel_vacuum) > +{ > ...... > + /* The leader process takes one index */ > + nindexes_parallel--; --> nindexes_parallel can become -1 > + > + /* No index supports parallel vacuum */ > + if (nindexes_parallel == 0) . -> Now if it is 0 then return 0 but > if its -1 then continue. seems strange no? I think here itself we can > handle if (nindexes_parallel <= 0), that will make code cleaner. > + return 0; > + I think this got recently introduce by one of my changes based on the comment by Mahendra, we can adjust this check. > > > > > > I don't like the idea of first initializing the > > > VacuumSharedCostBalance with lps->lvshared->cost_balance and then > > > uninitialize if nworkers_launched is 0. > > > I am not sure why do we need to initialize VacuumSharedCostBalance > > > here? just to perform pg_atomic_write_u32(VacuumSharedCostBalance, > > > VacuumCostBalance);? > > > I think we can initialize it only if nworkers_launched > 0 then we can > > > get rid of the else branch completely. > > > > > > > No, we can't initialize after nworkers_launched > 0 because by that > > time some workers would have already tried to access the shared cost > > balance. So, it needs to be done before launching the workers as is > > done in code. We can probably add a comment. > I don't think so, VacuumSharedCostBalance is a process local which is > just pointing to the shared memory variable right? > > and each process has to point it to the shared memory and that we are > already doing in parallel_vacuum_main. So we can initialize it after > worker is launched. > Basically code will look like below > > pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance); > pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0); > oh, I thought you were telling to initialize the shared memory itself after launching the workers. However, you are asking to change the usage of the local variable, I think we can do that. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 17, 2020 at 11:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jan 17, 2020 at 11:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Jan 17, 2020 at 10:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Fri, Jan 17, 2020 at 9:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > I have few small comments. > > > > > > > > 1. > > > > logical streaming for large in-progress transactions+ > > > > + /* Can't perform vacuum in parallel */ > > > > + if (parallel_workers <= 0) > > > > + { > > > > + pfree(can_parallel_vacuum); > > > > + return lps; > > > > + } > > > > > > > > why are we checking parallel_workers <= 0, Function > > > > compute_parallel_vacuum_workers only returns 0 or greater than 0 > > > > so isn't it better to just check if (parallel_workers == 0) ? > > > > > > > > > > Why to have such an assumption about > > > compute_parallel_vacuum_workers()? The function > > > compute_parallel_vacuum_workers() returns int, so such a check > > > (<= 0) seems reasonable to me. > > > > Okay so I should probably change my statement to why > > compute_parallel_vacuum_workers is returning "int" instead of uint? > > > > Hmm, I think the number of workers at most places is int, so it is > better to return int here which will keep it consistent with how we do > at other places. See, the similar usage in compute_parallel_worker. Okay, I see. > > I > > mean when this function is designed to return 0 or more worker why to > > make it return int and then handle extra values on caller. Am I > > missing something, can it really return negative in some cases? > > > > I find the below code in "compute_parallel_vacuum_workers" a bit confusing > > > > +static int > > +compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested, > > + bool *can_parallel_vacuum) > > +{ > > ...... > > + /* The leader process takes one index */ > > + nindexes_parallel--; --> nindexes_parallel can become -1 > > + > > + /* No index supports parallel vacuum */ > > + if (nindexes_parallel == 0) . -> Now if it is 0 then return 0 but > > if its -1 then continue. seems strange no? I think here itself we can > > handle if (nindexes_parallel <= 0), that will make code cleaner. > > + return 0; > > + > > I think this got recently introduce by one of my changes based on the > comment by Mahendra, we can adjust this check. Ok > > > > > > > > > I don't like the idea of first initializing the > > > > VacuumSharedCostBalance with lps->lvshared->cost_balance and then > > > > uninitialize if nworkers_launched is 0. > > > > I am not sure why do we need to initialize VacuumSharedCostBalance > > > > here? just to perform pg_atomic_write_u32(VacuumSharedCostBalance, > > > > VacuumCostBalance);? > > > > I think we can initialize it only if nworkers_launched > 0 then we can > > > > get rid of the else branch completely. > > > > > > > > > > No, we can't initialize after nworkers_launched > 0 because by that > > > time some workers would have already tried to access the shared cost > > > balance. So, it needs to be done before launching the workers as is > > > done in code. We can probably add a comment. > > I don't think so, VacuumSharedCostBalance is a process local which is > > just pointing to the shared memory variable right? > > > > and each process has to point it to the shared memory and that we are > > already doing in parallel_vacuum_main. So we can initialize it after > > worker is launched. > > Basically code will look like below > > > > pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance); > > pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0); > > > > oh, I thought you were telling to initialize the shared memory itself > after launching the workers. However, you are asking to change the > usage of the local variable, I think we can do that. Okay. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 17, 2020 at 11:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: I have performed cost delay testing on the latest test(I have used same script as attahced in [1] and [2]. vacuum_cost_delay = 10 vacuum_cost_limit = 2000 Observation: As we have concluded earlier, the delay time is in sync with the I/O performed by the worker and the total delay (heap + index) is almost the same as the non-parallel operation. test1:[1] Vacuum non-parallel WARNING: VacuumCostTotalDelay=11332.320000 Vacuum 2 workers WARNING: worker 0 delay=171.085000 total io=34288 hit=22208 miss=0 dirty=604 WARNING: worker 1 delay=87.790000 total io=17910 hit=17890 miss=0 dirty=1 WARNING: worker 2 delay=88.620000 total io=17910 hit=17890 miss=0 dirty=1 WARNING: VacuumCostTotalDelay=11505.650000 Vacuum 4 workers WARNING: worker 0 delay=87.750000 total io=17910 hit=17890 miss=0 dirty=1 WARNING: worker 1 delay=89.155000 total io=17910 hit=17890 miss=0 dirty=1 WARNING: worker 2 delay=87.080000 total io=17910 hit=17890 miss=0 dirty=1 WARNING: worker 3 delay=78.745000 total io=16378 hit=4318 miss=0 dirty=603 WARNING: VacuumCostTotalDelay=11590.680000 test2:[2] Vacuum non-parallel WARNING: VacuumCostTotalDelay=22835.970000 Vacuum 2 workers WARNING: worker 0 delay=345.550000 total io=69338 hit=45338 miss=0 dirty=1200 WARNING: worker 1 delay=177.150000 total io=35807 hit=35787 miss=0 dirty=1 WARNING: worker 2 delay=178.105000 total io=35807 hit=35787 miss=0 dirty=1 WARNING: VacuumCostTotalDelay=23191.405000 Vacuum 4 workers WARNING: worker 0 delay=177.265000 total io=35807 hit=35787 miss=0 dirty=1 WARNING: worker 1 delay=177.175000 total io=35807 hit=35787 miss=0 dirty=1 WARNING: worker 2 delay=177.385000 total io=35807 hit=35787 miss=0 dirty=1 WARNING: worker 3 delay=166.515000 total io=33531 hit=9551 miss=0 dirty=1199 WARNING: VacuumCostTotalDelay=23357.115000 [1] https://www.postgresql.org/message-id/CAFiTN-tFLN%3Dvdu5Ra-23E9_7Z1JXkk5MkRY3Bkj2zAoWK7fULA%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAFiTN-tC%3DNcvcEd%2B5J62fR8-D8x7EHuVi2xhS-0DMf1bnJs4hw%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 17, 2020 at 12:51 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Jan 17, 2020 at 11:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > I have performed cost delay testing on the latest test(I have used > same script as attahced in [1] and [2]. > vacuum_cost_delay = 10 > vacuum_cost_limit = 2000 > > Observation: As we have concluded earlier, the delay time is in sync > with the I/O performed by the worker > and the total delay (heap + index) is almost the same as the > non-parallel operation. > Thanks for doing this test again. In the attached patch, I have addressed all the comments and modified a few comments. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Fri, 17 Jan 2020 at 14:47, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 17, 2020 at 12:51 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Jan 17, 2020 at 11:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > I have performed cost delay testing on the latest test(I have used
> > same script as attahced in [1] and [2].
> > vacuum_cost_delay = 10
> > vacuum_cost_limit = 2000
> >
> > Observation: As we have concluded earlier, the delay time is in sync
> > with the I/O performed by the worker
> > and the total delay (heap + index) is almost the same as the
> > non-parallel operation.
> >
>
> Thanks for doing this test again. In the attached patch, I have
> addressed all the comments and modified a few comments.
>
Hi,
Below are some review comments for v50 patch.
1.
+LVShared
+LVSharedIndStats
+LVParallelState
LWLock
I think, LVParallelState should come before LVSharedIndStats.
2.
+ /*
+ * It is possible that parallel context is initialized with fewer workers
+ * then the number of indexes that need a separate worker in the current
+ * phase, so we need to consider it. See compute_parallel_vacuum_workers.
+ */
--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
>
> On Fri, Jan 17, 2020 at 12:51 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Jan 17, 2020 at 11:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > I have performed cost delay testing on the latest test(I have used
> > same script as attahced in [1] and [2].
> > vacuum_cost_delay = 10
> > vacuum_cost_limit = 2000
> >
> > Observation: As we have concluded earlier, the delay time is in sync
> > with the I/O performed by the worker
> > and the total delay (heap + index) is almost the same as the
> > non-parallel operation.
> >
>
> Thanks for doing this test again. In the attached patch, I have
> addressed all the comments and modified a few comments.
>
Hi,
Below are some review comments for v50 patch.
1.
+LVShared
+LVSharedIndStats
+LVParallelState
LWLock
I think, LVParallelState should come before LVSharedIndStats.
2.
+ /*
+ * It is possible that parallel context is initialized with fewer workers
+ * then the number of indexes that need a separate worker in the current
+ * phase, so we need to consider it. See compute_parallel_vacuum_workers.
+ */
This comment is confusing me. I think, "then" should be replaced with "than".
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 17, 2020 at 1:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > Thanks for doing this test again. In the attached patch, I have > addressed all the comments and modified a few comments. I am in favor of the general idea of parallel VACUUM that parallelizes the processing of each index (I haven't looked at the patch, though). I observed something during a recent benchmark of the deduplication patch that seems like it might be relevant to parallel VACUUM. This happened during a recreation of the original WARM benchmark, which is described here: https://www.postgresql.org/message-id/CABOikdMNy6yowA%2BwTGK9RVd8iw%2BCzqHeQSGpW7Yka_4RSZ_LOQ%40mail.gmail.com (There is an extra pgbench_accounts index on abalance, plus 4 indexes on large text columns with filler MD5 hashes, all of which are random.) On the master branch, I can clearly observe that the "filler" MD5 indexes are bloated to a degree that is affected by the order of their original creation/pg_class OID order. These are all indexes that become bloated purely due to "version churn" -- or what I like to call "unnecessary" page splits. The keys used in each pgbench_accounts logical row never change, except in the case of the extra abalance index (the idea is to prevent all HOT updates without ever updating most indexed columns). I noticed that pgb_a_filler1 is a bit less bloated than pgb_a_filler2, which is a little less bloated than pgb_a_filler3, which is a little less bloated than pgb_a_filler4. Even after 4 hours, and even though the "shape" of each index is identical. This demonstrates an important general principle about vacuuming indexes: timeliness can matter a lot. In general, a big benefit of the deduplication patch is that it "buys time" for VACUUM to run before "unnecessary" page splits can occur -- that is why the deduplication patch prevents *all* page splits in these "filler" indexes, whereas on the master branch the filler indexes are about 2x larger (the exact amount varies based on VACUUM processing order, at least earlier on). For tables with several indexes, giving each index its own VACUUM worker process will prevent "unnecessary" page splits caused by version churn, simply because VACUUM will start to clean each index sooner than it would compared to serial processing (except for the "lucky" first index). There is no "lucky" first index that gets preferential treatment -- presumably VACUUM will start processing each index at the same time with this patch, making each index equally "lucky". I think that there may even be a *complementary* effect with parallel VACUUM, though I haven't tested that theory. Deduplication "buys time" for VACUUM to run, while at the same time VACUUM takes less time to show up and prevent "unnecessary" page splits. My guess is that these two seemingly unrelated patches may actually address this "unnecessary page split" problem from two completely different angles, with an overall effect that is greater than the sum of its parts. While the difference in size of each filler index on the master branch wasn't that significant on its own, it's still interesting. It's probably quite workload dependent. -- Peter Geoghegan
On Sun, Jan 19, 2020 at 2:15 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Fri, Jan 17, 2020 at 1:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Thanks for doing this test again. In the attached patch, I have > > addressed all the comments and modified a few comments. > > I am in favor of the general idea of parallel VACUUM that parallelizes > the processing of each index (I haven't looked at the patch, though). > I observed something during a recent benchmark of the deduplication > patch that seems like it might be relevant to parallel VACUUM. This > happened during a recreation of the original WARM benchmark, which is > described here: > > https://www.postgresql.org/message-id/CABOikdMNy6yowA%2BwTGK9RVd8iw%2BCzqHeQSGpW7Yka_4RSZ_LOQ%40mail.gmail.com > > (There is an extra pgbench_accounts index on abalance, plus 4 indexes > on large text columns with filler MD5 hashes, all of which are > random.) > > On the master branch, I can clearly observe that the "filler" MD5 > indexes are bloated to a degree that is affected by the order of their > original creation/pg_class OID order. These are all indexes that > become bloated purely due to "version churn" -- or what I like to call > "unnecessary" page splits. The keys used in each pgbench_accounts > logical row never change, except in the case of the extra abalance > index (the idea is to prevent all HOT updates without ever updating > most indexed columns). I noticed that pgb_a_filler1 is a bit less > bloated than pgb_a_filler2, which is a little less bloated than > pgb_a_filler3, which is a little less bloated than pgb_a_filler4. Even > after 4 hours, and even though the "shape" of each index is identical. > This demonstrates an important general principle about vacuuming > indexes: timeliness can matter a lot. > > In general, a big benefit of the deduplication patch is that it "buys > time" for VACUUM to run before "unnecessary" page splits can occur -- > that is why the deduplication patch prevents *all* page splits in > these "filler" indexes, whereas on the master branch the filler > indexes are about 2x larger (the exact amount varies based on VACUUM > processing order, at least earlier on). > > For tables with several indexes, giving each index its own VACUUM > worker process will prevent "unnecessary" page splits caused by > version churn, simply because VACUUM will start to clean each index > sooner than it would compared to serial processing (except for the > "lucky" first index). There is no "lucky" first index that gets > preferential treatment -- presumably VACUUM will start processing each > index at the same time with this patch, making each index equally > "lucky". > > I think that there may even be a *complementary* effect with parallel > VACUUM, though I haven't tested that theory. Deduplication "buys time" > for VACUUM to run, while at the same time VACUUM takes less time to > show up and prevent "unnecessary" page splits. My guess is that these > two seemingly unrelated patches may actually address this "unnecessary > page split" problem from two completely different angles, with an > overall effect that is greater than the sum of its parts. > Good analysis and I agree that the parallel vacuum patch can help in such cases. However, as of now, it only works via Vacuum command, so some user intervention is required to realize the benefit. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 17, 2020 at 4:35 PM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > Below are some review comments for v50 patch. > > 1. > +LVShared > +LVSharedIndStats > +LVParallelState > LWLock > > I think, LVParallelState should come before LVSharedIndStats. > > 2. > + /* > + * It is possible that parallel context is initialized with fewer workers > + * then the number of indexes that need a separate worker in the current > + * phase, so we need to consider it. See compute_parallel_vacuum_workers. > + */ > > This comment is confusing me. I think, "then" should be replaced with "than". > Pushed, after fixing these two comments. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, 20 Jan 2020 at 12:39, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jan 17, 2020 at 4:35 PM Mahendra Singh Thalor > <mahi6run@gmail.com> wrote: > > > > Below are some review comments for v50 patch. > > > > 1. > > +LVShared > > +LVSharedIndStats > > +LVParallelState > > LWLock > > > > I think, LVParallelState should come before LVSharedIndStats. > > > > 2. > > + /* > > + * It is possible that parallel context is initialized with fewer workers > > + * then the number of indexes that need a separate worker in the current > > + * phase, so we need to consider it. See compute_parallel_vacuum_workers. > > + */ > > > > This comment is confusing me. I think, "then" should be replaced with "than". > > > > Pushed, after fixing these two comments. Thank you for committing! Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2020-01-20 09:09:35 +0530, Amit Kapila wrote: > Pushed, after fixing these two comments. When attempting to vacuum a large table I just got: postgres=# vacuum FREEZE ; ERROR: invalid memory alloc request size 1073741828 #0 palloc (size=1073741828) at /mnt/tools/src/postgresql/src/backend/utils/mmgr/mcxt.c:959 #1 0x000056452cc45cac in lazy_space_alloc (vacrelstats=0x56452e5ab0e8, vacrelstats=0x56452e5ab0e8, relblocks=24686152) at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:2741 #2 lazy_scan_heap (aggressive=true, nindexes=1, Irel=0x56452e5ab1c8, vacrelstats=<optimized out>, params=0x7ffdf8c00290,onerel=<optimized out>) at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:786 #3 heap_vacuum_rel (onerel=<optimized out>, params=0x7ffdf8c00290, bstrategy=<optimized out>) at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:472 #4 0x000056452cd8b42c in table_relation_vacuum (bstrategy=<optimized out>, params=0x7ffdf8c00290, rel=0x7fbcdff1e248) at /mnt/tools/src/postgresql/src/include/access/tableam.h:1450 #5 vacuum_rel (relid=16454, relation=<optimized out>, params=params@entry=0x7ffdf8c00290) at /mnt/tools/src/postgresql/src/backend/commands/vacuum.c:1882 Looks to me that the calculation moved into compute_max_dead_tuples() continues to use use an allocation ceiling maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData)); but the actual allocation now is #define SizeOfLVDeadTuples(cnt) \ add_size((offsetof(LVDeadTuples, itemptrs)), \ mul_size(sizeof(ItemPointerData), cnt)) i.e. the overhead of offsetof(LVDeadTuples, itemptrs) is not taken into account. Regards, Andres
On Tue, Jan 21, 2020 at 11:30 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2020-01-20 09:09:35 +0530, Amit Kapila wrote: > > Pushed, after fixing these two comments. > > When attempting to vacuum a large table I just got: > > postgres=# vacuum FREEZE ; > ERROR: invalid memory alloc request size 1073741828 > > #0 palloc (size=1073741828) at /mnt/tools/src/postgresql/src/backend/utils/mmgr/mcxt.c:959 > #1 0x000056452cc45cac in lazy_space_alloc (vacrelstats=0x56452e5ab0e8, vacrelstats=0x56452e5ab0e8, relblocks=24686152) > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:2741 > #2 lazy_scan_heap (aggressive=true, nindexes=1, Irel=0x56452e5ab1c8, vacrelstats=<optimized out>, params=0x7ffdf8c00290,onerel=<optimized out>) > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:786 > #3 heap_vacuum_rel (onerel=<optimized out>, params=0x7ffdf8c00290, bstrategy=<optimized out>) > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:472 > #4 0x000056452cd8b42c in table_relation_vacuum (bstrategy=<optimized out>, params=0x7ffdf8c00290, rel=0x7fbcdff1e248) > at /mnt/tools/src/postgresql/src/include/access/tableam.h:1450 > #5 vacuum_rel (relid=16454, relation=<optimized out>, params=params@entry=0x7ffdf8c00290) at /mnt/tools/src/postgresql/src/backend/commands/vacuum.c:1882 > > Looks to me that the calculation moved into compute_max_dead_tuples() > continues to use use an allocation ceiling > maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData)); > but the actual allocation now is > > #define SizeOfLVDeadTuples(cnt) \ > add_size((offsetof(LVDeadTuples, itemptrs)), \ > mul_size(sizeof(ItemPointerData), cnt)) > > i.e. the overhead of offsetof(LVDeadTuples, itemptrs) is not taken into > account. > Right, I think we need to take into account in both the places in compute_max_dead_tuples(): maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData); .. maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData)); -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, 21 Jan 2020 at 15:35, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jan 21, 2020 at 11:30 AM Andres Freund <andres@anarazel.de> wrote: > > > > Hi, > > > > On 2020-01-20 09:09:35 +0530, Amit Kapila wrote: > > > Pushed, after fixing these two comments. > > > > When attempting to vacuum a large table I just got: > > > > postgres=# vacuum FREEZE ; > > ERROR: invalid memory alloc request size 1073741828 > > > > #0 palloc (size=1073741828) at /mnt/tools/src/postgresql/src/backend/utils/mmgr/mcxt.c:959 > > #1 0x000056452cc45cac in lazy_space_alloc (vacrelstats=0x56452e5ab0e8, vacrelstats=0x56452e5ab0e8, relblocks=24686152) > > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:2741 > > #2 lazy_scan_heap (aggressive=true, nindexes=1, Irel=0x56452e5ab1c8, vacrelstats=<optimized out>, params=0x7ffdf8c00290,onerel=<optimized out>) > > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:786 > > #3 heap_vacuum_rel (onerel=<optimized out>, params=0x7ffdf8c00290, bstrategy=<optimized out>) > > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:472 > > #4 0x000056452cd8b42c in table_relation_vacuum (bstrategy=<optimized out>, params=0x7ffdf8c00290, rel=0x7fbcdff1e248) > > at /mnt/tools/src/postgresql/src/include/access/tableam.h:1450 > > #5 vacuum_rel (relid=16454, relation=<optimized out>, params=params@entry=0x7ffdf8c00290) at /mnt/tools/src/postgresql/src/backend/commands/vacuum.c:1882 > > > > Looks to me that the calculation moved into compute_max_dead_tuples() > > continues to use use an allocation ceiling > > maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData)); > > but the actual allocation now is > > > > #define SizeOfLVDeadTuples(cnt) \ > > add_size((offsetof(LVDeadTuples, itemptrs)), \ > > mul_size(sizeof(ItemPointerData), cnt)) > > > > i.e. the overhead of offsetof(LVDeadTuples, itemptrs) is not taken into > > account. > > > > Right, I think we need to take into account in both the places in > compute_max_dead_tuples(): > > maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData); > .. > maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData)); > > Agreed. Attached patch should fix this issue. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, Jan 21, 2020 at 12:11 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 21 Jan 2020 at 15:35, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Jan 21, 2020 at 11:30 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > Hi, > > > > > > On 2020-01-20 09:09:35 +0530, Amit Kapila wrote: > > > > Pushed, after fixing these two comments. > > > > > > When attempting to vacuum a large table I just got: > > > > > > postgres=# vacuum FREEZE ; > > > ERROR: invalid memory alloc request size 1073741828 > > > > > > #0 palloc (size=1073741828) at /mnt/tools/src/postgresql/src/backend/utils/mmgr/mcxt.c:959 > > > #1 0x000056452cc45cac in lazy_space_alloc (vacrelstats=0x56452e5ab0e8, vacrelstats=0x56452e5ab0e8, relblocks=24686152) > > > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:2741 > > > #2 lazy_scan_heap (aggressive=true, nindexes=1, Irel=0x56452e5ab1c8, vacrelstats=<optimized out>, params=0x7ffdf8c00290,onerel=<optimized out>) > > > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:786 > > > #3 heap_vacuum_rel (onerel=<optimized out>, params=0x7ffdf8c00290, bstrategy=<optimized out>) > > > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:472 > > > #4 0x000056452cd8b42c in table_relation_vacuum (bstrategy=<optimized out>, params=0x7ffdf8c00290, rel=0x7fbcdff1e248) > > > at /mnt/tools/src/postgresql/src/include/access/tableam.h:1450 > > > #5 vacuum_rel (relid=16454, relation=<optimized out>, params=params@entry=0x7ffdf8c00290) at /mnt/tools/src/postgresql/src/backend/commands/vacuum.c:1882 > > > > > > Looks to me that the calculation moved into compute_max_dead_tuples() > > > continues to use use an allocation ceiling > > > maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData)); > > > but the actual allocation now is > > > > > > #define SizeOfLVDeadTuples(cnt) \ > > > add_size((offsetof(LVDeadTuples, itemptrs)), \ > > > mul_size(sizeof(ItemPointerData), cnt)) > > > > > > i.e. the overhead of offsetof(LVDeadTuples, itemptrs) is not taken into > > > account. > > > > > > > Right, I think we need to take into account in both the places in > > compute_max_dead_tuples(): > > > > maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData); > > .. > > maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData)); > > > > > > Agreed. Attached patch should fix this issue. > if (useindex) { - maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData); + maxtuples = ((vac_work_mem * 1024L) - SizeOfLVDeadTuplesHeader) / sizeof(ItemPointerData); SizeOfLVDeadTuplesHeader is not defined by patch. Do you think it makes sense to add a comment here about the calculation? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, 21 Jan 2020 at 16:13, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jan 21, 2020 at 12:11 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 21 Jan 2020 at 15:35, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Jan 21, 2020 at 11:30 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > > > Hi, > > > > > > > > On 2020-01-20 09:09:35 +0530, Amit Kapila wrote: > > > > > Pushed, after fixing these two comments. > > > > > > > > When attempting to vacuum a large table I just got: > > > > > > > > postgres=# vacuum FREEZE ; > > > > ERROR: invalid memory alloc request size 1073741828 > > > > > > > > #0 palloc (size=1073741828) at /mnt/tools/src/postgresql/src/backend/utils/mmgr/mcxt.c:959 > > > > #1 0x000056452cc45cac in lazy_space_alloc (vacrelstats=0x56452e5ab0e8, vacrelstats=0x56452e5ab0e8, relblocks=24686152) > > > > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:2741 > > > > #2 lazy_scan_heap (aggressive=true, nindexes=1, Irel=0x56452e5ab1c8, vacrelstats=<optimized out>, params=0x7ffdf8c00290,onerel=<optimized out>) > > > > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:786 > > > > #3 heap_vacuum_rel (onerel=<optimized out>, params=0x7ffdf8c00290, bstrategy=<optimized out>) > > > > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:472 > > > > #4 0x000056452cd8b42c in table_relation_vacuum (bstrategy=<optimized out>, params=0x7ffdf8c00290, rel=0x7fbcdff1e248) > > > > at /mnt/tools/src/postgresql/src/include/access/tableam.h:1450 > > > > #5 vacuum_rel (relid=16454, relation=<optimized out>, params=params@entry=0x7ffdf8c00290) at /mnt/tools/src/postgresql/src/backend/commands/vacuum.c:1882 > > > > > > > > Looks to me that the calculation moved into compute_max_dead_tuples() > > > > continues to use use an allocation ceiling > > > > maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData)); > > > > but the actual allocation now is > > > > > > > > #define SizeOfLVDeadTuples(cnt) \ > > > > add_size((offsetof(LVDeadTuples, itemptrs)), \ > > > > mul_size(sizeof(ItemPointerData), cnt)) > > > > > > > > i.e. the overhead of offsetof(LVDeadTuples, itemptrs) is not taken into > > > > account. > > > > > > > > > > Right, I think we need to take into account in both the places in > > > compute_max_dead_tuples(): > > > > > > maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData); > > > .. > > > maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData)); > > > > > > > > > > Agreed. Attached patch should fix this issue. > > > > if (useindex) > { > - maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData); > + maxtuples = ((vac_work_mem * 1024L) - SizeOfLVDeadTuplesHeader) / > sizeof(ItemPointerData); > > SizeOfLVDeadTuplesHeader is not defined by patch. Do you think it > makes sense to add a comment here about the calculation? Oops, it should be SizeOfLVDeadTuples. Attached updated version. I defined two macros: SizeOfLVDeadTuples is the size of LVDeadTuples struct and SizeOfDeadTuples is the size including LVDeadTuples struct and dead tuples. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, Jan 21, 2020 at 12:51 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 21 Jan 2020 at 16:13, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > SizeOfLVDeadTuplesHeader is not defined by patch. Do you think it > > makes sense to add a comment here about the calculation? > > Oops, it should be SizeOfLVDeadTuples. Attached updated version. > > I defined two macros: SizeOfLVDeadTuples is the size of LVDeadTuples > struct and SizeOfDeadTuples is the size including LVDeadTuples struct > and dead tuples. > I have reproduced the issue by defining MaxAllocSize as 10240000 and then during debugging, skipped the check related to LAZY_ALLOC_TUPLES. After patch, it fixes the problem for me. I have slightly modified your patch to define the macros on the lines of existing macros TXID_SNAPSHOT_SIZE and TXID_SNAPSHOT_MAX_NXIP. What do you think about it? Andres, see if you get a chance to run the test again with the attached patch, otherwise, I will commit it tomorrow morning. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Tue, Jan 21, 2020 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jan 21, 2020 at 12:51 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 21 Jan 2020 at 16:13, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > SizeOfLVDeadTuplesHeader is not defined by patch. Do you think it > > > makes sense to add a comment here about the calculation? > > > > Oops, it should be SizeOfLVDeadTuples. Attached updated version. > > > > I defined two macros: SizeOfLVDeadTuples is the size of LVDeadTuples > > struct and SizeOfDeadTuples is the size including LVDeadTuples struct > > and dead tuples. > > > > I have reproduced the issue by defining MaxAllocSize as 10240000 and > then during debugging, skipped the check related to LAZY_ALLOC_TUPLES. > After patch, it fixes the problem for me. I have slightly modified > your patch to define the macros on the lines of existing macros > TXID_SNAPSHOT_SIZE and TXID_SNAPSHOT_MAX_NXIP. What do you think > about it? > > Andres, see if you get a chance to run the test again with the > attached patch, otherwise, I will commit it tomorrow morning. > Patch looks fine to me except, we better use parentheses for the variable passed in macro. +#define MAXDEADTUPLES(max_size) \ + ((max_size - offsetof(LVDeadTuples, itemptrs)) / sizeof(ItemPointerData)) change to -> (((max_size) - offsetof(LVDeadTuples, itemptrs)) / sizeof(ItemPointerData)) -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, 21 Jan 2020 at 18:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jan 21, 2020 at 12:51 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 21 Jan 2020 at 16:13, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > SizeOfLVDeadTuplesHeader is not defined by patch. Do you think it > > > makes sense to add a comment here about the calculation? > > > > Oops, it should be SizeOfLVDeadTuples. Attached updated version. > > > > I defined two macros: SizeOfLVDeadTuples is the size of LVDeadTuples > > struct and SizeOfDeadTuples is the size including LVDeadTuples struct > > and dead tuples. > > > > I have reproduced the issue by defining MaxAllocSize as 10240000 and > then during debugging, skipped the check related to LAZY_ALLOC_TUPLES. > After patch, it fixes the problem for me. I have slightly modified > your patch to define the macros on the lines of existing macros > TXID_SNAPSHOT_SIZE and TXID_SNAPSHOT_MAX_NXIP. What do you think > about it? Thank you for updating the patch. Yeah MAXDEADTUPLES is better than what I did in the previous version patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jan 22, 2020 at 7:14 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 21 Jan 2020 at 18:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I have reproduced the issue by defining MaxAllocSize as 10240000 and > > then during debugging, skipped the check related to LAZY_ALLOC_TUPLES. > > After patch, it fixes the problem for me. I have slightly modified > > your patch to define the macros on the lines of existing macros > > TXID_SNAPSHOT_SIZE and TXID_SNAPSHOT_MAX_NXIP. What do you think > > about it? > > Thank you for updating the patch. Yeah MAXDEADTUPLES is better than > what I did in the previous version patch. > Pushed after making the change suggested by Dilip. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Jan 22, 2020 at 7:14 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > Thank you for updating the patch. Yeah MAXDEADTUPLES is better than > what I did in the previous version patch. > Would you like to resubmit your vacuumdb utility patch for this enhancement? I see some old version of it and it seems to me that you need to update that patch. + if (optarg != NULL) + { + parallel_workers = atoi(optarg); + if (parallel_workers <= 0) + { + pg_log_error("number of parallel workers must be at least 1"); + exit(1); + } + } This will no longer be true. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, 22 Jan 2020 at 11:23, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jan 22, 2020 at 7:14 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > Thank you for updating the patch. Yeah MAXDEADTUPLES is better than > > what I did in the previous version patch. > > > > Would you like to resubmit your vacuumdb utility patch for this > enhancement? I see some old version of it and it seems to me that you > need to update that patch. > > + if (optarg != NULL) > + { > + parallel_workers = atoi(optarg); > + if (parallel_workers <= 0) > + { > + pg_log_error("number of parallel workers must be at least 1"); > + exit(1); > + } > + } > > This will no longer be true. Attached the updated version patch. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 22 Jan 2020 at 11:23, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Jan 22, 2020 at 7:14 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > Thank you for updating the patch. Yeah MAXDEADTUPLES is better than > > > what I did in the previous version patch. > > > > > > > Would you like to resubmit your vacuumdb utility patch for this > > enhancement? I see some old version of it and it seems to me that you > > need to update that patch. > > > > + if (optarg != NULL) > > + { > > + parallel_workers = atoi(optarg); > > + if (parallel_workers <= 0) > > + { > > + pg_log_error("number of parallel workers must be at least 1"); > > + exit(1); > > + } > > + } > > > > This will no longer be true. > > Attached the updated version patch. > Thanks Sawada-san for the re-based patch. I reviewed and tested this patch. Patch looks good to me. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 22 Jan 2020 at 11:23, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Jan 22, 2020 at 7:14 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > Thank you for updating the patch. Yeah MAXDEADTUPLES is better than > > > > what I did in the previous version patch. > > > > > > > > > > Would you like to resubmit your vacuumdb utility patch for this > > > enhancement? I see some old version of it and it seems to me that you > > > need to update that patch. > > > > > > + if (optarg != NULL) > > > + { > > > + parallel_workers = atoi(optarg); > > > + if (parallel_workers <= 0) > > > + { > > > + pg_log_error("number of parallel workers must be at least 1"); > > > + exit(1); > > > + } > > > + } > > > > > > This will no longer be true. > > > > Attached the updated version patch. > > > > Thanks Sawada-san for the re-based patch. > > I reviewed and tested this patch. Patch looks good to me. As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option functionality with older versions(<13) and also I tested vacuumdb by giving "-j" option with "-P". All are working as per expectation and I didn't find any issue with these options. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 24, 2020 at 4:58 PM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > Attached the updated version patch. > > > > Thanks Sawada-san for the re-based patch. > > > > I reviewed and tested this patch. Patch looks good to me. > > As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option > functionality with older versions(<13) and also I tested vacuumdb by > giving "-j" option with "-P". All are working as per expectation and I > didn't find any issue with these options. > I have made few modifications in the patch. 1. I think we should try to block the usage of 'full' and 'parallel' option in the utility rather than allowing the server to return an error. 2. It is better to handle 'P' option in getopt_long in the order of its declaration in long_options array. 3. Added an Assert for server version while handling of parallel option. 4. Added a few sentences in the documentation. What do you guys think of the attached? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Sat, 25 Jan 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 24, 2020 at 4:58 PM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
> >
> > On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > >
> > > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > Attached the updated version patch.
> > >
> > > Thanks Sawada-san for the re-based patch.
> > >
> > > I reviewed and tested this patch. Patch looks good to me.
> >
> > As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option
> > functionality with older versions(<13) and also I tested vacuumdb by
> > giving "-j" option with "-P". All are working as per expectation and I
> > didn't find any issue with these options.
> >
>
> I have made few modifications in the patch.
>
> 1. I think we should try to block the usage of 'full' and 'parallel'
> option in the utility rather than allowing the server to return an
> error.
> 2. It is better to handle 'P' option in getopt_long in the order of
> its declaration in long_options array.
> 3. Added an Assert for server version while handling of parallel option.
> 4. Added a few sentences in the documentation.
>
> What do you guys think of the attached?
>
I took one more review round. Below are some review comments:
1.
-P, --parallel=PARALLEL_DEGREE do parallel vacuum
I think, "do parallel vacuum" should be modified. Without specifying -P, we are still doing parallel vacuum so we can use like "degree for parallel vacuum"
2. Error message inconsistent for FULL and parallel option:
>
> On Fri, Jan 24, 2020 at 4:58 PM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
> >
> > On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > >
> > > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > Attached the updated version patch.
> > >
> > > Thanks Sawada-san for the re-based patch.
> > >
> > > I reviewed and tested this patch. Patch looks good to me.
> >
> > As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option
> > functionality with older versions(<13) and also I tested vacuumdb by
> > giving "-j" option with "-P". All are working as per expectation and I
> > didn't find any issue with these options.
> >
>
> I have made few modifications in the patch.
>
> 1. I think we should try to block the usage of 'full' and 'parallel'
> option in the utility rather than allowing the server to return an
> error.
> 2. It is better to handle 'P' option in getopt_long in the order of
> its declaration in long_options array.
> 3. Added an Assert for server version while handling of parallel option.
> 4. Added a few sentences in the documentation.
>
> What do you guys think of the attached?
>
I took one more review round. Below are some review comments:
1.
-P, --parallel=PARALLEL_DEGREE do parallel vacuum
I think, "do parallel vacuum" should be modified. Without specifying -P, we are still doing parallel vacuum so we can use like "degree for parallel vacuum"
2. Error message inconsistent for FULL and parallel option:
Error for normal vacuum:
ERROR: cannot specify both FULL and PARALLEL options
Error for vacuumdb:
error: cannot use the "parallel" option when performing full
error: cannot use the "parallel" option when performing full
I think, both the places, we should use 2nd error message as it is giving more clarity.
On Tue, Jan 28, 2020 at 2:13 AM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > On Sat, 25 Jan 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jan 24, 2020 at 4:58 PM Mahendra Singh Thalor > > <mahi6run@gmail.com> wrote: > > > > > > On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > > > > > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > Attached the updated version patch. > > > > > > > > Thanks Sawada-san for the re-based patch. > > > > > > > > I reviewed and tested this patch. Patch looks good to me. > > > > > > As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option > > > functionality with older versions(<13) and also I tested vacuumdb by > > > giving "-j" option with "-P". All are working as per expectation and I > > > didn't find any issue with these options. > > > > > > > I have made few modifications in the patch. > > > > 1. I think we should try to block the usage of 'full' and 'parallel' > > option in the utility rather than allowing the server to return an > > error. > > 2. It is better to handle 'P' option in getopt_long in the order of > > its declaration in long_options array. > > 3. Added an Assert for server version while handling of parallel option. > > 4. Added a few sentences in the documentation. > > > > What do you guys think of the attached? > > > > I took one more review round. Below are some review comments: > > 1. > -P, --parallel=PARALLEL_DEGREE do parallel vacuum > > I think, "do parallel vacuum" should be modified. Without specifying -P, we are still doing parallel vacuum so we can uselike "degree for parallel vacuum" > I am not sure if 'degree' makes it very clear. How about "use this many background workers for vacuum, if available"? > 2. Error message inconsistent for FULL and parallel option: > Error for normal vacuum: > ERROR: cannot specify both FULL and PARALLEL options > > Error for vacuumdb: > error: cannot use the "parallel" option when performing full > > I think, both the places, we should use 2nd error message as it is giving more clarity. > Which message are you advocating here "cannot use the "parallel" option when performing full" or "cannot specify both FULL and PARALLEL options"? The message used in this patch is mainly because of consistency with nearby messages in the vacuumdb utility. If you are advocating to change "cannot specify both FULL and PARALLEL options" to match what we are using in this patch, then it is better to do that separately and maybe ask for more opinions. I think I understand your desire to use the same message at both places, but it seems to me the messages used in both the places are to maintain consistency with the nearby code or the message used for a similar purpose. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, 25 Jan 2020 at 15:41, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jan 24, 2020 at 4:58 PM Mahendra Singh Thalor > <mahi6run@gmail.com> wrote: > > > > On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > > > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > Attached the updated version patch. > > > > > > Thanks Sawada-san for the re-based patch. > > > > > > I reviewed and tested this patch. Patch looks good to me. > > > > As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option > > functionality with older versions(<13) and also I tested vacuumdb by > > giving "-j" option with "-P". All are working as per expectation and I > > didn't find any issue with these options. > > > > I have made few modifications in the patch. > > 1. I think we should try to block the usage of 'full' and 'parallel' > option in the utility rather than allowing the server to return an > error. > 2. It is better to handle 'P' option in getopt_long in the order of > its declaration in long_options array. > 3. Added an Assert for server version while handling of parallel option. > 4. Added a few sentences in the documentation. > > What do you guys think of the attached? Your changes look good me. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 28 Jan 2020 at 08:14, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jan 28, 2020 at 2:13 AM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
> >
> > On Sat, 25 Jan 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Jan 24, 2020 at 4:58 PM Mahendra Singh Thalor
> > > <mahi6run@gmail.com> wrote:
> > > >
> > > > On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > > > >
> > > > > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada
> > > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > > >
> > > > > > Attached the updated version patch.
> > > > >
> > > > > Thanks Sawada-san for the re-based patch.
> > > > >
> > > > > I reviewed and tested this patch. Patch looks good to me.
> > > >
> > > > As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option
> > > > functionality with older versions(<13) and also I tested vacuumdb by
> > > > giving "-j" option with "-P". All are working as per expectation and I
> > > > didn't find any issue with these options.
> > > >
> > >
> > > I have made few modifications in the patch.
> > >
> > > 1. I think we should try to block the usage of 'full' and 'parallel'
> > > option in the utility rather than allowing the server to return an
> > > error.
> > > 2. It is better to handle 'P' option in getopt_long in the order of
> > > its declaration in long_options array.
> > > 3. Added an Assert for server version while handling of parallel option.
> > > 4. Added a few sentences in the documentation.
> > >
> > > What do you guys think of the attached?
> > >
> >
> > I took one more review round. Below are some review comments:
> >
> > 1.
> > -P, --parallel=PARALLEL_DEGREE do parallel vacuum
> >
> > I think, "do parallel vacuum" should be modified. Without specifying -P, we are still doing parallel vacuum so we can use like "degree for parallel vacuum"
> >
>
> I am not sure if 'degree' makes it very clear. How about "use this
> many background workers for vacuum, if available"?
If background workers are many, then automatically, we are using them(by default parallel vacuum). This option is to put limit on background workers(limit for vacuum workers) to be used by vacuum process. So I think, we can use "max parallel vacuum workers (by default, based on no. of indexes)" or "control parallel vacuum workers"
>
> > 2. Error message inconsistent for FULL and parallel option:
> > Error for normal vacuum:
> > ERROR: cannot specify both FULL and PARALLEL options
> >
> > Error for vacuumdb:
> > error: cannot use the "parallel" option when performing full
> >
> > I think, both the places, we should use 2nd error message as it is giving more clarity.
> >
>
> Which message are you advocating here "cannot use the "parallel"
> option when performing full" or "cannot specify both FULL and PARALLEL
> options"? The message used in this patch is mainly because of
> consistency with nearby messages in the vacuumdb utility. If you are
> advocating to change "cannot specify both FULL and PARALLEL options"
> to match what we are using in this patch, then it is better to do that
> separately and maybe ask for more opinions. I think I understand your
> desire to use the same message at both places, but it seems to me the
> messages used in both the places are to maintain consistency with the
> nearby code or the message used for a similar purpose.
--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
>
> On Tue, Jan 28, 2020 at 2:13 AM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
> >
> > On Sat, 25 Jan 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Jan 24, 2020 at 4:58 PM Mahendra Singh Thalor
> > > <mahi6run@gmail.com> wrote:
> > > >
> > > > On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> > > > >
> > > > > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada
> > > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > > >
> > > > > > Attached the updated version patch.
> > > > >
> > > > > Thanks Sawada-san for the re-based patch.
> > > > >
> > > > > I reviewed and tested this patch. Patch looks good to me.
> > > >
> > > > As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option
> > > > functionality with older versions(<13) and also I tested vacuumdb by
> > > > giving "-j" option with "-P". All are working as per expectation and I
> > > > didn't find any issue with these options.
> > > >
> > >
> > > I have made few modifications in the patch.
> > >
> > > 1. I think we should try to block the usage of 'full' and 'parallel'
> > > option in the utility rather than allowing the server to return an
> > > error.
> > > 2. It is better to handle 'P' option in getopt_long in the order of
> > > its declaration in long_options array.
> > > 3. Added an Assert for server version while handling of parallel option.
> > > 4. Added a few sentences in the documentation.
> > >
> > > What do you guys think of the attached?
> > >
> >
> > I took one more review round. Below are some review comments:
> >
> > 1.
> > -P, --parallel=PARALLEL_DEGREE do parallel vacuum
> >
> > I think, "do parallel vacuum" should be modified. Without specifying -P, we are still doing parallel vacuum so we can use like "degree for parallel vacuum"
> >
>
> I am not sure if 'degree' makes it very clear. How about "use this
> many background workers for vacuum, if available"?
If background workers are many, then automatically, we are using them(by default parallel vacuum). This option is to put limit on background workers(limit for vacuum workers) to be used by vacuum process. So I think, we can use "max parallel vacuum workers (by default, based on no. of indexes)" or "control parallel vacuum workers"
>
> > 2. Error message inconsistent for FULL and parallel option:
> > Error for normal vacuum:
> > ERROR: cannot specify both FULL and PARALLEL options
> >
> > Error for vacuumdb:
> > error: cannot use the "parallel" option when performing full
> >
> > I think, both the places, we should use 2nd error message as it is giving more clarity.
> >
>
> Which message are you advocating here "cannot use the "parallel"
> option when performing full" or "cannot specify both FULL and PARALLEL
> options"? The message used in this patch is mainly because of
I mean that "cannot use the "parallel" option when performing full" should be used in both the places.
> advocating to change "cannot specify both FULL and PARALLEL options"
> to match what we are using in this patch, then it is better to do that
> separately and maybe ask for more opinions. I think I understand your
> desire to use the same message at both places, but it seems to me the
> messages used in both the places are to maintain consistency with the
> nearby code or the message used for a similar purpose.
Okay. I am agree with your points. Let's keep as it is.
--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
On Tue, Jan 28, 2020 at 12:04 PM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > On Tue, 28 Jan 2020 at 08:14, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Jan 28, 2020 at 2:13 AM Mahendra Singh Thalor > > <mahi6run@gmail.com> wrote: > > > > > > On Sat, 25 Jan 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Fri, Jan 24, 2020 at 4:58 PM Mahendra Singh Thalor > > > > <mahi6run@gmail.com> wrote: > > > > > > > > > > On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > > > > > > > > > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada > > > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > > > > Attached the updated version patch. > > > > > > > > > > > > Thanks Sawada-san for the re-based patch. > > > > > > > > > > > > I reviewed and tested this patch. Patch looks good to me. > > > > > > > > > > As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option > > > > > functionality with older versions(<13) and also I tested vacuumdb by > > > > > giving "-j" option with "-P". All are working as per expectation and I > > > > > didn't find any issue with these options. > > > > > > > > > > > > > I have made few modifications in the patch. > > > > > > > > 1. I think we should try to block the usage of 'full' and 'parallel' > > > > option in the utility rather than allowing the server to return an > > > > error. > > > > 2. It is better to handle 'P' option in getopt_long in the order of > > > > its declaration in long_options array. > > > > 3. Added an Assert for server version while handling of parallel option. > > > > 4. Added a few sentences in the documentation. > > > > > > > > What do you guys think of the attached? > > > > > > > > > > I took one more review round. Below are some review comments: > > > > > > 1. > > > -P, --parallel=PARALLEL_DEGREE do parallel vacuum > > > > > > I think, "do parallel vacuum" should be modified. Without specifying -P, we are still doing parallel vacuum so we canuse like "degree for parallel vacuum" > > > > > > > I am not sure if 'degree' makes it very clear. How about "use this > > many background workers for vacuum, if available"? > > If background workers are many, then automatically, we are using them(by default parallel vacuum). This option is to putlimit on background workers(limit for vacuum workers) to be used by vacuum process. > I don't think that the option is just to specify the max limit because that is generally controlled by guc parameters. This option allows users to specify the number of workers for the cases where he has more knowledge about the size/type of indexes. In some cases, the user might be able to make a better decision and that was the reason we have added this option in the first place. > So I think, we can use "max parallel vacuum workers (by default, based on no. of indexes)" or "control parallel vacuumworkers" > Hmm, I feel what I suggested is better because of the above explanation. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, 28 Jan 2020 at 12:32, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jan 28, 2020 at 12:04 PM Mahendra Singh Thalor > <mahi6run@gmail.com> wrote: > > > > On Tue, 28 Jan 2020 at 08:14, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Jan 28, 2020 at 2:13 AM Mahendra Singh Thalor > > > <mahi6run@gmail.com> wrote: > > > > > > > > On Sat, 25 Jan 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Fri, Jan 24, 2020 at 4:58 PM Mahendra Singh Thalor > > > > > <mahi6run@gmail.com> wrote: > > > > > > > > > > > > On Thu, 23 Jan 2020 at 15:32, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > > > > > > > > > > > On Wed, 22 Jan 2020 at 12:48, Masahiko Sawada > > > > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > > > > > > Attached the updated version patch. > > > > > > > > > > > > > > Thanks Sawada-san for the re-based patch. > > > > > > > > > > > > > > I reviewed and tested this patch. Patch looks good to me. > > > > > > > > > > > > As offline, suggested by Amit Kapila, I verified vacuumdb "-P" option > > > > > > functionality with older versions(<13) and also I tested vacuumdb by > > > > > > giving "-j" option with "-P". All are working as per expectation and I > > > > > > didn't find any issue with these options. > > > > > > > > > > > > > > > > I have made few modifications in the patch. > > > > > > > > > > 1. I think we should try to block the usage of 'full' and 'parallel' > > > > > option in the utility rather than allowing the server to return an > > > > > error. > > > > > 2. It is better to handle 'P' option in getopt_long in the order of > > > > > its declaration in long_options array. > > > > > 3. Added an Assert for server version while handling of parallel option. > > > > > 4. Added a few sentences in the documentation. > > > > > > > > > > What do you guys think of the attached? > > > > > > > > > > > > > I took one more review round. Below are some review comments: > > > > > > > > 1. > > > > -P, --parallel=PARALLEL_DEGREE do parallel vacuum > > > > > > > > I think, "do parallel vacuum" should be modified. Without specifying -P, we are still doing parallel vacuum so wecan use like "degree for parallel vacuum" > > > > > > > > > > I am not sure if 'degree' makes it very clear. How about "use this > > > many background workers for vacuum, if available"? > > > > If background workers are many, then automatically, we are using them(by default parallel vacuum). This option is toput limit on background workers(limit for vacuum workers) to be used by vacuum process. > > > > I don't think that the option is just to specify the max limit because > that is generally controlled by guc parameters. This option allows > users to specify the number of workers for the cases where he has more > knowledge about the size/type of indexes. In some cases, the user > might be able to make a better decision and that was the reason we > have added this option in the first place. > > > So I think, we can use "max parallel vacuum workers (by default, based on no. of indexes)" or "control parallel vacuumworkers" > > > > Hmm, I feel what I suggested is better because of the above explanation. Agreed. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
On Tue, Jan 28, 2020 at 12:53 PM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > > > 1. > > > > > -P, --parallel=PARALLEL_DEGREE do parallel vacuum > > > > > > > > > > I think, "do parallel vacuum" should be modified. Without specifying -P, we are still doing parallel vacuum sowe can use like "degree for parallel vacuum" > > > > > > > > > > > > > I am not sure if 'degree' makes it very clear. How about "use this > > > > many background workers for vacuum, if available"? > > > > > > If background workers are many, then automatically, we are using them(by default parallel vacuum). This option is toput limit on background workers(limit for vacuum workers) to be used by vacuum process. > > > > > > > I don't think that the option is just to specify the max limit because > > that is generally controlled by guc parameters. This option allows > > users to specify the number of workers for the cases where he has more > > knowledge about the size/type of indexes. In some cases, the user > > might be able to make a better decision and that was the reason we > > have added this option in the first place. > > > > > So I think, we can use "max parallel vacuum workers (by default, based on no. of indexes)" or "control parallel vacuumworkers" > > > > > > > Hmm, I feel what I suggested is better because of the above explanation. > > Agreed. > Okay, thanks for the review. Attached is an updated patch. I have additionally run pgindent. I am planning to commit the attached tomorrow unless I see more comments. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Tue, Jan 28, 2020 at 8:56 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sat, 25 Jan 2020 at 15:41, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I have made few modifications in the patch. > > > > 1. I think we should try to block the usage of 'full' and 'parallel' > > option in the utility rather than allowing the server to return an > > error. > > 2. It is better to handle 'P' option in getopt_long in the order of > > its declaration in long_options array. > > 3. Added an Assert for server version while handling of parallel option. > > 4. Added a few sentences in the documentation. > > > > What do you guys think of the attached? > > Your changes look good me. > Thanks for the review. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, 28 Jan 2020 at 18:47, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jan 28, 2020 at 12:53 PM Mahendra Singh Thalor > <mahi6run@gmail.com> wrote: > > > > > > > > 1. > > > > > > -P, --parallel=PARALLEL_DEGREE do parallel vacuum > > > > > > > > > > > > I think, "do parallel vacuum" should be modified. Without specifying -P, we are still doing parallel vacuum sowe can use like "degree for parallel vacuum" > > > > > > > > > > > > > > > > I am not sure if 'degree' makes it very clear. How about "use this > > > > > many background workers for vacuum, if available"? > > > > > > > > If background workers are many, then automatically, we are using them(by default parallel vacuum). This option isto put limit on background workers(limit for vacuum workers) to be used by vacuum process. > > > > > > > > > > I don't think that the option is just to specify the max limit because > > > that is generally controlled by guc parameters. This option allows > > > users to specify the number of workers for the cases where he has more > > > knowledge about the size/type of indexes. In some cases, the user > > > might be able to make a better decision and that was the reason we > > > have added this option in the first place. > > > > > > > So I think, we can use "max parallel vacuum workers (by default, based on no. of indexes)" or "control parallel vacuumworkers" > > > > > > > > > > Hmm, I feel what I suggested is better because of the above explanation. > > > > Agreed. > > > > Okay, thanks for the review. Attached is an updated patch. I have > additionally run pgindent. I am planning to commit the attached > tomorrow unless I see more comments. Thank you for committing it! Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jan 29, 2020 at 7:20 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > > > > Okay, thanks for the review. Attached is an updated patch. I have > > additionally run pgindent. I am planning to commit the attached > > tomorrow unless I see more comments. > > Thank you for committing it! > I have marked this patch as committed in CF. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com